HI Pals,
we have a position for Data Engineer on Remote.
Title:Data Engineer(AI/ML & MLOps)
Loc:Remote
Duration: Long Term contract
Job Summary
We are seeking a highly skilled Data Engineer with expertise in AI/ML, MLOps, and Databricks to design, develop, and optimize scalable data
platforms that support advanced analytics, machine learning, and AI-driven solutions. The ideal candidate will have hands-on experience building
modern data pipelines, implementing MLOps practices, and leveraging cloud-native technologies to deliver high-quality data products.
Key Responsibilities
Design, develop, and maintain scalable batch and real-time data pipelines using Databricks, Apache Spark, and cloud technologies.
Build and optimize data ingestion, transformation, and storage solutions for structured and unstructured data.
Collaborate with Data Scientists, ML Engineers, and business stakeholders to support AI/ML model development and deployment.
Implement MLOps frameworks for model training, versioning, monitoring, deployment, and lifecycle management.
Develop feature engineering pipelines and data preparation workflows for machine learning applications.
Automate CI/CD processes for data and ML workloads.
Ensure data quality, governance, security, and compliance across enterprise data platforms.
Optimize performance and cost efficiency of cloud-based data solutions.
Monitor, troubleshoot, and resolve data pipeline and model deployment issues.
Create technical documentation, architecture diagrams, and operational runbooks.
Required Qualifications
Bachelor''''s degree in Computer Science, Information Systems, Engineering, or related field.
5+ years of experience in Data Engineering.
2+ years of experience supporting AI/ML workloads and production ML systems.
Strong expertise with Databricks, Apache Spark, PySpark, and Delta Lake.
Experience building data pipelines using Python and SQL.
Hands-on experience with MLOps tools such as MLflow, Kubeflow, Azure ML, SageMaker, or Vertex AI.
Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform.
Strong knowledge of data warehousing, ETL/ELT, data modeling, and data governance.
Experience with orchestration tools such as Apache Airflow, Azure Data Factory, or Prefect.
Familiarity with Git, CI/CD pipelines, Docker, and Kubernetes.
Strong analytical, troubleshooting, and communication skills.
Preferred Qualifications
Experience with Generative AI, Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG) architectures.
Knowledge of vector databases such as Pinecone, Weaviate, Chroma, or FAISS.
Experience implementing data governance and security frameworks.
Relevant cloud certifications (AWS, Azure, Google Cloud Platform, Databricks).
Technical Skills
Data Engineering
Databricks
Apache Spark / PySpark
Delta Lake
SQL
Data Modeling
ETL/ELT Development
AI/ML & MLOps
MLflow
Model Monitoring
Feature Engineering
Model Deployment
CI/CD for ML Pipelines
LLM & GenAI Integration
Cloud Platforms
AWS (S3, Glue, EMR, Lambda, SageMaker)
Azure (ADF, Synapse, Azure Databricks, Azure ML)
Google Cloud Platform (BigQuery, Dataflow, Vertex AI)
DevOps & Automation
Docker
Kubernetes
GitHub Actions
Jenkins
Terraform
Apache Airflow
Nice to Have
Experience with streaming technologies such as Kafka, Event Hubs, or Kinesis.
Experience building enterprise AI platforms and data lakes.
Knowledge of data mesh and lakehouse architectures.
Thanks & Regards
Suman|Technical Recruiter |BrightSol