Job Title: Lead Data Engineer (AI/ML)
Location: Charlotte, NC (Hybrid/Onsite)
Contract: W2
Duration: 12+ Months
Position Summary
We are seeking a highly experienced Lead Data Engineer with 12+ years of experience in enterprise data engineering and AI/ML data platforms. The ideal candidate will lead the design, development, and implementation of scalable data architectures, cloud-native data platforms, and AI/ML data pipelines supporting advanced analytics, Generative AI, and Machine Learning initiatives.
This role requires strong expertise in Python, PySpark, Spark, Snowflake, Databricks, Azure/AWS/Google Cloud Platform, AI/ML frameworks, data lakes, ETL/ELT, MLOps, and modern cloud technologies. The candidate will collaborate with Data Scientists, ML Engineers, Architects, DevOps, and business stakeholders to deliver enterprise-grade AI-enabled data solutions.
Key Responsibilities
- Lead the design and implementation of enterprise-scale data engineering and AI/ML data platforms.
- Architect scalable batch and real-time data pipelines supporting analytics and machine learning workloads.
- Build and optimize cloud-native data lakes, data warehouses, and Lakehouse architectures.
- Design and implement ETL/ELT pipelines using modern cloud technologies.
- Develop feature engineering pipelines supporting ML model training and inference.
- Build scalable data pipelines for Large Language Models (LLMs), Generative AI, and AI-powered applications.
- Design data ingestion frameworks for structured, semi-structured, and unstructured datasets.
- Implement data validation, profiling, governance, lineage, and quality monitoring solutions.
- Optimize Spark, SQL, and distributed processing workloads for performance and scalability.
- Lead cloud migration and application modernization initiatives.
- Collaborate with Data Scientists and ML Engineers to productionize AI/ML models.
- Build and maintain MLOps pipelines for automated model deployment, monitoring, and retraining.
- Integrate AI-powered solutions using OpenAI, Azure OpenAI, AWS Bedrock, or Vertex AI.
- Implement CI/CD pipelines for data engineering and machine learning workflows.
- Mentor junior engineers and establish engineering best practices.
- Participate in architecture reviews, code reviews, and technical decision-making.
Required Technical Skills
Programming Languages
- Python
- SQL
- PySpark
- Scala
- Java
Big Data Technologies
- Apache Spark
- PySpark
- Hadoop
- Hive
- Kafka
- Delta Lake
- Apache Airflow
Cloud Platforms
- Microsoft Azure
- Amazon Web Services (AWS)
- Google Cloud Platform (Google Cloud Platform)
Data Engineering
- Snowflake
- Databricks
- Azure Data Factory (ADF)
- Azure Synapse Analytics
- AWS Glue
- Amazon Redshift
- Google BigQuery
- dbt
- Informatica
- Matillion
AI / Machine Learning
- Machine Learning Pipelines
- Feature Engineering
- Model Training Pipelines
- Model Deployment
- Model Monitoring
- MLOps
- MLflow
- Kubeflow
- SageMaker
- Azure ML
- Vertex AI
Generative AI
- OpenAI APIs
- Azure OpenAI
- AWS Bedrock
- LangChain
- LlamaIndex
- Vector Databases (Pinecone, FAISS, ChromaDB)
- RAG (Retrieval-Augmented Generation)
- Prompt Engineering
- AI Agents
- MCP (Model Context Protocol)
Streaming Technologies
- Kafka
- Spark Streaming
- Azure Event Hub
- AWS Kinesis
Databases
- SQL Server
- PostgreSQL
- Oracle
- MongoDB
- Cassandra
- NoSQL
- Cosmos DB
DevOps / CI/CD
- Azure DevOps
- GitHub Actions
- Jenkins
- GitLab CI/CD
- Docker
- Kubernetes
- Terraform
Data Governance
- Collibra
- Alation
- Microsoft Purview
- Apache Atlas
- Data Lineage
- Metadata Management
- Data Catalog
BI & Analytics
Leadership Responsibilities
- Lead and mentor a team of Data Engineers and ML Engineers.
- Drive enterprise AI and Data Engineering strategy.
- Define data architecture standards and engineering best practices.
- Conduct architecture and code reviews.
- Collaborate with enterprise architects, business leaders, and product owners.
- Lead Agile ceremonies, sprint planning, and technical estimations.
- Drive continuous improvement initiatives across data engineering and AI platforms.
- Ensure security, scalability, reliability, and governance of enterprise data assets.
Required Qualifications
- Bachelor''s or Master''s degree in Computer Science, Data Science, Information Systems, Engineering, or a related field.
- 12+ years of Data Engineering experience.
- 8+ years of Python and SQL development.
- 6+ years of PySpark and Spark development.
- 5+ years of Snowflake or Databricks experience.
- 5+ years of cloud platform experience (Azure, AWS, or Google Cloud Platform).
- 4+ years of AI/ML data engineering experience.
- Strong experience developing enterprise ETL/ELT pipelines.
- Experience implementing Lakehouse architectures.
- Hands-on experience with MLOps platforms and AI model deployment.
- Strong understanding of Data Modeling (Star Schema, Snowflake Schema, Data Vault).
- Experience supporting enterprise AI initiatives.
- Strong Agile/Scrum experience.
Preferred Qualifications
- Experience with Large Language Models (LLMs).
- Experience building RAG-based applications.
- Experience with AI Agents and autonomous workflows.
- Knowledge of Agentic AI architectures.
- Experience with graph databases (Neo4j).
- Experience with vector search and semantic retrieval.
- SnowPro, Databricks, Azure, AWS, Google Cloud Platform, or AI/ML certifications.
- Financial Services, Banking, Healthcare, or Retail domain experience.