Overview
Skills
Job Details
Role: AI Engineer
Client: Sodexo
Location: Remote (USA
In collaboration with a cross-functional feature team (AI Product Manager, Data Engineer,
DevOps, etc.), the Senior Data Engineer s objective is to design, build, and optimize
scalable and reliable AI/ML and Generative AI solutions, integrating advanced
algorithms and data pipelines across the product lifecycle (e.g., MVM, MVP,
Industrialization).-----2. Key Responsibilities and Expected Deliverables
Support Scoping and Minimum Viable Model Phases (30%)
- Conduct data and tech due diligence, including State-of-the-Art (SoTA) analysis for
AI/ML and Generative AI solutions.
- Set up exploratory environments using tools like Jupyter Notebook, Databricks, or
MLFlow for rapid prototyping.
- Provide accurate workload estimations for product backlogs, collaborating closely with AI
Product Managers.
Design and Implement Scalable AI/ML and Data Pipelines (50%)
- Develop and optimize end-to-end data pipelines using Apache Kafka, PySpark, Azure
Data Factory, and Databricks to integrate heterogeneous data sources.
- Build and deploy Retrieval-Augmented Generation (RAG) solutions and Large Language
Models (LLMs) like Llama or GPT-3.5, leveraging frameworks such as LangChain,
Hugging Face Transformers, and vector databases (e.g., FAISS, Pinecone).
- Implement anomaly detection frameworks using statistical methods (e.g., Z-score,
control charts) and ML-based approaches (e.g., Isolation Forests, Autoencoders),
optimizing for real-time analytics.
- Containerize AI/ML applications using Docker and Kubernetes to ensure scalable,
consistent deployments across environments.
- Embed monitoring metrics and CI/CD pipelines using Azure DevOps and Git to
streamline model testing, deployment, and monitoring, reducing deployment time.
Drive Innovation and Technology Adoption (20%)
- Conduct technological watch on emerging AI/ML, NLP, and Generative AI technologies,
testing tools like TensorFlow, PyTorch, and Snowflake in use-case contexts.
- Recommend and prototype new market technologies to enhance Sodexo s data-driven
capabilities.
- Participate in selecting service providers through RFPs, focusing on cloud and AI
solutions.
MSc in Data Science, Computer Science, or related field (e.g., Information Technology).
- 8-10 years of relevant experience as a Data Engineer or Data Scientist, with a focus on
AI/ML, Generative AI, and data pipeline development.
Programming Languages
- Expert knowledge of Python (mandatory) with experience in R, PL-SQL, and Spark.
- Familiarity with GraphQL and NoSQL databases (e.g., MongoDB, PostgreSQL) is a plus.
Big Data & Distributed Systems
- Proficient in Apache Spark and Kafka for real-time data streaming and processing.
- Extensive experience with Azure Databricks, Azure Data Factory, and Azure Data Lake
Storage Gen2.
- Skilled in container technologies (Docker, Kubernetes) and microservices architectures.
- Expertise in Snowflake for data warehousing, with knowledge of big data file formats
(e.g., Parquet, Delta).
- Familiarity with AWS (S3, Glue) and Google Cloud Platform (BigQuery) is a plus.
Machine Learning and Analytics
- Deep expertise in ML frameworks (TensorFlow, PyTorch, Scikit-learn) and NLP tools
(NLTK, SpaCy, Hugging Face Transformers).
- Experience with Generative AI, LLMs (e.g., Llama, GPT-3.5), and RAG implementations
using LangChain and vector databases.
- Proficient in anomaly detection (e.g., Isolation Forests, Autoencoders) and predictive
modeling (e.g., XGBoost, Random Forest).
- Skilled in visualization tools like Tableau and Power BI for creating impactful dashboards.
- Experience with MLOps workflows using Azure ML Services, MLFlow, or Azure DevOps.
Cloud
- Expert in Azure services (Data Factory, Databricks, Cognitive Search) with a focus on
data and AI solutions.
- Strong understanding of cloud governance, security, and networking (e.g., resource
groups, SSO).
- Knowledge of AWS and Google Cloud Platform is a plus.
Software Engineering
- Proficient in Git for version control and Azure DevOps for CI/CD pipeline creation.
- Strong ability to write clean, modular code and document infrastructure for team
collaboration.
Hard Skills
- System design and data pipeline architecture
- Data modeling, ETL processes, and data integration
- AI/ML model development, including NLP and Generative AI
- Anomaly detection and real-time analytics
- Data visualization and dashboard creation
- DevOps and MLOps practices
- Agile methodology and user story creation
- Validation, testing, and performance optimization
Soft Skills
- Collaborates (Level 3: Leading collaboration across teams)
- Ensures accountability (Level 3: Driving team accountability)
- Drives results (Level 3: Consistently delivering high-impact outcomes)
- Nimble learning (Level 3: Proactively adopting new technologies)