Machine Learning Engineer (AI Quality & Governance) – Contract (CST)
This role sits within the AI Quality and Governance team, focused on building guardrails for internal AI/ML and GenAI platforms. The engineer will contribute to developing frameworks that evaluate and monitor ML and GenAI solutions, including measuring relevance, cost, latency, and token usage across systems such as LLM-based applications.
The position emphasizes building scalable evaluation systems, improving observability, and ensuring responsible and reliable AI performance. This is a hands-on engineering role with strong exposure to real-world ML and GenAI evaluation challenges.
What You’ll Do:
· Design and contribute to systems that enable rapid ML and GenAI development with high availability and strong observability
· Build and enhance evaluation frameworks for ML, GenAI, and agentic AI solutions, focusing on metrics such as accuracy, relevance, cost, latency, and token usage
· Develop APIs and services to support access to AI solutions and their evaluation pipelines
· Contribute to ML infrastructure, including deployment, monitoring, and performance optimization
· Write clean, scalable, and maintainable code, primarily in Python
· Develop and maintain automated tests (unit, integration, functional) to ensure system reliability
· Troubleshoot and debug issues related to ML performance, evaluation metrics, and production systems
· Collaborate with product managers, data teams, and engineers to deliver end-to-end AI solutions
· Partner with senior engineers to follow best practices and improve development processes
· Support production systems and participate in on-call rotations as needed
· Explore and experiment with emerging tools and techniques in ML and GenAI evaluation
What You’ll Need:
· 2+ years of experience in Machine Learning Engineering, or an advanced degree in a related field
· Strong proficiency in Python (primary language). Java experience is optional
· Hands-on experience with AWS or similar cloud platforms
· Experience with ML lifecycle including model deployment, data pipelines, and CI/CD
· Familiarity with Generative AI systems and LLM-based application development
· Exposure to ML/LLM evaluation frameworks such as RAGAS, DeepEval, or similar tools (tooling may evolve)
· Experience building or working with evaluation, monitoring, or observability frameworks for ML/GenAI systems
· Understanding of metrics-driven evaluation, including relevance, accuracy, latency, and cost optimization
· Experience with Docker and Kubernetes is preferred
· Strong foundation in data structures, algorithms, and distributed systems
· Experience debugging production systems and working in Agile environments
· Basic knowledge of APIs, web protocols (HTTP), and databases
· Awareness of Responsible AI, AI governance, or audit frameworks (e.g., ISO 42001) is a plus
· Strong communication skills and ability to collaborate across teams (critical for success in this role).
Location: Brazil (nearshore)
Working Hours: CST time zone
Duration: 6 months+