Overview
On Site
Depends on Experience
Full Time
Skills
ML/AI
LLM
GenAI
CI/CD
AWS
Job Details
Fulltime Role Onsite at any of the below locations
Denver, Colorado
Springfield Gardens, New York
San Francisco, California
Atlanta, Georgia
Dallas, Texas
Description:
We re looking for a hands-on Lead AI/ML Platform Engineer to architect and evolve the GenAI platform that drives business value across our global operations. You ll lead the execution of strategy and technical direction of our AI platform, mentoring engineers, shaping standards, and driving adoption across decentralized teams helping to scale reusable frameworks and support production-grade AI deployments.
Key responsibilities include:
- Design and own AI/ML infrastructure for scalable, secure, cloud-native platforms (Dataiku, AWS, OpenAI, Pinecone).
- Lead GenAI platform development including prompt workflows, agent context, memory, and retrieval architectures.
- Design and own custom Model Context Protocol (MCP) server architecture.
- Build scalable EKS-based backends to support MCP services and real-time AI API endpoints.
- Define and enforce CI/CD, MLOps, and IaC standards across all AI projects.
- Architect Agent evaluation tooling, cost tracking, and feedback loops.
- Establish agent scalability & governance: develop templatedriven scaling patterns and define platform standards and lifecycle governance for reusable agents.
- Design internal self-service agent toolkits with permission controls.
- Enable and mentor development teams in AI/ML techniques.
- Work effectively with offshore teams to coordinate and integrate AI/ML developments.
- Communicate effectively, translating complex technical details into understandable concepts for non-technical stakeholders.
Requirements
- Bachelor s degree in Computer Science, Engineering, or related field; Master s a plus.
- 8+ years building and operating production ML/AI platforms, including 3+ years in a technicallead capacity.
- Deep expertise with AWS (EC2, S3, Lambda, EKS) and infrastructureascode with Terraform or CloudFormation.
- Handson Kubernetes experience and strong grasp of containerized microservice architectures.
- Advanced software engineering skills in Python with experience building highthroughput APIs and realtime serving systems.
- Strong SQL skills
- Proven track record implementing CI/CD, MLOps frameworks, and observability tooling.
- Practical experience with LLM tooling and vector databases (OpenAI, LangChain, Pinecone) and designing agent context, memory, and retrieval.
- Demonstrated ability to design secure, compliant systems and manage cost efficiency.
- Skilled communicator and mentor, able to lead distributed teams and influence senior stakeholders.
- Ability to multitask, prioritize effectively, and thrive in a fast-paced, dynamic environment.
Preferred:
- Experience with Dataiku and Snowflake strongly preferred.
- Experience designing multi-agent systems and agent lifecycle standards.
- Exposure to agent evaluation systems and ROI tracking.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.