Job Title: AI/ML Engineer/Architect
Location: Frisco, TX
Duration: / Term: 6+ months – Contract
Job Description:
Experience Desired: 10+ Years.
Qualification:
We are seeking a Hands-On Architect/Principal Software Engineer with 10+ years of experience in designing, developing, and deploying large-scale distributed applications, along with proven expertise in AI/ML and Agentic AI solutions. The ideal candidate should have strong hands-on coding experience in Python (mandatory) and at least one additional programming language such as Java, Go, Rust, or C++. Candidates must have experience building and deploying production-grade AI/ML applications, LLM-based systems, multi-agent architectures, and end-to-end MLOps pipelines.
Responsibilities:
- Lead the architecture, design, development, and deployment of scalable, high-performance AI/ML and Agentic AI applications with a strong hands-on coding approach.
- Design and build cloud-native, distributed microservices and full-stack applications using Python and modern programming languages on AWS and Google Cloud Platform.
- Develop, deploy, and optimize production-ready LLM-based, multi-agent AI systems and end-to-end MLOps pipelines.
- Architect and implement Kubernetes-based infrastructure using Docker, Helm, ArgoCD, Istio/Linkerd, Cilium, and cloud-native networking best practices.
- Collaborate with product managers, data scientists, and engineering teams to translate business requirements into scalable technical solutions.
- Provide technical leadership, mentor engineering teams, conduct code reviews, and establish software engineering best practices.
- Drive AI-assisted software development by leveraging tools such as GitHub Copilot, ChatGPT, Claude, and other developer productivity solutions.
- Design and optimize REST APIs, gRPC services, databases, and distributed systems for high availability, scalability, and low latency.
- Implement CI/CD pipelines, DevOps automation, monitoring, observability, and security best practices across cloud environments.
- Troubleshoot complex production issues, optimize system performance, and continuously improve application reliability and operational excellence.
Key Skills:
AI/ML, Agentic AI, LLMs, Python, Java, AWS, Google Cloud Platform, Kubernetes, Docker, MLOps