Job Description
Role Summary
We are looking for a Senior AI Engineer to design, build, and operate production-grade Agentic AI
systems that power intelligent assistants and autonomous workflows across regulated enterprise
environments.
This role will focus on LLM-based agents, multi-agent orchestration, guardrails, and end-to-end
deployment pipelines, with direct exposure to Copilot Studio, LangChain, LangGraph, and
enterprise CI/CD practices.
The ideal candidate combines strong Python engineering, agent architecture expertise, and
rigorous testing/observability discipline to ensure safe, reliable, and accurate AI responses at
scale.
Key Responsibilities
Agentic AI & LLM Engineering
Design and implement Agentic AI systems using LangChain and LangGraph, including
planner executor, router, and evaluator patterns.
Build multi-agent workflows for intent classification, probing, reasoning, and decision
orchestration.
Develop tool-using agents (retrieval, rules engines, APIs, enterprise services).
Optimize prompt strategies, state management, memory, and reasoning flows to minimize
hallucinations and maximize accuracy.
Copilot & Agent Platforms
Build and extend agents using Copilot Studio and Power Platform based agent frameworks.
Integrate custom Python-based agents with Copilot runtime, connectors, and enterprise data
sources.
Collaborate with product and platform teams to operationalize agents across real business
workflows.
Guardrails, Safety & Compliance
Implement AI guardrails including:
o Policy enforcement
o Output validation
o Grounding checks (RAG / knowledge-based verification)
o Human-in-the-loop and escalation patterns
Ensure agents comply with enterprise risk, regulatory, and data security standards.
Design architectures that are auditable, observable, and deterministic where required.
Testing, Evaluation & Quality Assurance
Build automated agent testing frameworks to validate:
o Correct intent classification
o Accurate probing behavior o Expected response generation
o Regression prevention across prompt and model changes
Implement offline and online evaluation (golden datasets, synthetic tests, confidence scoring).
Partner with QA and Ops to monitor accuracy, failure modes, and drift.
CI/CD & Platform Engineering
Develop CI/CD pipelines for AI agents (prompt versioning, agent configs, model updates).
Support containerized deployments and environment promotion (dev test prod).
Integrate logging, observability, alerts, and performance metrics for agent behavior.
Required Qualifications
Technical Skills (Must Have)
Strong Python engineering experience (async, APIs, services).
Hands-on experience with LangChain and/or LangGraph in real-world agent implementations.
Experience building Agentic AI systems (not just prompts or chatbots).
Understanding of LLM tooling, RAG, function/tool calling, and orchestration patterns.
Experience implementing CI/CD pipelines for ML or AI-driven systems.
Proven experience in testing LLM outputs and agent behavior.
Platform & Architecture
Experience with enterprise AI platforms (Copilot Studio, Power Platform, or equivalent).
Familiarity with microservices, APIs, event-driven systems, and cloud-native design.
Experience designing governed, production-ready AI architectures.
Preferred / Nice to Have
Experience with Copilot Studio custom agents or connectors.
Knowledge of LLMOps / AI Ops practices.
Experience in regulated domains (financial services, healthcare, compliance-heavy
environments).
Familiarity with evaluation frameworks, agent observability tools, and policy engines.
Exposure to graph-based reasoning or knowledge graphs.
What Success Looks Like
Agents consistently generate accurate, policy-compliant, and explainable responses.
New agent capabilities move from prototype to production safely and quickly.
CI/CD pipelines catch regressions before agents reach users.
Guardrails prevent hallucinations and incorrect guidance at scale.
AI systems are trusted by both business users and risk/compliance teams.