What you'll do:
Build agentic AI systems: Design and implement tool-calling agents that combine retrieval, structured reasoning, and secure action execution (function calling, change orchestration, policy enforcement) following MCP protocol. Engineer robust guardrails for safety, compliance, and least-privilege access.
Productionize LLMs: Build evaluation framework for open-source and foundational LLMs; implement retrieval pipelines, prompt synthesis, response validation, and self-correction loops tailored to production operations.
Integrate with runtime ecosystems: Connect agents to observability, incident management, and deployment systems to enable automated diagnostics, runbook execution, remediation, and post-incident summarization with full traceability.
Collaborate directly with users: Partner with production engineers, and application teams to translate production pain points into agentic AI roadmaps; define objective functions linked to reliability, risk reduction, and cost; and deliver auditable, business-aligned outcomes.
Safety, reliability, and governance: Build validator models, adversarial prompts, and policy checks into the stack; enforce deterministic fallbacks, circuit breakers, and rollback strategies; instrument continuous evaluations for usefulness, correctness, and risk.
Scale and performance: Optimize cost and latency via prompt engineering, context management, caching, model routing, and distillation; leverage batching, streaming, and parallel tool-calls to meet stringent SLOs under real-world load.
Build a RAG pipeline: Curate domain-knowledge; build data-quality validation framework; establish feedback loops and milestone framework maintain knowledge freshness.
Raise the bar: Drive design reviews, experiment rigor, and high-quality engineering practices; mentor peers on agent architectures, evaluation methodologies, and safe deployment patterns