Develops and implements an industry-leading, self-service AI platform. This platform will offer standardized blueprints for engineering teams to utilize, including "macro-agents" and "micro-tools."
Develops and articulates the long-term, multi-year technical roadmap for the AI Platform, ensuring its capabilities are strategically aligned with the overarching business objectives.
Develops and implements complex stategraphs to manage edge cases and enable self-correction in autonomous planning.
Leads the architecture and hands-on development of remote MCP servers and implements custom function calling to securely connect agents with sensitive enterprise data.
Defines and implements the communication standards for agent-to-agent interactions to facilitate autonomous discovery and task hand-offs between agents developed by various business units.
Ensures the agent identity layer is architected for granular permissioning and non-repudiation, specifically regarding every autonomous system action.
Develops a unified knowledge layer for the platform, leveraging semantic retrieval engine and multimodal grounding. This layer will serve as the single source of truth for all connected agents, providing "truth-as-a-service."
Develops and implements a global memory bank architecture, leveraging semantic retrieval engine and graph databases. This system will be essential for preserving context and capturing "institutional knowledge" from interactions over time.
Develops the platform's trust layer by automating rapid evaluation pipelines. These pipelines will measure key metrics of success, cost, and safety for agentic behavior across all tenants.
Ensures the agent runtime environment lifecycle is managed to guarantee high availability, session persistence, and global scalability for the company's digital workforce.
Performs in-depth, rigorous code reviews with a specific focus on identifying and mitigating the unique failure modes inherent in agentic systems, such as state bloat, tool call hallucinations, and infinite loops.
- REQUIRED
10+ years in Software Engineering, with at least 4 years in a Principal or Architect-level role.
2+ years specifically architecting LLM-based systems, with a proven track record of moving agentic projects into production at scale.
5+ years of experience developing within an agile methodology.
Certified Google Cloud Professional Cloud Architect.
Experience leading technical workstreams, translating business problems into AI-native architectures.
Expertise in asynchronous orchestration frameworks (e.g., Python) and proficiency in statically typed systems (e.g., Java, Go, or Rust) is required to engineer high-concurrency agentic middleware using stateful graph orchestration (e.g., ADK or LangGraph) to power robust, autonomous reasoning engines.
Expertise in cloud-native CLI tools and Infrastructure-as-Code frameworks for automating agentic infrastructure deployment. Proven track record of deploying and scaling containerized autonomous workloads using enterprise-grade container orchestration and serverless execution platforms.
Experience managing high-scale distributed architecture, vector databases, graph databases, and structured data pipelines.
Deep knowledge of stateful orchestration frameworks and multi-agent design patterns, with the architectural expertise to engineer custom reasoning engines and proprietary orchestration logic when off-the-shelf solutions reach their scaling or safety limits.
Practitioner understanding of Chain-of-Thought, ReAct, Tree-of-Thoughts, and Self-Reflection architectures.
Experience managing systems with millions of daily requests or handling multi-petabyte datasets.
Proficiency in architecting semantic retrieval layers, attribute-aware discovery, and stateful persistence systems to provide high-fidelity long-term context for autonomous agents.
Deep understanding of MCP, A2A, REST/gRPC APIs, Oauth2 security, and function calling mechanics.
Familiarity with design patterns and microservices-based architecture patterns.
Pls apply with Salary expected.