About AgentOps AgentOps is the enterprise engineering foundation for building, operating, and governing
AI agents and digital workers as production-grade systems. We are enabling the shift from simple chat-based experiences to agentic systems that can reason, plan, use tools, and execute complex workflows reliably across the enterprise.
Our mission is to provide the platform capabilities, reusable
skills , and operational controls required to scale intelligent digital workers with strong standards for reliability, security, observability, and compliance.
The Role As a
Software Engineer III on the
Agent Engineering team, you will design and build core platform capabilities that power intelligent, stateful, and production-ready
agents, digital workers, and reusable skills . This is a hands-on senior engineering role focused on orchestration, agent runtime patterns, resilience, memory, retrieval, and observability.
You will help define reusable engineering patterns for how digital workers are built, how skills are packaged and reused, and how agentic workflows are operated across the platform. You will work closely with partner teams to translate complex business workflows into robust, governed, and scalable agentic services.
Key Responsibilities 1) Advanced Orchestration & Digital Worker Execution - Design and implement multi-agent and digital worker orchestration patterns that enable specialized agents to delegate, collaborate, and complete multi-step business goals.
- Build stateful and cyclic workflows using frameworks such as LangGraph, CrewAI, AutoGen, or similar , enabling reflection, recovery, and adaptive execution beyond simple linear chains.
- Develop reusable orchestration components for routing, retries, fallback logic, structured outputs, and human-in-the-loop interventions.
- Define how digital workers compose and invoke reusable skills across common enterprise workflows.
2) Skills, Tooling & Interoperability - Build and maintain reusable skills that encapsulate business actions, domain logic, tool usage, and workflow steps in a standardized way.
- Define contracts and standards for how skills are exposed, discovered, versioned, and consumed by agents and digital workers.
- Contribute to standards for MCP, tool calling, and agent interaction contracts across the platform.
- Integrate enterprise APIs, services, and data systems into reusable skills with strong attention to safety, governance, and maintainability.
3) Stateful Execution, Reliability & Agent Runtime Engineering - Design systems for long-running, resumable workflows for agents and digital workers, including checkpointing, persistence, context restoration, and lifecycle management.
- Implement resilience patterns for non-deterministic AI systems, including timeout handling, intelligent retries, degraded execution modes, and escalation paths.
- Improve runtime reliability, scalability, and cost efficiency of agent and digital worker workloads in production.
- Partner with infrastructure and platform teams to harden execution across cloud-native environments.
4) RAG, Memory & Knowledge-Augmented Intelligence - Build and optimize retrieval-augmented generation pipelines using vector databases, hybrid retrieval, re-ranking, and grounding strategies.
- Design memory patterns that improve continuity and contextual relevance across agent and digital worker sessions, including short-term, episodic, and semantic memory approaches.
- Integrate enterprise knowledge sources and structured systems securely into workflows and skills.
- Evaluate and improve answer quality, retrieval performance, and contextual fidelity.
5) Evaluation, Guardrails & Observability - Build automated evaluation frameworks to measure workflow quality, skill execution quality, tool-use accuracy, groundedness, safety, and task success.
- Instrument deep tracing and operational observability using tools such as Langfuse, LangSmith, Arize Phoenix, OpenTelemetry, or similar .
- Define and monitor engineering KPIs such as latency, cost per run, fallback rates, workflow completion success, skill reliability, and production health.
- Contribute to guardrails for safe execution, prompt injection resistance, and policy-compliant agent behavior.
6) Technical Leadership & Platform Contribution - Drive reusable engineering standards, shared libraries, and reference patterns for agent development, digital workers, and skills across the platform.
- Mentor other engineers through design reviews, code reviews, and implementation guidance.
- Partner with product, architecture, and domain teams to shape scalable solutions for enterprise use cases.
- Stay current on the evolving agentic AI ecosystem and evaluate new frameworks, techniques, and runtime patterns pragmatically for enterprise adoption.
What Makes You a Fit Required Qualifications - 8+ years of software engineering experience with strong proficiency in Python and backend/platform engineering.
- Hands-on experience building LLM-powered systems, agents, digital workers, or workflow automation platforms in production.
- Experience with frameworks such as LangGraph, CrewAI, AutoGen, LangChain, LlamaIndex, or similar .
- Strong experience in APIs, distributed systems, cloud-native engineering, and production reliability.
- Experience designing and integrating RAG pipelines, tool-calling systems, reusable skills, and structured output patterns .
- Experience with at least one major cloud platform such as AWS, Azure, or Google Cloud Platform , along with Docker, Kubernetes, and CI/CD practices.
- Ability to design systems with strong trade-off awareness across quality, latency, cost, resilience, and maintainability.
Preferred Qualifications - Experience with MCP or similar tool/context interoperability protocols.
- Experience with Redis, DynamoDB, Postgres, or workflow/state stores for orchestration and persistence.
- Familiarity with multi-agent systems, digital worker architectures, skill registries, and human-in-the-loop execution models .
- Experience with AI observability, evaluation frameworks, and operational telemetry for LLM systems.
- Understanding of secure execution patterns, sandboxing, and prompt injection mitigation.
- Ability to translate emerging research and ecosystem patterns into pragmatic production solutions.
#LI-EB1
Who we are: At Pearson, our purpose is simple: to help people realize the life they imagine through learning. We believe that every learning opportunity is a chance for a personal breakthrough. We are the world's lifelong learning company. For us, learning isn't just what we do. It's who we are. To learn more: We are Pearson.
Pearson is an Equal Opportunity Employer and a member of E-Verify. Employment decisions are based on qualifications, merit and business need. Qualified applicants will receive consideration for employment without regard to race, ethnicity, color, religion, sex, sexual orientation, gender identity, gender expression, age, national origin, protected veteran status, disability status or any other group protected by law. We actively seek qualified candidates who are protected veterans and individuals with disabilities as defined under VEVRAA and Section 503 of the Rehabilitation Act.
If you are an individual with a disability and are unable or limited in your ability to use or access our career site as a result of your disability, you may request reasonable accommodations by emailing
Job: Engineering
Job Family: TECHNOLOGY
Organization: Corporate Strategy & Technology
Schedule: FULL_TIME
Workplace Type: Hybrid
Req ID: 23253