Title: AI/LLM and backend developer
Location: 3 days a week on site in the Redwood City, CA area ( Tuesday- Thursday)
Contract Length: 2 Years
Mode of interview: 2 Virtual interviews.
Minimum years of experience: 7 Years experience
Interview process: 2 rounds virtual
Job Description:
The Role
You’ll own the backend systems that make AI agents reliable in production: agent runtime services, integrations, data modeling, observability, and platform reliability. You’ll design, ship, measure, and harden systems that power real customer workflows at scale.
What You’ll Do
- Own agent runtime services: tool execution, state management, orchestration, retries/idempotency, rate limiting.
- Design APIs & contracts: stable, versioned internal/external APIs, webhooks/events, integration adapters.
- Model complex domain data: schemas for agent memory/state, workflow history, audit trails, permissions, multi-tenant isolation.
- Build integrations at scale: OAuth, webhooks, sync engines, connectors with robust observability and failure handling.
- Reliability engineering: define SLIs/SLOs, implement tracing, timeouts, circuit breakers, budgets, and incident response.
- Performance & cost controls: optimize latency/throughput, queues, caches, storage; manage inference/tool-call costs and runaway tasks.
- Raise the bar: code quality, testing strategy, on-call hygiene, runbooks, postmortems, mentoring.
What We’re Looking For (Required)
- 6+ years building backend systems for production SaaS, platforms, or distributed systems.
- Strong fundamentals in distributed systems, concurrency, queues/workers, caching, and production ops.
- Data modeling depth: relational design (Postgres/MySQL), migrations, indexing, query optimization, data correctness.
- API design excellence: clear, evolvable contracts across internal services and external partners.
- Thrive in high-velocity environments without compromising reliability/security.
- Ownership mindset: build → ship → operate; comfortable with ambiguity and rapid iteration.
- LLM product experience: prompting, tool calling, evals, latency/cost tradeoffs.
- Agent architectures: planning/execution loops, memory/state, sandboxed tools, HITL, safety constraints.
- Frameworks/SDKs: Vercel AI SDK, LangChain/LangGraph, Anthropic Agents, OpenAI tool calling, sandboxed runtimes.
- Infra familiarity: Kubernetes, serverless, stream processing, feature stores, vector search.