Location: Phoenix, AZ (Hybrid 3 days in office)
Python AI Engineer
-Knowledge Base Infrastructure
We are hiring a Python Platform Engineer to operate and evolve the infrastructure behind an enterprise knowledge base platform that is moving from a Confluence-focused RAG chatbot into a broader agentic knowledge system.
Today, the platform supports Confluence/GitHub ingestion → chunking → pgvector → RAG retrieval → FastAPI serving. Over the next phase, we are expanding toward hybrid retrieval (vector + sparse + graph), multi-source ingestion, evaluation pipelines, agent infrastructure, harness and shared chat platform primitives.
What You''ll Own:
· Architect and implement backend services in Python 3.11, FastAPI, Pydantic, SQLAlchemy async, and asyncpg
· Design retrieval and orchestration trade-offs around quality, latency, cost, safety, and operational simplicity
· Build production-grade agent runtime capabilities: memory boundaries, tool sandboxing, permissions, and budget controls
· Improve answer grounding, failure analysis, and citation enforcement rather than optimizing for demo behavior
· Create observability and operational feedback loops with OpenTelemetry, PrometheGrafana, Docker/Helm, and GitHub Actions
· Work closely with product and engineering partners to support multiple conversational surfaces through one knowledge platform
· Ingestion infrastructure across current and future content sources
· Observability across application, pipeline, database, and model-serving behavior
· Cost, latency, throughput, and failure-mode management for AI-heavy workloads
· Release workflows that validate AI behavior changes, not just code compilation
Qualifications:
· Strong hands-on experience with Python in platform, automation, or infrastructure-heavy environments
· Experience building CLI tools using Python, Golang or Rust.
· Hands-on experience with LangGraph, LangChain, pgvector, and modern retrieval pipelines
· Experience designing evaluation frameworks for LLM-backed systems, including regression detection and quality measurement
· Strong experience with Docker, Helm, GitHub Actions, and Kubernetes-oriented workflows
· Familiarity with the operational characteristics of embedding pipelines, vector search, and LLM-backed systems
· Strong observability skills across metrics, tracing, dashboards, alerting, and log analysis
· Experience with ingestion, ETL, or content-processing pipelines at scale
· Ability to think in terms of reliability, cost, latency, throughput, and recovery
Nice to Have:
· Experience with Qdrant, Neo4j, or other vector/graph infrastructure
· Experience supporting RAG, search, evaluation, or agent platforms
· Experience in enterprise or regulated environments
· Familiarity with Vault, Splunk, Artifactory, ECR
· Comfort using AI-assisted engineering workflows in day-to-day work