Job Description:
Applications without a GitHub link and Claude demo will not be reviewed.
Should be from a banking or financial domain
Should be from a Data Engineering background or Python or AI/ML
We are hiring a Senior AI Engineer who is a deep practitioner. This role is engineering-first: you will spend the majority of your time writing code, designing LLM pipelines, building RAG architectures, wiring up agentic workflows, and ensuring AI systems behave reliably in client environments.
You work directly under the Head of AI & Data. You do not need to lead client meetings, but you must be able to explain your technical work clearly in writing and in team reviews.
Core AI Engineering
• Design and build LLM-powered applications using the Anthropic Claude API -- including prompt engineering, system prompt architecture, tool use, multi-turn session management, and streaming.
• Build and maintain production-quality RAG pipelines: document ingestion, chunking strategies, embedding models, vector database management, and retrieval optimization.
• Develop agentic AI workflows using LangChain, LlamaIndex, or custom agent architectures: multi-step reasoning, tool orchestration, memory, and controlled output formatting.
• Write clean, tested, documented Python -- production-grade, not prototype-grade. All code lives in Git.
• Build and maintain data pipelines that feed AI systems: ingestion, transformation, enrichment, and delivery to LLM contexts.
Anthropic / Claude Engineering
• Implement and optimize Claude API integrations: model selection (Haiku / Sonnet / Opus), context window utilization, token budgeting, response caching, and latency optimization.
• Design robust system prompts for financial services use cases: client communication drafting, document analysis, compliance review assistance, portfolio commentary, and meeting prep.
• Configure Claude Enterprise deployments: org API key structure, Zero Data Retention (ZDR) setup, rate limit management, and audit logging.
• Implement prompt injection defenses and output validation guardrails appropriate for regulated financial environments.
• Evaluate and benchmark Claude model outputs using structured evaluation frameworks; document findings to inform prompt and architecture decisions.
Evaluation, Testing & Reliability
• Build evaluation pipelines and test suites: define success metrics, write automated evaluation scripts, and track output quality over time.
• Use LLM evaluation tools (LangSmith, RAGAS, Promptfoo, or custom frameworks) to measure retrieval quality, answer faithfulness, and output consistency.
• Instrument AI systems for observability: logging, tracing, cost tracking, and error monitoring.
• Participate in code reviews, architecture reviews, and documentation reviews -- work must be reproducible by others.
Collaboration & Delivery
• Work closely with the Head and Manager of AI & Data on client engagements -- own technical components end-to-end.
• Contribute to internal tooling, reusable libraries, and AI practice assets.
• Write clear technical documentation, architecture diagrams, and runbooks for deployed systems.
Required Qualifications
Experience
• 3-6 years of software engineering experience, with at least 2 years focused on AI/ML systems and at least 1 year building LLM-powered applications in production.
• Demonstrable production experience: you have shipped an AI system that real users use. Can describe the architecture, trade-offs, and lessons learned.
• Strong Python fundamentals: data structures, async programming, API design, testing (pytest), and packaging.
• Git fluency: version-controlled workflow, PR discipline, meaningful commit messages.
Anthropic / Claude -- Must-Have
• Direct, hands-on experience with the Anthropic Claude API -- you have built at least one real project with it and can demonstrate it.
• Working knowledge of Claude prompt engineering: system prompts, few-shot examples, tool use (function calling), and structured output design.
• Understanding of Claude''s model family: when to use Haiku vs. Sonnet vs. Opus, context window trade-offs, and cost/latency optimization.
• Awareness of Anthropic''s safety design principles and how they affect prompt engineering and output handling in regulated environments.
LLM & AI Engineering Skills
• RAG pipeline experience: chunking, embedding, vector search, re-ranking, and answer synthesis. Has debugged retrieval quality issues and improved them.
• Experience with at least one vector database: Pinecone, Chroma, Weaviate, pgvector, Qdrant, or equivalent.
• Agentic AI experience: built a multi-step, tool-using agent that does something non-trivial -- not just a hello-world LangChain demo.
• API integration skills: REST, OAuth, JSON schemas, error handling, and retry logic for connecting LLM systems to external data sources.
• Cloud deployment: AWS, Google Cloud Platform, or Azure for deploying and maintaining AI workloads.
Application Requirements -- Non-Negotiable
We hire engineers on what they have built, not what they say they can build. Required to be considered:
• Public GitHub Profile (Required): Active GitHub showing real work -- original repositories, meaningful commit history, readable code, and READMEs. We will look at your code.
• Claude Demo Project (Required): A project using the Anthropic Claude API that you built. GitHub repo, live demo, or Loom walkthrough. Must show: Claude API integration, at least one of tool use / RAG / agentic behavior, and engineering depth beyond a tutorial.
• Additional Portfolio Project (Strongly Preferred): A second project showing range -- RAG system, fine-tuned model, AI data pipeline, agent with memory, or production deployment you owned. Describe the architecture and what made it hard.
Applications without a GitHub link and Claude demo will not be reviewed.
Preferred (Not Required)
• Experience with LLM evaluation tooling: LangSmith, RAGAS, Promptfoo, Braintrust, or custom eval frameworks.
• MLOps experience: experiment tracking (MLflow, Weights & Biases), CI/CD for AI systems, model versioning.
• TypeScript/Node.js or Streamlit/Gradio for lightweight AI demos or internal tooling.
• Financial services domain exposure: wealth management workflows, CRM platforms, or financial data APIs.
• Experience with document parsing pipelines (PDFs, structured financial documents, earnings reports).
• Contributions to open-source AI/LLM projects.