Role Overview
We are looking for an AI Platform Engineer a builder who can architect the "factory" where AI is made.
Our goal is to build an internal, on-premises AI ecosystem that mimics the capabilities of AWS or Azure. You will be responsible for creating a horizontal platform used by various lines of business to deploy AI projects simultaneously.
Key Responsibilities
- Platform Architecture: Design and develop a "Model-as-a-Service" platform that allows non-experts to use drag-and-drop components to build AI solutions.
- RAG-as-a-Service: Build and optimize end-to-end Retrieval-Augmented Generation (RAG) pipelines, including sophisticated chunking strategies and vector database management.
- Tooling & Libraries: Develop and maintain MCP (Model Control Protocol) libraries, clients, and servers to connect various data sources to the AI engine.
- Infrastructure Management: Help manage and optimize one of the largest on-premise GPU farms in the U.S. banking sector (500+ Nvidia nodes).
- Agentic AI: Build a repository for Agentic AI where users can select existing agents or build custom ones for specialized tasks.
- CI/CD Integration: Integrate AI deployment pipelines with enterprise-level CI/CD tools like Jenkins and Ansible.
- Compliance & Guardrails: Implement corporate-level guardrails and work within Model Risk Management (MRM) frameworks to ensure all AI deployments are secure and compliant.
Required Technical Skills
- Expert Python: Deep, hands-on knowledge is mandatory.
- Data Engineering: Extensive experience in massive data ingestion and processing.
- RAG Expertise: Deep understanding of vector databases, inferencing, and advanced chunking strategies.
- Platform Engineering: Proven experience building tools/platforms that other developers or business units use.
- Infrastructure Knowledge: Experience mimicking cloud capabilities (AWS/Azure) within a strictly on-premise environment.
- DevOps: Familiarity with Jenkins, Ansible, and automated deployment pipelines.
Experience & Qualifications
- Seniority: This is a senior-level role. We are looking for someone with a proven track record of building production-grade platforms (10-15+ years)
- Industry Knowledge: You must stay current with the "latest and greatest" in AI (e.g., rag-less inferencing, agentic frameworks).
- Problem Solver: Must be able to take a use case from a business unit and translate it into a scalable platform service.
- Experience with Scale: Experience working with large-scale GPU farms and high-volume data environments is highly preferred.