AI Architect
Hybrid - 30% onsite requirement
Locations options - Charlotte, NC, Dallas, TX or Iselin, NJ
12-month project
Job Description:
Job Duties: Role Overview:
We are seeking a Principal GenAI Architect to serve as a hands-on practitioner and core technical visionary. This is a rare, high-impact role requiring deep expertise in Generative AI, distributed systems, and agentic architectures. You will act as the central design authority for our GenAI capabilities within a matrixed organization, bridging internal platform development, third-party vendor reviews, and cutting-edge agentic workflows.
Your primary mandate is to "push the thinking"-elevating our AI strategy while remaining deeply hands-on. You will oversee all GenAI use cases, driving architectural excellence across cloud, on-premise, and edge environments, with a specific focus on applications within the regional banking and financial services sector.
Key Responsibilities:
GenAI Architecture & Thought Leadership:
Serve as the ultimate technical authority for GenAI architecture across the enterprise, reviewing and guiding all AI/ML use cases within a matrixed organization.
Push the boundaries of our technical vision, acting as a forward-thinking catalyst for how GenAI is built and deployed.
Lead the architectural review process for all third-party AI integrations coming into the bank (e.g., ServiceNow, Five9, Pega), ensuring they meet strict security, performance, and integration standards.
Agentic Stack & AI Platform Engineering:
Spearhead the growth and development of our agentic stack, designing agentic frameworks that incorporate robust workflow (WF) logic.
Architect sophisticated retrieval systems and agent data stacks, utilizing vector databases, hybrid search, BM25, and graph-based reasoning.
Implement solutions for externalized long-term memory, contextual data freshness, and Model Context Protocol (MCP) servers.
Lead prompt and context engineering strategies to maximize model accuracy and reliability.
Infrastructure, Inference & Edge Computing:
Design, implement, and scale high-performance distributed systems and AI/ML platforms.
Optimize LLM inference, implementing advanced batching, caching strategies, and load balancing techniques.
Evaluate and implement dynamic deployment strategies, weighing the trade-offs of deploying small/local LLMs at the edge versus leveraging hyperscaler inferencing via cloud APIs.
Architect and test distributed API gateways across hybrid (cloud and on-premise) environments.
Oversee on-premise hardware strategy, including rigorous GPU management, utilization, and thermal/compute optimization.
Minimum Skills Required: Required Qualifications
- Engineering Foundation: 12-15 years experience with strong proficiency in at least one core programming language (e.g., Python, Go, C++) and deep experience building large-scale distributed systems.
- GenAI & LLM Expertise: 5-7 years hands-on, practitioner-level experience with LLM inference optimization, fine-tuning, and deployment strategies.
- Agentic Architectures: 3-5 years experience with a proven track record of building complex agentic systems, evaluation frameworks, and advanced retrieval pipelines (RAG, Vector DBs, Graph reasoning).
- Cloud & Infrastructure: 10-12 years extensive experience with Kubernetes, Cloud Infrastructure (AWS, Google Cloud Platform, or Azure), and managing high-availability platforms.
- Hardware / On-Premise Knowledge: 8-10 years experience and understanding of GPU orchestration, resource management, and hardware optimization in on-premise or hybrid data centers.
- Strategic Communication: 12-15 years experience and ability to navigate a matrixed organization, translate complex technical trade-offs to leadership, and rigorously evaluate third-party enterprise platforms.
Nice to Have
- Domain experience in the Banking or Financial Services industry.
- Interest or hands-on experience in integrating Blockchain technologies and decentralized frameworks.