Overview
Skills
Job Details
Role: AI SME/Architect with AWS Experience
Location: NYC, NY (Hybrid, 3 days onsite)
Duration: Long Term Contract
Our challenge
We are seeking an exceptional hands-on Technical Lead to spearhead our enterprise GenAI engineering program. This is a unique opportunity for a seasoned technologist who combines deep AI/ML expertise with practical engineering skills to build and operationalize cutting-edge generative AI solutions. Candidate will lead the development of AI agents and platforms while remaining deeply involved in the technical implementation.
Responsibilities:
Technical Leadership & Development
- Lead the design, development, and deployment of enterprise-scale GenAI solutions using a hybrid of custom developed solutions and open-source platforms (Dify, OpenWebUI, etc.)
- Architect and implement AI agents using Python frameworks including LlamaIndex and LangGraph
- Drive hands-on development while providing technical guidance to the engineering team
- Establish best practices for GenAI development, deployment, and operations
AI/ML Engineering
- Design and implement LLM-based solutions with deep understanding of model architectures, fine-tuning, and prompt engineering
- Apply classical machine learning techniques where appropriate to complement GenAI solutions
- Optimize AI pipelines for performance, cost, and scalability
- Implement RAG (Retrieval Augmented Generation) patterns and vector databases
Context Engineering & Advanced RAG
- Design and implement sophisticated context engineering strategies for optimal LLM performance
- Build advanced RAG systems including multi-hop reasoning, hybrid search, and re-ranking mechanisms
- Develop agentic RAG architectures where agents dynamically query, synthesize, and validate information
- Implement context window optimization techniques and dynamic context selection strategies
- Create self-improving RAG systems with feedback loops and quality assessment
LLM Optimization & Fine-tuning
- Lead fine-tuning initiatives for domain-specific LLMs using techniques like LoRA, QLoRA, and full fine-tuning
- Implement performance optimization strategies including quantization, pruning, and distillation
- Design and execute benchmark suites to measure and improve model performance
- Optimize inference latency and throughput for production workloads
- Implement prompt optimization and few-shot learning strategies
Platform & Infrastructure
- Design event-driven architectures for asynchronous AI processing at scale
- Build and deploy containerized AI applications using Kubernetes on AWS
- Implement AWS services (SageMaker, Bedrock, Lambda, EKS, SQS, SNS, etc.) for AI workloads
- Establish CI/CD pipelines for AI model and application deployment
Security & Governance
- Implement secure design principles for AI systems including data privacy and model security
- Establish AI security frameworks covering prompt injection prevention, model access controls, and data governance
- Ensure compliance with enterprise security standards and AI ethics guidelines
- Design audit trails and monitoring for AI system behavior
Requirements:
AI/ML Expertise
- Deep understanding of LLMs: Architecture, training, fine-tuning, and deployment strategies
- Context engineering proficiency: Expert-level understanding of context window management, prompt engineering, and context optimization techniques
- Advanced RAG implementation: Hands-on experience building sophisticated RAG systems with hybrid search, metadata filtering, and agentic capabilities
- Fine-tuning expertise: Proven experience fine-tuning LLMs for specific domains using modern techniques (LoRA, PEFT, etc.)
- Performance optimization: Track record of optimizing LLM inference for latency, throughput, and cost
- Classical ML proficiency: Strong foundation in traditional machine learning algorithms and applications
- Python mastery: Expert-level Python with extensive experience in ML libraries (PyTorch, TensorFlow, Pandas, NumPy)
- GenAI frameworks: Hands-on experience with LlamaIndex, LangChain, LangGraph, or similar frameworks
- Open-source GenAI platforms: Experience with Dify, OpenWebUI, or comparable platforms
- Engineering Excellence
- Cloud architecture: Proven experience designing and implementing AWS solutions using multiple services
- Event-driven systems: Expertise in asynchronous, event-driven architectures for scalable AI processing
- Containerization: Advanced knowledge of Docker, Kubernetes, and container orchestration
- DevOps/MLOps: Experience with CI/CD, infrastructure as code, and ML model lifecycle management
- Security & Enterprise Standards
- Secure development: Strong understanding of secure coding practices and security design patterns
- AI security: Knowledge of AI-specific security concerns (adversarial attacks, data poisoning, prompt injection)
- Enterprise integration: Experience with enterprise authentication, authorization, and compliance requirements
Leadership & Communication
- 12+ years of hands-on technical experience with at least 5 years in AI/ML
- Proven track record of leading technical teams while remaining hands-on
- Excellent communication skills to articulate complex technical concepts to diverse stakeholders
- Experience working in enterprise environments with multiple stakeholders
Preferred Qualifications
- Experience with multi-agent systems and agent orchestration
- Knowledge of vector databases (Qdrant, OpenSearch, pgvector)
- Expertise in embedding models and semantic search optimization
- Contributions to open-source AI/ML projects
- Experience with model quantization and edge deployment
- Knowledge of graph-based RAG and knowledge graph integration
- Certifications in AWS, Kubernetes, or ML platforms