Overview
Skills
Job Details
Role Generative AI Engineer
Location - Remote(NJ)
W2 Only
Role Overview:
Join an emerging R&D team building a production-grade Generative AI platform, You will develop robust microservices and Retrieval-Augmented Generation (RAG) pipelines that power a wide range of critical applications, from regulatory-grade document quality control to agentic assistants for sales and medical claims reasoning. This role is essential for creating reliable, scalable AI systems that augment the work of our subject matter experts and drive innovation across the organization.
Key Responsibilities:
High-Throughput RAG Pipeline Development:
- Design and build scalable document processing pipelines to ingest and semantically chunk large batches of documents (PDF/DOCX) from sources like Azure Blob and AWS S3.
- Integrate embedding models and tune vector databases like Milvus for high-performance, sub-100 ms k-NN retrieval.
- Implement hybrid retrieval systems using BM25 and vector search, and continually track and improve retrieval performance using metrics like MRR and recall@k.
Model Fine-Tuning & Prompt Engineering:
- Apply large language models (LLMs) and NLP techniques to solve complex problems such as named-entity recognition, question answering, and summarization.
- Build fine-tuning pipelines using frameworks like LoRA/PEFT and run hyperparameter sweeps in Azure ML.
- Author multi-step prompt chains, enforce structured JSON outputs, and use validation guards to reduce hallucinations and improve model consistency.
MLOps & Production Deployment:
- Develop and containerize agent-based microservices using frameworks like FastAPI or Azure Functions.
- Define Infrastructure as Code using Terraform/ARM and build CI/CD workflows in GitHub Actions for automated testing and canary rollouts.
- Implement robust monitoring and alerting for latency (p50/p95) and error rates using tools like Prometheus, Grafana, or Azure Monitor to ensure SLA compliance.
Performance, Cost & Standards:
- Profile API calls and implement cost-reduction strategies like batching, caching, and early-stop logits.
- Produce high-quality documentation, including architecture diagrams, sequence flows, and data schemas.
- Enforce security and compliance standards, including data encryption and PII redaction, to align with HIPAA/GxP requirements.
Required Qualifications:
BS/MS in Computer Science, AI/ML, or a related field.
3+ years of experience building end-to-end LLM/RAG systems in a production environment.
Deep Python experience, including libraries like FastAPI, pandas, and NumPy.
Hands-on experience with LLM orchestration frameworks (LangChain, LlamaIndex), NLP libraries (HuggingFace), and OpenAI/Azure SDKs.
Proven expertise in MLOps including CI/CD (GitHub Actions/Azure DevOps) and containerization (Docker/Kubernetes).
Preferred Qualifications (Nice-to-Haves):
- Experience working in a regulated industry such as pharmaceuticals or life sciences.
- Hands-on experience with vector databases like Milvus or Pinecone.
- Familiarity with chatbot frameworks like Rasa or Botpress.
- Experience with data-centric AI tools for validation and monitoring, such as Great Expectations or Deepchecks.