Senior Machine Learning Engineer (AI/ML, PyTorch, TensorFlow, RAG, GPU computing, CUDA)
Washington DC-Baltimore, MD Area (Hybrid Onsite Job)
Fulltime / Permanent Position
Responsibilities:
Productionize AI models from research prototypes into scalable, deployable systems used in real world applications.
Develop, fine tune, and optimize models using PyTorch, TensorFlow, or Hugging Face Transformers, adapting both open and closed source models.
Implement model optimization techniques such as quantization, pruning, distillation, and hardware specific acceleration.
Engineer systems for dynamic model adaptation using low rank adaptation (LoRA), parameter efficient fine tuning (PEFT), and on device inference strategies.
Build and maintain Retrieval Augmented Generation (RAG) pipelines, including vector database integration for contextual retrieval.
Work with multi modal AI systems across computer vision, audio, and natural language domains.
Employ synthetic data generation and digital twinning techniques (GANs, diffusion models, or simulation based) to create robust datasets for edge cases.
Develop GPU accelerated and low-level system code in C, C++, or Rust for performance critical operations.
Optimize model execution for distributed and resource constrained environments, ensuring reliability under variable connectivity conditions.
Collaborate cross functionally with Infrastructure, MLOps, and Security teams to deliver secure, compliant, and high-performance AI solutions for government partners.
Qualifications:
Active US Security clearance or eligibility and willingness to obtain a US Security clearance
5+ years of experience in applied AI, ML engineering, or production AI systems.
Deep proficiency in PyTorch, TensorFlow, or Hugging Face Transformers.
Proven experience deploying AI models across cloud, edge, and mobile hardware environments.
Expertise in model compression and optimization (quantization, pruning, distillation).
Strong understanding of GPU computing, CUDA, and performance profiling.
Experience building RAG pipelines and integrating vector databases (e.g., FAISS, Milvus, Pinecone).
Familiarity with multi modal models and synthetic data generation methods.
Low level programming experience in C, C++, or Rust with understanding of computer memory and concurrency.
Strong algorithmic and problem-solving skills, especially in distributed or constrained compute environments.