GenAI/LLM Engineer

Overview

Hybrid

Depends on Experience

Accepts corp to corp applications

Contract - Independent

Contract - W2

Contract - 12 Month(s)

No Travel Required

Skills

PyTorch/TensorFlow

LoRA/QLoRA fine-tuning

LLMOps workflows

Job Details

GenAI/LLM Engineer

Location: Remote/candidate should be in the PST time zone

Top Skills Needed: PyTorch/TensorFlow, LoRA/QLoRA fine-tuning, Prompt engineering, Model compression (quantization/pruning), Retrieval-augmented generation (RAG), Hugging Face Transformers, Multi-GPU training, Memory optimization techniques, LLMOps workflows

Implementing GenAI requires specialized expertise in large language models. Traditional data scientists often haven't had the opportunity to dive deep into the practical intricacies of LLMs particularly advanced fine-tuning techniques, model compression strategies, memory optimization approaches, and specialized training workflows. This role requires a hands-on deep learning practitioner comfortable with modern frameworks and libraries specific to LLM development.

Enables domain-specific fine-tuningof models to Company's unique utility context
Improves model performance while reducing computational coststhrough advanced optimization techniques
Creates Company-specific AI capabilitiesthat address our unique operational challenges
Enables the CoE to move beyond generic AI tools to customized solutionsthat deliver higher business value

Key Responsibilities:

Implement and optimize advanced fine-tuning approaches (LoRA, PEFT, QLoRA) to adapt foundation models to Company s domain
Develop systematic prompt engineering methodologies specific to public sector and utility operations, regulatory compliance, and technical documentation
Create reusable prompt templates and libraries to standardize interactions across multiple LLM applications and use cases
Implement prompt testing frameworks to quantitatively evaluate and iteratively improve prompt effectiveness
Establish prompt versioning systems and governance to maintain consistency and quality across applications
Apply model customization techniques like knowledge distillation, quantization, and pruning to reduce memory footprint and inference costs
Tackle memory constraints using techniques such as sharded data parallelism, GPU offloading, or CPU+GPU hybrid approaches
Build robust retrieval-augmented generation (RAG) pipelines with vector databases, embedding pipelines, and optimized chunking strategies
Design advanced prompting strategies including chain-of-thought reasoning, conversation orchestration, and agent-based approaches
Collaborate with the MLOps engineer to ensure models are efficiently deployed, monitored, and retrained as needed

Expected Skillset:

Deep Learning & NLP: Proficiency with PyTorch/TensorFlow, Hugging Face Transformers, DSPy, and advanced LLM training techniques
GPU/Hardware Knowledge: Experience with multi-GPU training, memory optimization, and parallelization strategies
LLMOps: Familiarity with workflows for maintaining LLM-based applications in production and monitoring model performance
Technical Adaptability: Ability to interpret research papers and implement emerging techniques (without necessarily requiring PhD-level mathematics)
Domain Adaptation: Skills in creating data pipelines for fine-tuning models with utility-specific content

Thanks,

Vinutha

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share