Senior LLM Engineer (NLP Specialist)
Key Responsibilities
1. LLM Development & Optimization
Fine-tune and deploy large language models (GPT, BERT, T5, etc.) and NLP-related tasks with vision language models (Florence-2, PaliGemma) for domain-specific tasks such as text classification, summarization, and entity extraction.
Advanced NLP techniques, ensuring accurate text recognition of plan annotations (e.g., pipe materials, dimensions).
2. Annotation Workflow Integration
Design custom techniques effective annotations to thereby achieve desired project goal.
Automate or streamline annotation tasks wherever possible (e.g., partial auto-labeling) to reduce manual effort and error rates.
3. Experimentation & Evaluation
Establish robust evaluation metrics (Perplexity,Precision, recall, F1-score, Levheinstein distance, character array, Bleu score ) for NLP components, including text extraction quality.
Set up an iterative experimentation framework to track model versioning, data changes, and performance gains over time.
4. Data Management & Accountability
Coordinate with the Project Manager and AI Engineer to ensure new data (annotated or collected) is properly versioned and accessible.
Implement best practices for dataset growth, including tracking who annotated or curated the data and how these changes affect model performance.
5. Deployment & Scalability
Integrate with existing infrastructure and pipelines, ensuring minimal disruption to ongoing model development.
Optimize model sizes and queries to reduce latency between input and output (quantization, batch optimization, etc.)
Required Qualifications
Education & Experience
Master s/Phd in Computer Science/Engineering, Computational Linguistics.
Intensive hands on experience on custom LLM.
Technical Expertise
Proficiency in Python and deep learning frameworks (PyTorch preferred).
Proven track record of deploying NLP or LLM systems in production (Cloud or on-prem).
Solid understanding of tokenization, embedding techniques, and advanced fine-tuning strategies.
Experience in leveraging open, closed, or custom annotation tools (for example- labelStudio) to coordinate between multiple annotation formats and annotation teams