Overview
Skills
Job Details
Job Title: GenAI Architect (Eval Framework)
Job Location: Fremont, CA, USA
Mandatory skills
Langfuse (including v3 features)
Evaluation SDK.
Azure AI services
LLMOps, prompt engineering, and GenAI lifecycle management.
Python
Hands-on experience with Langfuse (including v3 features) and integrations.
Experience with other GenAI observability tools (e.g., TruLens, W&B, Helicone).
Knowledge of Retrieval-Augmented Generation (RAG), fine-tuning, and multi-agent orchestration.
Strong understanding of Azure AI services, especially the Evaluation SDK.
Deep expertise in LLMOps, prompt engineering, and GenAI lifecycle management.
Proficiency in Python, TypeScript, or similar languages used in GenAI frameworks.
Experience with cloud-native architectures (Azure preferred).
Familiarity with Tracing tools, observability platforms, and evaluation metrics.
Excellent communication and documentation skills.
Key Responsibilities:
Set-up and deploy Langfuse v3 in production environment.
Architect and implement the upgrade of Langfuse v2 to v3 within the LamBots framework, ensuring backward compatibility and performance optimization
Design modular components for prompt management, tracing, metrics, evaluation, and playground features using Langfuse v3.
Leverage Langfuse s full feature set:
Prompt Management versioning, templating, and optimization
Tracing end-to-end visibility into GenAI workflows
Metrics performance, latency, and usage analytics
Evaluation automated and manual scoring of model outputs
Playground interactive testing and debugging of prompts
Integrate Azure AI Evaluation SDK into LamBots to enable scalable enterprise-grade evaluation pipelines/workflows, including:
Build reusable components and templates for evaluation across diverse GenAI use cases.
Collaborate with cross-functional teams to integrate evaluation capabilities into production pipelines/ systems.
Ensure scalability and reliability of evaluation tools in both offline and online environments.
Define and enforce evaluation standards and best practices for GenAI agents, RAG pipelines, and multi-agent orchestration.
Collaborate with product, engineering, and data science teams to align evaluation metrics with business KPIs.
Drive observability, debugging, and traceability features for GenAI workflows.
Stay current with emerging GenAI evaluation tools, frameworks, and methodologies.