Langfuse GenAI Architect

Overview

On Site
Depends on Experience
Full Time

Skills

Evaluation
LANGFUSE V3
Python

Job Details

Title: GenAI Architect (Eval Framework)
Location: Fremont, CA, USA
Mandatory skills
  • Langfuse (including v3 features)

  • Evaluation SDK.

  • Azure AI services

  • LLMOps, prompt engineering, and GenAI lifecycle management.

  • Python

Hands-on experience with Langfuse (including v3 features) and integrations.
Experience with other GenAI observability tools (e.g., TruLens, W&B, Helicone).
Knowledge of Retrieval-Augmented Generation (RAG), fine-tuning, and multi-agent orchestration.
Strong understanding of Azure AI services, especially the Evaluation SDK.
Deep expertise in LLMOps, prompt engineering, and GenAI lifecycle management.
Proficiency in Python, TypeScript, or similar languages used in GenAI frameworks.
Experience with cloud-native architectures (Azure preferred).
Familiarity with Tracing tools, observability platforms, and evaluation metrics.
Excellent communication and documentation skills.
Key Responsibilities:
Set-up and deploy Langfuse v3 in production environment.
Architect and implement the upgrade of Langfuse v2 to v3 within the LamBots framework, ensuring backward compatibility and performance optimization
Design modular components for prompt management, tracing, metrics, evaluation, and playground features using Langfuse v3.
Leverage Langfuse s full feature set:
Prompt Management versioning, templating, and optimization
Tracing end-to-end visibility into GenAI workflows
Metrics performance, latency, and usage analytics
Evaluation automated and manual scoring of model outputs
Playground interactive testing and debugging of prompts
Integrate Azure AI Evaluation SDK into LamBots to enable scalable enterprise-grade evaluation pipelines/workflows, including:
Build reusable components and templates for evaluation across diverse GenAI use cases.
Collaborate with cross-functional teams to integrate evaluation capabilities into production pipelines/ systems.
Ensure scalability and reliability of evaluation tools in both offline and online environments.
Define and enforce evaluation standards and best practices for GenAI agents, RAG pipelines, and multi-agent orchestration.
Collaborate with product, engineering, and data science teams to align evaluation metrics with business KPIs.
Drive observability, debugging, and traceability features for GenAI workflows.
Stay current with emerging GenAI evaluation tools, frameworks, and methodologies.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Laiba Technologies LLC