Langfuse Evaluation Architect

Overview

On Site
Depends on Experience
Accepts corp to corp applications
Contract - W2
Contract - 24 Month(s)

Skills

Langfuse v3
Azure
Python
Architect

Job Details

Job Title: GenAI Architect (Eval Framework)

Job Location: Fremont, CA, USA

Mandatory skills

Langfuse (including v3 features)

Evaluation SDK.

Azure AI services

LLMOps, prompt engineering, and GenAI lifecycle management.

Python

Hands-on experience with Langfuse (including v3 features) and integrations.

Experience with other GenAI observability tools (e.g., TruLens, W&B, Helicone).

Knowledge of Retrieval-Augmented Generation (RAG), fine-tuning, and multi-agent orchestration.

Strong understanding of Azure AI services, especially the Evaluation SDK.

Deep expertise in LLMOps, prompt engineering, and GenAI lifecycle management.

Proficiency in Python, TypeScript, or similar languages used in GenAI frameworks.

Experience with cloud-native architectures (Azure preferred).

Familiarity with Tracing tools, observability platforms, and evaluation metrics.

Excellent communication and documentation skills.

Key Responsibilities:

Set-up and deploy Langfuse v3 in production environment.

Architect and implement the upgrade of Langfuse v2 to v3 within the LamBots framework, ensuring backward compatibility and performance optimization

Design modular components for prompt management, tracing, metrics, evaluation, and playground features using Langfuse v3.

Leverage Langfuse s full feature set:

Prompt Management versioning, templating, and optimization

Tracing end-to-end visibility into GenAI workflows

Metrics performance, latency, and usage analytics

Evaluation automated and manual scoring of model outputs

Playground interactive testing and debugging of prompts

Integrate Azure AI Evaluation SDK into LamBots to enable scalable enterprise-grade evaluation pipelines/workflows, including:

Build reusable components and templates for evaluation across diverse GenAI use cases.

Collaborate with cross-functional teams to integrate evaluation capabilities into production pipelines/ systems.

Ensure scalability and reliability of evaluation tools in both offline and online environments.

Define and enforce evaluation standards and best practices for GenAI agents, RAG pipelines, and multi-agent orchestration.

Collaborate with product, engineering, and data science teams to align evaluation metrics with business KPIs.

Drive observability, debugging, and traceability features for GenAI workflows.

Stay current with emerging GenAI evaluation tools, frameworks, and methodologies.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Laiba Technologies LLC