Key Responsibilities:
Conversation Analytics & Insight Generation
· Design and implement analytics pipelines that extract meaningful patterns from large-scale chat conversation data
· Develop facet extraction approaches using LLMs to categorize conversations by request type, task performed, and topic discussed
· Build dashboards and reporting artifacts that communicate usage trends, emerging topics, and user behavior to stakeholders
· Identify and quantify shifts in conversation patterns over time to inform product roadmap and content strategy
· Translate analytical findings into actionable recommendations for platform improvement
Clustering & Unsupervised Learning
· Architect and optimize hierarchical clustering pipelines using density-based algorithms (e.g., HDBSCAN) to group conversations by semantic similarity
· Generate and manage text embeddings at scale using embedding models for downstream clustering and similarity tasks
· Design multi-level clustering strategies that produce both granular groupings and higher-order category taxonomies
· Evaluate cluster quality using persistence metrics, silhouette analysis, and domain-informed validation
· Experiment with clustering parameters, distance metrics, and dimensionality reduction techniques to improve grouping coherence
Data Engineering & Pipeline Development
· Build and maintain data pipelines using Python for ingesting, transforming, and analyzing conversation datasets
· Develop automated workflows using cloud-native orchestration and compute services to run analytics at scale on scheduled cadences
· Work with object storage, search engines, and relational databases to store and query analytical outputs
· Implement caching, batching, and incremental processing strategies to handle large embedding and clustering workloads efficiently
· Maintain reproducible analysis environments and version analytical artifacts (models, cluster outputs, embeddings)
LLM-Assisted Analysis
· Design and refine LLM prompts for facet extraction, cluster labeling, and conversation summarization
· Evaluate LLM output quality for analytical tasks and iterate on prompt strategies to improve accuracy
· Leverage model infrastructure for embedding generation and LLM inference
· Explore emerging techniques in LLM-driven data analysis, topic modeling, and automated insight generation
Quality & Testing
· Develop evaluation frameworks to measure clustering quality, facet extraction accuracy, and analytical pipeline correctness
· Build automated regression tests to detect drift in clustering outputs or degradation in categorization quality
· Validate analytical results against known baselines and domain expertise
· Document methodologies, assumptions, and limitations of analytical approaches
Security & Compliance
· Assist with adherence to technology policies and comply with all security controls
· Implement secure coding practices, particularly in handling personally identifiable information (PII) and sensitive data
· Participate in threat modeling and security discussions for API and infrastructure components
· Understand and apply organizational security standards and best practices