Job Title: Data Engineer / AI Engineer (Agentic AI Platform – Financial Data)
Location: Philadelphia, PA (Hybrid)
Duration: 12+ months contract
About the Role:
We are building a platform that converts unstructured financial data (emails, corporate actions, index announcements) into high-quality, structured datasets used by financial institutions.
This is not a typical “LLM wrapper” role.
You will work on systems that:
- Extract data from noisy, inconsistent sources
- Validate and reconcile outputs across multiple inputs
- Ensure correctness, traceability, and auditability
The challenge is not just applying LLMs—it’s making them reliable in production for financial workflows.
What You’ll Work On
- Designing pipelines that process high-volume financial documents (batch + near real-time)
- Building LLM-powered extraction workflows (classification, parsing, summarization)
- Implementing validation layers (rule-based + model-based) to reduce hallucinations
- Developing retrieval systems using embeddings and vector search
- Architecting end-to-end systems: ingestion → processing → storage → serving
- Ensuring data quality, observability, and fault tolerance
- Collaborating with product to turn messy data into usable financial intelligence
Core Requirements
- Strong Python and backend/data engineering experience
- Experience building production data pipelines (ETL, streaming, or async systems)
- Solid understanding of distributed systems and failure modes
- Experience working with LLM-based systems in production:
- Prompt design
- Output validation
- Retry/fallback strategies
- Evaluation and monitoring
- Experience with data storage systems (SQL + NoSQL)
- Familiarity with cloud infrastructure (AWS or similar)
Preferred Experience
- Experience with RAG / vector search systems
- Background in financial data or capital markets
- Experience with streaming systems (Kafka, etc.)
- Experience building multi-step or agent-style workflows
What Makes This Role Interesting
- Work on high-accuracy AI systems where correctness matters
- Solve real problems around:
- LLM reliability and hallucination mitigation
- Data consistency across conflicting sources
- Real-time vs correctness tradeoffs
- Build systems used in financial decision-making workflows
- High ownership over core architecture in an early-stage environment
Nice to Know (but not required)
- Experience with orchestration tools (Airflow, etc.)
- Exposure to evaluation frameworks for LLMs
- Experience working with large-scale document processing
Tech Stack (Representative, not exhaustive)
- Python, APIs, async processing
- LLM APIs + embeddings
- SQL / NoSQL databases
- Cloud infrastructure (AWS)
- Data pipelines and streaming systems
- Vector Databases
Best Regards,
Ashish Singh
Truehire Staffing,
5900, Balcones Drive Suit 100, Austin, TX, 78731
Email ID:
Web: