Overview
Skills
Job Details
Skills and Qualifications
Experience in data engineering, ML platforms, or backend systems.
Strong experience with Python, SQL, and modern data frameworks (e.g., Apache Spark, Airflow, dbt, Kafka).
Familiarity with LLM-based agent frameworks (e.g., LangChain, LlamaIndex, AutoGen, CrewAI).
Experience with vector databases (e.g., Pinecone, Weaviate, FAISS) and semantic search.
Experience building event-driven architectures (Kafka, Pub/Sub, or similar).
Understanding of LLM orchestration, prompt engineering, and tool-use paradigms.
Experience working with cloud platforms (AWS, Google Cloud Platform, or Azure).
Communication Skills: Excellent written and verbal communication skills in English.
Problem Solving Skills: Strong analytical & problem-solving skills to identify data issues and performance bottlenecks.
Key Responsibilities
Design and implement scalable data ingestion and transformation pipelines (real-time and batch) to support autonomous AI agents. This includes data from third-party APIs, ServiceNow, Google GA4, JIRA, Zendesk, and other complex systems.
Collaborate on developing Data Quality and Ops Platforms for real-time data anomaly detection and data profiling.
Develop and maintain agent memory architecture (vector stores, knowledge graphs, temporal data stores).
Integrate LLMs (e.g., OpenAI, Claude, Gemini) and agent frameworks (LangChain, AutoGen, CrewAI, MetaGPT).
Build and manage APIs, event buses, or pub/sub systems that enable agent-to-agent communication and coordination.
Optimize platform performance, including data indexing, latency reduction, and load balancing for multi-agent orchestration.
Develop tools and dashboards for platform observability, including agent metrics, prompt lifecycle tracking, and model drift.
Implement security, data governance, and versioning practices in agent workflows and interactions.
Collaborate with AI research, backend, and product teams to ship features powered by multi-modal, memory-augmented agents.
Develop CI/CD processes for continuous delivery in AWS and Snowflake Cloud.
Support and Troubleshoot: Perform production troubleshooting and resolution, focusing on observability to detect issues in advance.