Job DescriptionKey Responsibilities
- Design, build, and maintain scalable batch and streaming data pipelines for ingesting and processing large datasets
- Transform, model, and optimize data for analytics, reporting, and downstream applications
- Implement data validation, monitoring, and security controls to ensure data quality, reliability, and compliance
- Contribute to the design and evolution of the data platform with a focus on scalability, performance, and maintainability
- Collaborate with BI, analytics, AI, and product teams to deliver data solutions aligned with business needs
- Develop automated workflows and observability mechanisms to ensure pipeline reliability and system visibility
- Create and maintain documentation for pipelines, data models, and platform components
- Evaluate and improve tools, frameworks, and processes to enhance efficiency and maintainability
Required Qualifications
Strong experience with Databricks, Apache Spark, and PySpark for large-scale data processing
Experience building and optimizing data pipelines at scale, including parallelization and performance tuning
Experience with near real-time or streaming data systems
Proficiency in Python and SQL for data engineering and transformation workflows
Experience with ETL/ELT processes and tools
Hands-on experience with cloud data platforms (Azure, AWS, or Google Cloud Platform)
Solid understanding of data modeling and dataset design for analytics and downstream applications
Experience tuning queries and optimizing compute performance
Knowledge of data governance, security, and compliance practices
- Strong communication skills and ability to work cross-functionally
Preferred Qualifications
Experience with cloud platforms (Azure, AWS, or Google Cloud Platform)
Experience with vector databases and embedding-based systems
Experience with streaming frameworks and data quality tools
Familiarity with knowledge graphs and graph-based data modeling
Experience with CI/CD pipelines and deployment automation
Familiarity with BI tools and machine learning pipelines
Education & Experience
Bachelor's degree or equivalent experience.
3-7 years of data engineering experience.