Design, develop, and maintain scalable, high-performance data pipelines for ingestion, processing, and transformation.
Build and optimize ETL/ELT workflows handling 30+ TB of data each day.
Architect scalable and distributed solutions for data processing and storage.
Work closely with data scientists, analytics, and platform teams to deliver clean, reliable, and well-structured datasets.
Implement data quality validation, monitoring, and error-handling processes.
Optimize query and pipeline performance for large datasets.
Manage both structured and unstructured data from multiple sources.
Contribute to architectural discussions and recommendations to improve scalability and reliability.
Adhere to best practices in data governance, security, and lifecycle management.
Bachelor’s or Master’s degree in Computer Science, Information Systems, or related field.
Minimum 5+ years of experience in Data Engineering or related roles (16+ years overall experience preferred).
Proven expertise in building large-scale data pipelines and distributed data processing systems.
Strong programming skills in Python, Scala, or Java for data engineering applications.
Proficiency in SQL and experience with big data ecosystems (Spark, Hadoop, etc.).
Hands-on experience with data lakes, data warehouses, and ETL/ELT frameworks.
Familiarity with workflow orchestration tools like Airflow or similar frameworks.
Strong analytical and problem-solving skills, with attention to performance and data reliability.
Experience with cloud data platforms (AWS, Google Cloud Platform, or Azure).
Exposure to real-time/streaming platforms such as Apache Kafka.
Knowledge of containerization tools like Docker and Kubernetes.
Experience in financial analytics or supporting quantitative research environments.
Understanding of data governance, metadata management, and data cataloging tools.