Overview
Skills
Job Details
Job Description:
We are seeking a highly experienced and skilled Senior Python/DAG Developer to join our data engineering team. In this role, you will be responsible for designing, developing, and maintaining complex data pipelines and workflows. The ideal candidate will have a deep understanding of data orchestration principles, extensive experience with Python, and a proven track record of building robust and scalable Directed Acyclic Graphs (DAGs) using tools like Apache Airflow. You will be a key player in our effort to build the next generation of data infrastructure, ensuring data is processed efficiently and reliably across the organization.
Key Responsibilities:
Design & Development: Architect, build, and maintain efficient and scalable data pipelines using Python and DAG-based orchestration tools (e.g., Apache Airflow, Dagster, Prefect).
Orchestration: Develop, schedule, and monitor complex data workflows, ensuring timely and accurate data delivery for business intelligence, analytics, and machine learning initiatives.
Optimization: Identify performance bottlenecks and refactor data pipelines to improve efficiency, reliability, and cost-effectiveness.
Collaboration: Work closely with data scientists, analysts, and other engineers to understand data requirements and deliver solutions that meet business needs.
Code Quality: Uphold and promote best practices in coding, including code reviews, documentation, and automated testing to ensure the long-term maintainability of data pipelines.
Troubleshooting: Diagnose and resolve issues within data pipelines and orchestration systems, responding to incidents and minimizing downtime.
Mentorship: Act as a subject matter expert and mentor junior developers, sharing knowledge of best practices in Python development and data engineering.
Required Qualifications:
Experience:
Minimum of 7 years of professional experience in software development, with a strong focus on Python for data engineering and ETL (Extract, Transform, Load) processes.
Python: Expert-level proficiency in Python, including writing clean, well-documented, and production-ready code.
DAGs & Orchestration: Extensive hands-on experience (at least 3-5 years) designing, implementing, and managing data pipelines using DAG-based orchestration platforms like Apache Airflow. A strong understanding of Airflow concepts (operators, sensors, hooks, XComs) is essential.
Database Skills: Solid experience with SQL and relational databases (e.g., PostgreSQL, MySQL). Experience with NoSQL databases and data warehouses (e.g., Snowflake, BigQuery) is a plus.
Cloud Platforms: Proven experience working with at least one major cloud provider (AWS, Google Cloud Platform, or Azure), including familiarity with their data-related services (e.g., S3, Cloud Storage, EMR, Dataproc).
Data Formats: Experience with various data formats (e.g., Parquet, Avro, JSON) and data transformation techniques.
Version Control: Strong knowledge of Git and collaborative development workflows.
Problem-Solving: Excellent analytical and problem-solving skills with a meticulous attention to detail.
Preferred Qualifications:
Experience with streaming data technologies (e.g., Kafka, Spark Streaming, Flink).
Knowledge of containerization technologies (Docker, Kubernetes).
Experience with CI/CD pipelines for data engineering workflows.
Familiarity with data governance and security best practices.
Bachelor's or Master's degree in Computer Science, Engineering, or a related field