Role: Data Engineer - Python/PySpark
Location: Irving TX (3 Days onsite/week)
Full Time
Job Description:
• Strong hands-on development experience in Python, PySpark, and SQL.
• Experience building large-scale ETL/ELT pipelines for structured and unstructured data.
• Deep understanding of Spark and distributed computing fundamentals (transformations, shuffles, optimization).
• Experience with big data frameworks such as Hadoop and Spark.
• Proficiency with Git-based repositories (Bitbucket / GitHub).
• Experience working with AWS, Azure, or Google Cloud Platform environments.
• Strong understanding of database design, data modeling, warehouse schemas (star/snowflake).
• Experience with CI/CD automation and pipeline development.
• Strong analytical and troubleshooting skills for resolving complex data issues.
• Ability to collaborate with cross-functional teams and convert business requirements into technical solutions.
• Design, develop, and maintain robust, scalable ETL/ELT pipelines.
• Write efficient, reusable, and scalable code in Python and PySpark for distributed data processing.
• Review existing data engineering code and identify opportunities for refactoring or performance improvement.
• Implement data validation, cleansing, reconciliation, and quality checks across the data lifecycle.
• Collaborate with IT and business stakeholders to understand data requirements and translate them into solutions.
• Monitor pipeline performance, troubleshoot failures, and optimize for latency, throughput, and cost.
• Participate in code reviews, enforce coding standards, and contribute to engineering best practices.
• Build and maintain CI/CD pipelines for testing, packaging, and deployment of data pipelines.
• Ensure data reliability, security, and consistency across environments.
• Work with cloud services and big data platforms to support modern data architecture.