Key Responsibilities:
• Design, develop, and maintain scalable data pipelines using Python and PySpark
• Build and optimize ETL/ELT processes in Databricks
• Implement data transformation and ingestion workflows
• Work with structured and unstructured datasets
• Ensure data quality, integrity, and compliance with IRS data governance policies
• Collaborate with data scientists and analysts to support analytics use cases
• Optimize performance of large-scale distributed data systems
• Support cloud-based data architecture initiatives
Required Qualifications:
• 5+ years of experience in Data Engineering
• Strong hands-on experience with Python, PySpark, Databricks
• Experience with large-scale distributed data processing
• Experience working in a federal government environment (IRS preferred)
• Knowledge of data warehousing concepts and data modeling
• Experience with SQL and relational databases
• Familiarity with cloud platforms (Azure preferred)
Preferred Qualifications:
• Experience supporting IRS modernization programs
• Knowledge of federal data security and compliance standards
• Experience with CI/CD pipelines and DevOps practices