Senior Data Engineer (Databricks / PySpark / AWS)
Location: New York, NY (local candidates only)
Work Model: 100% Onsite
Contract: Long-term contract (12+ months, open-ended) Right to Hire
Rate: $85 $100/hr (W2 only)
Employment Type: W2 only | No C2C or 3rd-party agencies
Security Requirement: L2 Security Clearance required
Top Required Skills: Databricks (SaaS), Python, PySpark, Advanced SQL, ETL/ELT
We are seeking a Senior Data Engineer to join a highly collaborative, onsite data engineering team supporting large-scale cloud migration and data platform modernization initiatives. This is a hands-on senior role requiring deep expertise in Databricks, PySpark, and SQL, with a strong focus on building, deploying, and optimizing AWS-based ETL pipelines. (End to End)
The ideal candidate is not only capable of executing solutions but can also evaluate existing architectures, identify gaps, and propose improved approaches using Databricks and modern data engineering best practices. This role operates in an Agile development environment and requires close collaboration with engineering peers and business stakeholders.
Key Responsibilities
- Design, develop, and maintain scalable ETL/ELT pipelines using Databricks and Apache Spark (PySpark)
- Lead and support the migration of applications and data pipelines from on-premises environments to AWS
- Build robust data ingestion workflows from structured, semi-structured, and unstructured data sources
- Write, optimize, and maintain complex SQL queries across RDBMS, data lakes, and federated data environments
- Analyze existing data solutions and recommend alternative designs or improvements aligned with performance, scalability, and maintainability goals
- Develop reusable frameworks and components to standardize data processing patterns
- Tune and optimize Spark jobs for performance and scalability in cloud environments
- Build, deploy, and support code across environments from development through production, using version control and CI/CD pipelines
- Implement and enforce best practices for data quality, validation, security, and governance
- Partner closely with data architects, analysts, and business stakeholders to translate data requirements into technical solutions
- Participate in Agile ceremonies and contribute to sprint planning, estimation, and delivery
- Troubleshoot, debug, and resolve production data pipeline issues
- Required Qualifications
- Strong hands-on experience with Databricks SaaS, Python, and PySpark (must be able to independently build ETL pipelines)
- Expert-level SQL skills, including writing and optimizing complex queries (SQL proficiency will be assessed during interviews)
- Experience building and supporting AWS-based ETL solutions, including Glue, EC2, EMR, and S3
- Solid understanding of Data Lake and Lakehouse architectures
- Experience working with RDBMS and large-scale analytical datasets
- Proven experience deploying code using version control and CI/CD pipelines
- Ability to work onsite full-time and collaborate closely with cross-functional teams
- Strong communication skills, positive attitude, and a solution-oriented mindset
- Candidates must be comfortable owning solutions end-to-end, from design through production support
Education
Bachelor s degree in Computer Science or equivalent professional experience
Preferred / Nice-to-Have Skills
- Scala
- Experience with Starburst or Trino and federated SQL architectures
- Containerization and orchestration tools such as Docker and Kubernetes
- AWS IAM, networking, and monitoring tools
- Infrastructure-as-code tools such as Terraform