Title: Big Data Engineer (W2 only)
Duration: 6+ months
Work location: Rockville, MD or Tysons Corner, VA
Any Visa
Mode of Interview: 2 rounds of Video calls
Seeking a highly skilled and experienced Big Data Engineer to design, build, and optimize large-scale data platforms and distribute processing systems at our FinTech customer. This role is critical in enabling data-driven decision-making across the organization by delivering scalable, reliable, and high-performance data solutions.
The ideal candidate has deep expertise in distributed computing, cloud platforms, and modern big data technologies such as Apache Spark, Hadoop, Hive, and Trino. This individual will work closely with data scientists, analysts, product teams, and engineering stakeholders to architect and implement robust data pipelines and enterprise-grade data platforms. The role also requires strong software engineering practices, AI-assisted development proficiency, and the ability to optimize systems handling petabyte-scale data.
Responsibilities
Design, develop, and maintain large-scale data pipelines using modern big data technologies such as Spark, Hadoop, Hive, and Trino.
Build scalable and reliable solutions for data ingestion, transformation, storage, and analytics.
Architect distributed data platforms capable of processing massive (petabyte-scale) datasets.
Optimize and enhance existing data pipelines for performance, scalability, cost efficiency, and reliability.
Implement automated testing frameworks and continuous validation for data quality and pipeline accuracy.
Develop unit, integration, and end-to-end test strategies for data platforms.
Collaborate with cross-functional teams to translate business requirements into scalable data solutions.
Support data scientists and analytics teams by delivering high-quality, production-ready datasets.
Monitor, troubleshoot, and resolve data pipeline issues in production environments.
Investigate and resolve challenges such as data skew, resource constraints, job failures, and large-scale system bottlenecks.
Apply Spark tuning techniques including partitioning, caching, broadcast joins, and performance optimization.
Ensure strong software engineering practices, including version control, code quality, and CI/CD automation.
Stay current with emerging big data, cloud, and AI technologies to continuously improve data architecture.
Drive AI-enabled development practices, including prompt engineering, AI-assisted coding, and workflow optimization.
Partner with stakeholders to ensure regulatory, governance, and financial data integrity requirements are met.
Qualifications
Required:
Bachelor s degree in computer science, Information Systems, or a related discipline, or equivalent practical experience.
5+ years of experience designing and implementing big data and distributed systems.
Strong expertise in Apache Spark and its architecture (executors, stages, DAG, tasks).
Hands-on experience with big data technologies such as Hadoop, Hive, and Trino.
Strong proficiency in Python, Scala, or Java with a focus on scalable and modular code.
Extensive experience writing advanced SQL queries including window functions, complex joins, and aggregations.
Experience working with large-scale datasets and troubleshooting performance or scalability challenges.
Hands-on experience with cloud platforms such as AWS, including S3, EMR, Glue, Lambda, and Athena.
Experience designing and maintaining production ETL and data processing systems.
Strong understanding of distributed system performance tuning and resource optimization.
Experience implementing CI/CD pipelines and automated testing in data engineering environments.
Strong understanding of Agile methodologies such as Scrum and Kanban.
Excellent communication and collaboration skills.
Ability to work in fast-paced, dynamic environments and manage competing priorities.