Overview
Skills
Job Details
Duration: 3+ months (Project will extend quarterly based on budget and performance)
in a nearby coworking space (such as Regus offices in your city), followed by remote work thereafter.
Note: Work Location: Two weeks of onsite work in a nearby coworking space (such as Regus offices in your city), followed by remote work thereafter.
Job Overview:
We are seeking a highly skilled and detail-oriented Data Engineer with strong expertise in SQL and Apache Spark to join our growing data team. In this role, you will be responsible for building scalable and efficient data pipelines, transforming large volumes of data, and supporting data analytics initiatives across the organization.
Key Responsibilities:
- Design, develop, and maintain robust data pipelines and ETL/ELT processes using Spark and SQL.
- Process and transform large datasets from various sources to ensure high performance and data quality.
- Optimize Spark jobs for performance, scalability, and cost-efficiency in distributed environments.
- Work closely with data analysts, data scientists, and business stakeholders to understand data requirements and deliver solutions.
- Build and manage data models and data marts in cloud data warehouses (e.g., Snowflake, Redshift, BigQuery).
- Ensure data accuracy, integrity, and availability across systems.
- Participate in code reviews, troubleshooting, and performance tuning of existing data processes.
- Maintain documentation for data flows, transformations, and processes.
Required Skills & Qualifications:
- 9+ years of hands-on experience as a Data Engineer or in a similar role.
- Strong proficiency in SQL for data transformation, querying, and performance tuning.
- Experience working with Apache Spark (PySpark or Scala) for large-scale data processing.
- Familiarity with data lakes, data warehouses, and cloud data platforms (AWS, Google Cloud Platform, Azure).
- Proficient in scripting languages such as Python or Scala.
- Solid understanding of data modeling concepts and data architecture.
- Experience with version control tools like Git.
- Strong analytical and problem-solving skills.
- Excellent communication and collaboration skills.
Preferred Qualifications:
- Experience with tools like Airflow or other workflow orchestration platforms.
- Knowledge of cloud-native services (e.g., AWS Glue, EMR, Databricks).
- Experience with real-time data streaming tools (Kafka, Spark Streaming) is a plus.
- Exposure to BI tools like Tableau, Power BI, or Looker.