Role: Data Engineer
Location: New York (100% onsite) (need local to NY)
Duration: Long term Project
Must have:
Languages & Scripting: Spark, Python, Java, Scala, Hive, Kafka, SQL
Cloud Platforms: AWS OR Google Cloud Platform
Data Warehousing & Analytics: Redshift or Snowflake or Big Query
Data Integration & ETL: AWS Glue, Aws EMR, Spark, Data Bricks
CI/CD: AWS Code Pipeline, Jenkins, CloudFormation, Docker, Kubernetes
JD :
· Results-driven Data Engineer with a decade of expertise in Data engineering across cloud platforms with a total of 12 years in IT.
· Extensive experience utilizing Google Cloud Platform (Google Cloud Platform) services, including BigQuery, Dataflow, Dataprep, and Pub/Sub, for data engineering solutions.
· Proficient in building and managing Google Cloud Platform data pipelines with tools like Cloud Composer and Cloud Dataflow.
· Proven ability in developing and deploying applications on Google Kubernetes Engine (GKE).
· Strong background in implementing security and compliance on Google Cloud Platform, ensuring data privacy and regulatory adherence.
· Track record of optimizing cost and resource usage within Google Cloud Platform environments.
· Skilled in AWS services such as Amazon EMR, Redshift, and Glue for efficient data processing.
· Expertise in architecting scalable, cost-effective solutions on AWS, with proficiency in configuring AWS Lambda for serverless computing.
· Adept at setting up AWS Kinesis streams to process real-time data, enhancing system responsiveness and data-driven decision-making.
· Proficient in leveraging AWS DynamoDB to create scalable, low-latency NoSQL databases for dynamic applications.
· Deep expertise in optimizing and managing Amazon Redshift data warehouses to deliver high-performance analytics and business insights.
· Experienced in integrating AWS services into CI/CD pipelines, streamlining automation for continuous integration, delivery, and deployment.
· Skilled in setting up and securing AWS Virtual Private Cloud (VPC) environments.
· Proficient in managing Azure virtual machines (VMs) for cloud infrastructure operations.
· Extensive experience managing on-premises data infrastructure, including data warehouses and databases.
· Familiar with AWS DevOps practices for continuous integration and deployment.
· Expertise in using Git for version control in DBT projects, ensuring proper tracking and documentation of data model changes.
· Skilled in performance optimization and tuning of on-premises data systems.
· Proficient in data migration strategies between on-premises and cloud environments.
· Strong troubleshooting skills in resolving issues within on-premises data systems.
· Proven ability to maintain high availability and disaster recovery solutions in on-premises environments.
· Experienced in implementing CI/CD pipelines using tools like Jenkins and GitLab CI/CD.
· Adept in automated testing processes, including unit, integration, and regression testing.
· Skilled in gathering and analyzing project requirements to ensure alignment with business goals.
· Experienced in Agile project management, contributing to successful outcomes through data-driven analytics and collaborative teamwork