Apply Now

Big Data Engineer- W2 Only

Rockville, MD, US • Posted 14 hours ago • Updated 14 hours ago

Contract W2

12 Months

No Travel Required

Able to Sponsor

On-site

Depends on Experience

NGTalentTech Group LLC

Fitment

Dice Job Match Score™

👾 Reticulating splines...

Job Details

Skills

Big Data

Summary

Role: Big Data Engineer
Location: Rockville, MD(Onsite)
Duration: 12+ Months with possible extension

Job Description Summary
We are seeking a highly skilled and experienced Big Data Engineer to design, develop, and optimize large-scale data processing systems. In this role, you will work closely with cross-functional teams to architect data pipelines, implement data integration solutions, and ensure the performance, scalability, and reliability of big data platforms. The ideal candidate will have deep expertise in distributed systems, cloud platforms, and modern big data technologies such as Hadoop, Spark, and Kubernetes-based orchestration.

Responsibilities:
Design, develop, and maintain large-scale data processing pipelines using Big Data technologies (e.g., Hadoop, Spark, Python, Scala).
Architect and deploy containerized big data workloads on Amazon EMR on EKS (Elastic Kubernetes Service).
Design and implement Kubernetes-based infrastructure for running Spark applications at scale.
Implement data ingestion, storage, transformation, and analysis solutions that are scalable, efficient, and reliable.
Stay current with industry trends and emerging Big Data technologies to continuously improve the data architecture.
Collaborate with cross-functional teams to understand business requirements and translate them into technical solutions.
Optimize and enhance existing data pipelines for performance, scalability, and reliability.
Develop automated testing frameworks and implement continuous testing for data quality assurance.
Conduct unit, integration, and system testing to ensure the robustness and accuracy of data pipelines.
Work with data scientists and analysts to support data-driven decision-making across the organization.
Ability to write and maintain automated unit, integration, and end-to-end tests.
Monitor and troubleshoot data pipelines in production environments to identify and resolve issues.
Manage Kubernetes clusters, pods, services, and deployments for big data workloads.

Essential Technical Skills:
AI Tool Proficiency:
Hands-on experience with AI development tools (GitHub Copilot, Q Developer, ChatGPT, Claude, etc.)

Big Data Technologies:
Experience with Big data technologies such as Hadoop, Spark, Hive & Trino
Understanding of common issues like data skew and strategies to mitigate it, working with massive data volumes in PetaBytes, and troubleshooting job failures due to resource limitations, bad data, and scalability challenges.
Real-world experience with debugging and mitigation strategies.

Container Orchestration & Kubernetes:
Strong experience with Kubernetes architecture, concepts, and operations (pods, services, deployments, namespaces, ConfigMaps, Secrets)
Hands-on experience with Amazon EMR on EKS (Kubernetes) for running Apache Spark workloads
Experience with Kubernetes resource management, scheduling, and auto-scaling
Knowledge of Helm charts for deploying and managing applications on Kubernetes
Understanding of Kubernetes networking, storage (PVs, PVCs), and security best practices
Experience with kubectl and Kubernetes YAML manifests
Ability to troubleshoot Kubernetes cluster issues, pod failures, and resource constraints
Experience integrating Spark with Kubernetes operators and dynamic allocation

AI Skills:
Prompt Engineering: Proficiency in crafting effective prompts for AI coding assistants and analysis tools
AI Workflow Design: Experience redesigning development processes to leverage AI capabilities
Data Analysis: Ability to interpret AI-generated insights and translate them into actionable team improvements
Change Management: Experience leading teams through AI adoption and workflow transformation

Apache Spark (Development, Internals & Tuning):
Deep understanding of Spark''s core architecture - executors, tasks, stages, DAG
Expertise in Spark performance tuning techniques: partitioning, caching, broadcast joins, etc.
Experience troubleshooting slow running/stuck jobs or resource issues in Spark
Proven ability to optimize Spark jobs for large-scale datasets
Experience running Spark on Kubernetes and understanding Spark-on-K8s architecture

Cloud Technologies:
Experience with AWS services like S3, EMR, EMR on EKS, Glue, Lambda, Athena, etc.
Hands-on experience using S3 with Spark (e.g., dealing with file formats, consistency issues)
Strong experience with Amazon EKS (Elastic Kubernetes Service) architecture and best practices
Experience with AWS IAM roles for service accounts (IRSA) for Kubernetes workloads
Knowledge of AWS networking for EKS (VPC, subnets, security groups)
Experience with AWS monitoring and logging tools (CloudWatch, CloudTrail) for Kubernetes workloads
Serverless knowledge (Lambda, Fargate)

Programming - Python or Scala:
Ability to write clean, modular, and perform code
Experience with functional programming concepts (e.g., immutability, higher-order functions)
Real-world use cases where scalable data processing code was implemented
Strong understanding of collections, concurrency, and memory management

SQL Skills (Window Functions, Joins, Complex Queries):
Proficiency with SQL window functions, multi-table joins, and aggregations
Ability to write and optimize complex SQL queries
Experience handling edge cases like NULLs, duplicates, and ordering

Good to have:
Experience with managing production data pipelines/ETL systems
Experience with CI/CD pipelines (Jenkins, GitLab CI, GitHub Actions, ArgoCD)
Experience with Infrastructure as Code (Terraform, CloudFormation) for provisioning EKS clusters and EMR on EKS
Experience writing comprehensive test cases and test automation
Experience with Docker and container image optimization
Knowledge of service mesh technologies (Istio, Linkerd)
Experience with monitoring and observability tools (Prometheus, Grafana, ELK stack)
AWS certifications (AI practitioner, Solutions Architect, Big Data Specialty, or Kubernetes certifications like CKA/CKAD)
Experience with GitOps practices for Kubernetes deployments

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 91171995
Position Id: 8976342
Posted 14 hours ago

Company Info

About NGTalentTech Group LLC

Ngtalenttech Group LLC is a one-stop hub for software development expert support and advisory.

We are an IT consulting and services company involved in developing solutions for different platforms , and in providing best resourcing all the time

We have core team in finding the suitable candidate with strong technical Skills, passion towards the work

We offer corporate training in technologies like Front end,Java UI, Big Data, .Net, Java, iOS/Android, Informatica, Websphere Admin, Weblogin Admin, Selenium Automation QA, SalesForce, etc. With our well qualified training team

Go to company profile

Contact the job poster

Sudhir Moota

Recruiter @ NGTalentTech Group LLC

View Profile

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

It looks like there aren't any Similar Jobs for this job yet.

Search all similar jobs