Apply Now

Cloud Data SRE

Remote • Posted 2 hours ago • Updated 2 hours ago

Contract W2

No Travel Required

Remote

Depends on Experience

Fitment

Dice Job Match Score™

🔗 Matching skills to job...

Job Details

Skills

SRE
production support
Hadoop
Spark
Kubernetes
Data platform

Summary

Job Title/Role	Cloud Data SRE
Location	Remote
Mandatory Skills	6–8 years of experience in Data SRE / Production Support roles. Strong knowledge of: Spark job execution & tuning Hadoop ecosystem (HDFS, YARN) Kubernetes basics Serverless Spark environments Hands-on experience with monitoring, troubleshooting, alerting, and incident response. Comfort with shell scripting / Python for automation (nice to have, not mandatory coding heavy).
JD	Cloud Data SRE (Spark / Data Platform) – 6–8 Years Experience Role Overview We are looking for an experienced Cloud Data SRE with 6–8 years of relevant experience to support, manage, and optimize Spark-based data workloads in production. This role is not development-focused; instead, it emphasizes production support, troubleshooting, system reliability, platform migration, and operational excellence across Spark and data ecosystem components. Key Responsibilities Production Support & Incident Management Provide on‑call support for production alerts and critical issues. Perform log analysis, debug application failures, and drive quick resolution. Handle incident management, root-cause analysis, and permanent remediation. Conduct alert retrospectives, reduce noise, and fine-tune alert thresholds. Monitoring & Operational Excellence Monitor Spark jobs, data pipelines, and underlying infrastructure across Hadoop/Kubernetes/serverless platforms. Manage server health, Hadoop cluster nodes, and disk utilization. Configure resource parameters and optimize Spark job performance. Support developers by helping diagnose and resolve job issues. Data & Platform Management Manage data access, quotas, file permissions, and HDFS/Kube resources. Handle data management operations including data copy, DR, retention planning, and utilization checks. Tooling & Automation Build/maintain tools for automation, reporting, dashboarding, and incident analysis. Improve operational efficiency through scripts, utilities, and internal platforms. Migration Projects Migrate projects from: Legacy schedulers to Data Platform Hadoop HDFS → ACOS YARN / Kubernetes → Serverless Spark Support data and compute migration initiatives end-to-end. Required Experience 6–8 years of experience in Data SRE / Production Support roles. Strong knowledge of: Spark job execution & tuning Hadoop ecosystem (HDFS, YARN) Kubernetes basics Serverless Spark environments Hands-on experience with monitoring, troubleshooting, alerting, and incident response. Comfort with shell scripting / Python for automation (nice to have, not mandatory coding heavy).
JD

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10118140
Position Id: 8936756
Posted 2 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Sr. Cloud Data Engineer

Remote or Wisconsin

•

Yesterday

MARS Solutions Group is looking for an experienced Sr. Cloud Data Engineer. Our client is a Financial Service company looking for high-quality talent to make a difference. They are known to respect a traditional work week and often extend contracts for added job security and stability. Sr. Cloud Data Engineer: - 10 Years of Hands on Experience Key Skills: Cloud Platforms: AWS ( Kubernetes), Google Cloud, Microsoft Azure. Data Storage: Data lakes, data warehouses, cloud storage service (S3) Da

Easy Apply

Third Party, Contract

$80

Data Platform Engineer

Remote or Cleveland, Ohio

•

Today

Description We are looking for a skilled Data Platform Engineer to join our team on a long-term contract basis in Cleveland, Ohio. In this role, you will be responsible for managing and maintaining cloud-based analytics platforms, ensuring their stability, performance, and reliability. This is an excellent opportunity to work in a dynamic environment with cutting-edge technologies, including Kubernetes and containerized applications. Responsibilities: Oversee the daily administration and operat

Easy Apply

Contract

USD 42.75 - 49.50 per hour

Requirement for SRE Manager

Remote

•

2d ago

Location : 100 % Remote Duration : 3 months Contract to Hire Need only on 1099 / W2 Site Reliability Engineering Manager SRE Manager to lead a team of reliability engineers responsible for the uptime, performance, and efficiency of the customer-facing platforms. You ll set SLOs and error budgets, build great incident and change practices, and coach engineers to automate everything that can be automated. Responsibilities Lead & grow the team: Hire, coach, and develop SREs; set goals and establis

Easy Apply

Contract

$160,000 - $180,000

Big Data Engineer- Remote

Remote

•

2d ago

SkillSets: Total EXP: US EXP: Spark: Scala: AWS: Redshift: ETL: API/Webservice: Java Fully remote. Long term contract. Healthcare Experience needed! Top Skills' Details 1. Senior level resource with Scala and Spark for data processing and ETL 2. 5+ years experience with AWS Cloud services like Spark, Scala, AWS EMR, RedShift and Airflow 3. Experience with REST APIs and web services 4. Software Engineering Experience/Advanced in Java. Secondary Skills - Nice to Haves Sql python Job Descriptio

Easy Apply

Contract, Third Party

$55

Search all similar jobs