Apply Now

Cloud Data SRE

Remote • Posted 5 hours ago • Updated 5 hours ago

Full Time

Remote

Up to $130,000/yr

Fitment

Dice Job Match Score™

🔢 Crunching numbers...

Job Details

Skills

HDFS
Hadoop
Kubernetes
SRE
Production Support

Summary

Mandatory Skills:

6 8 years of experience in Data SRE / Production Support roles. Strong knowledge of: Spark job execution & tuning Hadoop ecosystem (HDFS, YARN) Kubernetes basics Serverless Spark environments Hands-on experience with monitoring, troubleshooting, alerting, and incident response. Comfort with shell scripting / Python for automation (nice to have, not mandatory coding heavy).

Job description:

Cloud Data SRE (Spark / Data Platform) 6 8 Years Experience.

Role Overview

We are looking for an experienced Cloud Data SRE with 6 8 years of relevant experience to support, manage, and optimize Spark-based data workloads in production. This role is not development-focused; instead, it emphasizes production support, troubleshooting, system reliability, platform migration, and operational excellence across Spark and data ecosystem components.

Key Responsibilities

Production Support & Incident Management

Provide on call support for production alerts and critical issues.

Perform log analysis, debug application failures, and drive quick resolution.

Handle incident management, root-cause analysis, and permanent remediation.

Conduct alert retrospectives, reduce noise, and fine-tune alert thresholds.

Monitoring & Operational Excellence

Monitor Spark jobs, data pipelines, and underlying infrastructure across Hadoop/Kubernetes/serverless platforms.

Manage server health, Hadoop cluster nodes, and disk utilization.

Configure resource parameters and optimize Spark job performance.

Support developers by helping diagnose and resolve job issues.

Data & Platform Management Manage data access, quotas, file permissions, and HDFS/Kube resources.

Handle data management operations including data copy, DR, retention planning, and utilization checks.

Tooling & Automation Build/maintain tools for automation, reporting, dashboarding, and incident analysis.

Improve operational efficiency through scripts, utilities, and internal platforms.

Migration Projects

Migrate projects from:

Legacy schedulers to Data Platform

Hadoop HDFS ACOS YARN / Kubernetes Serverless Spark

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 91081485
Position Id: 8938297
Posted 5 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Sr. Data Engineer, tvScientific

Remote or San Francisco, California

•

Today

About Pinterest: Millions of people around the world come to our platform to find creative ideas, dream about new possibilities and plan for memories that will last a lifetime. At Pinterest, we're on a mission to bring everyone the inspiration to create a life they love, and that starts with the people behind the product. Discover a career where you ignite innovation for millions, transform passion into growth opportunities, celebrate each other's unique experiences and embrace the flexibility

Full-time

USD 123,696.00 - 254,667.00 per year

Big Data (Python/Scala) Engineer -Assistant Vice President

Remote or Tampa, Florida

•

Today

Citi, the leading global bank, has approximately 200 million customer accounts and does business in more than 160 countries and jurisdictions. Citi provides consumers, corporations, governments, and institutions with a broad range of financial products and services, including consumer banking and credit, corporate and investment banking, securities brokerage, transaction services, and wealth management. As a bank with a brain and a soul, Citi creates economic value that is systemically responsib

Full-time

USD 96,960.00 - 145,440.00 per year

Staff Data Engineer, tvScientific

Remote or San Francisco, California

•

Today

Full-time

USD 155,584.00 - 320,320.00 per year

DevOps L2 SRE Lead Vice President

Remote or Jacksonville, Florida

•

Today

Overview of the Role: Citi, the leading global bank, has approximately 200 million customer accounts and does business in more than 160 countries and jurisdictions. Citi provides consumers, corporations, governments, and institutions with a broad range of financial products and services, including consumer banking and credit, corporate and investment banking, securities brokerage, transaction services, and wealth management. As a bank with a brain and a soul, Citi creates economic value that i

Full-time

USD 113.00 per hour

Search all similar jobs