Cloud Data SRE

Cupertino, CA, US • Posted 7 hours ago • Updated 7 hours ago
Contract W2
Contract Corp To Corp
On-site
Depends on Experience
Fitment

Dice Job Match Score™

🎯 Assessing qualifications...

Job Details

Skills

  • Apache Hadoop
  • Apache Spark
  • Cloud Computing
  • Data Management
  • Data Processing
  • Disaster Recovery
  • HDFS

Summary

We are looking for Cloud Data SRE for our client in Cupertino, CA
Job Title: Cloud Data SRE
Job Location: Cupertino, CA
Job Type: Contract
Job Overview:
Pay Range: $55hr - $60hr

Responsibilities:

  • Provide on-call support for production alerts and critical incidents.
  • Perform log analysis, troubleshoot application failures, and ensure timely resolution.
  • Manage incident lifecycle including root cause analysis and permanent fixes.
  • Conduct alert reviews, reduce noise, and optimize alert thresholds.
  • Monitor Spark jobs, data pipelines, and infrastructure across Hadoop, Kubernetes, and serverless platforms.
  • Manage server health, cluster nodes, and disk utilization.
  • Optimize Spark job performance and configure resource parameters.
  • Support developers in diagnosing and resolving data processing issues.
  • Manage data access, permissions, quotas, and platform resources.
  • Perform data management activities including data copy, disaster recovery, and retention planning.
  • Build and maintain automation tools for reporting, monitoring, and incident analysis.
  • Improve operational efficiency through scripting and internal tooling.
  • Lead and support migration initiatives from legacy schedulers to modern data platforms.
  • Migrate workloads from Hadoop HDFS to ACOS and from YARN/Kubernetes to serverless Spark environments.
  • Support end-to-end data and compute migration activities.

Required Experience:

  • 6 8 years of experience in Data SRE or Production Support roles.
  • Strong experience with Spark job execution and performance tuning.
  • Hands-on knowledge of Hadoop ecosystem including HDFS and YARN.
  • Basic understanding of Kubernetes and serverless Spark environments.
  • Experience with monitoring, alerting, troubleshooting, and incident management processes.
  • Familiarity with shell scripting or Python for automation (preferred but not mandatory).
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10516350
  • Position Id: CA_CDSM_0409
  • Posted 7 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

San Jose, California

28d ago

Easy Apply

Contract, Third Party

Depends on Experience

San Jose, California

Today

Easy Apply

Full-time, Part-time, Third Party, Contract

San Jose, California

6d ago

Easy Apply

Contract, Third Party

Depends on Experience

Sunnyvale, California

Today

Easy Apply

Contract, Third Party

Depends on Experience

Search all similar jobs