Apply Now

Production Support Engineer (DevOps / Streaming Platform)

Hybrid in Atlanta, GA, US • Posted 2 days ago • Updated 2 days ago

Contract W2

6 Months

No Travel Required

Hybrid

Depends on Experience

Fitment

Dice Job Match Score™

🧠 Analyzing your skills...

Job Details

Skills

Video Streaming Platforms
OTT Services
Live Systems
CDN Architectures
DASH
HLS
Terraform
Kubernetes
CI/CD

Summary

Job Title: Production Support Engineer (DevOps / Streaming Platform)
Location: Atlanta, GA (Hybrid)
Hire Type: Contract (6 months+)

Summary
Client is building a dedicated Operational Support (L2) team responsible for the stability, availability, and operational excellence of their 24/7 live video streaming, ads, player, and real time delivery platforms.
As an Operational Support Engineer (L2), you take end to end ownership of customer impacting production incidents once they are triaged by Level 1 support. You operate directly on production systems, lead live incident resolution, and act as the operational bridge between Support, Engineering, DevOps, and customers, particularly during high impact live events.
This is a hands on, customer facing role focused on incident ownership, production operations, automation, and operational scalability, not just reactive troubleshooting.

Key Responsibilities:

Incident & Operational Support
Take ownership of escalated customer issues from Level 1 Support and drive them to resolution
Troubleshoot and resolve complex, high-impact production incidents affecting live streams, VOD playback, ad insertion, DRM, and real-time WebRTC services
Operate directly on production environments, including configuration changes, CDN adjustments, and corrective actions, following established operational procedures, including executing mitigations and emergency changes during live incidents when customer impact requires immediate action
Lead or actively contribute to live incident bridges involving customers, internal teams, and partners
Provide clear, timely communication during incidents, including status updates and customer-facing explanations

Infrastructure as Code & Production Operations
Work fluently with Infrastructure as Code (IaC) to understand, troubleshoot, and safely modify production environments
Leverage tools and frameworks such as:
o Terraform
o Helm
o Kubernetes manifests
o GitOps workflows
o CI/CD and deployment pipelines
Use IaC as the primary mechanism for safe, auditable, and repeatable operational changes
Collaborate with Engineering and DevOps to improve deployment reliability and operational safety
Validate and execute infrastructure or configuration changes through codified workflows

AI-Driven Operations & Automation
Leverage AI tools and automation to enhance operational efficiency and incident response
Contribute to and use:
o AI-assisted incident triage and classification
o Automated runbook execution
o AI-based pattern detection across incidents
o Intelligent alert correlation and noise reduction

Use AI to:
o Generate or improve incident communications
o Accelerate troubleshooting workflows
o Identify recurring patterns and systemic issues
Drive adoption of automation-first and AI-augmented operational practices

Pre-Event Planning & Operational Readiness
Participate in pre-event readiness planning for critical customer events
Validate system readiness through:
o Runbook checks
o Monitoring coverage validation
o Risk identification and mitigation planning
Define and rehearse incident response strategies for high-risk scenarios
Collaborate with customers and internal teams to ensure smooth event execution

On-Call & 24/7 Operations
Participate in a 24/7 on-call rotation, including nights, weekends, and holidays, as part of a global support model
Ensure smooth handovers between shifts and regions
Respond to critical alerts within defined SLAs for stream health, player errors, and delivery infrastructure

Root Cause & Continuous Improvement
Perform or contribute to root cause analysis (RCA) for production incidents
Document findings, corrective actions, and preventive measures
Identify recurring issues and work with Engineering and Product teams to eliminate them permanently
Contribute to and improve runbooks, operational playbooks, and knowledge bases for all products (Player, ads, live and real time streaming)

Collaboration & Engineering Feedback Loop
Work closely with Engineering teams to escalate defects, validate fixes, and support production deployments
Provide feedback on system observability, tooling gaps, and operational risks
Act as the operational voice during post-incident reviews

Required Skills & Experience:
5+ years of relevant experience in operational, support, or similar customer‑facing roles
Proven ability to own complex problems end‑to‑end and operate with a high degree of autonomy
Strong experience supporting production video streaming platforms, OTT services, live systems
Solid troubleshooting skills across distributed systems (APIs, microservices, cloud infrastructure)
Familiarity with HLS, DASH, CMAF, WebRTC, DRM and CDN architectures
Experience working with monitoring, alerting, and logs to diagnose live incidents (Grafana, Kibana/ELK, Prometheus, Loki)
Correlate backend streaming metrics, player telemetry, and CDN signals to diagnose live customer issues end-to-end.
Comfort performing controlled changes in production environments
Working knowledge of incident management and on-call operations.

Operational Mindset:
Proven ability to remain calm, structured, and decisive during high-pressure incidents
Strong sense of ownership and accountability for customer outcomes
Excellent written and verbal communication skills, including customer-facing communication during incidents.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10360587
Position Id: 8997776
Posted 2 days ago

Contact the job poster

Soumen Mondal

AVP - Client Services @ Kani Solutions

View Profile

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Sr Staff Operational Support Engineer

Atlanta, Georgia

•

Today

Join the leader in entertainment innovation and help us design the future. At Dolby, science meets art, and high tech means more than computer code. As a member of the Dolby team, you'll see and hear the results of your work everywhere, from movie theaters to smartphones. We continue to revolutionize how people create, deliver, and enjoy entertainment worldwide. To do that, we need the absolute best talent. We're big enough to give you all the resources you need, and small enough so you can make

Full-time

USD 152,200.00 - 209,200.00 per year

Staff Operational Support Engineer

Atlanta, Georgia

•

Today

Full-time

USD 136,500.00 - 187,400.00 per year

SOFTWARE ENGINEERING DIRECTOR I, Production Support Operations

Atlanta, Georgia

•

Today

The position is described below. If you want to apply, click the Apply Now button at the top or bottom of this page. After you click Apply Now and complete your application, you'll be invited to create a profile, which will let you see your application status and any communications. If you already have a profile with us, you can log in to check status. Need Help? If you have a disability and need assistance with the application, you can request a reasonable accommodation. Send an email to Acce

Full-time

Senior Site Reliability Engineer

Atlanta, Georgia

•

4d ago

Job Title: Senior Site Reliability Engineer Location: Atlanta, GA Duration: / Term: Contract Experience Desired: 8+ Years Job Description: CDP MISSION: Our mission is to be the authoritative source of truth for customer data delivering timely, high-quality data at scale to power the contextual experiences that drive the growth of this company. Every customer profile must be accurate, trusted, and available when it matters, across every touchpoint, for the entire US adult population

Easy Apply

Third Party, Contract

$60 - $65

Search all similar jobs