Site Reliability Engineer

Charlotte, NC, US • Posted 3 days ago • Updated 3 days ago
Full Time
Travel Required
On-site
Depends on Experience
Fitment

Dice Job Match Score™

🔢 Crunching numbers...

Job Details

Skills

  • Site Reliability & Operations
  • Ansible
  • AppDynamics

Summary

Job Title: Site Reliability Engineer (SRE)

Location: Charlotte, NC – Hybrid Role (3 days onsite) (in person interview)

Visa: USCEAD/EAD/L2ead

 

Please don’t submit devops engineer with SRE experience ,Strict instruction from client should have only SRE experience for last 3 to 4 years

 

 

Overview

This role supports mission‑critical platforms within a large, regulated enterprise environment. The Senior SRE will partner with engineering, product, and systems operations teams to drive reliability, scalability, automation, and operational excellence across complex, distributed systems.

This is not a traditional application support role. The ideal candidate has operated as part of a mature SRE team, owns reliability outcomes, and brings strong communication and consulting skills to influence senior stakeholders. The role blends hands‑on SRE engineering ratewith production support responsibilities, with a long‑term goal of increasing SRE maturity across the organization.

 

Key Responsibilities

Site Reliability & Operations

  • Define, implement, and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and reliability metrics for supported platforms
  • Help introduce and mature error budget concepts as part of SRE best practices
  • Drive reliability, availability, scalability, and performance improvements for mission‑critical applications
  • Lead and execute production readiness activities, including:
    • Non‑Functional Requirements (NFRs)
    • Permit to Operate (PTO) and operational gating
  • Participate in and lead incident response, root cause analysis (RCA), and post‑incident remediation efforts

 

Automation & Observability

  • Identify and implement automation opportunities to reduce manual toil and operational risk
  • Utilize a centralized automation platform (Ansible owned by a horizontal team)
  • Build, enhance, and maintain monitoring, telemetry, and observability solutions using tools such as:
    • AppDynamics (App D)
    • ThousandEyes
    • Splunk
  • Improve alerting quality to drive faster detection and resolution

 

Collaboration & Consulting

  • Act as a trusted technical advisor to:
    • Senior engineering leaders
    • Product managers
    • Systems operations stakeholders
  • Communicate clearly and effectively through written documentation, status updates, and operational reviews
  • Translate complex technical risks into business‑relevant impact for non‑technical stakeholders
  • Support a platform where more than half of the applications are vendor‑provided, requiring strong operational oversight rather than direct application development

 

Operating Model & Expectations

  • Ideal target state: 80% SRE / 20% support
  • Realistic near‑term split: ~50% SRE work / ~50% operational support
  • Participation in a rotating late‑shift coverage approximately every 6 weeks (e.g., 12pm–8pm or 12pm–9pm)
  • Hands‑on involvement in day‑to‑day operations, not purely advisory

 

Required Qualifications:

  • 5+ years of experience in:
    • Site Reliability Engineering
    • Systems Operations Engineering
    • Platform Engineering
    • Technology Architecture
  • Demonstrated experience operating within an SRE team, not just DevOps or technical support
  • Proven ownership of:
    • Reliability engineering initiatives
    • Production readiness and operational gating
    • SLO/SLI definition and service metrics
  • Strong communication, writing, and stakeholder management skills (Non‑negotiable)

 

Technical Skills & Experience:

Core Technologies

  • Kubernetes / OpenShift
  • Python scripting (for automation and operational tooling)
  • Enterprise application environments supporting Java / Oracle‑based stacks
  • Experience supporting vendor‑hosted applications

 

Monitoring & Observability (Strong Preference)

  • AppDynamics
  • ThousandEyes
  • Splunk or similar enterprise logging platforms

 

Nice to Have

  • Autosys
  • Oracle Cloud Platform (OCP)
  • Google Cloud Platform (Google Cloud Platform)
  • Security or risk‑focused operational experience in regulated environments

 

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 91131645
  • Position Id: 8910766
  • Posted 3 days ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Charlotte, North Carolina

Today

Easy Apply

Full-time

Charlotte, North Carolina

Today

Easy Apply

Contract

$61.69 - $67.6

Charlotte, North Carolina

Today

Easy Apply

Full-time

USD 61.00 - 65.00 per hour

Charlotte, North Carolina

Today

Easy Apply

Contract

USD0 - USD0

Search all similar jobs