Job Title: Site Reliability Engineer (SRE)
Location: Charlotte, NC – Hybrid Role (3 days onsite) (in person interview)
Visa: USCEAD/EAD/L2ead
Please don’t submit devops engineer with SRE experience ,Strict instruction from client should have only SRE experience for last 3 to 4 years
Overview
This role supports mission‑critical platforms within a large, regulated enterprise environment. The Senior SRE will partner with engineering, product, and systems operations teams to drive reliability, scalability, automation, and operational excellence across complex, distributed systems.
This is not a traditional application support role. The ideal candidate has operated as part of a mature SRE team, owns reliability outcomes, and brings strong communication and consulting skills to influence senior stakeholders. The role blends hands‑on SRE engineering ratewith production support responsibilities, with a long‑term goal of increasing SRE maturity across the organization.
Key Responsibilities
Site Reliability & Operations
- Define, implement, and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and reliability metrics for supported platforms
- Help introduce and mature error budget concepts as part of SRE best practices
- Drive reliability, availability, scalability, and performance improvements for mission‑critical applications
- Lead and execute production readiness activities, including:
- Non‑Functional Requirements (NFRs)
- Permit to Operate (PTO) and operational gating
- Participate in and lead incident response, root cause analysis (RCA), and post‑incident remediation efforts
Automation & Observability
- Identify and implement automation opportunities to reduce manual toil and operational risk
- Utilize a centralized automation platform (Ansible owned by a horizontal team)
- Build, enhance, and maintain monitoring, telemetry, and observability solutions using tools such as:
- AppDynamics (App D)
- ThousandEyes
- Splunk
- Improve alerting quality to drive faster detection and resolution
Collaboration & Consulting
- Act as a trusted technical advisor to:
- Senior engineering leaders
- Product managers
- Systems operations stakeholders
- Communicate clearly and effectively through written documentation, status updates, and operational reviews
- Translate complex technical risks into business‑relevant impact for non‑technical stakeholders
- Support a platform where more than half of the applications are vendor‑provided, requiring strong operational oversight rather than direct application development
Operating Model & Expectations
- Ideal target state: 80% SRE / 20% support
- Realistic near‑term split: ~50% SRE work / ~50% operational support
- Participation in a rotating late‑shift coverage approximately every 6 weeks (e.g., 12pm–8pm or 12pm–9pm)
- Hands‑on involvement in day‑to‑day operations, not purely advisory
Required Qualifications:
- 5+ years of experience in:
- Site Reliability Engineering
- Systems Operations Engineering
- Platform Engineering
- Technology Architecture
- Demonstrated experience operating within an SRE team, not just DevOps or technical support
- Proven ownership of:
- Reliability engineering initiatives
- Production readiness and operational gating
- SLO/SLI definition and service metrics
- Strong communication, writing, and stakeholder management skills (Non‑negotiable)
Technical Skills & Experience:
Core Technologies
- Kubernetes / OpenShift
- Python scripting (for automation and operational tooling)
- Enterprise application environments supporting Java / Oracle‑based stacks
- Experience supporting vendor‑hosted applications
Monitoring & Observability (Strong Preference)
- AppDynamics
- ThousandEyes
- Splunk or similar enterprise logging platforms
Nice to Have
- Autosys
- Oracle Cloud Platform (OCP)
- Google Cloud Platform (Google Cloud Platform)
- Security or risk‑focused operational experience in regulated environments