Overview
Skills
Job Details
Job Title: SRE Production Support
Location: Bellevue, USA (Work from Office)
Duration: 4 Years plus
Certification: Required in Splunk, AppDynamics, Grafana, or similar tools
Experience: 10+ Years (8+ Years in Relevant SRE/Production Support)
pranayatburgeonitsdotcom
Job Description:
We are hiring a skilled Site Reliability Engineer (SRE) with strong production support experience. You ll be responsible for proactive monitoring, incident management, and system reliability using leading observability tools.
Key Responsibilities:
Monitor systems using tools like Splunk, Grafana, AppDynamics, etc.
Lead incident triage calls and drive resolution efforts.
Correlate system metrics and logs to reduce downtime (MTTD/MTTR).
Collaborate with technical teams and communicate effectively with leadership.
Required Skills:
Hands-on experience with observability tools (Splunk, AppDynamics, Grafana)
Strong knowledge of Linux/Unix, APIs, Load Balancers, Containers (Docker, K8s), and Cloud (AWS, Google Cloud Platform, PCF)
Excellent communication and incident management skills
Nice to Have:
Familiarity with Java, Python, Go, or Node.js
Exposure to ServiceNow and automated playbooks