Overview
Full Time
Contract - Independent
Contract - W2
Contract - 12+Month(s)
50% Travel
Skills
SRE
Job Details
Hi,
Hope you are doing well. Below is the Job Description, kindly go through it and please let me know if you are interested.
Job Title: Senior Site Reliability Engineering (SRE) Engineer
Location: NYC (3 days onsite, Hybrid)
Duration-12+Months
Duration-12+Months
Contract
Job Summary:
We are seeking a highly skilled and experienced Senior Site Reliability Engineering (SRE) Engineer to lead our SRE team in ensuring the reliability, scalability, and performance of our production systems. The ideal candidate will have a strong background in cloud infrastructure, automation, and system monitoring, with excellent leadership and communication skills to collaborate across teams and foster a culture of operational excellence
Key Responsibilities:
- Design and develop enterprise-grade APIs and configuration solutions.
- Contribute to enterprise and application architecture design.
- Lead observability initiatives including monitoring, alerting, and incident response.
- Build and maintain dashboards and alerting systems using Grafana, Prometheus, Splunk, etc.
- Create and maintain detailed runbooks for operational procedures and incident handling.
- Define and monitor SLAs, SLOs, and KPIs for critical services.
- Collaborate with architecture, development, and security teams to ensure system reliability.
- Evaluate and adopt new technologies to improve system performance and maintainability.
Required Skills:
- Strong background in IT infrastructure, cloud platforms (AWS, Azure, Google Cloud Platform), and SRE practices.
- Experience in enterprise and application architecture.
- Proven experience in building APIs and backend services.
- Hands-on experience with tools:
- Monitoring & Observability: Grafana, Prometheus, Splunk
- ITSM & Operations: ServiceNow, OpsRamp
- Project & Incident Tracking: JIRA
- Experience in building alerts, dashboards, and operational runbooks.
- Experience managing distributed systems and large-scale production environments.
- Strong leadership, communication, and problem-solving skills.
- Ability to quickly learn and adapt to new technologies and environments.
Preferred:
- Exposure to OpenShift and Azure cloud platforms.
- Certifications: SRE Foundation, ITIL, or relevant cloud certifications.
Thanks & Regards,
Dipankar Singh
Technical Recruiter|| USA || Canada || India
Email:
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.