Site Reliability Engineer (W2 POSITION, NEED & LOCAL)

Overview

On Site

Depends on Experience

Contract - Independent

Contract - W2

Contract - 12 Month(s)

Skills

Mentorship

FOCUS

Grafana

Incident Management

Java

Kubernetes

Application Development

Communication

Debugging

DevOps

Docker

Dynatrace

Recruiting

Reference Data

Reliability Engineering

Root Cause Analysis

Software Engineering

Job Details

Role: Site Reliability Engineer

Location: Wilmington, DE

Duration: long term

Openings: 1

Years Experience: Must have at least 5+ years related experience

Work Authorization: USC/ (Cat09)

Important Note:

Manager is not looking for cloud/ DevOps/ infrastructure skilled candidates DO NOT sent profiles with these experiences
This is an SRE focused role hiring manager is looking for someone who is highly skilled in RCA, postmortem documentations
Software background is ideal (ex. Java)

Job Description:

As part of the Site Reliability Engineering team within the Reference Data Engineering group, you'll help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to runtime problems. In this environment, you ll take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow. As an SRE, you ll be part of application development org to build more resilient, self-healing applications that require minimum production operations.

Key Responsibilities:

Lead and conduct detailed Root Cause Analysis (RCA) for incidents, identifying underlying issues and recommending corrective actions.
Document and communicate findings from RCA processes, ensuring transparency and knowledge sharing across the organization.
Develop and maintain incident postmortem reports, providing insights and actionable recommendations to stakeholders.
Monitor system performance and reliability metrics, proactively identifying potential issues before they escalate.
Contribute to the design and implementation of automated monitoring and alerting systems to improve incident detection and response times.
Continuously improve the incident management process, incorporating feedback and lessons learned from RCA activities.
Participate in incident response activities.

Qualifications:

Bachelor s degree or equivalent experience in a software engineering discipline
5+ years of Software Engineering experience
Excellent communication skills, with the ability to convey technical findings to both technical and non-technical audiences
Excellent debugging and trouble shooting skills
Experience in Site Reliability Engineering, DevOps, or a similar role, with a focus on incident management and RCA.
Experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Dynatrace).
Familiarity with containerization technologies (e.g., Docker, Kubernetes).

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share