Apply Now

Site Reliability Engineer

• Posted 2 days ago • Updated 1 hour ago

Contract W2

On-site

USD55 - USD60/hr

Fitment

Dice Job Match Score™

🤯 Applying directly to the forehead...

Job Details

Skills

Site Reliability Engineer

Summary

job summary:

Key Responsibilities

Observability & Monitoring

Own and maintain a single-pane-of-glass dashboard for application, platform, dependency, and client journey health.

Improve SLOs, SLIs, alerts, dashboards, and monitoring standards.

Ensure proactive detection of client-impacting issues using logs, metrics, traces, and synthetic monitoring.

Reliability & Incident Management

Improve MTTD, MTTR, and overall service reliability.

Maintain incident response playbooks and alerting standards.

Facilitate blameless postmortems, root cause analysis, and track corrective actions through closure.

Analyze trends and recurring failure patterns to prevent repeat incidents.

Resilience Engineering

Lead FMEA assessments for critical applications and journeys.

Identify single points of failure and partner with teams on remediation plans.

Conduct Game Days, chaos testing, failover testing, and recovery exercises.

Validate multi-region, multi-AZ, and disaster recovery capabilities.

Safe Change & Operational Excellence

Define reliability standards and operational guardrails.

Review production readiness of high-risk changes.

Drive adoption of safe deployment practices such as canary releases, feature flags, and automated rollback mechanisms.

Community of Practice & Reliability Leadership

Build and lead the Cash & Money Movement SRE Community of Practice.

Drive engagement, knowledge sharing, and reliability culture across the organization.

Identify and mentor application-level SRE champions/POCs.

Facilitate weekly reliability forums, office hours, and operational reviews.

Educate teams on SRE best practices, observability, incident management, resilience testing, and safe change principles.

Partner closely with Danlin Hibay's SRE and operational excellence organizations to stay aligned with enterprise standards, emerging tools, lessons learned, and engineering best practices.

Act as the liaison between Cash & Money Movement and enterprise SRE communities to bring recommendations, standards, and innovations back to product teams

location: Malvern, Pennsylvania

job type: Contract

salary: $55 - 60 per hour

work hours: 8am to 5pm

education: Bachelors

responsibilities:

Key Responsibilities

Observability & Monitoring

Own and maintain a single-pane-of-glass dashboard for application, platform, dependency, and client journey health.
Improve SLOs, SLIs, alerts, dashboards, and monitoring standards.
Ensure proactive detection of client-impacting issues using logs, metrics, traces, and synthetic monitoring.

Reliability & Incident Management

Improve MTTD, MTTR, and overall service reliability.
Maintain incident response playbooks and alerting standards.
Facilitate blameless postmortems, root cause analysis, and track corrective actions through closure.
Analyze trends and recurring failure patterns to prevent repeat incidents.

Resilience Engineering

Lead FMEA assessments for critical applications and journeys.
Identify single points of failure and partner with teams on remediation plans.
Conduct Game Days, chaos testing, failover testing, and recovery exercises.
Validate multi-region, multi-AZ, and disaster recovery capabilities.

Safe Change & Operational Excellence

Define reliability standards and operational guardrails.
Review production readiness of high-risk changes.
Drive adoption of safe deployment practices such as canary releases, feature flags, and automated rollback mechanisms.

Community of Practice & Reliability Leadership

Build and lead the Cash & Money Movement SRE Community of Practice.
Drive engagement, knowledge sharing, and reliability culture across the organization.
Identify and mentor application-level SRE champions/POCs.
Facilitate weekly reliability forums, office hours, and operational reviews.
Educate teams on SRE best practices, observability, incident management, resilience testing, and safe change principles.
Partner closely with Danlin Hibay's SRE and operational excellence organizations to stay aligned with enterprise standards, emerging tools, lessons learned, and engineering best practices.
Act as the liaison between Cash & Money Movement and enterprise SRE communities to bring recommendations, standards, and innovations back to product teams

qualifications:

Key Deliverables

Unified Cash & Money Movement Reliability Dashboard

Journey Health Dashboard (Add Bank, Transfers, Wires, ACH, Direct Deposit, Cash Plus, etc.)

SLO/SLI Framework and Alert Standards

FMEA Library and Resiliency Test Plans

Incident Playbooks and Postmortem Reviews

Reliability Community of Practice

Reliability Maturity Assessments and Executive Reporting

Success Measures

Reduced Sev 1/2/3 incidents

Reduced MTTD and MTTR

100% critical applications with SLOs, dashboards, and actionable alerts

Completion of FMEA and resiliency testing for critical journeys

Timely closure of postmortem action items

Improved reliability, availability, and client experience across Cash & Money Movement.

Active and engaged reliability community across Cash & Money Movement

Operating Model

This is a Hub-and-Spoke SRE model, SRE defines what "good" looks like and drives continuous improvement while engineering teams remain accountable for execution and results.

SRE owns

Reliability standards and best practices

Observability and dashboards

Assessments, FMEA, and resilience testing

Incident reviews and postmortems

Community of Practice

Education, coaching, and governance

Product Teams own

Reliability backlog execution

Remediation and implementation

Operational outcomes

Service health and reliability improvements

Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.

At Randstad Digital, we welcome people of all abilities and want to ensure that our hiring and interview process meets the needs of all applicants. If you require a reasonable accommodation to make your application or interview experience a great one, please contact

Pay offered to a successful candidate will be based on several factors including the candidate's education, work experience, work location, specific job duties, certifications, etc. In addition, Randstad Digital offers a comprehensive benefits package, including: medical, prescription, dental, vision, AD&D, and life insurance offerings, short-term disability, and a 401K plan (all benefits are based on eligibility).

This posting is open for thirty (30) days.

Any consideration of a background check would be an individualized assessment based on the applicant or employee's specific record and the duties and requirements of the specific job.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: cxsapwma1
Position Id: 1338419
Posted 2 days ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Site Reliability Engineer

Buffalo, New York

•

Today

Location: Buffalo, NY Salary: $65.00 USD Hourly - $70.00 USD Hourly Description: Title : Site Reliability Engineer Duration of project : Contract 12+ Months Open Location : Buffalo, NY Remote EST Overview We are building a high-impact Site Reliability Engineering team to support 12+ mission-critical enterprise applications across a mix of legacy and modern environments. This role is part of a strategic initiative focused on: Application instrumentation Observability adoption (OpenTelemetry,

Contract

USD 65.00 - 70.00 per hour

Staff Site Reliability Engineer

Remote

•

Today

About AlphaSense: The world's most sophisticated companies rely on AlphaSense to remove uncertainty from decision-making. With market intelligence and search built on proven AI, AlphaSense delivers insights that matter from content you can trust. Our universe of public and private content includes equity research, company filings, event transcripts, expert calls, news, trade journals, and clients' own research content. The acquisition of Tegus by AlphaSense in 2024 advances our shared mission to

Full-time

USD 150,000.00 - 225,000.00 per year

SRE Lead || Phoenix, AZ || LOCALS ONLY || Hybrid

Hybrid in Phoenix, Arizona

•

23d ago

SRE Lead & Monitoring ConsultantKey ResponsibilitiesSRE Practice Development Assess operational maturity and build SRE transformation roadmap Establish SLOs, SLIs, and error budgets for critical services Design incident management processes and on-call strategies Implement chaos engineering and resilience testing Mentor teams on SRE principles and best practices Monitoring & Observability Deploy and configure Datadog, Splunk, Grafana, and Prometheus Implement metrics collection, log aggregatio

Easy Apply

Contract

Depends on Experience

Site Reliability Engineer (SRE)

Berkeley Heights, New Jersey

•

Today

job summary: Fully onite in Berkley Heights NJ location: Berkeley Heights, New Jersey job type: Contract to Perm salary: $70 - 80 per hour work hours: 9am to 5pm education: Bachelors responsibilities: Automate operational tasks and health checks to create sustainable systems and services. Monitor the production environment to ensure system health using observability tools like Dynatrace and Splunk. Identify reliability gaps through process reengineering and analyze performance metrics.

Contract

USD70 - USD80

Search all similar jobs