Senior Site Reliability Engineer

Overview

Remote
On Site
USD54 - USD64
Contract - W2

Skills

Senior Site Reliability Engineer

Job Details

job summary:

Story Behind the Need


Who is Resiliency Engineering Enablement?



  • Partner with application and infrastructure teams to define Disaster Recovery (DR) standards
  • Design, deploy and manage Tier 1 DR capabilities.
  • Standardize and evangelize DR implementation patterns
  • Define and evangelize observability and ops excellence standards as related to DR
  • Define and maintain failover criteria
  • Define, maintain and test Technical Recovery Guides (TRG)









location: Saint Louis, Missouri

job type: Contract

salary: $54 - 64 per hour

work hours: 8am to 5pm

education: Bachelors



responsibilities:

Typical Day in the Role



  • This resource will be working on building and improving the disaster recovery (DR) capabilities of Client's Tier 1 applications. Common responsibilities will include:
  • Building, reviewing and maintaining application design and architecture documents.
  • Ensuring the DR capabilities are built into each system.
  • Working with development teams to implement and maintain the DR capabilities.
  • Participate in DR testing exercises and evaluate the results for continuous improvement.
  • Leads more complex projects focused on building and maintaining observability/monitoring for the application, monitoring key performance indicators, maintaining alerting, and continuously improving visibility.
  • Helps make decisions around periodic system validation and testing, service monitoring, and standing up new services/tools
  • Uses knowledge and experience to identify strategies that increase system reliability and performance through on-call rotation and process optimization
  • Identifies and implements necessary manual and automated procedures for improved collaborative response in real-time
  • Leads lower level Engineers in stress, security, and performance testing
  • Resolves issues that come up through support escalation
  • Keeps documentation and runbooks up to date to effectively deal with new incidents that might arise
  • Leads post incident reviews and documents findings for future informed decision making
  • Reviews proposals to optimize Software Development Life Cycle (SDLC) to boost service reliability and makes decisions around which proposals should move forward.
  • Communicates complex topics with development teams to investigate and document issues and leads internal team to develop solutions to mitigate them


qualifications:

Candidate Requirements



  • Required: A Bachelor's degree in a quantitative or business field (e.g., statistics, mathematics, engineering, computer science). Preferred:
  • Years of experience required: 4-6 years minimum
  • Disqualifiers: missing requirements
  • Additional qualities to look for: Experience with Rancher and Axway API Gateway








skills: Top 3 must-have hard skills stack-ranked by importance



  • 1 AWS, Route 53, Lambda, Mongo DB, Kafka, Kubernetes
  • 2 Load Balancing / Load Redirecting / Load Restricting strategies
  • 3 Monitoring and Observability tools such as Prometheus, Grafana, Dynatrace, Splunk, Elk




Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.

At Randstad Digital, we welcome people of all abilities and want to ensure that our hiring and interview process meets the needs of all applicants. If you require a reasonable accommodation to make your application or interview experience a great one, please contact

Pay offered to a successful candidate will be based on several factors including the candidate's education, work experience, work location, specific job duties, certifications, etc. In addition, Randstad Digital offers a comprehensive benefits package, including: medical, prescription, dental, vision, AD&D, and life insurance offerings, short-term disability, and a 401K plan (all benefits are based on eligibility).

This posting is open for thirty (30) days.


It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.



Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.