Senior Cloud Reliability Engineer

Overview

On Site
Depends on Experience
Accepts corp to corp applications
Contract - W2
Contract - 12 Month(s)

Skills

Accountability
Collaboration
Amazon Web Services
Computer Science
FRS

Job Details

We are looking for Senior Cloud Reliability Engineer for our client in Richmond, VA
Job Title: Senior Cloud Reliability Engineer
Job Location: Richmond, VA
Job Type: Contract
Job Description:
Qualifications:
  • 5-7 years of extensive experience in end-to-end enterprise software development life cycle experience, including maintenance and support.
  • 3+ years of experience in Observability and SRE practices.
  • 3+ years of experience in the Cloud Networking domain (experience with Routers, Firewalls, Load Balancers, etc).
  • Bachelor s degree in computer science, Information Systems, or equivalent background or equivalent experience.
  • Extensive knowledge and experience of working in AWS environments
  • Knowledge of Azure is a plus.
  • Strong Software development experience in Cloud with one of the languages: Python or GoLang.
  • Experience with observability, open telemetry, and in one or more of the tools like Dynatrace, Prometheus, Grafana, AWS CloudWatch, AWS Canary, AWS Event Bridge.
  • Expertise in automating the TOIL.
  • Working experience in Agile and Scaled Agile environments.
  • Experience supporting infrastructure for large multi-service applications
  • Knowledge of secure coding standards and the banking environment is a plus.
  • Desirable to have AWS Certifications (AWS Certified Solutions Architect and AWS Certified SysOps Administrator).
Responsibilities:
  • As the Senior Cloud Reliability Engineer in the SRE Service, they will be accountable for implementing reliability practices with software as means for the cloud foundational product line in the Federal Reserve.
  • The SRE Service is part of the Cloud Solutions & Services department and has overall responsibility for the reliability of the numerous cloud foundational environments in the FRS.
  • Works part of the cloud foundational platform squads focused on Cloud Networks to demonstrate and champion site reliability culture and practices, and exerts technical influence throughout the candidate's team.
  • Develops and maintains automations, scripts, and code associated with automating manual work, improving the reliability and stability of the cloud platform.
  • Develops, integrates, and maintains the synthetics (canaries) code to establish the health of the services.
  • Leads SLIs, SLOs, and Error budget efforts in collaboration with the product team to instrument, visualize for proactively managing the stability of cloud platforms
  • Implement observability (logs, metrics, traces) and monitoring for Cloud Network components like VPC, VPN Tunnels, GWLB, and Transit Gateway using tools like SevOne, Grafana, Dynatrace, AWS CloudWatch, and AWS Canary
  • Respond to and resolve incidents in a timely manner.
  • Use Infrastructure as Code (IaC) tools like Terraform to manage AWS resources.
  • Develops reusable artifacts and software utilities to industrialize SRE practices across FRS.
  • Other duties assigned as necessary.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.