Site Reliability Engineer

Overview

On Site

USD 36.01 - 73.61 per hour

Full Time

Skills

Creative Problem Solving

Finance

Real-time

Accountability

Professional Development

Software Deployment

Instrumentation

Production Support

Trading

System Monitoring

Collaboration

Reliability Engineering

System Administration

Application Support

Incident Management

Red Hat Enterprise Linux

Linux Administration

Microsoft Windows Server Administration

DevOps

AppDynamics

Splunk

Dynatrace

Cloud Computing

Atlassian

JIRA

Confluence

Bamboo

Research

Dashboard

Grafana

Google Cloud Platform

Google Cloud

Kubernetes

PaaS

IaaS

Pivotal

Cloud Foundry

Continuous Integration and Development

Continuous Integration

Continuous Delivery

High Availability

Offshoring

Professional Services

Communication

Job Details

Your Opportunity

At Schwab, you're empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us "challenge the status quo" and transform the finance industry together.

The Client Trading Experience Technology team is essential in supporting the operational reliability of real-time trading applications that operate 24x7x365 in locations across the world. We partner with multiple support teams to provide guidance and drive adoption of key reliability engineering practices in support of large-scaled mission-critical trading services. We are looking for skilled candidates enthusiastic about learning new and existing technologies to deliver solutions for the resiliency of our production systems. The role will require a high level of responsibility and accountability yet has a foundational structure for professional development and career growth.

As a Site Reliability Engineer, you will be responsible for proactively preventing production incidents by supporting application releases in our software deployment pipeline. During Blameless Post-mortem, you will have the opportunity to recommend improvements to monitoring and other processes in production and work with respective teams to design and implement the recommendations. Other key responsibilities include return to service activities, on-call rotation, and proactive monitoring.

Responsibilities include, but are not limited to:

Practice Site Reliability Engineering mindset and solve problems through automation, instrumentation, and simplicity
Partner with the Architects, Development Leads, Business Partners and other SREs in the team, to ensure implementations are architected and designed from the aspect of resiliency

Identify applications reliability and availability improvements, establish, and build solutions to continue to drive an improved experience
Perform production support, application deployments and provide a rapid response for critical trading applications

Proactively perform system monitoring, and review SLO / SLI Metrics and runbooks
Implement and collaborate on solutions that increase the monitoring and observability of systems at scale
Work with development teams to provide recommendations about system health upgrades and toil reduction
Advocate for Schwab's Reliability Engineering principles, guidelines, and standards
Foster a culture of learning through education and knowledge sharing around reliability practices, processes, and tools
Participate in On-Call escalations during Market and off-hours

What you have

Required Qualifications:

4+ years of experience with large-scale enterprise system administration, application support or incident handling
4+ years of experience of RHEL Linux administration or Windows server administration
4+ years of experience with proven track record of supporting enterprise production environment while adhering to various DevOps & SRE frameworks
4+ years of experience building application dashboards for proactive monitoring, setting up Alerts, etc.
4+ years of experience with logging/application monitoring tools (AppDynamics, Splunk, Dynatrace, Thousand Eyes)
2+ years of experience supporting applications on Cloud operations such as Google Cloud Platform and Pivotal Cloud Foundry (PCF)
3+ years of experience using Atlassian tools Jira, Confluence, Bamboo

Preferred Qualifications:

Experience researching and building dashboards for Grafana and Prometheus
Experience with Google Cloud Anthos and Kubernetes
Strong understanding & experience of Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) such as Pivotal Cloud Foundry (PCF)
Experience with Continuous Integration/Continuous Delivery pipelines (CI/CD)
Understanding of High Availability Enterprise systems and leveraging tools to automate proactively and eventually predictive availability solutions
Receptive, approachable teammate, with the ability to positively interact with business partners, technology teams, offshore, and professional services
Strong advocate with excellent written and verbal communication skills

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share