Site Reliability Engineer

Overview

On Site
USD 36.01 - 73.61 per hour
Full Time

Skills

Creative Problem Solving
Finance
Real-time
Accountability
Professional Development
Software Deployment
Instrumentation
Production Support
Trading
System Monitoring
Collaboration
Reliability Engineering
Knowledge Sharing
System Administration
Application Support
Incident Management
Red Hat Enterprise Linux
Linux Administration
Microsoft Windows Server Administration
DevOps
AppDynamics
Splunk
Dynatrace
Cloud Computing
Atlassian
JIRA
Confluence
Bamboo
Research
Dashboard
Grafana
Google Cloud Platform
Google Cloud
Kubernetes
PaaS
IaaS
Pivotal
Cloud Foundry
Continuous Integration and Development
Continuous Integration
Continuous Delivery
High Availability
Offshoring
Professional Services
Communication

Job Details

Your Opportunity

At Schwab, you're empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us "challenge the status quo" and transform the finance industry together.

The Client Trading Experience Technology team is essential in supporting the operational reliability of real-time trading applications that operate 24x7x365 in locations across the world. We partner with multiple support teams to provide guidance and drive adoption of key reliability engineering practices in support of large-scaled mission-critical trading services. We are looking for skilled candidates enthusiastic about learning new and existing technologies to deliver solutions for the resiliency of our production systems. The role will require a high level of responsibility and accountability yet has a foundational structure for professional development and career growth.

As a Site Reliability Engineer, you will be responsible for proactively preventing production incidents by supporting application releases in our software deployment pipeline. During Blameless Post-mortem, you will have the opportunity to recommend improvements to monitoring and other processes in production and work with respective teams to design and implement the recommendations. Other key responsibilities include return to service activities, on-call rotation, and proactive monitoring.

Responsibilities include, but are not limited to:
  • Practice Site Reliability Engineering mindset and solve problems through automation, instrumentation, and simplicity
  • Partner with the Architects, Development Leads, Business Partners and other SREs in the team, to ensure implementations are architected and designed from the aspect of resiliency
  • Identify applications reliability and availability improvements, establish, and build solutions to continue to drive an improved experience
  • Perform production support, application deployments and provide a rapid response for critical trading applications
  • Proactively perform system monitoring, and review SLO / SLI Metrics and runbooks
  • Implement and collaborate on solutions that increase the monitoring and observability of systems at scale
  • Work with development teams to provide recommendations about system health upgrades and toil reduction
  • Advocate for Schwab's Reliability Engineering principles, guidelines, and standards
  • Foster a culture of learning through education and knowledge sharing around reliability practices, processes, and tools
  • Participate in On-Call escalations during Market and off-hours

What you have

Required Qualifications:
  • 4+ years of experience with large-scale enterprise system administration, application support or incident handling
  • 4+ years of experience of RHEL Linux administration or Windows server administration
  • 4+ years of experience with proven track record of supporting enterprise production environment while adhering to various DevOps & SRE frameworks
  • 4+ years of experience building application dashboards for proactive monitoring, setting up Alerts, etc.
  • 4+ years of experience with logging/application monitoring tools (AppDynamics, Splunk, Dynatrace, Thousand Eyes)
  • 2+ years of experience supporting applications on Cloud operations such as Google Cloud Platform and Pivotal Cloud Foundry (PCF)
  • 3+ years of experience using Atlassian tools Jira, Confluence, Bamboo

Preferred Qualifications:
  • Experience researching and building dashboards for Grafana and Prometheus
  • Experience with Google Cloud Anthos and Kubernetes
  • Strong understanding & experience of Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) such as Pivotal Cloud Foundry (PCF)
  • Experience with Continuous Integration/Continuous Delivery pipelines (CI/CD)
  • Understanding of High Availability Enterprise systems and leveraging tools to automate proactively and eventually predictive availability solutions
  • Receptive, approachable teammate, with the ability to positively interact with business partners, technology teams, offshore, and professional services
  • Strong advocate with excellent written and verbal communication skills
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.