Site Reliability Engineer - Senior

Overview

On Site
USD 65.00 - 70.00 per hour
Full Time

Skills

Problem Solving
Instrumentation
User Experience
Production Support
Trading
System Monitoring
Collaboration
Reliability Engineering
Knowledge Sharing
System Administration
Application Support
Incident Management
Red Hat Enterprise Linux
Linux Administration
Microsoft Windows Server Administration
DevOps
AppDynamics
Splunk
Dynatrace
Cloud Computing
Atlassian
JIRA
Confluence
Bamboo
Research
Dashboard
Grafana
Google Cloud
Google Cloud Platform
Kubernetes
PaaS
IaaS
Pivotal
Cloud Foundry
Continuous Integration and Development
Continuous Integration
Continuous Delivery
High Availability
Offshoring
Professional Services
Communication
Privacy
Marketing

Job Details

Location: Dallas, TX
Salary: $65.00 USD Hourly - $70.00 USD Hourly
Description: Our client is currently seeking a Site Reliability Engineer - Senior

We're looking for a Staff Site Reliability Engineer (SRE) to join our team and help us build highly reliable, scalable, and resilient systems. In this role, you'll apply an SRE mindset to solve complex problems through automation, instrumentation, and simplification. You'll partner closely with architects, development leads, and business partners to ensure our implementations are designed with resiliency in mind from the ground up.

If you're passionate about improving application reliability and availability, building robust solutions, and advocating for best practices, we encourage you to apply!

Responsibilities
  • Practice a Site Reliability Engineering mindset, solving problems through automation, instrumentation, and simplicity.
  • Partner with Architects, Development Leads, Business Partners, and other SREs to ensure implementations are architected and designed for resiliency.
  • Identify opportunities for application reliability and availability improvements, then establish and build solutions to enhance the user experience.
  • Perform production support for critical trading applications, including deployments and rapid incident response.
  • Proactively perform system monitoring, review SLO/SLI metrics, and maintain runbooks.
  • Implement and collaborate on solutions that increase the monitoring and observability of systems at scale.
  • Work with development teams to provide recommendations for system health upgrades and toil reduction.
  • Advocate for our Reliability Engineering principles, guidelines, and standards.
  • Foster a culture of learning through education and knowledge sharing around reliability practices, processes, and tools.
  • Participate in on-call escalations during market and off-hours.


Minimum Qualifications
  • 6+ years of experience with large-scale enterprise system administration, application support, or incident handling in an SRE role.
  • 6+ years of experience with RHEL Linux administration or Windows server administration.
  • 6+ years of experience supporting enterprise production environments while adhering to DevOps & SRE frameworks.
  • 6+ years of experience building application dashboards for proactive monitoring and setting up alerts.
  • 6+ years of experience with logging/application monitoring tools (e.g., AppDynamics, Splunk, Dynatrace, Thousand Eyes).
  • 4+ years of experience supporting applications on Cloud operations such as Google Cloud Platform and Pivotal Cloud Foundry (PCF).
  • 4+ years of experience using Atlassian tools like Jira, Confluence, and Bamboo.


Preferred Qualifications
  • Experience researching and building dashboards for Grafana and Prometheus.
  • Experience with Google Cloud Anthos and Kubernetes.
  • Strong understanding and experience with Platform as a Service (PaaS) and Infrastructure as a Service (IaaS), such as Pivotal Cloud Foundry (PCF).
  • Experience with Continuous Integration/Continuous Delivery (CI/CD) pipelines.
  • Understanding of High Availability Enterprise systems and leveraging tools to automate proactively and predict availability solutions.
  • Receptive, approachable team player with the ability to positively interact with business partners, technology teams, offshore, and professional services.
  • Strong advocate with excellent written and verbal communication skills.

By providing your phone number, you consent to: (1) receive automated text messages and calls from the Judge Group, Inc. and its affiliates (collectively "Judge") to such phone number regarding job opportunities, your job application, and for other related purposes. Message & data rates apply and message frequency may vary. Consistent with Judge's Privacy Policy, information obtained from your consent will not be shared with third parties for marketing/promotional purposes. Reply STOP to opt out of receiving telephone calls and text messages from Judge and HELP for help.

Contact:

This job and many more are available through The Judge Group. Please apply with us today!
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Judge Group, Inc.