Sr Site Reliability Engineer

Overview

On Site

$70 - $75

Full Time

Skills

kubernetes

ecs

docker

ansible

terraform

cloudwatch

jira

Job Details

Contract to Hire | Hybrid McKinney, TX | U.S. Citizenship Required

We re seeking a Senior Site Reliability Engineer (SRE) to join our client s team and help design, implement, and maintain highly reliable, scalable, secure, and cost-effective infrastructure solutions. In this role, you'll play a critical part in improving system stability, observability, and overall performance across our platforms.

As a Senior SRE, you'll serve as a bridge between development and operations, applying a software engineering mindset to infrastructure and systems management. You ll proactively identify areas for optimization, build robust infrastructure, and foster a culture of operational excellence throughout the organization.

Key Responsibilities

System Administration: Install, configure, and maintain Linux environments and container orchestration platforms to ensure high availability and performance. Responsibilities include kernel tuning, user permissions, and troubleshooting both hardware and software issues.
Network Administration: Design, monitor, and troubleshoot network systems and protocols (e.g., DNS, DHCP, VPN). Secure networks through segmentation and access control.
Monitoring & Observability: Implement comprehensive observability solutions using tools like Prometheus and Grafana. Set up alerting systems for proactive issue detection and resolution.
Automation: Leverage Infrastructure as Code (IaC) tools such as Terraform and Ansible to automate provisioning and configuration tasks, ensuring consistency across environments.
Security: Apply best practices to secure systems and networks, including firewalling, IDS, vulnerability management, and encryption protocols.
Incident Response: Participate in on-call rotations to troubleshoot and resolve production incidents quickly and effectively.
Documentation & Collaboration: Create clear documentation for systems, processes, and architectures while fostering strong cross-functional team relationships.

Non-Technical

Leads through influence, mentoring and empowering peers
Balances tactical and strategic needs to address both short and long-term organizational priorities based on articulated team and company goals
Demonstrates intrinsic motivation
Writes clear, concise, and meaningful documentation Develops and leverages collaborative, empathetic relationships across the organization
Ability to make and explain thoughtful decisions based on sound logical, analytical, data-driven reasoning

Technical

Expertise with container management (Kubernetes, ECS, Docker, Helm)
Expertise with configuration management (Ansible, Chef, Puppet)
Expertise with infrastructure as code (Terraform, OpenTofu, Pulumi)
Expertise with monitoring and alerting systems (Cloudwatch, Datadog, New Relic, Site24x7, Dynatrace)
Expertise with Linux systems deployment, management, performance tuning, and debugging
Expertise with computer networking Experience with VCS systems and providers (Git, Mercurial, Github, Sourcehut)
Experience with CI/CD systems (Github Actions, Circle CI, Argo)
Experience with ticket management systems (Jira, Shortcut, Azure DevOps)

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Key Responsibilities

Share