Lead SRE

Overview

On Site
BASED ON EXPERIENCE
Contract - W2
Contract - Independent

Skills

Employment Authorization
Strategic Leadership
High Availability
Systems Design
Service Level
DevOps
Regulatory Compliance
Leadership
Mentorship
Collaboration
Communication
Problem Solving
Conflict Resolution
Continuous Improvement
Operational Excellence
Software Development
Java
C#
SQL
Windows PowerShell
Root Cause Analysis
Problem Management
Amazon Web Services
Open Source
Computer Networking
Cloud Computing
Dragon NaturallySpeaking
DNS
Virtual Private Cloud
API
WAF
Terraform
Knowledge Sharing
Linux
Apache Spark
Microsoft Azure
Scripting
Bash
Perl
Ruby
Python
Docker
Machine Learning (ML)
Apache Mesos
Kubernetes
Orchestration
Continuous Integration
Continuous Delivery
Computer Science
FOCUS
Professional Services
Genetics
Law

Job Details

Title: Lead Site Reliability Engineer (SRE)
Rate: $80-$84/hour
Location: Hybrid - Onsite Tuesdays and Wednesdays in Arlington, TX; 3 days remote
Duration: 6-month contract to hire
Work Authorization: , or authorized to work in the U.S.


About the Role

The Lead Site Reliability Engineer (SRE) will provide strategic leadership for building and running large-scale software systems. This role involves identifying and delivering automation solutions to ensure high availability and resiliency, leveraging expertise in software development, complexity analysis, and scalable system design. The Lead SRE will work closely with engineering teams to ensure systems are stable, performant, and meet business and user expectations.


Key Responsibilities

  • Lead architecture and development teams to ensure applications are highly available, reliable, and performant at scale.

  • Partner with architecture teams to integrate operability, measurability, and manageability into business features.

  • Collaborate with product owners to establish Service Level Objectives (SLOs) and define consequences for non-compliance.

  • Identify monitoring gaps, improve application performance, and assist with troubleshooting.

  • Drive Root Cause Analysis (RCA) of failures within software, pipelines, or DevOps processes.

  • Design, build, and advocate for automated solutions to optimize uptime with minimal human intervention.

  • Participate in on-call rotations as needed.

  • Create and implement standards and best practices across teams and vendors.

  • Perform other duties as assigned and ensure compliance with company policies.


What Makes You a Dream Candidate

  • Proven leadership skills with the ability to guide and mentor teams.

  • Strong collaboration and communication skills.

  • Proactive approach to problem-solving and continuous improvement.

  • Passion for automation and operational excellence.

  • Deep technical expertise in cloud technologies and software development.


Knowledge & Skills

  • Strong experience in C# or Java (C# preferred).

  • Proficiency in SQL and PowerShell.

  • Expertise in defining and evaluating SLOs/SLIs and associated consequences.

  • Experience performing RCA and Problem Management.

  • Hands-on experience with cloud-native applications on Azure and AWS, including monitoring, networking, and containerization.

  • Proficiency in containerization technologies such as Azure Kubernetes Service, Kubernetes (open source), and Docker.

  • Knowledge of monitoring tools like Azure Application Insights and Azure Monitor.

  • Familiarity with networking technologies relevant to cloud platforms (DNS, VPC, API Gateway, WAF/CDN, etc.).

  • Strong experience with Terraform for Infrastructure-as-Code.

  • Ability to foster a culture of learning and knowledge sharing across teams.


Education & Experience

  • 5-7 years hands-on experience supporting Linux production environments.

  • 5-7 years hands-on Spark administration experience.

  • Hands-on experience with Microsoft Azure required.

  • 3-5 years hands-on scripting experience (Bash, Perl, Ruby, Python).

  • 3-5 years experience with Docker Datacenter.

  • 2-4 years hands-on experience on machine learning platforms.

  • Minimum 1 year experience with Mesos, Kubernetes, OpenShift, or similar container orchestration platforms.

  • Minimum 1 year hands-on experience with CI/CD tools and technologies.

  • Minimum 1 year leading an SRE team.

  • Bachelor s degree in Computer Science, Engineering, or related field; Master s preferred.

Our benefits package includes:
  • Comprehensive medical benefits
  • Competitive pay
  • 401(k) retirement plan
  • and much more!

About INSPYR Solutions
Technology is our focus and quality is our commitment. As a national expert in delivering flexible technology and talent solutions, we strategically align industry and technical expertise with our clients' business objectives and cultural needs. Our solutions are tailored to each client and include a wide variety of professional services, project, and talent solutions. By always striving for excellence and focusing on the human aspect of our business, we work seamlessly with our talent and clients to match the right solutions to the right opportunities. Learn more about us at inspyrsolutions.com.

INSPYR Solutions provides Equal Employment Opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability, or genetics. In addition to federal law requirements, INSPYR Solutions complies with applicable state and local laws governing nondiscrimination in employment in every location in which the company has facilities

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About INSPYR Solutions