Site Reliability Engineer

Austin, MI, US • Posted 7 days ago • Updated 6 hours ago

Contract W2

On-site

$57 - $101 hourly

Kforce Technology Staffing

Fitment

Dice Job Match Score™

🛠️ Calibrating flux capacitors...

Job Details

Skills

Scalability
Continuous Improvement
Software Engineering
Customer Experience
Incident Management
Apache Kafka
PostgreSQL
Apache Cassandra
Redis
Dashboard
Provisioning
Performance Metrics
CHAOS
Testing
Systems Design
Capacity Management
Documentation
Mentorship
DevOps
Communication
Collaboration
Software Development
FOCUS
Programming Languages
Golang
Java
Python
Amazon Web Services
Microsoft Azure
Google Cloud
Google Cloud Platform
Cloud Computing
Kubernetes
Orchestration
Storage
NFS
HDFS
Ceph
Amazon S3
Resource Management
Apache Mesos
Continuous Integration
Continuous Delivery
GitLab
Linux
Operating Systems
Performance Tuning
Debugging
Grafana
Splunk
Nagios
Root Cause Analysis
Business Analysis
Leadership
Machine Learning (ML)
Predictive Analytics
Artificial Intelligence
Messaging

Summary

RESPONSIBILITIES:
Kforce's client is seeking a Principal Site Reliability Engineer (SRE) to lead platform-first initiatives that improve scalability, reliability, and system performance. This role ensures the stability of large-scale distributed systems while driving continuous improvement and strengthening SRE practices across the organization. The Principal SRE blends software engineering with systems expertise to build resilient services, reduce downtime, and improve customer experience. They will lead incident response, automate repetitive work, enhance observability, and drive platform reliability at scale.

Responsibilities:
* Build and maintain infrastructure for distributed systems (Kubernetes, Kafka, Postgres, Cassandra, Redis)
* Improve monitoring, alerting, dashboards, SLIs/SLOs, and system visibility
* Partner with engineering to enhance reliability across critical services
* Advance CI/CD pipelines, deployment safety, and automated provisioning
* Analyze performance metrics, troubleshoot issues, and drive RCA
* Lead incident response and long-term remediation
* Implement automation, observability tooling, chaos testing, and predictive analytics
* Support system design reviews, capacity planning, and platform improvements
* Promote documentation, mentorship, and a blameless engineering culture

REQUIREMENTS:
* Bachelor's degree in Computer Science or related discipline (or equivalent experience)
* Cloud, DevOps, SRE, or Kubernetes certifications preferred
* CKA certification is a strong plus
* Strong engineering background; Proficiency in Go, Java, or Python
* Cloud experience (AWS/Azure/Google Cloud Platform) and distributed systems expertise
* Advanced Kubernetes experience; CKA preferred
* Experience with modern observability tools (Prometheus, Grafana, Datadog, Splunk, ELK, Jaeger)
* CI/CD (GitLab), Linux OS expertise, troubleshooting, and RCA excellence
* Strong communication and cross-team collaboration skills
* Proven expertise in software development with a strong focus on building and supporting large-scale distributed systems
* Proficiency in one or more high-level programming languages commonly used in distributed environments, such as Go (Golang), Java, or Python
* Extensive experience working with cloud providers such as AWS, Azure, or Google Cloud Platform, including hands-on development of cloud-native services
* Advanced expertise with Kubernetes and container orchestration platforms
* Experience with distributed storage technologies including NFS, HDFS, Ceph, and Amazon S3, as well as resource management frameworks such as Mesos or Yarn
* Deep understanding of CI/CD pipelines, automation, and deployment tools (e.g., GitLab)
* Expert-level knowledge of Linux operating systems, including performance tuning, debugging, and system internals
* Strong proficiency with modern monitoring and observability tools, such as Prometheus, Grafana, Datadog, Splunk, ELK, Jaeger, and Nagios Core
* Highly developed troubleshooting abilities, including advanced root cause analysis skills for complex production issues
* Solid business analysis skills with the ability to integrate cross-functional metrics and align engineering work with business outcomes
* Prior Principal SRE or reliability leadership experience
* Experience with AI/ML for predictive analytics and anomaly detection

The pay range is the lowest to highest compensation we reasonably in good faith believe we would pay at posting for this role. We may ultimately pay more or less than this range. Employee pay is based on factors like relevant education, qualifications, certifications, experience, skills, seniority, location, performance, union contract and business needs. This range may be modified in the future.

We offer comprehensive benefits including medical/dental/vision insurance, HSA, FSA, 401(k), and life, disability & ADD insurance to eligible employees. Salaried personnel receive paid time off. Hourly employees are not eligible for paid time off unless required by law. Hourly employees on a Service Contract Act project are eligible for paid sick leave.

Note: Pay is not considered compensation until it is earned, vested and determinable. The amount and availability of any compensation remains in Kforce's sole discretion unless and until paid and may be modified in its discretion consistent with the law.

This job is not eligible for bonuses, incentives or commissions.

Kforce is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, pregnancy, sexual orientation, gender identity, national origin, age, protected veteran status, or disability status.

By clicking ?Apply Today? you agree to receive calls, AI-generated calls, text messages or emails from Kforce and its affiliates, and service providers. Note that if you choose to communicate with Kforce via text messaging the frequency may vary, and message and data rates may apply. Carriers are not liable for delayed or undelivered messages. You will always have the right to cease communicating via text by using key words such as STOP.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: kforcecx
Position Id: ITWQG2168138
Posted 7 days ago

Company Info

About Kforce Technology Staffing

Kforce is a solutions firm specializing in technology, finance and accounting, and professional staffing services. Our KNOWLEDGEforce® empowers industry-leading companies to achieve their digital transformation goals. We curate teams of technical experts who deliver solutions custom-tailored to each client’s needs. These scalable, flexible outcomes are shaped by deep market knowledge, thought leadership and our multi-industry expertise.

Our integrated approach is rooted in 60 years of proven success deploying highly skilled professionals on a temporary and direct-hire basis. Each year, approximately 18,000 talented experts work with the Fortune 500 and other leading companies. Together, we deliver Great Results Through Strategic Partnership and Knowledge Sharing®

NYSE: KFRC

Go to company profile

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Site Reliability Engineer

Greenwood Village, Colorado

•

Today

RESPONSIBILITIES: Kforce's client in Greenwood Village, CO is seeking a Site Reliability Engineer (SRE) to strengthen the reliability, scalability, and performance of enterprise systems and applications. This role bridges software engineering and infrastructure operations, focusing on automation, observability, and continuous improvement. The ideal candidate brings hands-on experience with monitoring platforms such as Splunk and Datadog, strong scripting capabilities in Python, and practical exp

Contract

$50.31 - $62.19 hourly

DevOps Engineer with Terraform

Greenwood Village, Colorado

•

Today

RESPONSIBILITIES: Kforce has a client that is seeking a DevOps Engineer with Terraform in Greenwood Village, CO. Summary: We're looking for a seasoned DevOps Engineer to drive automation, scalability, and efficiency across our software delivery lifecycle. This role is critical to building and maintaining robust CI/CD pipelines, orchestrating containerized workloads, and optimizing cloud infrastructure in AWS. If you thrive in a fast-paced environment and know how to turn code into production-re

Contract

$50.49 - $65.78 hourly

Senior Full Stack Cloud Engineer - Smart Grid 2.0

West Palm Beach, Florida

•

Today

RESPONSIBILITIES: Kforce has a client in West Palm Beach, FL that is seeking a Senior Full Stack Cloud Engineer - Smart Grid 2.0. Essential Functions & Responsibilities: Cloud & IoT Solutions Implementation (50%): * Design and implement secure & scalable cloud-native architectures (Microservices, Serverless, Container) using comprehensive AWS stack * Build end-to-end IoT solutions from device connectivity through data analytics * Develop real-time streaming data pipelines from hardware devices

Contract

$65.00 - $72.50 hourly

Senior Full-Stack Engineer (NodeJS/Angular)

Remote or Boca Raton, Florida

•

Today

RESPONSIBILITIES: Kforce has a client that is seeking a remote Senior Full-Stack Engineer (NodeJS/Angular). Summary: We are seeking a Senior Full-Stack Engineer with deep expertise in NodeJS and Angular to drive the development of this cutting-edge product. You will play a key role in building highly performant, analytics-driven, and visualization-focused applications that empower decision-makers with actionable insights. Key Responsibilities: * Design and develop scalable and high-performance

Contract

$50 - $60 hourly

Search all similar jobs