Why UKG:
At UKG, the work you do matters. The code you ship, the decisions you make, and the care you show a customer all add up to real impact. Today, tens of millions of workers start and end their days with our workforce operating platform. Helping people get paid, grow in their careers, and shape the future of their industries. That's what we do.
We never stop learning. We never stop challenging the norm. We push for better, and we celebrate the wins along the way. Here, you'll get flexibility that's real, benefits you can count on, and a team that succeeds together. Because at UKG, your work matters-and so do you.
Site Reliability Engineer III
Site Reliability Engineers (SREs) at UKG are experienced individual contributors who apply software engineering principles to operational challenges across the full service lifecycle. In this role, you will proactively monitor system health, manage risk through SLOs and error budgets, lead incident response, and enable safe, rapid change - balancing reliability with delivery velocity.
SREs at UKG are passionate about learning and evolving with modern technologies. We strive to innovate and relentlessly improve the customer experience, with an "automate everything" mindset that enables services to be delivered with speed, consistency, and high availability.
About the Role and Job Responsibilities
- Engage in and improve the lifecycle of services from conception to end-of-life, including system design reviews, capacity planning, and production readiness.
- Contribute to standards and best practices for system architecture, service delivery, reliability, and automation, including the definition and monitoring of service health indicators (latency, traffic, error rates, and resource saturation), service level objectives (SLOs), and the use of error budgets to guide operational and delivery decisions.
- Support service, product, and engineering teams by leveraging common tooling and frameworks to increase availability and improve incident detection and response.
- Improve system performance, availability, and efficiency through automation, process refinement, post-incident reviews, and in-depth configuration analysis.
- Collaborate closely with engineering teams across the organization to deliver and operate reliable services.
- Increase operational efficiency, effectiveness, and service quality by treating operational challenges as software engineering problems (reducing toil).
- Share knowledge and contribute to a culture of Site Reliability Engineering best practices within the team.
- Mentor and guide junior engineers on SRE principles, reliability practices, and operational standards.
- Actively participate in incident response, including on-call rotations and post-incident reviews, collaborating with engineering teams to restore service and reduce recurrence.
Required Qualifications
- 5+ years of hands-on experience in software engineering, systems engineering, or cloud-based environments, with a demonstrated ability to work independently on complex, ambiguous problems.
- 3+ years of experience working with public cloud platforms (e.g., Google Cloud Platform (preferred), AWS, or Azure).
- 3+ years of experience configuring, operating, and maintaining applications and/or systems infrastructure in a large-scale, customer-facing environment.
- Demonstrated understanding of observability best practices, including metric generation and collection, log aggregation pipelines, time-series databases, and distributed tracing.
- Experience coding in one or more higher-level programming languages (e.g., Python, Java, or C++).
- Strong working knowledge of Linux systems, including troubleshooting, performance analysis, and scripting in production environments.
- Experience with GitHub Actions and modern CI/CD practices.
- Hands-on experience with containerization and container orchestration (Docker, Kubernetes) in production environments.
- Experience building operational dashboards and alerts using observability tools such as Splunk or Grafana.
- Able to communicate technical risk and tradeoffs clearly to non-technical stakeholders.
Preferred Qualifications
- Experience with distributed system design and architecture.
- Experience with infrastructure-as-code and configuration management tools (e.g., Terraform, Ansible).
- Solid grounding in at least two of the following areas: Computer Science fundamentals, Cloud Architecture, Security, or Network Design.
Company Overview:
UKG is the Workforce Operating Platform that puts workforce understanding to work. With the world's largest collection of workforce insights, and people-first AI, our ability to reveal unseen ways to build trust, amplify productivity, and empower talent, is unmatched. It's this expertise that equips our customers with the intelligence to solve any challenge in any industry - because great organizations know their workforce is their competitive edge. Learn more at ukg.com.
Equal Opportunity Employer
UKG is an equal opportunity employer. We evaluate qualified applicants without regard to race, color, disability, religion, sex, age, national origin, veteran status, genetic information, and other legally protected categories.
View The EEO Know Your Rights poster
UKG participates in E-Verify. View the E-Verify posters here.
It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.
Disability Accommodation in the Application and Interview Process
For individuals with disabilities that need additional assistance at any point in the application and interview process, please email
The pay range for this position is $102,300.00 to $133,900.00 USD. The actual base pay offered may vary depending on skills, experience, job-related knowledge and work location. In addition to base pay, employees may be eligible to participate in a performance-based bonus plan and to receive restricted stock unit awards as part of total compensation. Learn more about UKG's benefits and rewards at
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
- Dice Id: RTL83235
- Position Id: b711ec8a897765a249aa0157e6a51dfd
- Posted 2 days ago