Cloud Infrastructure Site Reliability Engineer (SRE)

Overview

On Site
Full Time

Skills

Java
SRE
AWS
Azure
Terraform
Python
Cloud
Automation
GCP
C
Ansible
GO
CloudFormation

Job Details


Cloud Infrastructure Site Reliability Engineer (SRE)

Role Location: Alpharetta, Berkeley (Onsite)
Skills: SRE, Cloud, Automation

Who We Are:

Born digital, UST transforms lives through the power of technology. We walk alongside our clients and partners, embedding innovation and agility into everything they do. We help them create transformative experiences and human-centered solutions for a better world.

UST is a mission-driven group of 29,000+ practical problem solvers and creative thinkers in more than 30 countries. Our entrepreneurial teams are empowered to innovate, act nimbly, and create a lasting and sustainable impact for our clients, their customers, and the communities in which we live.

With us, you'll create a boundless impact that transforms your career-and the lives of people across the world.

Visit us at UST.com.

You Are:

As a Cloud Infrastructure Site Reliability Engineer (SRE) with expertise in multiple public cloud service provider platforms, you will be responsible for operating infrastructure solutions, following the principles and practices pioneered by Google's SRE model.

The opportunity:

Design, build, and maintain highly available, scalable, and secure cloud infrastructure on platforms such as AWS, Google Cloud Platform, or Azure.

Develop and implement automation for provisioning, monitoring, scaling, and incident response using Infrastructure-as-Code tools (e.g., Terraform, CloudFormation, Ansible).

Monitor system reliability, capacity, and performance; proactively detect and address issues before they impact users.

Respond to production incidents, participate in on-call rotations, and lead post-incident reviews to drive root cause analysis and reliability improvements.

Collaborate with software engineering and security teams to ensure new services and features are production-ready and meet reliability standards.

Build and maintain tools for deployment, monitoring, and operations; automate manual processes to reduce toil.

This position description identifies the responsibilities and tasks typically associated with the performance of the position. Other relevant essential functions may be required.

What you need:

Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.

3+ years of experience in software development with proficiency in at least one programming language (e.g., Python, Go, Java, C++).

Experience administrating cloud platforms (AWS, Google Cloud Platform, Azure), including networking, security, containerization, storage, data management, and serverless technologies.

Solid understanding of Linux systems, networking fundamentals, virtualized, and distributed systems, file systems, system processes and configurations.

Deep understanding of observability (monitoring, ing, and logging) tools in cloud environments. Ability to set up and maintain monitoring dashboards, s, and logs.

Familiarity with Continuous Integration/Continuous Deployment (CI/CD) tools for automated testing, deployments, provisioning, and observability.

Ability to manage and respond to incidents, perform root cause analysis, and implement post-mortem reviews.

Understanding of setting, monitoring, and maintaining Service-Level Objectives (SLOs) and Service-Level Agreements (SLAs) for system reliability.

Additional Qualifications a Plus:

Experience working with enterprise-scale financial services or other regulated industries

5+ years of experience in SRE, DevOps, infrastructure, or cloud engineering roles, preferably supporting large-scale, distributed systems.

Excellent problem-solving, troubleshooting, and communication skills.

Experience leading technical projects or mentoring junior engineers.

Certifications: Certified Engineer, DevOps, SRE, CSREF

Compensation can differ depending on factors including but not limited to the specific office location, role, skill set, education, and level of experience. UST provides a reasonable range of compensation for roles that may be hired in various U.S. markets as set forth below.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.