Site Reliability Engineer (SRE)

Overview

On Site

Accepts corp to corp applications

Contract - W2

Contract - Independent

Contract - 28 day((s))

Skills

azure

AWS

GCP

CI/CD

DevOps

ansible

Kubernetes

Docker

cloud infrastructure

system reliability

Linux system administration

Job Details

Job Title: Site Reliability Engineer (SRE)

Location: Alpharetta, GA- Only Local

Job Description:

We are looking for an experienced Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in DevOps, cloud infrastructure, automation, monitoring, and system reliability. You will be responsible for ensuring high availability, scalability, and performance of production systems while driving operational excellence through automation.

Key Responsibilities:

Design, build, and maintain scalable and reliable infrastructure on AWS / Azure / Google Cloud Platform.
Develop automation for deployment, monitoring, and incident response.
Implement CI/CD pipelines using tools like Jenkins, GitHub Actions, or GitLab CI.
Monitor system performance and ensure uptime, latency, and capacity optimization.
Build and maintain infrastructure as code using Terraform, Ansible, or CloudFormation.
Collaborate with development teams to improve system reliability and deployment processes.
Implement robust monitoring, alerting, and logging using Prometheus, Grafana, ELK, or Datadog.
Participate in on-call rotations, incident response, and root cause analysis.

Required Skills:

10+ years of experience as an SRE, DevOps, or Cloud Engineer.
Hands-on experience with AWS, Azure, or Google Cloud Platform.
Strong scripting skills in Python, Bash, or Go.
Proficient with Docker, Kubernetes, Helm.
Experience with Terraform, Ansible, or other IaC tools.
Expertise in monitoring & observability tools (Prometheus, Grafana, Splunk, ELK, Datadog).
Solid understanding of Linux system administration and networking concepts.
Strong troubleshooting and problem-solving skills.

Preferred Skills:

Experience with microservices and service mesh (Istio/Linkerd).
Familiarity with security best practices and incident management.
Experience in performance tuning and capacity planning.
Exposure to SLA/SLO/SLI management and reliability metrics

Education:

Bachelor's or Master's degree in Computer Science, Information Technology, or related field.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

About Kanshe Infotech

Share