Site Reliability Engineer (SRE)

Overview

On Site
Full Time
Part Time
Accepts corp to corp applications
Contract - W2
Contract - Independent

Skills

High Availability
Game Development
Workflow
Incident Management
Management
IaaS
Migration
Data Centers
Root Cause Analysis
Repair
Continuous Integration
Continuous Delivery
Documentation
Virtualization
Cloud Computing
Amazon Web Services
VMware
Kubernetes
Docker
Elasticsearch
Apache Kafka
System Administration
Computer Networking
Orchestration
Progress Chef
Puppet
Terraform
Jenkins
Python
Golang
Java

Job Details

Job Role:Site Reliability Engineer (SRE)

Location:Austin, TX

*******W2 & C2C CONTRACT ONLY *******





Job Description:





Role Summary:

Support live services by ensuring high availability of infrastructure, primary services, and studio services.

Enable rapid game development through on-demand infrastructure services and cloud-based workflows.

Engage across the full product lifecycle-from architecture and delivery to production deployment and incident response.

Manage both on-premises and cloud resources with a strong grasp of cloud infrastructure fundamentals.

________________________________________



Key Responsibilities:

Design and architect distributed systems in the cloud; assist in migrating systems from on-prem data centers to the cloud.

Develop monitoring, alerting, and dashboarding solutions to enhance visibility into application performance and business metrics.

Troubleshoot and maintain large-scale distributed production systems across on-prem and cloud environments.

Conduct root cause analysis and post-mortems to prevent future incidents.

Leverage automation to reduce toil, improve detection and resolution times (MTTD & MTTR), and repair services.

Design and implement CI/CD pipelines.

Create documentation and support tooling for online support teams.

________________________________________



Qualifications & Skills:

Experience monitoring infrastructure and application availability to meet SLI and SLO targets.

Proficiency in virtualization, containerization, and cloud computing (AWS preferred).

Familiarity with VMWare ecosystems, Kubernetes, and Docker.

Knowledge of tools and technologies such as:

ElasticSearch

Prometheus

Graphite

Kafka

Strong systems administration skills, particularly in *nix environments.

Solid understanding of networking protocols and components.

Experience with automation and orchestration tools like:

Chef

Puppet

Terraform

Packer

Jenkins

Programming experience in Python, Golang, and/or Java.

Background in working with distributed systems

NOTE:If you are interested,Please send your updated resumes to Madhurima at galaxyitech dot com (or) Phone:Four Eight Zero - Four Zero Seven - Six Nine One Eight )

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.