Site Reliability Engineer (SRE)

Overview

On Site

Full Time

Part Time

Accepts corp to corp applications

Contract - W2

Contract - Independent

Skills

High Availability

Game Development

Workflow

Incident Management

Management

IaaS

Migration

Data Centers

Root Cause Analysis

Repair

Continuous Integration

Continuous Delivery

Documentation

Virtualization

Cloud Computing

Amazon Web Services

VMware

Kubernetes

Docker

Elasticsearch

Apache Kafka

System Administration

Computer Networking

Orchestration

Progress Chef

Puppet

Terraform

Jenkins

Python

Golang

Java

Job Details

Job Role:Site Reliability Engineer (SRE)

Location:Austin, TX

*******W2 & C2C CONTRACT ONLY *******

Job Description:

Role Summary:

Support live services by ensuring high availability of infrastructure, primary services, and studio services.

Enable rapid game development through on-demand infrastructure services and cloud-based workflows.

Engage across the full product lifecycle-from architecture and delivery to production deployment and incident response.

Manage both on-premises and cloud resources with a strong grasp of cloud infrastructure fundamentals.

________________________________________

Key Responsibilities:

Design and architect distributed systems in the cloud; assist in migrating systems from on-prem data centers to the cloud.

Develop monitoring, alerting, and dashboarding solutions to enhance visibility into application performance and business metrics.

Troubleshoot and maintain large-scale distributed production systems across on-prem and cloud environments.

Conduct root cause analysis and post-mortems to prevent future incidents.

Leverage automation to reduce toil, improve detection and resolution times (MTTD & MTTR), and repair services.

Design and implement CI/CD pipelines.

Create documentation and support tooling for online support teams.

________________________________________

Qualifications & Skills:

Experience monitoring infrastructure and application availability to meet SLI and SLO targets.

Proficiency in virtualization, containerization, and cloud computing (AWS preferred).

Familiarity with VMWare ecosystems, Kubernetes, and Docker.

Knowledge of tools and technologies such as:

ElasticSearch

Prometheus

Graphite

Kafka

Strong systems administration skills, particularly in *nix environments.

Solid understanding of networking protocols and components.

Experience with automation and orchestration tools like:

Chef

Puppet

Terraform

Packer

Jenkins

Programming experience in Python, Golang, and/or Java.

Background in working with distributed systems

NOTE:If you are interested,Please send your updated resumes to Madhurima at galaxyitech dot com (or) Phone:Four Eight Zero - Four Zero Seven - Six Nine One Eight )

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share