Site Reliability Engineer

Overview

On Site

Full Time

Part Time

Accepts corp to corp applications

Contract - Independent

Contract - W2

Skills

Employment Authorization

Management

Continuous Integration

Continuous Delivery

Incident Management

Root Cause Analysis

Collaboration

DevOps

Kubernetes

Orchestration

Linux

Unix

Operating Systems

Performance Analysis

NMON

Log Analysis

Database Administration

Performance Tuning

PL/SQL

Python

Java

Node.js

Artificial Intelligence

Machine Learning (ML)

Workflow

Cloud Computing

Amazon Web Services

Google Cloud

Google Cloud Platform

Microsoft Azure

Systems Architecture

Regulatory Compliance

Job Details

Hiring: W2 Candidates Only

Visa: Open to any visa type with valid work authorization in the USA

Key Responsibilities

Design, implement, and manage Kubernetes environments from deployment to configuration, monitoring, and troubleshooting

Build and maintain scalable and reliable infrastructure using infrastructure as code principles

Develop comprehensive monitoring solutions and implement alerting strategies

Analyze system performance bottlenecks and implement improvements

Implement and maintain CI/CD pipelines for seamless deployments

Conduct incident response, root cause analysis, and implement preventative measures

Create and enhance automation tools leveraging AI/ML where applicable

Collaborate with development teams to improve application reliability and performance

Required Qualifications

5-7 years of experience in SRE or DevOps roles

Strong expertise with Kubernetes ecosystem and container orchestration

Deep understanding of Linux/Unix operating systems and performance analysis tools (NMON, etc.)

Experience with log analysis, monitoring systems, and observability tools

Proficiency in database administration and performance tuning (Oracle, SQL Server)

Strong programming skills in at least one of: Python, Go, Java, or Node.js

Experience developing automation tools and frameworks

Proven track record of proactive problem identification and resolution

Preferred Qualifications

Experience with AI/ML integration into operational workflows

Cloud platform experience (AWS, Google Cloud Platform, Azure)

Knowledge of service mesh technologies

Experience with distributed systems architecture

Familiarity with security best practices and compliance requirements

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share