Site Reliability Engineer

Overview

On Site
Full Time
Part Time
Accepts corp to corp applications
Contract - Independent
Contract - W2

Skills

Employment Authorization
Management
Continuous Integration
Continuous Delivery
Incident Management
Root Cause Analysis
Collaboration
DevOps
Kubernetes
Orchestration
Linux
Unix
Operating Systems
Performance Analysis
NMON
Log Analysis
Database Administration
Performance Tuning
PL/SQL
Python
Java
Node.js
Artificial Intelligence
Machine Learning (ML)
Workflow
Cloud Computing
Amazon Web Services
Google Cloud
Google Cloud Platform
Microsoft Azure
Systems Architecture
Regulatory Compliance

Job Details

Hiring: W2 Candidates Only

Visa: Open to any visa type with valid work authorization in the USA

Key Responsibilities


Design, implement, and manage Kubernetes environments from deployment to configuration, monitoring, and troubleshooting

Build and maintain scalable and reliable infrastructure using infrastructure as code principles

Develop comprehensive monitoring solutions and implement alerting strategies

Analyze system performance bottlenecks and implement improvements

Implement and maintain CI/CD pipelines for seamless deployments

Conduct incident response, root cause analysis, and implement preventative measures

Create and enhance automation tools leveraging AI/ML where applicable

Collaborate with development teams to improve application reliability and performance


Required Qualifications

5-7 years of experience in SRE or DevOps roles

Strong expertise with Kubernetes ecosystem and container orchestration

Deep understanding of Linux/Unix operating systems and performance analysis tools (NMON, etc.)

Experience with log analysis, monitoring systems, and observability tools

Proficiency in database administration and performance tuning (Oracle, SQL Server)

Strong programming skills in at least one of: Python, Go, Java, or Node.js

Experience developing automation tools and frameworks

Proven track record of proactive problem identification and resolution


Preferred Qualifications

Experience with AI/ML integration into operational workflows

Cloud platform experience (AWS, Google Cloud Platform, Azure)

Knowledge of service mesh technologies

Experience with distributed systems architecture

Familiarity with security best practices and compliance requirements

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.