Site Reliability Engineer || Remote || Contract

Overview

Remote
Depends on Experience
Contract - W2
Contract - Independent

Skills

Agile
Amazon Web Services
AppDynamics
Build Automation
Cloud Computing
Collaboration
Communication
Conflict Resolution
Continuous Delivery
Continuous Integration
Debugging
DevOps
Good Clinical Practice
Google Cloud Platform
High Availability
Incident Management
Java
Kubernetes
Management
Microservices
Microsoft Azure
Operational Efficiency
Performance Monitoring
Problem Solving
Python
Reliability Engineering
Root Cause Analysis
Scalability
Software Architecture
Software Engineering
Splunk
Telecommunications
Terraform

Job Details

Role- Site Reliability Engineer

Experience- 8 Years

Location: Remote

Job Type- Contract

Key Responsibilities:

  • Develop and maintain reliable, scalable, and secure systems in Java, Go, and Python.
  • Design, implement, and manage Kubernetes clusters and associated microservices.
  • Build automation and monitoring tools to enhance system reliability and operational efficiency.
  • Utilize observability tools such as Splunk and AppDynamics for proactive incident detection and resolution.
  • Collaborate with development and operations teams to ensure end-to-end system reliability.
  • Perform root cause analysis, contribute to postmortems, and implement long-term fixes.
  • Participate in on-call rotations and drive incident response and resolution.
  • Support application deployment and integration in a large-scale telecom environment.

Required Skills & Qualifications:

  • 5+ years of experience as a Site Reliability Engineer.
  • Strong proficiency in Java and Go (must-have); experience with Python is a plus.
  • Hands-on experience with Kubernetes, CI/CD, containerization, and service mesh.
  • Expertise in observability tools: Splunk, AppDynamics, or similar platforms.
  • Solid background in software engineering and application architecture.
  • Experience in application performance monitoring, scalability, and high availability.
  • Knowledge of telecom systems and domain-specific challenges is highly preferred.
  • Strong problem-solving and debugging skills across distributed systems.
  • Excellent communication and collaboration abilities in a remote work environment.

Nice to Have:

  • Experience with cloud platforms (AWS, Google Cloud Platform, or Azure).
  • Familiarity with Infrastructure as Code (Terraform, Helm).
  • Exposure to agile and DevOps best practices.
  • experience with telecom service environments.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.