Site Reliablity Engineer

Overview

Full Time
Part Time
Accepts corp to corp applications
Contract - W2
Contract - Independent

Skills

Grafana
Terraform
Ansible
Management
Continuous Integration
Continuous Delivery
Operational Efficiency
Collaboration
Capacity Management
Performance Tuning
Root Cause Analysis
Regulatory Compliance
Disaster Recovery
Linux
Unix
Computer Networking
Cloud Computing
Amazon Web Services
Microsoft Azure
Google Cloud
Google Cloud Platform
Scripting
Python
Bash
Orchestration
Docker
Kubernetes
Incident Management

Job Details

  • Design, build, and maintain highly available, scalable, and reliable systems.
  • Implement monitoring, alerting, and observability using tools like Prometheus, Grafana, ELK, Datadog.
  • Automate infrastructure and deployments using Terraform, Ansible, or similar IaC tools.
  • Manage CI/CD pipelines for continuous delivery and operational efficiency.
  • Collaborate with development teams to improve system performance, reliability, and incident response.
  • Conduct capacity planning, performance tuning, and root cause analysis.
  • Ensure security, compliance, and disaster recovery strategies.

Required Skills

  • Strong knowledge of Linux/Unix systems, networking, and cloud platforms (AWS, Azure, Google Cloud Platform).
  • Proficiency in scripting languages (Python, Bash, Go).
  • Experience with containers and orchestration (Docker, Kubernetes).
  • Familiarity with monitoring tools and incident management practices.
  • Understanding of SRE principles (SLIs, SLOs, SLAs).

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Purple Drive Technologies LLC