Senior Systems Engineer - SRE - ARK Solutions Inc

Overview

Remote

$80 - $83

Contract - W2

Contract - 12 Month(s)

Skills

SRE

AWS

Cloud

storage

backup

scripting

IaC

PowerShell

Job Details

Position: Senior Systems Engineer - SRE Location: 100% Remote

About the Role:

The Senior Systems Engineer - Site Reliability Engineering (SRE) is responsible for the reliability, scalability, and performance of mission-critical cloud and on-prem services that support millions of customers globally. This role involves overseeing incident management, driving automation efforts, and working closely with cross-functional teams to ensure alignment between SRE strategy and business objectives. Partners closely with Product Teams, Applications teams, Infrastructure, and the broader Applications and Infrastructure Delivery teams to develop key metrics and KPIs to improve applications stability, availability and performance. The ideal candidate will bring strong communication skills, collaborating with key stakeholders across the company to optimize cloud infrastructure and uphold the highest standards of operational excellence in a dynamic, fast-paced environment.

Job Responsibilities:

Ensure the reliability, availability, and performance of mission-critical cloud services, implementing best practices for monitoring, alerting, and incident management.

Oversee the management of high-severity incidents, driving quick resolution and post-incident analysis to identify root causes and prevent recurrence.

Drive the automation of operational processes and ensure systems can scale effectively to support growing user demand, optimizing cloud and on-prem infrastructure and resource usage.

Develop and execute the SRE strategy aligned with business goals, and communicate service health, reliability, and performance metrics to senior leadership and stakeholders

Skill and Experience:

Experience in information technology process and/or technical project management including:

Experience as a Site Reliability Engineer (SRE), building and managing highly available and mission critical systems, with 3+ years of experience on public cloud, preferably AWS

Expertise in enterprise storage platforms (e.g., NetApp, Dell EMC, Isilon, Unity, Pure Storage, PURE Cloud Block Store)

Expertise in cloud storage platforms (e.g., EBS, S3, Azure Blob, AWS FSx, ONTAP FSx etc)

Deep knowledge of enterprise backup technologies (e.g., Commvault, Rubrik, Veeam, Veritas)

Deep knowledge of cloud native backups (e.g., AWS Backup, Azure Backup etc)

Strong scripting skills (Python, Shell, PowerShell).

Familiarity with Infrastructure as Code (IaC) tools like Terraform, Cloudformation

Monitoring and observability experience using Prometheus, Grafana, ELK Stack, or similar.

Proven automation and programming experience in one or more of the following languages: Java, Python, Go, Perl, Bash

Deep understanding of SRE practices such as Service Level Objectives, Error Budgets, Toil Management, Observability & Monitoring, Blameless Postmortems, Incident Response Process, Capacity Planning

Exposure to Cloud Native, Relational and NoSQL databases like RDS, MySQL, PostgreSQL, Cassandra or Couchbase preferable.

Experience with deploying, monitoring, and troubleshooting large-scale, distributed applications in cloud environments such as AWS

Experience in vulnerability assessment, patching, security compliance of infrastructure, storage & backup

Experience is setting up DR using approved Storage and Backup technologies

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Senior Systems Engineer - SRE

Job Details

Share