Overview
On Site
135k - 160k
Full Time
Skills
Customer Facing
DevOps
Disaster Recovery
Startups
Lean Methodology
Amazon Web Services
Google Cloud Platform
Google Cloud
Terraform
Kubernetes
Orchestration
Service Level
Workflow
Communication
Incident Management
Reliability Engineering
Customer Support
Collaboration
Process Engineering
SAP BASIS
Recruiting
Access Control
Job Details
An automation-focused technology company supporting critical infrastructure industries is seeking its first U.S.-based Customer Reliability Engineer. This role represents a unique opportunity to be the first point of contact for reliability and incident response in North America-owning uptime, disaster recovery, and customer-facing incident handling during hours when the global DevOps team is offline.
The position is designed for an engineer who will not only maintain system stability but also establish best practices for incident handling, reliability measurement, and customer trust. The work will involve AWS, Google Cloud Platform, Terraform, and Kubernetes in fast-moving, production-critical environments where decisive action and clear communication are essential. This role offers significant autonomy to shape reliability culture and processes from the ground up.
Required Skills & Experience
The position is designed for an engineer who will not only maintain system stability but also establish best practices for incident handling, reliability measurement, and customer trust. The work will involve AWS, Google Cloud Platform, Terraform, and Kubernetes in fast-moving, production-critical environments where decisive action and clear communication are essential. This role offers significant autonomy to shape reliability culture and processes from the ground up.
Required Skills & Experience
- Hands-on experience with incident response, disaster recovery, and production reliability in startup or lean engineering environments
- Proficiency with AWS or Google Cloud Platform, Infrastructure-as-Code (Terraform), and Kubernetes orchestration
- Proven ability to own end-to-end reliability initiatives and serve as the first point of contact during critical incidents
- Understanding of service-level concepts (SLIs, SLOs, SLAs) and how to apply them in practice
- Experience building automation to reduce manual intervention in reliability workflows
- Excellent communication skills, with comfort engaging directly with customers during incidents
- Hands-On Incident Response & Reliability Engineering: 60%
- Customer Support, Collaboration & Process Development: 40%
Applicants must be currently authorized to work in United States on a full-time basis now and in the future.
Accommodation will be provided in all parts of the hiring process as required under Motion Recruitment's Employment Accommodation policy. Applicants need to make their needs known in advance.
#LI-AC1
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.