Cloud Resiliency Engineer / SRE / DevOps / Application Architect

  • Posted 4 hours ago | Updated 4 hours ago

Overview

Remote
Hybrid
Contract - W2
Contract - 3+ Month(s)

Skills

Python
Operations
DEV OPS
Terraform
Best Practices
Metrics
APM
Database
Scripting
Kubernetes
Kafka
Continuous Integration/Delivery
Prometheus
Grafana
Shell Scripting
disaster recovery
mentor
Cloud Security
Welding
Provisioning
Regulatory Compliance
Problem-Solving
Incident Response
Identity and Access Management
AWS CloudWatch
AWS Certified
Reliability Engineering
Business Continuity
Infrastructure Engineering

Job Details

Job Title: Cloud Resiliency Engineer / SRE / DevOps / Application Architect

Location: Remote (US)
Duration: 3+ months contract with possibility of extension

Workstream: Resiliency Operations

Job Summary:
We are seeking an experienced professional to join our Resiliency Operations team, focusing on designing, implementing, and maintaining highly available and resilient cloud infrastructure. The ideal candidate will have a strong background in Site Reliability Engineering (SRE), Resiliency Operations, and cloud-native technologies, with hands-on experience in Terraform, observability, security, and data protection.

Key Responsibilities:

  • Lead and execute resiliency and reliability initiatives across cloud environments.

  • Implement and maintain infrastructure as code using Terraform.

  • Configure and manage IAM, KMS, and secure vaulting mechanisms to enforce access and data security policies.

  • Implement observability controls (monitoring, logging, alerting, tracing) to ensure high system reliability.

  • Work with databases (DB), Kafka, and other messaging systems to implement fault-tolerant and resilient architectures.

  • Collaborate with SRE, DevOps, and application teams to design solutions for disaster recovery, failover, and high availability.

  • Continuously improve operational processes for incident response, capacity planning, and resiliency testing.

  • Maintain detailed documentation of resiliency architecture, policies, and operational procedures.

Required Skills & Experience:

  • Proven experience in SRE, DevOps, or Cloud Resiliency Operations.

  • Hands-on experience with Terraform for cloud infrastructure provisioning.

  • Strong knowledge of IAM, KMS, Vaulting, and cloud security best practices.

  • Experience with observability tools (CloudWatch, Prometheus, Grafana, ELK stack, etc.).

  • Familiarity with databases (SQL/NoSQL), Kafka, and other distributed systems.

  • Expertise in designing highly available and fault-tolerant systems in the cloud.

  • Excellent troubleshooting, problem-solving, and collaboration skills.

Preferred Qualifications:

  • Cloud certifications such as AWS Certified Solutions Architect, AWS Certified DevOps Engineer, or Google Cloud Professional SRE.

  • Experience in multi-region cloud deployments and resiliency testing.

  • Knowledge of CI/CD pipelines and automation frameworks.

What We Offer:

  • Opportunity to work on cutting-edge resiliency and reliability initiatives in cloud environments.

  • Exposure to diverse technologies including cloud security, observability, and distributed systems.

  • Collaborative and innovative work culture with flexible work arrangements.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.