Overview
Skills
Job Details
Job Title: Cloud Resiliency Engineer / SRE / DevOps / Application Architect
Location: Remote (US)
Duration: 3+ months contract with possibility of extension
Workstream: Resiliency Operations
Job Summary:
We are seeking an experienced professional to join our Resiliency Operations team, focusing on designing, implementing, and maintaining highly available and resilient cloud infrastructure. The ideal candidate will have a strong background in Site Reliability Engineering (SRE), Resiliency Operations, and cloud-native technologies, with hands-on experience in Terraform, observability, security, and data protection.
Key Responsibilities:
-
Lead and execute resiliency and reliability initiatives across cloud environments.
-
Implement and maintain infrastructure as code using Terraform.
-
Configure and manage IAM, KMS, and secure vaulting mechanisms to enforce access and data security policies.
-
Implement observability controls (monitoring, logging, alerting, tracing) to ensure high system reliability.
-
Work with databases (DB), Kafka, and other messaging systems to implement fault-tolerant and resilient architectures.
-
Collaborate with SRE, DevOps, and application teams to design solutions for disaster recovery, failover, and high availability.
-
Continuously improve operational processes for incident response, capacity planning, and resiliency testing.
-
Maintain detailed documentation of resiliency architecture, policies, and operational procedures.
Required Skills & Experience:
-
Proven experience in SRE, DevOps, or Cloud Resiliency Operations.
-
Hands-on experience with Terraform for cloud infrastructure provisioning.
-
Strong knowledge of IAM, KMS, Vaulting, and cloud security best practices.
-
Experience with observability tools (CloudWatch, Prometheus, Grafana, ELK stack, etc.).
-
Familiarity with databases (SQL/NoSQL), Kafka, and other distributed systems.
-
Expertise in designing highly available and fault-tolerant systems in the cloud.
-
Excellent troubleshooting, problem-solving, and collaboration skills.
Preferred Qualifications:
-
Cloud certifications such as AWS Certified Solutions Architect, AWS Certified DevOps Engineer, or Google Cloud Professional SRE.
-
Experience in multi-region cloud deployments and resiliency testing.
-
Knowledge of CI/CD pipelines and automation frameworks.
What We Offer:
-
Opportunity to work on cutting-edge resiliency and reliability initiatives in cloud environments.
-
Exposure to diverse technologies including cloud security, observability, and distributed systems.
-
Collaborative and innovative work culture with flexible work arrangements.