Overview
Skills
Job Details
Key Responsibilities:
Architect and implement resilient cloud infrastructure using AWS services such as EC2, ECS/EKS, RDS, Lambda, S3, CloudFormation, etc.
Lead SRE practices such as monitoring, alerting, incident response, and post-mortems.
Design and enforce infrastructure-as-code (IaC) strategies using Terraform or CloudFormation.
Build and maintain CI/CD pipelines for automated deployment and testing.
Establish observability frameworks using tools like CloudWatch, Prometheus, Grafana, ELK, or Datadog.
Define and monitor SLOs/SLIs, conduct capacity planning, and performance tuning.
Implement automated remediation and runbook automation.
Collaborate with security, development, and operations teams to ensure end-to-end system reliability and compliance.
Mentor junior engineers and drive DevOps and SRE best practices across teams.
Required Qualifications:
Bachelor s degree in Computer Science, Engineering, or related field.
8+ years in IT operations or DevOps roles, with 3+ years in a cloud architect or SRE capacity.
Strong expertise with AWS cloud services and architecture.
Proficiency in Terraform, CloudFormation, or similar IaC tools.
Experience with Kubernetes (EKS preferred), containers, and microservices.
Deep understanding of CI/CD tools like Jenkins, GitLab CI, or AWS CodePipeline.
Solid experience in monitoring and alerting systems.
Strong scripting skills (Python, Bash, etc.).
Excellent problem-solving and incident management skills.
Preferred Qualifications:
AWS Certifications (e.g., AWS Certified Solutions Architect Professional, DevOps Engineer).
Experience with multi-account AWS landing zones and service control policies (SCPs).
Knowledge of FinOps, cost optimization, and security best practices.
Familiarity with service mesh (Istio, Linkerd) and GitOps practices.