Overview
Skills
Job Details
Job Title: Site Reliability Engineer (SRE) with Ansible Expertise
Job Summary
We are looking for a Site Reliability Engineer (SRE) with strong Ansible automation skills to design, implement, and maintain scalable and reliable infrastructure. You will work closely with development and operations teams to ensure high availability, performance, and security of our systems while driving automation to reduce toil.
Key Responsibilities
Infrastructure Automation:
Develop, maintain, and optimize Ansible playbooks, roles, and modules for configuration management, provisioning, and orchestration.
Automate repetitive tasks to improve efficiency and reduce manual intervention.
Reliability & Performance:
Implement monitoring, logging, and alerting solutions (Prometheus, Grafana, ELK, etc.) to ensure system reliability.
Troubleshoot and resolve production incidents, perform root cause analysis (RCA), and implement preventive measures.
CI/CD & Deployment Pipelines:
Integrate Ansible with CI/CD tools (Jenkins, GitLab CI, GitHub Actions) for seamless deployments.
Ensure zero-downtime deployments using blue-green or canary strategies.
Cloud & On-Prem Infrastructure:
Manage and scale cloud (AWS/Google Cloud Platform/Azure) and/or on-prem infrastructure using Infrastructure as Code (IaC).
Work with containerization (Docker) and orchestration tools (Kubernetes, OpenShift).
Security & Compliance:
Implement security best practices in automation (vaults, secrets management, RBAC).
Ensure compliance with industry standards (SOC2, ISO27001, GDPR).
Collaboration & Documentation:
Work closely with Dev and Ops teams to improve system reliability.
Maintain clear documentation for automation workflows and runbooks.
Required Skills & Qualifications
Must-Have:
Strong experience with Ansible (playbooks, roles, inventories, dynamic inventories).
Proficiency in Linux/Unix systems and shell scripting (Bash/Python).
Experience with Infrastructure as Code (IaC) (Terraform, CloudFormation).
Knowledge of monitoring tools (Prometheus, Grafana, Nagios, Datadog).
Familiarity with **cloud