Overview
Remote
On Site
USD 2-3
Full Time
Part Time
Accepts corp to corp applications
Contract - Independent
Contract - W2
Skills
AWS
Azure
Job Details
SRE/Cloud Engineer/AWS DevOps Engineer
100% Remote Role
Long Term Project
Short Version:
SRE/Cloud Engineer/AWS DevOps Engineer with expertise in below tech stack.
- Cloud Services: AWS Cloud expertise (EC2, ECS, EKS, RDS, FIS, Resilience Hub)
- Monitoring Tool: DataDog
- CI Tool: GitHub Actions
- Programming Languages: TypeScript, Python
- Scripting: Bash, PowerShell
- Nice to haves: Gremlin tool, Azure Chaos Studio
Long Version:
We are seeking a highly skilled and innovative SRE/Cloud Engineer/AWS DevOps Engineer to spearhead our chaos engineering initiatives. This role will involve implementing and optimizing chaos engineering practices using tools such as Gremlin, AWS Fault Injection Simulator (FIS), and AWS Resilience Hub to enhance system reliability and resilience.
Key Responsibilities:
- Design, develop, and implement chaos engineering experiments to proactively identify weaknesses in system architecture and improve overall resilience.
- Utilize AWS Cloud services, including EC2, ECS, EKS, and RDS, to deploy, manage, and monitor applications in a cloud environment.
- Lead the implementation of AWS FIS and Resilience Hub to simulate and assess failure scenarios and ensure robust system recovery processes.
- Collaborate with cross-functional teams to integrate chaos engineering practices into the CI/CD pipeline using GitHub Actions.
- Employ DataDog for comprehensive monitoring and observability to track system performance and response during chaos experiments.
- Develop scripts in Bash and PowerShell to automate chaos engineering workflows and streamline operations.
- Advocate for and implement best practices in system design and architecture to enhance reliability and minimize downtime.
- Stay abreast of emerging technologies and methodologies in chaos engineering and cloud services to continuously improve processes.
Preferred Qualifications:
- Proficiency in programming languages such as TypeScript and Python.
- Experience with Gremlin tool and Azure Chaos Studio is a plus.
- Strong analytical skills and ability to troubleshoot complex issues in a cloud environment.
- Excellent communication and leadership skills to drive initiatives and mentor team members.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.