Role – Resiliency Test Engineering Architect
Alternate Titles - SRE Architect , Chaos Engineering Architect
Location –Anywhere in US is fine - REMOTE
- Leading core activities in setting up Resiliency testing COE at enterprise level.
- Develop roadmap, policies, procedures, framework, reference architectures, Resiliency services (Test Scorecard, Failure Mode Analysis, Test Scenarios etc. ) related to resiliency testing, Chaos engineering.
- Develop Site Reliability Engineering practices
- Working closely with all stakeholder groups (App Dev teams, IT Infra, etc.) ensuring end-to-end Application resiliency while upholding ETE policy, procedures and standards
- Improving, setting the direction for the resiliency test automation framework, publishing reusable artifacts to the Developer Marketplace
- Capture technical requirements, assessing capabilities and mapping to organizational resiliency principles to determine resiliency characteristics of applications.
- Chip in to strategy discussions and decisions on overall application design and best approach for implementing cloud, and on premises solutions.
- Focus on continuous improvement practices as the need arises to meet system resiliency imperatives.
- Define high availability and resilience standards and guidelines for embracing technologies from AWS and other service providers.
- Minimum of 12+ years of total experience
- Minimum 5+ years of Site Reliability Engineering experience
- Minimum of 2 years' experience as Chaos Engineering Architect
- Must have expertise with industry patterns, chaos engineering methodologies, and techniques across the disaster recovery subject areas
- Specialist in highly available architecture and solution implementation
- Experience in Enterprise IT Infrastructure and Solution Architecture
- Chaos Engineering / Resiliency Testing experience for distributed applications using tools like Gremlin or other tools.
- Design and Implement CI/CD tools (Git, Maven / Gradle, Jenkins, and Bamboo etc)
- Hands-on work experience in any of Public Cloud AWS, Google Cloud Platform & Azure
- Proven knowledge in Containerization & Container Orchestration Solutions (Docker & Kubernetes)
- Hand-on work experience in configuration management tools (Chef / Puppet / Ansible / SaltStack / Terraform)
- Financial domain experience is added advantage.
Tools & Technologies
Gremlin ,Dynatrace / AppDynamics, HP Load Runner, HP BSM, Kubernetes, Elastic ECE, Jaeger, Splunk, SolarWinds (for VMs), Promotheus Enterprise Edition (for RHOS), , Hygieia (DevOps) and Grafana (Mon)