Overview
Remote
Depends on Experience
Full Time
Skills
Continuous Delivery
Continuous Integration
Reliability Engineering
Supply Chain Management
Spring Framework
Java
Terraform
Splunk
Artificial Intelligence
Amazon Web Services
Cloud Computing
Dynatrace
Kubernetes
GitHub
Job Details
Key Responsibilities:
- System Reliability & Performance: Lead end-to-end reliability, availability, and performance of Supply Chain applications.
- Monitoring & Alerting: Design, implement, and maintain robust monitoring using tools like Dynatrace, Splunk, and Grafana.
- Capacity Planning: Ensure systems can support current and future workload demands.
- Incident Management: Guide application teams in rapid incident resolution and lead post-incident reviews (P1/P2).
- Performance Tuning: Analyze metrics/logs to identify bottlenecks and optimize performance.
- Security & Compliance: Support best practices for certs, secrets, and non-user ID management.
- Change Management: Ensure safe and reliable deployments through solid change processes.
- Peak Readiness: Prepare systems for peak season with a focus on resiliency and redundancy.
- Playbook Development: Create war room playbooks for incident response scenarios.
- Auto Scaling & Failover: Implement best practices for system resilience.
- Dev Collaboration: Partner with developers to enhance reliability through SDLC.
- Cloud Strategy: Lead cloud adoption strategy (AWS focus) aligned with enterprise IT goals.
Required Skills & Experience:
- 5+ years of experience leading SRE teams or initiatives.
- Proficiency in .NET, Java, Spring Boot, Angular, UNIX, C, C#.
- Strong experience with APM tools (Dynatrace Cloud, Splunk, Elastic APM, Grafana).
- Hands-on with AWS/Azure/Google Cloud Platform, Docker, and Kubernetes.
- Experience with GitHub Actions, CI/CD, and automation tools like Terraform.
- Exposure to MongoDB, MySQL, ServiceNow, Rally, and AI tools (e.g., GitHub Copilot).
- Excellent communication and documentation skills.
- Strong collaboration skills to work with cross-functional teams.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.