SRE Lead

Overview

Remote
$50 - $60
Contract - W2
Contract - 12 Month(s)
No Travel Required

Skills

.NET
Amazon Web Services
AngularJS
Application Development
C
C#
Capacity Management
Change Management
Cloud Computing
Collaboration
Communication
Continuous Delivery
Continuous Improvement
Continuous Integration
Database
Docker
Documentation
Dynatrace
Failover
GitHub
Good Clinical Practice
Google Cloud Platform
Grafana
Incident Management
Java
Management
Microservices
Microsoft Azure
MongoDB
Orchestration
PS
Performance Tuning
PostScript
ROOT
Rally
Reliability Engineering
Scalability
ServiceNow
Software Development
Software Performance Management
Splunk
Supply Chain Management
Terraform
Unix
WAR

Job Details

ONLY 12+ YEARS CANDIDATE
Responsibilities:
  • System Reliability and Performance: Lead and drive end to end (Supply Chain) reliability, availability, and performance of applications in Digital Experience.
  • Monitoring and Alerting: Help in designing, to implement, and in maintaining robust monitoring and alerting systems to proactively identify and resolve issues.
  • Capacity Planning: Help in capacity planning, ensuring that systems can handle current and future workloads.
  • Incident Response: Guide Org level application teams in incident response efforts, ensuring quick and effective resolution of issues.
  • Performance Tuning: Help teams in gathering and analyzing metrics from application monitoring logs to assist in performance tuning and identifying the bottlenecks.
  • Post-Incident Reviews: Help in post-incident(P1/P2) reviews to identify root causes and prevent future incidents.
  • Security: Help application teams to adopt industry standard best practices in managing security certs, Secrets and Non-User Id s to avoid any issues and also outages.
  • Change Management: Help application teams to implement robust change management processes to ensure that changes to the system are deployed safely and reliably.
  • PS Readiness: Help application teams to get ready for peak season in terms of overall E2E system resiliency and redundancy to handle expected peak usage volumes.
  • War room Playbooks: Help teams in preparation of playbook with War room scenarios.
  • Auto Failover & Auto Scaling: Help application teams in adopting best auto failover and auto scaling strategies to maintain overall system resiliency.
  • Collaboration with Developers: Work with application development teams to understand their needs, identify potential reliability issues, and improve the software development lifecycle.
  • Cloud: Define and develop Cloud strategy for the enterprise, focusing on AWS, aligned with IT requirements

Requirements:

  • A solid understanding of SRE principles and at least 5 years of leading experience to guide SRE engineers.
  • Experience in .Net , Java, Microservices, springboot, Angular, UNIX, C,C#
  • Experience leading SRE teams or projects.
  • Monitoring & Observability APM tools like Dynatrace Clod, Splunk, Elastic APM, Interlink and Grafana.
  • Hands-on experience with cloud platforms (e.g., AWS, Azure, Google Cloud Platform) and their services.
  • Experience with containers (Docker) and container orchestration (Kubernetes
  • AI tools like GitHub Copilot and Chat Playground.
  • Incidents management tools like Service now.
  • Rally.
  • Understanding of CI/CD and using GitHub actions.
  • Strong communication and collaboration skills to work effectively with cross-functional teams.
  • DB: MongoDB and MySQL
  • Proficiency in automation technologies and tools like Terraform.
  • Good Documentation skills.

Outcomes:

  • Increased system Reliability and 99.999% availability.
  • E2E (Supply chain) Resiliency and Redundancy.
  • Improved Scalability.
  • Faster Incident Resolution or Restoration.
  • Continuous Improvement.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.