High‑Level Requirements
5–7 years of relevant experience, primarily focused on operations support; administrative support experience is also considered highly desirable.Strong expertise in Microservices architecture, with practical experience designing, deploying, and supporting distributed systems in production environments.Deep hands‑on knowledge of Kubernetes, including deployment management, scaling, upgrades, troubleshooting, and cluster operations, with a strong focus on reliability, resilience, and performance.Working proficiency with API Gateway platforms such as Azure API Management (APIM), Kong, and IBM API Connect (APIC) for traffic management, rate limiting, routing, and API observability.Solid experience with observability and monitoring tools, including Splunk, AppDynamics, Instana, or similar platforms, covering log analytics, metrics, distributed tracing, dashboards, alerting, and SLO‑based monitoring.Proven ability to diagnose and resolve complex production issues, perform root cause analysis (RCA), and implement preventative and corrective measures.Familiarity with Site Reliability Engineering (SRE) best practices, including error budgets, SLIs/SLOs, incident response, post‑mortems, automation, and continuous improvement initiatives.