Overview
Skills
Job Details
Standard SRE JD:
8 10+ years of experience or above Site Reliability / DevOps Engineering
Experience with Monitoring and Observability (Datadog, Splunk)
Expertise in AWS / Azure
Expertise in Kubernetes, kOps, & Helm 3
You won t deploy Kubernetes / Docker our software engineers & release engineers do that. Instead, you ll ensure we have the Docker registry for them, and debug.
Experience with Infrastructure as Code (IAC), Terraform Mandatory
Fluency in at least one language required: Python, C#, JAVA. Should have strong API experience.
Strong leadership, initiative taking, and capacity for decision making
Expert knowledge in any or all of these is a huge plus: Prometheus Operator, Grafana, Loki, ELK Stack, OpenTelemetry, Jaeger/OpenTracing (and yes, we use ALL of them)
Participate in the on-call rotation for Operations support
______________
Azure, Datadog or Splunk, PowerShell, Terraform
Good knowledge in CI and CD, preferred tools Jenkins, Octopus, and GitHub Actions
Good knowledge of deploying Kubernetes resources using GITOPS way, Argo CD is preferred tool