|
Minimum Requirements:
Candidates that do not meet or exceed the minimum stated requirements (skills/experience) will be displayed to customers but may not be chosen for this opportunity.
|
|
Years
|
Required/Preferred
|
Experience
|
|
8
|
Required
|
experience in systems engineering, DevOps, or site reliability engineering roles
|
|
8
|
Required
|
Strong experience with Linux/Unix systems and system internals
|
|
8
|
Required
|
Proficiency in one or more programming/scripting languages (Python, Go, Java, Bash)
|
|
8
|
Required
|
Experience designing and operating highly available, distributed systems
|
|
8
|
Required
|
Strong knowledge of cloud platforms (AWS, or Google Cloud Platform) and cloud-native services
|
|
8
|
Required
|
Experience with containerization and orchestration (Docker, Kubernetes)
|
|
8
|
Required
|
Strong understanding of monitoring, alerting, and logging concepts
|
|
8
|
Required
|
Experience defining and managing SLIs, SLOs, and error budgets
|
|
8
|
Required
|
Familiarity with incident management, root cause analysis (RCA), and postmortems
|
|
8
|
Required
|
Experience integrating security and compliance into operational workflows
|
|
4
|
Preferred
|
Familiarity with observability tools (Prometheus, Grafana, Application Insights, Datadog, Splunk)
|
|
4
|
Preferred
|
Experience operating 24x7 production environments with on-call rotations
|
|
4
|
Preferred
|
Experience with chaos engineering and resiliency testing
|
|
4
|
Preferred
|
Experience with feature flags, canary deployments, and progressive delivery
|
|
4
|
Preferred
|
Strong documentation skills for runbooks, dashboards, and operational standards
|