Overview
On Site
Accepts corp to corp applications
Contract - Independent
Contract - W2
Contract - 6+ month(s)
Skills
java
Devops
Kubernetes
SRE
Prometheus
Job Details
Job Title: Senior Site Reliability Engineer / DevOps Engineer
Location: Bothell, WA
Duration: Contract
Term: 6+ months
Job Description:
Experience Desired: 7+ Years.
Key Responsibilities
Platform Reliability & Operations
- Own reliability, availability, scalability, and performance of API Gateway services running on Kubernetes
- Design and implement SRE best practices including SLIs, SLOs, SLAs, error budgets, and incident management
- Lead production readiness reviews, root cause analysis (RCA), and post-incident improvements
- Drive capacity planning, performance tuning, and resilience testing
- Kubernetes & Cloud Engineering
- Manage and optimize Kubernetes clusters (EKS / AKS / GKE / On-prem)
- Develop and maintain Helm charts, manifests, and deployment strategies
- Implement rollout strategies such as blue-green, canary, and rolling deployments
- Collaborate with development teams to ensure cloud-native design patterns
- Observability & Monitoring (Strong Focus)
- Build and maintain enterprise-grade observability (O11y) solutions:
- Prometheus & Grafana for metrics and dashboards
- Splunk for centralized logging and alerting
- OpenTelemetry for distributed tracing
- Define actionable alerts and dashboards for platform and application health
- Improve MTTR through better visibility and automation
- CI/CD & Automation
- Design and maintain CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI, etc.)
- Automate infrastructure using Infrastructure as Code (Terraform, CloudFormation, etc.)
- Develop automation scripts using Python, Bash, or Groovy
- Security & Compliance
- Implement DevSecOps practices including secrets management, image scanning, and RBAC
- Work closely with security teams on vulnerability remediation and compliance controls
- Innovation & POCs
- Actively contribute to POCs for AI Gateway / Intelligent API Gateway initiatives
- Evaluate and prototype integrations with AI/ML-driven routing, observability, and security features
- Stay current with emerging SRE, cloud, and AI gateway technologies
Soft Skills
- Strong troubleshooting and problem-solving skills
- Ability to work cross-functionally with developers, architects, and security teams
- Proactive mindset with a passion for automation and reliability
- Good documentation and communication skills
Key Skills:
SRE, Devops, Java, Kubernetes, Observability
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.