![]()
DevOps (Cloud-AI)
W2 Contract
Pay Rate: $55 - $65 per hour
Location: Cupertino, CA - Remote Role
Job Summary:
We are looking for a highly motivated DevOps / Site Reliability Engineer to support large-scale Kubernetes-based infrastructure and platform operations. This role is focused on building, automating, and operating highly reliable systems that power critical engineering platforms and services.
Duties and Responsibilities:
- Design, build, automate, and support scalable Kubernetes-based platforms and services
- Operate and troubleshoot production environments running at scale
- Develop automation and tooling to improve operational efficiency and reliability
- Monitor platform health, performance, and availability using observability tooling
- Troubleshoot infrastructure, application, and networking issues across distributed systems
- Work closely with engineering teams to improve deployment, reliability, and scalability practices
- Participate in operational support, incident response, and root cause analysis
- Improve CI/CD workflows and deployment automation
- Drive operational excellence through documentation, automation, and process improvements
- Take ownership of projects and independently drive deliverables to completion
Requirements and Qualifications:
- Strong hands-on experience with Kubernetes platforms such as EKS, GKE, AKS, or similar
- Experience running and supporting applications on Kubernetes at scale
- Strong understanding of containerized infrastructure and distributed systems
- Experience with monitoring and observability tools, preferably Grafana and Prometheus
- Experience with CI/CD pipelines and deployment automation
- Experience with Splunk logging, log analysis, and troubleshooting
- Strong scripting and automation experience using Python and/or Golang
- Experience troubleshooting production systems under pressure
- Strong communication and collaboration skills
- Self-starter mentality with strong ownership and accountability
Preferred Qualifications
- Experience operating Ray clusters/services
- Strong networking and troubleshooting experience
- Experience with cloud infrastructure and platform services
- Experience with Infrastructure as Code and automation frameworks
- Experience supporting high-scale production systems
- Familiarity with SRE principles and operational best practices
Bayside Solutions, Inc. is not able to sponsor any candidates at this time. Additionally, candidates for this position must qualify as a W2 candidate.
Bayside Solutions, Inc. may collect your personal information during the position application process. Please reference Bayside Solutions, Inc.'s CCPA Privacy Policy at ;/span>