Overview
On Site
Depends on Experience
Contract - W2
Contract - Independent
Contract - 12 Month(s)
Skills
SRE
Site Reliability Engineer
DevOps Engineer
Amazon Web Services
Ansible
Apache HTTP Server
Continuous Delivery
Continuous Improvement
DevOps
Docker
Fluency
Communication
Git
Java
Kubernetes
Linux
Risk Assessment
Reliability Engineering
Splunk
Terraform
Snow Flake Schema
Job Details
Title: Sr. SRE / DevOps Engineer
Location: Sunnyvale, CA (Only Local candidate)
Client Interview In-Person
Type: FTC
Job Description:
Job Summary:
- For this role, we are looking for a Sr. SRE / DevOps Engineer at the Sunnyvale, California location.
- As Site Reliability Engineer, the individual will work closely with multi-functional teams, automate operations, optimize infrastructure, implement security and solve issues in an exciting, fast-paced environment.
- The individual will play a vital role in ensuring that the systems are reliable, scalable, and high-performing.
Responsibilities:
- Ensure system reliability and availability Monitor system issues, create strategies to detect issues, address those issues, design automated systems to troubleshoot, write and review post-mortems.
- Mitigate Operational risks - Collaborate with development teams and other stakeholders to identify potential risks, perform risk assessments, implement risk mitigation strategies, continuously monitor and review the effectiveness of risk strategies.
- Monitor system health.
- Minimize emergency response (MTTR).
- Maintain CI/CD pipelines, etc.
- Continuous improvement by collaborating with various teams.
- Automation of processes.
Must have/required experience and skills:
- 8+ years of experience on DevOps and Site Reliability Engineering.
- Hands-on with containerization and orchestration: Docker, Kubernetes/EKS.
- Proficiency in infrastructure as code tools: Terraform, Ansible, or CloudFormation.
- Experience setting up and managing services running on Kubernetes.
- In-depth understanding of SRE principles, including monitoring, alerting, error budgets, fault analysis, and automation.
- In-depth knowledge of monitoring and observability tools: Apache Splunk
- Knowledge of Linux operating system principles, networking fundamentals, and systems management
- Demonstrable fluency in at least one of the following languages: Java or Python
- Ability to identify and communicate technical and architectural problems, while working with partners and their team to iteratively find solutions.
- Building and managing CI/CD pipeline gatekeeping production deployments, developing and implementing GIT branching strategies, branch protection rules, network policies, scaling up/scaling down the load on AWS.
- Strong problem-solving and analytical skills
- Solve performance issues and scalability issues in the system.
Technical Skills:
- DevOps and SRE
- AWS Kubernetes/EKS, Docker
- Terraform, Ansible, or CloudFormation
- Apache Splunk, Apache Flink
- Programming/Scripting using Java or Python
- CI/CD
- Database Vertica, Snowflake.
Behavioral Skills:
- Excellent Communication skills and collaboration skills
- Ability to propose and implement improvements in the system
- Ability to work with cross-functional stakeholders
- Adaptability and a willingness to learn new technologies and techniques.
- Proactive approach to issues, ability to provide prompt resolution/work around.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.