Hi,
Job Title: Site Reliability Engineer
Location: San Jose, CA - 2 Days Onsite
Duration: 6+ Months Contract
Note: They need to have strong in the areas of cloud technology, deployment, IAC and scripting areas. They should know Terraform, Kubernetes (NHS), Ansible
Job Description:
About the Role:
· We seek a highly skilled and dynamic Site Reliability Engineer – Consultant In this role you will
· Maintain and improve the reliability, performance, and availability of software systems.
· Act as a bridge between traditional IT operations and software development, bringing a software engineering approach to system administration.
Job Responsibilities:
· Creating and supporting automation scripts (shell/ansible/python) for infrastructure deployments, validations and monitoring to improve operational tasks
· Scheduling monitoring scripts using cron and airlfow
· Monitoring using tools including Dynatrace, Apica, Grafana etc
· Database handling
· Build CICD pipelines
· Incident handling and problem management
Mandatory Skills:
· Experience in Ansible/ Python
· Monitoring Tools – Dynatrace/Apica/Grafana
Required Education:
Bachelor’s degree in computer science or a related field.
Required Experience:
· 14 plus years of IT Infrastructure experience
· Extensive experience working with linux flavors like rhel/centos os, shells, filesystems and utilities
· Experience in programming languages like Python, ansible
· Knowledge of distributed computing and experience working with container orchestration frameworks including on-prem and rancher Kubernetes and good knowledge on Kubernetes objects
· Experience working with Storage, ONTAP is preferable: volume, aggregates, backups, DR planning
· Experience scheduling monitoring scripts using cron and airflow
· Experience with monitoring tools including Dynatrace, Apica, Grafana etc
· Database knowledge including SQL and NoSQL DBS
· Experience building CICD pipelines (preferred)
· Cloud platform knowledge (specifically AWS) is required