SRE Consultant with cloud data centers experience is mandatory

Overview

Hybrid
$60 - $70
Accepts corp to corp applications
Contract - W2
Contract - Independent
Contract - 12 Month(s)

Skills

Data Center
On-prem infrastructure management
Jenkins
Python

Job Details

Job Title: SRE Consultant with cloud data centers experience is mandatory
Location: Santa Clara, CA (Onsite 5 days a week)- try to submit Locals only
Onsite Requirement Yes
Number of days onsite 5 Days

Mandatory Areas
Must Have Skills
Skill 1 Manage Nvidia s on-prem infrastructure. Maintain uptime, reliability and readiness of on-prem engineering cloud spread across multiple data Centers.
Skill 2 Maintain KPI pipelines using Jenkins, Python and ELK.
Skill 3 Baremetal data centre machine management tools like IPMI, Redfish, KVM

cloud data centers experience is mandatory

o Any familiarity with Nvidia hardware like GPU & Tegras is a plus

Good To have Skills
Skill 1 Automation using Jenkins, Python, Go, Bash.


Requirements/Skills:
On-prem infrastructure management
o Manage Nvidia s on-prem infrastructure. Maintain uptime, reliability and readiness of on-prem engineering cloud spread across multiple data centers.
Guard SLAs
o Guard service level agreements (SLAs) for critical engineering services. Implement monitoring, alerting, and incident response procedures to ensure adherence to defined performance targets. Perform root cause analysis and post-mortems of incidents for any threshold breaches.
Observability
o Set up and manage monitoring and logging tools such as Prometheus, Grafana, or the ELK Stack to oversee system health and performance. Maintain KPI pipelines using Jenkins, Python and ELK.
o Improve monitoring systems by adding custom alerts based on business needs.
Automation & Optimization
o Help in capacity planning, optimization and better utilization efforts.
Day-to-Day Support
o Support user reported issues & issues. Monitor alerts and take necessary action.
o Actively participate in WAR room for critical issues
Collaboration & Documentation
o Create and maintain documentation for operational procedures, configurations, and troubleshooting guides.
Tech stack
o Baremetal data center machine management tools like IPMI, Redfish, KVM etc.
o Automation using Jenkins, Python, Go, Bash.
o Infrastructure tools like Kubernetes, MySQL, Prometheus, Grafana and ELK.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About American IT Systems