Overview
Skills
Job Details
Position: Senior Engineering Technician
Duration: Contract (12 months)
Location: Santa Clara, CA (100% onsite + travel to data centers in Santa Clara and Sunnyvale).
Job Description:
Looking for a motivated Engineering Technician for Client's on-premise, private cloud infrastructure! In this role, you will be faced with the challenge of providing and maintaining a compute farm of systems which includes Builders, Packagers, and Testers that act as a test-bed for our developers worldwide to test various hardware and software prior to release. The environment is huge, the scale massive, and the ask enormous!
What You'll Do:
- Collaborate closely with engineering teams (system architects, hardware/software engineers, QA, and more) to design, develop, debug, and release next-generation products.
- Manage and maintain a high-performing Compute Farm of builders, packagers, testers, and core infrastructure.
- Ensure availability targets are consistently met and lead system recovery efforts.
- Deploy and qualify systems while supporting exciting new technology bring-ups.
- Oversee inventory and lifecycle management for Client's assets across data centers and labs.
- Gather critical metrics and create Standard Operating Procedures (SOPs) documentation.
- Maintain a world-class, safe, and well-organized environment in our data centers and labs.
- Troubleshoot Linux/Windows, hardware, and infrastructure issues alongside engineers and platform operations teams.
- Plan, deploy, and maintain on-premises private cloud infrastructure, collaborating with datacenter and network engineering teams.
- Implement efficiency improvements to maximize availability, throughput, and test accuracy while meeting SLAs and KPIs.
- Represent the team in meetings with internal stakeholders and contribute to global operations.
What We Need to See:
- Associate's or Bachelor's Degree in Engineering/Technical Major (or equivalent experience).
- 5+ years of experience in data centers or large engineering labs.
- Familiarity with SCMs like GIT/Perforce.
- Proficiency in DCIM (Nautobot, etc.) and scripting (shell, Python, Ansible).
- Working knowledge of protocols/services like TCP/IP, DNS, NFS, SSL, etc.
- Experience with Windows, Linux, and Mac operating systems.
- Hands-on experience with PCBs, GPUs, and system deployments.
- Exceptional communication skills, both written and verbal.
- Ability to explain technical concepts to non-technical audiences.
- Strong problem-solving skills and a collaborative spirit.
What Makes You Stand Out:
- Experience managing HPC clusters using tools like BCM and Slurm.
- Hands-on knowledge of OpenStack.
- Relevant certifications such as CCNA or equivalent.
- Strong background in Windows and Linux administration, with an understanding of dense datacenter design, including compute, storage, and networking.
- Experience with hypervisors and VM applications.
- Knowledge of DC infrastructure with an emphasis on liquid cooling.
- A track record of technical curiosity and innovation.
- Mechanically inclined and comfortable with tools and physical tasks.
- Energetic, enthusiastic, and the understanding of what it takes to get the team to the finish line.
- Willing to go the extra mile to get the job done!
- This is an onsite contract position and will require local travel to DCs within Santa Clara.
Qualifications/Key Responsibilities:
- 5+ years of experience working in a data center/lab environment
- Associate's or Bachelor's Degree in Engineering/Technical Major (manager prefers bachelor's degree)
- Scripting/automation expertise
- Team focuses on early product development
- Strong coordination skills in the R&D space
- Experience managing HPC clusters (Slurm++)
- 2-3+ years of script building (Linux based)
- Working experience with process development and driving tasks to completion
- Scrum, Agile
Software:
- Linux, Python
- Jenkins ++
- Scripting (Bash, Ansible)
- DCIM Tools (Nautobot)