Role : Linux System Administration
Location: San Jose, CA onsite
Duration 3-6 months
Job Description :
Actively focus on delighting customers by minimizing downtime, anticipating needs, and exceeding service expectations Build trust with customers through professionalism, accountability, and consistent follow-through Support and troubleshoot Linux-based cloud environments used for silicon design and verification workflows Diagnose and resolve system-level issues across compute, storage, networking, and identity services
Monitor HPC cluster performance, job throughput, and queue health Identify and remediate HPC job performance issues, including scheduler configuration, resource contention, I/O bottlenecks, and memory constraints Troubleshoot and resolve license availability, utilization, and checkout issues impacting customer workloads Support distributed resource managers (e.g., Slurm, LSF, SGE, or similar technologies)Develop and maintain automation to streamline recurring operational tasks, including: System health, performance, and capacity monitoring
User provisioning and de-provisioning Use agentic AI, Python, shell scripting, Perl, or similar technologies to reduce manual effort and improve mean time to resolution (MTTR)Operate and support systems containing ITAR-controlled and CUI data in compliance with regulatory and company requirements Create and maintain runbooks, knowledge base articles, and customer-facing documentation
Strong hands-on experience with Linux system administration and troubleshooting
Experience supporting HPC or large-scale compute environments.
Proficiency in Python, shell scripting, Perl, or other automation-focused programming languages
Experience with monitoring,
Ability to work with export-restricted data (ITAR/CUI)Experience supporting EDA, semiconductor, or silicon design environments
Experience applying AI-assisted or autonomous automation in operations
Bachelor s degree in computer science, Equivalent practical experience