HPC/Linux Systems Engineer
CoreHive Computing is a technology consulting and solutions organization providing best-in-class IT consulting, technical support, service desk, and managed services to private and government clients. Since 2003, we have delivered innovative solutions that help our clients enhance the value of their relationships, improve organizational productivity, and increase the rate of return on technology investment.
We are seeking a full-time HPC/Linux Systems Engineer for a position in San Jose, California or Austin, Texas.
Required
Real Time Monitor/Requirements Traceability Matrix and scripting using Python, shell, Perl, etc., in a Farm environment and knowledge of LSF (Load Sharing Facility) spanning Farm to Cloud is highly desirable.
Solid understanding and proven operational experience with compute farms, job submission/management technologies, cloud, and associated management tools.
Proven experience working directly with R&D software development teams to collaboratively develop solutions to optimize their working environment (Direct EDA experience desired).
Primary Skills
8+ years of technical experience architecting, managing, and improving a compute farm environment running Linux.
At least 5 years of direct hands-on experience in a global or regional compute farm and/or hybrid cloud environment consisting of 1,000 or more servers with some remote direct reports.
At least 3 years working in a global group, coordinating support, strategies, projects, and operations across multiple geographies in a team-oriented approach.
Extensive technical experience managing IBM LSF and RTM and scripting using Python, shell, Perl, etc., in a Farm environment and knowledge of LSF spanning Farm to Cloud is highly desirable.
Solid understanding and proven operational experience with compute farms, job submission/management technologies, cloud, and associated management tools.
Proven experience working directly with R&D software development teams to collaboratively develop solutions to optimize their working environment (Direct EDA experience desired).
Proven experience in capacity and performance management, optimizing performance, ensuring adequate capacity, working with R&D on optimization of their workloads, and development and maintenance of key performance indicators.
A proven process focus shown through documentation, change management, incident management and problem-resolution activities.
Primary Responsibilities
Supporting multiple geological locations to serve user communities across North America, Europe, and Asia sites.
Focusing on improving R&D productivity and committing to customer success.
Driving the overall operational strategy for internal High-Performance Compute (HPC) farms in all customer locations.
Developing and executing the three-year compute roadmap and planning annual capacity growth for on-premises server farm in San Jose.
Education
BS/MS in computer science or related field