HPC Data Center Operator

Laboratory, Performance, Networking, Linux, Unix, WAN, LAN, Perl, Python, Computer, Security, Test
Full Time

Job Description

Company Description
Join us and make YOUR mark on the World!

Are you interested in joining some of the brightest talent in the world to strengthen the United States' security? Come join Lawrence Livermore National Laboratory (LLNL) where our employees apply their expertise to create solutions for BIG ideas that make our world a better place.

We are committed to a diverse and equitable workforce with an inclusive culture that values and celebrates the diversity of our people, talents, ideas, experiences, and perspectives. This is essential to innovation and creativity for continued success of the Laboratory's mission.
Job Description
Do you love High Performance Computing (HPC)? Would you like to work with four of the fastest HPC systems in the world?

We have an opening for an HPC Data Center Operator to monitor, diagnose, and repair system faults on a large number of high-performance computer (HPC) systems, storage systems and networks, working under minimal supervision. You will interact with other Livermore Computing (LC) staff to remediate problems and provide advanced technical support in a complicated HPC computing and networking environment, working either Swing (4:00pm - 12:00am) or Owl Shift (12:00am - 8:00am). This position is in the Livermore Computing Operations and Networking Group in the LC Division within the Computing Department under the Computation Directorate.

This position will be filled at either the 525.2 or 525.3 level depending on your qualifications. Additional job responsibilities (outlined below) will be assigned if you are selected at the higher level.

In this roleyou will
  • Provide general technical support and monitoring capabilities as part of the operations team for the HPC systems including Sierra, large Linux clusters, file systems, and storage systems.
  • Apply Unix system knowledge along with using a variety of in house and vendor supplied diagnostic tools to monitor and effect basic system repairs.
  • Troubleshoot and document general software, hardware, and network issues, and then apply corrective action/repairs to the problem or escalate as per defined procedures.
  • Utilize the Laboratory's trouble ticketing system, ServiceNow, for problem ticket tracking.
  • Receive, document, and accommodate all customer calls, particularly during off-hours, and resolve customer issues if possible, or escalate to the appropriate level.
  • Perform data center facilities monitoring, problem remediation, and emergency event response during normal daily operation and off-hours.
  • Participate in the decommission process of older HPC systems & system relocation activities.
  • Promote the use of inter-departmental resources for tools, metrics, and common solutions to team members via email and presentations.
  • Perform other duties as assigned.

Additional job responsibilities, at the 525.3 level
  • Provide advanced technical support and monitoring capabilities for the HPC systems clusters, file systems, and storage systems.
  • Troubleshoot moderately complex software, hardware, network and document issues, apply corrective action and repairs to the problem, or notify the appropriate on-call personnel.
  • Perform a variety of high-level technical tasks in the installation, diagnosis, repair and maintenance of clustered computer systems and related file systems and networks.

  • Ability to secure and maintain a U.S. DOE Q-level security clearance which requires U.S. citizenship
  • Associate's degree in a computer-related field or equivalent combination of technical training and experience.
  • Limited experience and knowledge of high-performance computer operating systems, networks (WAN/LAN), and/or networking protocols, such as Infiniband.
  • Proficient verbal and written communication skills necessary to interact with customers and team members with the ability to work independently and as a member of a team.
  • General working knowledge and experience working with Linux system administration, commands and utilities.
  • General working knowledge of parallel systems, distributed systems and/or network protocols, along with raid arrays.
  • Ability to understand and apply mechanical concepts and principles to solve problems with attention to detail.
  • Experience and knowledge of the skills needed for a customer support role to include a focus on listening, rapport-building, friendly and approachable nature, and courtesy and patience.
  • Ability to work all shifts, including weekends and holidays.

Additional qualifications at the 525.3 level
  • Advanced knowledge of HPC operating systems, local area networks, and/or networking protocols such as Infiniband.
  • Advanced knowledge of and experience working with Linux system administration, commands and utilities, including scripting skills (e.g. shells, Perl, Python).
  • Significant experience with high performance systems, parallel systems, distributed systems and/or network protocols, along with raid arrays.

Qualifications We Desire
  • Advanced training, certifications and experience in Linux system administration.
  • Advanced knowledge and experience working with electrical systems.
  • Computer Support background experience.

Additional Information
Why Lawrence Livermore National Laboratory?
  • Included in 2021Best Places to Work by Glassdoor!
  • Work for a premier innovative national Laboratory
  • Comprehensive Benefits Package
  • Flexible schedules (*depending on project needs)
  • Collaborative, creative, inclusive, and fun team environment

Learn more about our company, selection process, position types and security clearances by visiting our Careersite .

Security Clearance

LLNL is a Department of Energy (DOE) and National Nuclear Security Administration (NNSA) Laboratory. Some positions will require a DOE L or Q clearance (please reference Security Clearance requirement above). If you are selected and a clearance is required, wewill initiate a Federal background investigation to determine if youmeet eligibility requirements for access to classified information or matter. In addition, all L or Q cleared employees are subject to random drug testing. An L or Q clearance requires U.S. citizenship. For additional information please see DOE Order 472.2 .

Pre-Employment Drug Test

External applicant(s) selected for this position will be required to pass a post-offer, pre-employment drug test. This includes testing for use of marijuana as Federal Law applies to us as a Federal Contractor.

Equal Employment Opportunity

LLNL is an affirmative action and equal opportunity employer that values and hires a diverse workforce. All qualified applicants will receive consideration for employment without regard to race, color, religion, marital status, national origin, ancestry, sex, sexual orientation, gender identity, disability, medical condition, pregnancy, protected veteran status, age, citizenship, or any other characteristic protected by applicable laws.

If you need assistance and/or a reasonable accommodation during the application or the recruiting process, please submit a request via our online form .

CaliforniaPrivacy Notice

The California Consumer Privacy Act (CCPA) grants privacy rights to all California residents. The law also entitlesjob applicants, employees, and non-employee workers to be notified of what personal information LLNL collects and for what purpose. The Employee Privacy Notice can be accessed here .
Videos To Watch
Dice Id : LLNL
Position Id : REF1139Y
Originally Posted : 2 months ago
Have a Job? Post it