We are seeking an experienced Senior Linux Systems Engineer will support and enhance a Linux-based High-Performance Computing (HPC) environment that underpins advanced statistical modeling and economic research across multiple business units. This platform enables data scientists, economists, and analysts to perform large-scale computations using modern analytical tools. The role focuses on ensuring platform stability, scalability, performance, and security while continuously evolving the environment to meet growing analytical demands.
Position Responsibilities
· Administer, maintain, and optimize Linux-based HPC infrastructure to ensure high availability, performance, and reliability.
· Perform system patching, upgrades, configuration management, and security hardening in alignment with enterprise standards.
· Monitor system health, troubleshoot complex issues, and implement performance tuning across compute, storage, and network resources.
· Provide Tier 3 support for the analytics platform, resolving advanced technical issues and minimizing operational disruptions.
· Design, implement, and manage automation solutions using Ansible and Ansible Automation Platform to streamline system operations.
· Support HPC workload management frameworks (e.g., SLURM, Open OnDemand) and ensure efficient job scheduling and resource utilization.
· Collaborate with data scientists, economists, and business stakeholders to translate analytical requirements into scalable technical solutions.
· Implement and maintain security controls, conduct vulnerability assessments, and ensure compliance with regulatory and organizational standards.
· Contribute to platform architecture, capacity planning, and continuous improvement initiatives, including system enhancements and new feature deployments.
· Develop and maintain technical documentation, operational procedures, and knowledge base artifacts.
· Participate in an on-call rotation to support critical systems and ensure uninterrupted platform operations.
Position Qualifications
· Strong expertise in Linux system administration (e.g., Red Hat, CentOS, or Ubuntu) and shell scripting.
· Hands-on experience with Ansible and Ansible Automation Platform for configuration management and automation.
· Proven experience supporting High-Performance Computing (HPC) environments, including workload schedulers such as SLURM and user access tools like Open OnDemand.
· Familiarity with statistical and analytical tools such as R, Python, MATLAB, Stata, and SAS within HPC environments.
· Solid understanding of system performance tuning, capacity planning, and troubleshooting in distributed computing environments.
· Experience implementing security best practices, system hardening, and vulnerability management in regulated environments.
· Strong analytical, problem-solving, and troubleshooting skills with a proactive, customer-focused mindset.
· Excellent communication skills with the ability to collaborate effectively across technical and non-technical teams.
___________________________________________________________________
No Phone calls Please
Please apply with your resume in a word file including all your contact details