Overview
Skills
Job Details
job description:
we are seeking an experienced infrastructure architect with extensive background in high performance computing (hpc), linux systems, storage solutions, and automation frameworks. the ideal candidate will be responsible for designing, implementing, and maintaining secure, scalable, and resilient hpc infrastructure and storage systems, supporting scientific computing environments, managing cloud and on-prem systems, and ensuring best practices in disaster recovery and data security.
responsibilities:
design, implement, and maintain hpc infrastructure and security
manage san/nas storage systems, backups, and virtualization infrastructure
support enterprise backup, dr, and continuity plans
configure and manage automation tools (ansible, puppet, chef)
maintain high-speed network storage systems (e.g., mellanox switches, clustered nas)
support cloud infrastructure (compute engines, storage buckets)
manage sql and nosql databases (e.g., postgresql, mysql, oracle)
assist teams in utilizing computing and storage resources
collaborate with labs and dtmb to manage computing infrastructure
review system logs, monitor resource usage, and address anomalies
participate in failover and dr planning/testing
required skills & experience:
10+ years of experience in linux system administration (ubuntu, cli, firewalls, memory, vm, etc.)
10+ years of experience with scripting languages (bash, python, r)
10+ years with hpc environment setup and maintenance
10+ years experience with workload managers (e.g., slurm)
deep knowledge in database setup and administration (postgresql, mysql, etc.)
hands-on experience with network appliances and clustered storage
experience in backup/recovery systems and disaster recovery planning
familiarity with cloud infrastructure setup (aws, Google Cloud Platform, or azure preferred)
experience in automation & configuration management (ansible, puppet, nextflow)
knowledge of containerization tools (docker, singularity)
strong troubleshooting and log analysis skills (iis, dynatrace, etc.)
experience in reviewing config files (e.g., web.config)
exposure to hl7 messaging, cloudflare, forcepoint (rule sets like c86), junction configurations
experience assisting with failover and dr implementation/testing
familiarity with cdc hosted applications is a plus