Infrastructure Operations Engineer remote

Remote • Posted 3 hours ago • Updated 3 hours ago
Full Time
On-site
$150-170K
Fitment

Dice Job Match Score™

🤯 Applying directly to the forehead...

Job Details

Skills

  • IaaS
  • High Performance Computing
  • System Monitoring
  • Data Centers
  • Firmware
  • Patch Management
  • RMA
  • Computer Hardware
  • Standard Operating Procedure
  • Scripting
  • Backup
  • Testing
  • Data Integrity
  • Change Management
  • Collaboration
  • Communication
  • Customer Support
  • Hardware Support
  • Leadership
  • Continuous Improvement
  • Process Improvement
  • Documentation
  • Knowledge Base
  • Process Optimization
  • Regulatory Compliance
  • Training
  • Computer Science
  • Information Technology
  • System Administration
  • Server Hardware
  • Network
  • Interfaces
  • Linux Administration
  • Shell Scripting
  • Process Management
  • Log Analysis
  • Optimization
  • Ansible
  • Puppet
  • Progress Chef
  • Incident Management
  • Workflow
  • IPMI
  • BMC
  • Data Storage
  • Computer Networking
  • HPC
  • Conflict Resolution
  • Problem Solving
  • Organizational Skills
  • Management
  • Effective Communication

Summary

Company Overview:
We are a pioneering Infrastructure-as-a-Service (IaaS) company, focusing on delivering High-Performance Computing (HPC) solutions. Our cutting-edge data centers form the core of our operations, empowering us to offer unmatched computational resources to our global clientele. In line with our growth and the expansion of our services, we are on the lookout for an experienced and dedicated Infrastructure Operations Engineer to strengthen our team.

Position Summary:
The Infrastructure Operations Engineer is critical in maintaining and optimizing the infrastructure that powers our high-performance computing environments. This role encompasses proactive system monitoring, maintenance operations, and rapid response to infrastructure incidents. The successful applicant will work closely with cross-functional teams including Network Engineers, deployment teams, and customer support, ensuring maximum uptime and reliability of our infrastructure. Travel to Data Centers located within the US may sometimes be required to support critical maintenance, troubleshooting, or infrastructure upgrades

Key Responsibilities:
Maintenance and Support:
Perform daily infrastructure health checks and monitoring to ensure optimal system performance
Execute firmware updates and patch management across server infrastructure
Handle RMA processes and coordinate hardware replacements with vendors and on-site personnel
Respond to customer support escalations requiring backend infrastructure access, ensuring timely resolution
Document and execute standard operating procedures to maintain operational consistency

System Administration:
Maintain server configurations and automation scripts to streamline operations
Perform routine backup verification and restoration testing to ensure data integrity
Execute change requests following established approval processes and change management protocols
Monitor system performance and resource utilization, identifying optimization opportunities

Collaboration and Communication:
Work closely with Network Engineers, deployment teams, and customer support to resolve complex issues
Coordinate with OEMs and vendors through external portals for hardware support and replacements
Escalate complex technical issues to senior leadership when necessary
Participate in post-incident reviews and contribute to continuous improvement initiatives

Documentation and Process Improvement:
Maintain thorough documentation of infrastructure configurations, procedures, and incident resolutions
Contribute to the development and refinement of operational runbooks and knowledge base articles
Identify opportunities for automation and process optimization

Safety and Compliance:
Adhere to strict data center safety protocols and operational standards
Follow security best practices and compliance requirements for infrastructure access and maintenance
Participate in regular safety training and briefings

Qualifications:
Bachelor's degree in Computer Science, Information Technology, or a related field preferred
3-5 years of experience in infrastructure operations, system administration, or a similar role in enterprise or data center environments
Strong hands-on experience with server hardware components including drives, RAM, power supplies, network interfaces, and server chassis
Proven ability to diagnose and troubleshoot complex infrastructure issues using systematic methodologies
Advanced proficiency with Linux system administration including shell scripting, process management, log analysis, and system optimization
Experience with infrastructure monitoring tools and alerting systems
Familiarity with automation tools (Ansible, Puppet, Chef, or similar) preferred
Experience with ticket management systems and incident response workflows
Knowledge of IPMI, BMC, and out-of-band management tools
Understanding of storage systems, networking fundamentals, and HPC infrastructure preferred
Strong problem-solving abilities with demonstrated capacity to work independently and make sound decisions under pressure
Excellent organizational skills with ability to manage multiple priorities in a fast-paced environment
Effective communication skills, both written and verbal, with ability to explain technical concepts to non-technical stakeholders
Availability to participate in on-call rotation and travel occasionally to data center locations as required
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: cxbcsi
  • Position Id: Job44586
  • Posted 3 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote or Humboldt, Michigan

Today

Full-time

Remote or Minnetonka, Minnesota

Today

Full-time

USD 134,600.00 - 230,800.00 per year

Remote or Framingham, Massachusetts

11d ago

Easy Apply

Full-time

Depends on Experience

Remote or Pennsylvania

Today

Full-time

Search all similar jobs