Sr. DevOps Engineer (Compute Platform)

Remote • Posted 1 hour ago • Updated 1 hour ago
Contract W2
3 Months
No Travel Required
Remote
Depends on Experience
Fitment

Dice Job Match Score™

📋 Comparing job requirements...

Job Details

Skills

  • DevOps
  • Infrastructure Lifecycle Management
  • Kubernetes
  • Linux Administration
  • Operating Systems
  • Python
  • Ubuntu

Summary

Role: Sr. DevOps Engineer (Compute Platform)

Scope:

We are seeking a highly experienced Sr DevOps Engineer – Compute Platforms with strong operational expertise in enterprise compute platforms to implement, and support Kubernetes on baremetal and hypervisor platforms in a private cloud environment. This role focuses on the deployment, automation, support, and continuous improvement of large-scale compute environments spanning bare metal infrastructure, virtualization, private cloud, and Kubernetes platforms using Infrastructure-as-Code and GitOps practices.

This is a deeply technical role requiring expert-level understanding of compute hardware management, Kubernetes, OpenStack, hypervisors and extensive working knowledge onLinux Operating systems. You will also collaborate with platform and SRE teams to maintain secure, performant, and multi-tenant-isolated services that serve high-throughput, mission critical applications.

 

Key Responsibilities:

  • Operate and support enterprise compute platforms across hardware, OS, virtualization, and container orchestration layers
  • Deploy and maintain bare metal server infrastructure for Ubuntu OS with Kubernetes and hypervisors including Openstack & Harvester
  • Implement and maintain PXE-based provisioning environments leveraging Redfish APIs for large-scale server deployments
  • Install, patch, and maintain operating systems including Ubuntu and Harvester
  • Operate and support virtualization and private cloud platforms, including KVM on Ubuntu, OpenStack environments and Harvester HCI
  • Develop Infrastructure-as-Code using Ansible, Terraform, Helm and Git, with Python/Bash automation.
  • Implement CI/CD pipelines for infrastructure updates, patching, upgrades, testing, and rollback.
  • Perform firmware updates, patch management, and hardware health validation
  • Monitor system performance, capacity, and availability; proactively address reliability risks
  • Troubleshoot complex cross-stack issues spanning hardware, OS, virtualization, OpenStack, and Kubernetes
  • Participate in on-call escalation support for complex platform-related issues
  • Collaborate globally on change management, documentation, and operational best practices.
  • Develop and maintain runbooks, operational procedures, and technical documentation

 

Must Have:

  • 6+ years of experience as a DevOps Engineer, Site Reliability Engineer, or Infrastructure Operations Engineer with a strong focus on compute
  • Strong hands-on experience operating bare metal compute environments at scale
  • Experience with PXE boot, automated OS provisioning, and server imaging systems
  • Practical experience supporting Bare Metal as a Service (BMaaS) platforms leveraging Redfish APIs
  • Strong Linux administration skills, especially with Ubuntu
  • Operational experience with virtualization and private cloud platforms, including KVM on Ubuntu, OpenStack operations and troubleshooting, Harvester HCI
  • Experience deploying and operating production Kubernetes environments
  • Expertise with enterprise compute hardware, including Cisco UCS, Dell PowerEdge, Supermicro systems and HPE
  • Proficiency with Infrastructure as Code tools (e.g., Terraform, Ansible, or similar)
  • Experience building or supporting CI/CD pipelines for infrastructure and platform automation
  • Strong scripting skills in Python, Bash, or similar languages
  • Proven troubleshooting and root cause analysis skills in complex distributed systems
  • Excellent written and verbal communication skills
  • Bachelor’s degree in computer science or equivalent professional experience.

 

Nice to Have:

  • OpenStack, Ubuntu KVM administration.
  • BareMetal as a Service (PXE, Redfish).
  • Kubernetes on baremetal
  • Understanding of CIS/NIST security and infrastructure lifecycle management.
  • ITIL Foundation/advanced certifications in support of ITSM standard methodology.
  • Background in telco, edge cloud, or large enterprise environments.
  • Ubuntu Certifications, CNCF Certified Kubernetes Administrator (CKA), Certified Kubernetes Security Specialist (CKS)
  • Master’s degree in computer science, IT, Engineering, or a related field preferred; equivalent experience and relevant industry certifications will also be considered.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10309076
  • Position Id: 8986373
  • Posted 1 hour ago
Contact the job poster
Sumit Gupta

Sumit Gupta

Recruiter @ Nasscomm, Inc.
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote

Today

Easy Apply

Contract

Depends on Experience

Remote

Today

Easy Apply

Contract

Depends on Experience

Remote or Hybrid in Kirkland, Washington

Yesterday

Easy Apply

Third Party, Contract

Depends on Experience

Remote

5d ago

Easy Apply

Contract

60 - 65

Search all similar jobs