Apply Now

Sr. Site Reliability Engineer (Compute Platform)

Remote • Posted 1 hour ago • Updated 1 hour ago

Contract W2

Contract Independent

1 Month

No Travel Required

Remote

Depends on Experience

Fitment

Dice Job Match Score™

👤 Reviewing your profile...

Job Details

Skills

Site Reliability
Compute Platform
BareMetal

Summary

Position: Sr. Site Reliability Engineer (Compute Platform)

Location: Remote

Duration: 500 hrs

Client: Ahead

Job ID: AHD

Job description

seeking a highly experienced Sr Site Reliability Engineer – Compute Platforms to design, implement, and support Kubernetes on BareMetal and hypervisor platforms in a private cloud environment. This role is responsible for the architecture, design, and

standardization of enterprise compute and hypervisor environments spanning bare metal infrastructure, operating systems, hypervisors, private cloud orchestration, and Kubernetes using Infrastructure-as-Code and GitOps practices.

This is a deeply technical role requiring expert-level understanding of compute hardware management, Kubernetes, OpenStack, hypervisors and extensive working knowledge on Linux Operating systems. You will also collaborate with platform and SRE teams to maintain secure, performant, and multi-tenant-isolated services that serve high-throughput, mission critical applications.

KEY RESPONSIBILITIES

• Lead the architecture and design of enterprise compute and hypervisor platform solutions across hardware, OS, virtualization, cloud orchestration, and container orchestration layers

• Define standards and automation frameworks for bare metal provisioning and lifecycle management

• Design and implement Bare Metal as a Service (BMaaS) capabilities for scalable infrastructure consumption

• Architect and design Kubernetes platforms on bare metal with QoS and Affinity (ArgoCD)

• Architect and validate automated deployments of operating systems and hypervisors including Ubuntu and Harvester

• Design and maintain PXE-based provisioning environments leveraging Redfish APIs for large-scale server deployments

• Develop Infrastructure-as-Code using Ansible, Terraform, Helm and Git, with Python/Bash automation.

• Implement CI/CD pipelines for infrastructure updates, patching, upgrades, testing, and rollback.

• Design automated workflows for server build, firmware lifecycle management, patching, and hardware validation

• Evaluate and standardize enterprise hardware platforms to meet performance, scalability, and reliability requirements

• Produce detailed high-level and low-level design documentation, build guides, and operational handoff materials

• Perform deep troubleshooting across storage, Kubernetes, hypervisors, networking, and Linux systems

• Partner with operations, network, storage, and platform teams to ensure designs are supportable and production-ready

• Participate in on-call escalation support for complex platform-related issues

• Collaborate globally on change management, documentation, and operational best practices.

Must Have

6+ years of experience in infrastructure engineering, platform engineering, or DevOps with a strong focus on Compute system design

Proven experience designing and automating bare metal compute environments at scale

Strong hands-on experience with PXE boot, network-based OS provisioning, and automated server imaging

Experience implementing or supporting Bare Metal as a Service (BMaaS) platforms

Practical experience using Redfish APIs for hardware provisioning, power management, and remote lifecycle operations

Deep expertise with Ubuntu Linux in enterprise environments

Strong Hands-on experience with KVM hypervisors (Suse Harvester, OpenStack).

Experience designing and deploying production-grade Kubernetes clusters

Strong background with enterprise compute hardware platforms, including Cisco UCS, Dell PowerEdge, Supermicro systems & HPE

Proficiency with Infrastructure as Code tools (e.g., Terraform, Ansible, or similar)

Experience building or supporting CI/CD pipelines for infrastructure and platform automation

Strong scripting skills in Python, Bash, or similar languages

Demonstrated ability to produce clear, structured technical design documentation

Excellent written and verbal communication skills

Bachelor’s degree in computer science or equivalent professional experience.

Nice to Have

OpenStack, Ubuntu KVM administration.

BareMetal as a Service (PXE, Redfish).

Kubernetes on BareMetal

CIS/NIST security and infrastructure lifecycle management.

ITIL Foundation/advanced certifications in support of ITSM standard methodology.

Background in telco, edge cloud, or large enterprise environments.

Ubuntu Certifications, CNCF Certified Kubernetes Administrator (CKA), Certified Kubernetes Security Specialist (CKS)

Master’s degree in computer science, IT, Engineering, or a related field preferred; equivalent experience and relevant industry certifications will also be considered.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10309076
Position Id: AHD2026666579
Posted 1 hour ago

Contact the job poster

Sumit Gupta

Recruiter @ Nasscomm, Inc.

View Profile

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Sr. DevOps Engineer (Compute Platform)

Remote

•

Today

Role: Sr. DevOps Engineer (Compute Platform) Scope: We are seeking a highly experienced Sr DevOps Engineer Compute Platforms with strong operational expertise in enterprise compute platforms to implement, and support Kubernetes on baremetal and hypervisor platforms in a private cloud environment. This role focuses on the deployment, automation, support, and continuous improvement of large-scale compute environments spanning bare metal infrastructure, virtualization, private cloud, and Kubernete

Easy Apply

Contract

Depends on Experience

Sr. Site Reliability Engineer (Storage Platforms)

Remote

•

Today

Must Have 6+ years of experience managing enterprise storage and Kubernetes platforms on Linux.Strong hands-on experience with SDS solutions (Ceph, Longhorn) and storage migrations from legacy systems.Experience with block, file, and object storage, including Fibre Channel and IP-based protocols.Experience with NVMe-oF or iSCSI fabrics.Expert knowledge of Kubernetes and Linux systems (Ubuntu, RHEL/CentOS).Proficiency with Infrastructure-as-Code (IaC) (Ansible, Terraform).Strong scripting skills

Easy Apply

Contract

Depends on Experience

Platform Engineer

Remote or Maryland

•

Today

Job Title: Platform Engineer (AWS, Azure & Kubernetes) Location: Remote Experience: 5+ Years Job Overview We are seeking a highly skilled Platform Engineer with strong experience in AWS, Azure, Kubernetes, and Infrastructure as Code (IaC) to design, implement, and manage scalable multi-cloud platforms. The ideal candidate will have hands-on expertise in cloud infrastructure automation, Kubernetes orchestration, Terraform, and CI/CD pipelines, along with a strong understanding of cloud networkin

Easy Apply

Contract, Third Party

$$58/hr C2C

Senior Staff Engineer

Remote

•

12d ago

Job Title: Senior Staff Engineer Duration:1 Year, prossible extension Location: remote JOb Description: Provide technical leadership on high-impact projects.Influence and coach a distributed team of engineers.Deliver enhancements that drive the cost efficiency of our cloud platform.Facilitate alignment and clarity across teams on goals, outcomes, and timelines.Manage project priorities, deadlines, and deliverables.Contribute to the multi-year strategy to evolve clients Cloud PlatformDesign, deve

Easy Apply

Contract

60 - 70

Search all similar jobs