Overview
On Site
Depends on Experience
Contract - W2
Contract - 12 Month(s)
No Travel Required
Unable to Provide Sponsorship
Skills
DevOps
High Availability
Kubernetes
Lifecycle Management
Docker
Documentation
Failover
Computer Hardware
Computer Networking
Debugging
ARM
Ansible
Backup
Collaboration
Hardening
Linux
Management
PaaS
Provisioning
Recovery
Scheduling
Software Management
Storage
Terraform
Workflow
x86
Job Details
Responsibilities
- Cluster Provisioning & Configuration
- Deploy and configure K3s clusters across heterogeneous hardware (bare metal, ARM/x86 nodes, and accelerators).
- Manage hybrid, multi-node topologies (single-node edge clusters, dual-node HA, and multi-node deployments).
- Define and maintain consistent OS images, networking, and storage settings across nodes.
- Declarative Deployment Enablement
- Implement GitOps workflows for declarative application management (e.g., ArgoCD, Flux).
- Define, validate, and manage Kubernetes manifests, Helm charts, and CRDs.
- Automate lifecycle management of applications and infrastructure through declarative pipelines.
- Operations & Reliability
- Monitor and maintain cluster health, including networking, storage, and node availability.
- Implement self-healing, scaling, and failover strategies for hybrid deployments.
- Develop and maintain backup/restore, upgrade, and security hardening processes.
- Integration & Hybrid Architecture
- Enable interoperability across ARM and x86 nodes in the same deployment.
- Configure workloads to leverage specialized accelerators (e.g., GPUs, DPUs, FPGAs).
- Ensure consistent declarative workflows regardless of underlying hardware architecture.
- Collaboration & Documentation
- Work with DevOps, SRE, and PaaS teams to align K3s cluster deployments with platform goals.
- Document cluster provisioning, deployment flows, and operational playbooks.
- Train internal teams on hybrid K3s management and declarative deployment practices.
Deliverables
- Functional, reproducible K3s cluster deployments on hybrid architectures.
- Declarative manifests and GitOps pipelines for application deployment.
- Operational runbooks/playbooks for monitoring, upgrades, and incident recovery.
- Documentation of multi-node topologies, node roles, and cluster configuration.
Required Skills & Experience
- Hands-on experience with K3s/Kubernetes deployment and lifecycle management.
- Strong understanding of multi-node hybrid clusters across x86, ARM, and accelerators.
- Proficiency with GitOps tools (ArgoCD, Flux) and declarative deployment workflows.
- Experience with container runtime configuration (containerd, CRI-O, Docker).
- Familiarity with Linux OS images, networking (CNI), and storage provisioning (CSI).
- Knowledge of infrastructure-as-code tools (Terraform, Ansible, Helm).
- Strong debugging skills for cluster bring-up, networking, and workload scheduling.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.