We are seeking a Site Reliability Engineer (SRE) / Cloud DevSecOps Engineer with a strong focus on Infrastructure as Code (IaC) and cloud-native platforms.
Role: Site Reliability Engineer / Cloud DevSecOps Engineer
Type: Full-time
Location & Onsite: Blue Ash, OH (Onsite 5 days/week)
Interview Process: Technical + Manager rounds
Top Skills: Terraform, Kubernetes (AKS/EKS/GKE), CI/CD pipelines (GitHub Actions preferred), Linux administration, Azure/Google Cloud Platform cloud experience, Shell scripting, High-volume web application support
Role Overview:
This hands-on role specializes in building scalable, reliable infrastructure for development and application teams across cloud, on-prem, and store-based environments, partnering with Software Engineers to automate infrastructure, improve deployment processes, and ensure platform stability. Focus is on Terraform, Kubernetes, CI/CD pipelines rather than enterprise-level cloud administration.
Key Responsibilities:
- Design, build, and support scalable infrastructure for cloud-native and hybrid enterprise platforms
- Provision infrastructure using Terraform (IaC)
- Support Kubernetes platforms (AKS, EKS, GKE)
- Build and maintain CI/CD pipelines (GitHub Actions preferred)
- Collaborate with Software Engineers for efficient, secure deployments
- Automate infrastructure provisioning, deployment workflows, and operational processes
- Troubleshoot issues across development, test, and production environments
- Partner with Ops teams on monitoring (Grafana, logging, alerting)
- Maintain technical documentation, architecture diagrams, and operational playbooks
- Participate in on-call rotations and off-hours maintenance
- Promote a culture of collaboration, ownership, reliability, and continuous improvement
- Demonstrate Kroger’s core values: respect, honesty, integrity, diversity, inclusion, safety
Scope Clarification:
- No enterprise-level cloud administration (subscription management, capacity purchasing)
- Monitoring tools primarily owned by centralized Ops team
- Familiarity with Grafana expected but deep observability not required
Required Qualifications:
- Bachelor’s degree in Computer Science or equivalent (8+ years accepted in lieu)
- 4+ years in SRE, DevOps, Cloud Infrastructure
- Strong foundations in CS fundamentals: data structures, algorithms, concurrency, multi-threaded systems
- Experience supporting always-on, high-volume web applications in cloud environments
- Hands-on experience with Azure and/or Google Cloud Platform
- Strong Linux experience: system administration, security, performance tuning, production troubleshooting
- Experience with service-oriented and cloud-native architectures
- Networking fundamentals: TCP/IP, DNS, HTTP/S, VPNs, routing, subnets
- Shell scripting proficiency
- Experience provisioning/supporting managed cloud services, PaaS platforms, CI/CD pipelines
- Proven experience with customer-facing or omnichannel digital platforms
Nice to Have / Preferred Qualifications:
- Master’s/PhD in CS, Information Systems, or related field
- CI/CD tools: Jenkins, Spinnaker, Azure DevOps, TeamCity
- Azure DevOps services: Pipelines, Test Plans, Artifacts
- Traffic/web tools: Nginx, HAProxy, Squid
- Observability: ELK, Datadog, New Relic, Azure Monitor, Grafana, PagerDuty
- Modern infrastructure/messaging: Kafka, RabbitMQ, SQS
- Ansible, Terraform, Docker, Kubernetes
- 2+ years managing cloud infrastructure on Azure, AWS, or Google Cloud Platform
- Experience with high-volume eCommerce/digital commerce platforms
- Microsoft Azure certification
- Industry experience in retail and/or healthcare
Benefits & Time Off:
- 30 days of paid time off annually:
- 15 PTO days
- 5 sick days
- 4 personal days
- 6 paid holidays
This is an excellent opportunity for a hands-on, engineering-focused SRE to contribute to cloud and hybrid infrastructure.
If interested, please share your updated resume and availability for a quick discussion.