Senior DevOps / SRE Engineer

• Posted 16 hours ago • Updated 16 hours ago
Full Time
On-site
USD $140,875.00 - 153,750.00 per year
Fitment

Dice Job Match Score™

👤 Reviewing your profile...

Job Details

Skills

  • Business Strategy
  • FOCUS
  • Due Diligence
  • Private Equity
  • Decision-making
  • Innovation
  • Analytical Skill
  • Proprietary Software
  • Reliability Engineering
  • SAFE
  • Promotions
  • Computer Science
  • Information Systems
  • DevOps
  • Cloud Computing
  • Provisioning
  • Leadership
  • Security Controls
  • Collaboration
  • RBAC
  • Project Management
  • GitHub
  • Amazon S3
  • Amazon DynamoDB
  • Workflow
  • Grafana
  • Dashboard
  • Routing
  • PKI
  • Management
  • Auditing
  • Log Management
  • Supply Chain Management
  • Oracle Policy Automation
  • Network
  • Scripting
  • Python
  • Bash
  • Generative Artificial Intelligence (AI)
  • Continuous Integration
  • Continuous Delivery
  • Code Review
  • Public Relations
  • Risk Assessment
  • Terraform
  • Kubernetes
  • Regulatory Compliance
  • Incident Management
  • Artificial Intelligence
  • Capacity Management
  • Modeling
  • Training
  • Insurance

Summary

Description & Requirements

WHAT MAKES US A GREAT PLACE TO WORK

We are proud to be consistently recognized as one of the world's best places to work. We are currently the top ranked consulting firm on Glassdoor's Best Places to Work list and have earned the #1 overall spot a record seven times.

Extraordinary teams are at the heart of our business strategy, but these don't happen by chance. They require intentional focus on bringing together a broad set of backgrounds, cultures, experiences, perspectives, and skills in a supportive and inclusive work environment. We hire people with exceptional talent and create an environment in which every individual can thrive professionally and personally.

WHO YOU'LL WORK WITH

As the premier consulting partner for the private equity industry, Bain's PEG boasts a global practice that is over three times larger than any competitor. Our network of over 1,000 professionals supports private equity and institutional investor clients through every stage of the investment life cycle, from deal generation and due diligence to portfolio value creation and exit planning.

Bain & Company is developing a suite of cutting-edge data and software solutions designed to revolutionize how the private equity industry uses data for investment insights and decision-making.

The PEG Innovation team's mission is to create analytical solutions for Bain clients, teams, and the broader institutional investor space using proprietary software and data products. This includes the development, commercialization, and daily management of Bain's proprietary datasets, data, and software businesses.

WHERE YOU'LL FIT WITHIN THE TEAM

Senior DevOps / SRE Engineers own the CI/CD pipelines, GitOps infrastructure, Kubernetes operations, and reliability engineering practices that keep the PE platform running at production quality. You make it safe to deploy frequently and easy to recover when things go wrong. You work closely with Platform Engineering, Data Platform, and Product squads to ensure every team can ship confidently and operate their services without heroics.

WHAT YOU'LL DO

Core Platform Reliability, Delivery, and Operations (80%)
  • Design, build, and maintain CI/CD pipelines across all repositories using reusable GitHub Actions workflows.
  • Own the ArgoCD GitOps configuration; manage application promotion from staging to production.
  • Operate and upgrade the EKS cluster; manage node groups, Karpenter provisioners, and cluster add-ons.
  • Maintain the Terraform estate across all environments; review and apply infrastructure changes via Atlantis.
  • Define and maintain SLOs, alerting rules, and Grafana dashboards for all platform services.
  • Operate and maintain HashiCorp Vault; manage auth backends, policies, and secret engine configuration.
  • Implement and maintain supply chain security controls: image scanning, signing, SBOM generation, and OPA policy enforcement.
  • Collaborate with the Security Engineer on network policy, egress controls, and compliance requirements.
  • Participate in on-call rotation; lead incident response and post-incident review process.

Other (20%)
  • Automate repeatable operational work; reduce manual fixes through tooling and runbook automation.
  • Document runbooks proactively and keep them current as systems evolve.
  • Use AI tooling to draft infrastructure code and runbook content, validating outputs against security and compliance standards before merging.
  • Partner with product and engineering teams to tune reliability practices (SLOs, alerting thresholds, deployment safety checks) and to remove friction from developer workflows.
  • Communicate clearly during incidents: calm, factual, and action-oriented.

ABOUT YOU
  • Bachelor's degree in Computer Science, Engineering, Information Systems, or a related field (or equivalent practical experience).
  • 6+ years of experience in DevOps, SRE, Platform Engineering, or Production Operations roles supporting cloud-hosted, multi-service platforms.
  • Demonstrated experience owning production CI/CD, GitOps, and Kubernetes operations for multi-service platforms.
  • Experience operating and upgrading Kubernetes clusters (EKS preferred) and managing autoscaling/provisioning (e.g., Karpenter) in production.
  • Experience managing infrastructure-as-code at scale (Terraform), including state management and PR-driven apply workflows (e.g., Atlantis).
  • Track record of implementing observability and reliability practices: SLO definition, alert tuning, dashboards, incident response leadership, and post-incident reviews.
  • Experience operating secrets management systems (HashiCorp Vault preferred) and implementing security controls in delivery pipelines.
  • Strong cross-functional collaboration skills; able to enable multiple squads to deploy safely and operate services without heroics.

SRE/Platform Engineering
  • Expert-level Kubernetes: cluster operations, upgrades, node group management (Karpenter), namespace isolation, RBAC, PodDisruptionBudgets, and topology spread.
  • GitOps: ArgoCD configuration, Application and Project management, sync policies, drift detection, and automated rollback patterns.
  • CI/CD: GitHub Actions (reusable workflows, matrix builds, secrets handling, environment protection rules, deployment gates).
  • Infrastructure as code: Terraform at production scale (module design, state management using S3 + DynamoDB locking, Atlantis PR-driven workflows).
  • Service mesh: Istio (traffic management, mTLS policy, AuthorizationPolicy, circuit breaking, observability integration).
  • Autoscaling and capacity: KEDA and Karpenter (event-driven autoscaling, Spot instance management, bin-packing, interruption handling).
  • Observability: Prometheus, Grafana (dashboard-as-code), Loki, Tempo, Alertmanager (routing, inhibition, grouping).
  • Secrets management: HashiCorp Vault (auth backends, dynamic secret engines, PKI management, audit log management).
  • Container and supply chain security: Trivy scanning, Cosign image signing, SBOM generation, OPA/Gatekeeper policy authoring, Cilium network policy.
  • Scripting: strong Python and Bash for automation, tooling, and runbook automation.

Generative AI and agentic systems
  • Integrates AI-powered quality gates into CI/CD pipelines (e.g., automated code review bots, LLM-assisted security scanning, agent-generated PR summaries for change risk assessment).
  • Uses AI agents to accelerate Terraform modules, Kubernetes manifests, and Helm chart scaffolding; reviews outputs against security and compliance standards before merging.
  • Familiar with AI-assisted incident response: using LLMs to correlate logs, suggest runbook steps, and draft post-incident reviews from structured incident data.
  • Contributes to Prompt Execution Sandbox and Agent Gateway infrastructure requirements from a reliability and security posture perspective.
  • Uses AI tooling to accelerate SLO analysis, alert rule tuning, and capacity planning modelling.

General
  • Automates anything done more than once; prioritizes reliability and repeatability over manual fixes.
  • Treats SLOs as commitments, not aspirations; raises reliability concerns before they become incidents.
  • Documents and maintains runbooks as part of delivery, not as an afterthought.
  • Communicates clearly during incidents and drives structured follow-through through post-incident reviews.
  • This role follows a hybrid model, requiring in-office presence at least 1 day per week

U.S. COMPENSATION INFORMATION

Compensation for this role includes base salary, annual discretionary performance bonus, 401(k) plan with an annual employer contribution based on years of service and Bain's best in class benefits package (details listed below).

Some local governments in the United States require a good-faith, reasonable salary range be included in job postings for open roles. The estimated annualized compensation for this role is as follows:

In Atlanta, the good-faith, reasonable annualized full-time salary range for this role is between $140,875 - $153,750

In Texas, the good-faith, reasonable annualized full-time salary range for this role is between $147,625 - $161,250

In Chicago, the good-faith, reasonable annualized full-time salary range for this role is between $155,125 - $169,250

Placement within these ranges will vary based on factors such as experience, education, training, and skill level.

Compensation also includes a discretionary annual performance bonus, 401(k) plan with employer contribution, and Bain's best-in-class benefits-including full premium coverage for medical, dental, and vision, generous paid time off, and more.

Annual discretionary performance bonus

This role may also be eligible for other elements of discretionary compensation

4.5% 401(k) company contribution, which increases after 3 years of service and is 100% vested upon start date

Bain & Company's comprehensive benefits and wellness program is designed to help employees achieve personal independence, protection and stability in the areas most important to you and your family.

Bain pays 100% individual employee premiums for medical, dental and vision programs, offering one of the most comprehensive medical plans for employees without impacting your paycheck

Generous paid time off, including parental leave, sick leave and paid holidays

Fully vested 401(k) company contribution

Paid Life and Long-Term Disability insurance

Annual fitness reimbursements
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 90922487
  • Position Id: 24034705
  • Posted 16 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Atlanta, Georgia

Today

Full-time

USD 130,800.00 - 241,000.00 per year

Atlanta, Georgia

Today

Full-time

USD 139,900.00 - 274,800.00 per year

California

Today

Full-time

Remote

Today

Full-time

USD 100,000.00 - 110,000.00 per year

Search all similar jobs