Lead DevOps Software Engineer
Salary: Open + bonus
Location: Chicago, IL
Hybrid: 3 days onsite, 2 days remote
Qualifications
· Bachelor’s degree preferred including 7+ years of related experience
· AWS EC2, Kubernetes, Kafka, Jenkins, Terraform, Ansible, HashiCorp Vault
· Observability tooling such as Prometheus, Grafana, OpenTelemetry, Datadog, or equivalent
· Incident management platforms and on-call tooling (e.g., PagerDuty, OpsGenie)
· Microservices and streaming data-intensive application architecture
· Application architecture, networking, and security in the cloud
· Setting up platforms in AWS for high-performance requirements
· Broad experience in API-based development
· Git and Artifactory for sourcing artifacts
· Multi-AZ, multi-region failover architecture
· Chaos engineering principles and tooling (e.g., Chaos Monkey, Gremlin, LitmusChaos)
· Fluent with different data formats and structures: JSON, Protobuf, Avro
· SQL and NoSQL databases, in-memory data stores
· Java/Python/Scala/Golang software development
· Enterprise architecture frameworks such as TOGAF
Responsibilities
· Guides the implementation using CI/CD pipelines in Kubernetes environment
· Directs review, configuration, and execution of Terraform and Ansible automation pipelines delivered by product teams
· Guides the setup of common infrastructure platforms like multi-region Kubernetes and Kafka clusters
· Elicits requirements for application deployment and sizing to manage expected workloads
· Defines and enforces Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets in collaboration with product teams
· Leads blameless post-mortems and drives resolution of action items to reduce repeat incidents
· Designs and implements observability frameworks covering metrics, logs, and distributed tracing across all platform services
· Drives toil reduction initiatives by identifying and automating repetitive operational work
· Partners with product teams to embed reliability requirements and non-functional requirements (NFRs) early in the software development lifecycle
· Monitors application performance and tunes systems working with product teams
· Confers with product team leads and practitioners to create deployment and reliability plans
· Confers with Enterprise Architecture and Renaissance architecture teams to devise implementation architecture