Site Reliability Engineer


Integration Architects
Dice Job Match Score™
🤯 Applying directly to the forehead...
Job Details
Skills
- Kubernetes
- SRE
- Python
- Amazon Web Services
- CI/CD
- Docker
Summary
We''re looking for a seasoned SRE to join a fast-paced infrastructure team building and scaling a self-service compute platform. You''ll own the platform end-to-end — control plane, reliability/uptime, runtime, and deploy pipelines — enabling engineering teams to provision Kubernetes clusters, workloads, and VMs on demand across public cloud and on-prem environments.
Responsibilities
- Design and operate a multi-cluster Kubernetes platform including controllers, CRDs, and ingress/DNS/TLS automation
- Build and harden platform microservices — CI/CD, SSO, RBAC, secret encryption, and real-time monitoring workflows
- Integrate AI tooling into workloads; build agents and tools to help SRE teams scale and operate efficiently
- Own the production release path: Helm-driven deployments, multi-arch container builds, staged rollouts, and rollback playbooks
- Instrument the platform with audit logging, usage analytics, and automation to support a large internal user base
Required
- 6+ years of DevOps/SRE experience operating production Kubernetes (on-prem or cloud), with depth in CRDs, operators, ingress, and cluster networking
- Experience integrating AI tools with operational workflows
- Strong Python or Go plus working knowledge of TypeScript/React — comfortable across backend services, frontend UX, and infrastructure-as-code
- Production experience with cloud provisioning (AWS or equivalent), identity federation (OIDC/SAML), and secret management
- Solid grounding in relational databases, caching layers, and async networking patterns (SSH tunnels, WebSockets, message queues)
- BS/MS in CS or equivalent, with a track record shipping internal developer platforms and CI/CD pipelines
Nice to Have
- Deep experience with agentic workflows, CLI tooling, and MCP integrations
- Comfort building AI-assisted tooling on platform telemetry — automated runbooks, anomaly detection, or LLM-driven ops workflows
- CI tools (Jenkins, GitLab CI), CD tools (Argo, Flux), monitoring (PrometheGrafana, Victoria Metrics, Datadog, Splunk, Kibana)
- Linux, multi-tenant environments, VM administration, and Kubernetes orchestration
- Strong documentation habits — runbooks and docs that let the next engineer ship on day one
- Dice Id: 501521822
- Position Id: 8975309
- Posted 3 days ago
Company Info
About Integration Architects
Integration Architects helps organizations move faster by aligning technology, data, and talent with their strategy.
We provide flexible access to experienced professionals across technology and revenue functions, so our clients can scale quickly without compromising quality or control.
Our team has deep experience in both public and private sectors, including state government, justice systems, and regulated industries such as finance, insurance, and manufacturing.
Core Services: • Staff Augmentation (Technology & Sales) • Data & Software Solutions • System Integration & Modernization
From short-term project support to full delivery teams, we help clients close capacity gaps, execute confidently, and build the systems that power modern business.


Similar Jobs
It looks like there aren't any Similar Jobs for this job yet.
Search all similar jobs