Job#: 3025629 Job Description: Role: Cloud Infrastructure / SRE
Duration: Muti-year contract
Location: Hybrid - 4 days/week onsite in SE MI
Description:Our Platform Engineering team builds and operates shared infrastructure and paved paths that enable product teams to deliver software
securely, reliably, and quickly. This role is focused on
cloud infrastructure, DevOps, and Site Reliability Engineering (SRE), with a strong emphasis on software development and automation.
You will help design, build, and operate the core platform capabilities that power multiple teams, treating reliability, security, and developer experience as first-class features.
What You'll Do- Design, build, and operate cloud infrastructure and platform capabilities, including networking, compute, Kubernetes, CI/CD, secrets, certificates, and identity
- Define and improve system reliability using service-level indicators (SLIs), service-level objectives (SLOs), and error budgets
- Implement observability (metrics, logs, traces) with actionable alerting focused on user and business impact
- Build self-service workflows and automation using infrastructure as code, GitOps, and modern build/release pipelines to reduce operational toil
- Improve security and compliance through least-privilege access, secure-by-default patterns, policy-as-code, and continuous hardening
- Participate in on-call rotation, incident response, and post-incident reviews; drive systemic fixes and improve runbook quality
- Partner with application teams to improve deployability, resilience, and cost efficiency through capacity planning, autoscaling, and graceful degradation
Required Qualifications- Hands-on experience operating production cloud platforms (Google Cloud Platform, AWS, or Azure) with an SRE mindset
- Strong fundamentals in Linux, networking, distributed systems, and debugging complex production issues
- Proficiency with infrastructure as code and automation (e.g., Terraform, Helm/Kustomize, GitOps tooling)
- Experience with containers and orchestration (Docker, Kubernetes) and modern CI/CD pipelines
- Programming and scripting experience (e.g., Python, Go, Java, TypeScript) to build tools and automate workflows
- Strong communication skills, effective incident leadership, and a customer-focused approach to platform engineering
Preferred Qualifications- Experience defining SLIs/SLOs and implementing SLO-based alerting and dashboards
- Observability platform experience (e.g., PrometheGrafana, OpenTelemetry, centralized logging)
- Experience with policy-as-code and supply chain security (e.g., OPA/Rego, SLSA concepts, SBOMs, artifact signing)
- Experience building golden paths (container images, templates, reference architectures, paved pipelines) adopted by multiple teams
- Cost optimization experience, including FinOps practices, capacity forecasting, and right-sizing
How We Work- Automate first: Eliminate repeatable manual work and continuously measure and reduce toil
- Reliability is a feature: Design for failure using timeouts, retries with jitter, idempotency, and graceful degradation
- Small, safe changes: Incremental delivery with clear rollback strategies
- Engineering excellence: Design reviews, blameless postmortems, and strong documentation and runbooks
What Success Looks Like- Platform capabilities are easy to adopt, well-documented, and measurably reduce lead time for change
- Reliability improves over time through better SLO attainment, reduced incident frequency/severity, and faster MTTR
- Security posture improves via secure-by-default patterns and automated controls
Experience & Education- Experience Level: Engineer 3
- 6+ years overall IT experience
- 4+ years in software development
- Practical experience in at least two programming languages, or advanced expertise in one
Primary Skill Expectations (Expanded)- Cloud Infrastructure: Proven experience designing and operating production-grade cloud infrastructure, including networking, IAM, compute, and managed services, with clear understanding of tradeoffs
- Python: Experience building maintainable, production-grade tooling or automation (testable, error-tolerant, and team-owned)
- Google Cloud Platform: Hands-on operation of Google Cloud Platform services in a platform context, including workload identity, policy enforcement, secret management, and security controls
- Platform Support: Experience supporting internal developer platforms, including on-call ownership, incident response, blameless postmortems, and preventative engineering improvements
- Kubernetes: Production experience operating Kubernetes clusters, including upgrades, RBAC, networking, autoscaling, and deep troubleshooting
EEO Employer
Apex Systems is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law. Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law. If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation in using our website for a search or application, please contact our Employee Services Department at or .
Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRated's Best of Staffing in Talent Satisfaction in the United States and Great Place to Work in the United Kingdom and Mexico. Apex uses a virtual recruiter as part of the application process. Click for more details.
Apex Benefits Overview: Apex offers a range of supplemental benefits, including medical, dental, vision, life, disability, and other insurance plans that offer an optional layer of financial protection. We offer an ESPP (employee stock purchase program) and a 401K program which allows you to contribute typically within 30 days of starting, with a company match after 12 months of tenure. Apex also offers a HSA (Health Savings Account on the HDHP plan), a SupportLinc Employee Assistance Program (EAP) with up to 8 free counseling sessions, a corporate discount savings program and other discounts. In terms of professional development, Apex hosts an on-demand training program, provides access to certification prep and a library of technical and leadership courses/books/seminars once you have 6+ months of tenure, and certification discounts and other perks to associations that include CompTIA and IIBA. Apex has a dedicated customer service team for our Consultants that can address questions around benefits and other resources, as well as a certified Career Coach. You can access a full list of our benefits, programs, support teams and resources within our 'Welcome Packet' as well, which an Apex team member can provide.