Stefanini Group is hiring!
Stefanini is looking for a Platform Engineer in Dearborn, MI (Onsite)
For quick apply, please reach out to Adil Khan at /
We are looking for a Platform Engineer to help product teams deliver securely, reliably, and quickly. This role leans toward cloud infrastructure, DevOps, and Site Reliability Engineering (SRE), with strong software development skills.
ResponsibilitiesDesign and Operate Cloud Infrastructure: Build and manage cloud platforms, including networking, compute, Kubernetes, CI/CD, secrets, and identity.Define Reliability Metrics: Establish and enhance SLIs, SLOs, and error budgets.Implement Observability: Set up metrics, logs, and traces with actionable alerts.Automate Workflows: Develop self-service workflows (e.g., infrastructure as code, GitOps, CI/CD pipelines) to reduce manual efforts.Enhance Security & Compliance: Drive least-privilege access, secure defaults, and policy-as-code.Incident Management: Participate in on-call rotations, handle incidents, lead postmortems, and deliver fixes.Collaborate with Teams: Partner with application teams to improve deployability, resilience, and cost efficiency.
Experience RequiredManaged production-grade infrastructure on major cloud platforms like Google Cloud Platform. Designed multi-region Google Cloud Platform networks using VPCs, subnets, firewalls, and NAT, managed with Terraform and GitOps.Strong understanding of networking, IAM boundaries, and tradeoffs between managed services and self-hosted solutions.Built production-grade Python tools or automation with structured, testable, and maintainable code. Automated tasks like querying Google Cloud Platform Asset Inventory, generating IAM reports, and creating tickets with retry/error handling.Operated Google Cloud Platform services like Cloud Run, Workload Identity, Secret Manager, and VPC Service Controls. Applied Google Cloud Platform-specific reliability and security patterns with hands-on experience.Supported internal developer teams by handling on-call rotations, resolving incidents, and delivering systemic fixes.Managed production Kubernetes clusters, performed upgrades, configured policies, and debugged issues. Configured HPA/VPA for autoscaling and troubleshot pod scheduling and service mesh connectivity. Strong understanding of Kubernetes control planes for debugging and management
Experience PreferredWrote Go for platform tooling or infrastructure automation. Developed Kubernetes admission webhooks to enforce security policies or CLI tools for secret management. Produced idiomatic Go with proper error handling, context propagation, and unit tests.Contributed to or led the design of multi-team or multi-service platform architectures. Designed shared service networks (hub-and-spoke models), CI/CD templates, and service mesh configurations. Documented architecture patterns adopted by teams and articulated tradeoffs in design reviews.Implemented SRE practices, including SLIs, SLOs, and error budgets. Configured SLO-based alerting in PrometheGrafana and used burn rate alerts for incident management.
Required SkillsCloud Platforms: Experience managing production-grade systems on Google Cloud Platform, AWS, or Azure with an SRE mindset.Linux & Networking: Strong fundamentals in Linux, distributed systems, and debugging production issues.Infrastructure as Code: Skilled in tools like Terraform, Helm, Kustomize, and GitOps practices.Containers & Orchestration: Proficient in Docker, Kubernetes, and modern CI/CD tools.Programming: Experience with languages like Python, Go, Java, or TypeScript for building tools and automation.Communication: Clear communicator with effective incident leadership and a customer-first approach.
Preferred SkillsSLI/SLO Expertise: Experience defining SLIs/SLOs and implementing SLO-based alerting and dashboards.Observability Platforms: Familiarity with PrometheGrafana, OpenTelemetry, and centralized logging.Security Practices: Knowledge of policy-as-code, supply chain security, SBOMs, and artifact signing.Standardized Solutions: Experience creating reusable golden paths (e.g., container images, templates, pipelines).Cost Optimization: Skilled in FinOps practices, capacity planning, and multi-tenant platform controls.Go: Proficient in writing idiomatic Go for platform tooling or infrastructure automation.Cloud Architecture: Experience designing multi-service or multi-team platform architectures.Reliability Engineering: Practical implementation of SRE practices, including SLIs, SLOs, error budgets, and alerting.
Education RequiredBachelor's Degree
**Listed salary ranges may vary based on experience, qualifications, and local market. Also, some positions may include bonuses or other incentives***
Stefanini takes pride in hiring top talent and developing relationships with our future employees. Our talent acquisition teams will never make an offer of employment without having a phone conversation with you. Those face-to-face conversations will involve a description of the job for which you have applied. We will also speak with you about the process, including interviews and job offers.
About Stefanini Group
The Stefanini Group is a global provider of offshore, onshore and near shore outsourcing, IT digital consulting, systems integration, application, and strategic staffing services to Fortune 1000 enterprises around the world. Our presence is in countries like the Americas, Europe, Africa, and Asia, and more than four hundred clients across a broad spectrum of markets, including financial services, manufacturing, telecommunications, chemical services, technology, public sector, and utilities. Stefanini is a CMM level 5, IT consulting company with a global presence. We are a CMM Level 5 company.
#LI-AK3
#LI-ONSITE
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
- Dice Id: 10106616
- Position Id: 63135
- Posted 13 hours ago