Site Reliability Engineer


TrustMinds, Inc.
Dice Job Match Score™
👾 Reticulating splines...
Job Details
Skills
- Site Reliability
Summary
Site Reliability Engineer
Remote: Yes
SUMMARY
Client is seeking a Site Reliability Engineer (SRE) to strengthen the scalability, reliability, and performance of our cloud-based products and platforms. In this role, you will apply software engineering principles to operations problems—building automation, improving observability, and ensuring systems run reliably at scale. You will collaborate with development, cloud, and security teams to enhance deployment efficiency, reduce downtime, and create self-healing systems. This is a hands-on, cross-functional role ideal for engineers who think in terms of systems, automation, and continuous improvement.
JOB DESCRIPTION
Reliability & Performance
• Design and implement monitoring, alerting, and reliability tooling using CloudWatch, Grafana, Prometheus, Datadog, or ELK.
• Analyze production performance, capacity, and error budgets to maintain agreed SLIs/SLOs.
• Implement automated health checks, scaling rules, and self-recovery mechanisms to minimize manual intervention.
• Drive root cause analysis (RCA) and post-incident reviews, ensuring permanent fixes and documentation.
Automation & Operations
• Build automation for deployment, configuration, and infrastructure management using Terraform, Ansible, or CloudFormation.
• Develop and maintain CI/CD pipelines with GitHub Actions, GitLab CI, or Jenkins.
• Manage and optimize containerized and serverless workloads (Kubernetes, ECS, EKS, Lambda).
• Implement automated rollbacks, blue/green deployments, and canary releases.
Incident Response & On-Call
• Participate in 24/7 on-call rotation for critical systems and lead incident management for your domain.
• Reduce mean time to detection (MTTD) and mean time to recovery (MTTR) through proactive automation and observability.
• Develop runbooks and operational playbooks for global SRE teams.
Security & Compliance
• Embed security practices into automation and deployment processes.
• Ensure systems adhere to ISO 27001 and SOC 2 requirements through continuous compliance monitoring.
• Manage IAM policies, secrets, and network configurations securely and efficiently.
Collaboration & Continuous Improvement
• Partner with developers to design for operability, scalability, and resilience from day one.
• Contribute to cross-team reliability reviews and platform improvement initiatives.
• Champion DevOps and reliability culture across Client’s engineering organization.
QUALIFICATIONS
• 6+ years of experience in Site Reliability, DevOps, or Infrastructure Engineering roles.
• Strong background in AWS (EC2, ECS/EKS, RDS, Lambda, S3, IAM, VPC).
• Proficiency with Infrastructure-as-Code and automation (Terraform, Ansible, CloudFormation).
• Experience with observability tools (Prometheus, Grafana, CloudWatch, ELK, or Datadog).
• Scripting and automation skills (Python, Bash, Go, or PowerShell).
• Solid understanding of networking, DNS, and load balancing.
• Strong troubleshooting, incident management, and root cause analysis skills.
• Excellent communication and collaboration abilities in a cross-functional, distributed environment.
PREFERRED QUALIFICATIONS
• Certifications such as AWS Certified SysOps Administrator, SRE Foundation, or CKA.
• Experience with chaos engineering or resilience testing tools.
• Familiarity with SLO/SLI error budget management.
• Exposure to multi-region, multi-account, or hybrid architectures.
• Background supporting SaaS platforms or regulated environments (SOC 2, HIPAA, GDPR).
- Dice Id: 10336025
- Position Id: 8971666
- Posted 3 hours ago
Company Info
About TrustMinds, Inc.
TrustMinds, Inc. provides Information Technology recruitment and staffing services to organizations worldwide. Headquartered in Michigan; TrustMinds believes in partnering with clients to develop a strong sense of trust and confidence by working towards common goals and objectives. By applying the appropriate mix of people, process and technology, TrustMinds helps its clients achieve their technology and business objectives. Our goal is to work with our clients to help them achieve growth, success and discover opportunities to increase efficiencies, improve quality and reduce costs.
We cater to both permanent and contract staffing requests. We are experts in recruiting the best talent across different technologies and are helping organizations like yourselves in getting the right resources. We pride ourselves in discovering and recruiting the best suited candidates; from a technology, process and domain perspective; from across the industry. Our focus has been in recruiting local talent, thereby keeping the cost of hire in check for our clients. All our recruiters have over 10 years of experience in the industry and are locally based in the USA. Their understanding of the business domain, technology and organization fitment makes them recruit the best professionals from across the board.
We have been working with a very select group of clients; primarily large organizations; for whom we have become an extension of their talent acquisition arm. This has also led us to work very closely with them and give proper focus on their requirements.
We do not carry a bench; hence we will not be pushing our candidates for your requirements. We are expert recruiters who will take your requirements and recruit from the local market.
We look forward to discussing how we can help your organization fulfill its resourcing needs.
Similar Jobs
It looks like there aren't any Similar Jobs for this job yet.
Search all similar jobs