Overview
Skills
Job Details
Site Reliability Engineer
This is a Full-Time Remote Position supporting a Large Healthcare Organization. Full benefits, PTO, and paid holidays are inclusive of this position.
Number of Available Positions: 3
Requirements: ship Work Status Authorization; 4+ years of professional experience with SRE/DevOps; Expert knowledge of cloud services (Azure); Experience with Docker & Kubernetes; Experience with automation configuration management (Chef, Puppet, or Ansible); Knowledgeable about Encryption- PKI & OWASP; Experience working with RESTful webservices
Salary Target: $130K-$165K/YR (Salary is flexible based on the relevant years of professional experience and background related to the position)
Job Description:
The Site Reliability Engineer will architect, develop, and maintain Company s cloud environment in both the commercial and government cloud. The Site Reliability Engineer will work closely with software engineers, architects, and build/release engineers to architect and maintain a secure, resilient and high performance cloud infrastructure.
- Build, maintain, and operate the Azure hosted platform.
- Work closely with dev teams to Identify and measure SLOs, SLAs and SLIs
- Be a strong contributor to development of platform services including architecture, provisioning, configuration, deployment, and support
- Integration with centralized logging, metrics dashboards, instrumentation, incident monitoring and management
- Drive the initiatives related to Observability, Automation, Infrastructure as code etc.
- Participate in on-call rotation for incident resolution for the platform and/or any dependent components
- React to production deficiencies by continuously implementing automation, self-healing, and real-time monitoring to production systems
- Maintain and improve operational tooling, frameworks,
- Perform root cause analysis and deliver resolution for tools and automation failures
- Build frameworks that test the performance and resiliency of our platform services/tools
- Build/integrate/administer systems and tools that enable engineering teams to observe their applications in production with autonomy (Dashboards, APMs)
- Automate alerts for metrics on performance, cost, vulnerabilities, risk, compliance violations
- Improve processes/runbooks and champion automation of any manual items around support
- Conduct postmortem after production issues
Job Qualifications:
Required
- At least 4 years of experience working within a SRE/DevOps role
- Expert knowledge of a cloud service provider, Azure preferred
- Good knowledge of SRE principles
- Strong awareness of networking and internet protocols
- Understanding of identity and access management (IAM)
- Experience with Docker and Kubernetes (Azure Kubernetes Service preferred) in production
- Experience supporting infrastructure in production cloud environments
- Experience with automated configuration management tools such as Chef, Puppet, or Ansible
- Knowledge of Encryption, Public Key Infrastructure (PKI), understanding of OWASP
- Experience working with RESTful services
- Some Experience with Monitoring tools and technologies (Splunk, Dynatrace, new relic)
- Ability to deploy and operate cloud-native applications in a public cloud (Azure preferred)
- Ability to support software and/or cloud-infrastructure in an on-call rotation basis to help with identification and remediation of technical problems at the root cause
- Strong track record of learning new tools and technologies
- Detail and results-oriented, able to prioritize tasks
Preferred:
- Bachelor s Degree in Computer Science, Information Technology, Software Engineering, or related field