.NET-Based Site Reliability Engineering & Cloud Operations Platform
100% Remote
Long Term
Must Haves:
• Strong web development skills with a focus in C#/.NET
• Strong operations experience with the skills below but also someone who currently has a deep level understanding in AT LEAST one of the following: networking, any cloud (AWS is what they use now), automation (Ansible, Puppet, Chef, Jenkins, etc.), performance engineering OR APM (Dynatrace, New Relic, Sumo Logic), if they have good skills in a toolset that goes along with an SRE.
• Someone who worked as a developer and now works as a Site Reliability Engineer, Cloud or DevOps Engineer is ideal
Nice to Have:
• Dynatrace or New Relic (working with observability, installing, monitoring production tools with these) – huge plus
• Splunk
Job Summary
This client has a team of 4-5 Software Developer/Site Reliability Engineer, each with at least a mid-senior web developer skillset needed, plus, a strong general knowledge of operations experience, so a broad range of experience is a MUST. Some strong general knowledge as a Site Reliability Engineer in some of the below skillsets:
• Monitoring/Observability
• APM
• Performance optimization
• Cloud/Infrastructure capabilities
• Automation
• Containerization
Responsibilities:
• Each of these SREs will be embedded into a separate scrum team on a rotation where most of their time will be dedicated to managing operations (Cloud environment, DevOps tools, networking, infrastructure capabilities, root cause analysis, customer facing, on-call, knowledge sharing with each other, coordinating knowledge transfer sessions etc.), including some operational development efforts within .NET programming.
• Development efforts will initially focus on PBIs that improve observability, dashboarding, performance or other site reliability concerns. However, after clearing technical debt the developers will be allowed to take on features or other work that keeps them up to date on relevant technologies/methodologies/products that the Scrum team is working on
• The SREs will report to the Manager of Site Reliability and will collaborate extensively with each other and be part of an on-call rotation that may require 4-5 after-hours work per week as part of an on-call rotation.
• On call Requirements: They would be expected to be on call 1 of every 4 or 5 weeks.
• Additionally, the SRE will lead the day-of-deployment monitoring and participate in Go/No-Go calls.