Overview
On Site
$CAD $ 50 / hr
Contract - W2
Contract - to 05/31/2026
Skills
Site Reliability Engineer
Job Details
Job Description:
- Provide hands-on SRE technical support on squad level, providing 24x7 SRE support.
- Drive transformation by continuously looking for ways to automate existing processes.
- Track, audit, monitor and implement on technical work streams.
- Act as portfolio SME (Subject Matter Expert) understand & document common components, core functionalities, infrastructure of supported applications.
- Be an escalation point in the on-call rotation, and support our maintenance, scheduled work, support and release deployment requirements.
- Help in incident management and problem management for applications in scope and RCA Action items fulfillment/ownership.
- Focus on Continuous improvement and technical standards Drive improvements in productivity, monitoring, tooling and best practices.
- Candidateage technology currency (server patching, certificate renewal, compliance, etc.) with keen eye on automating opportunities.
- Drive best-in-class technical solutions by tracking closely industry leading solutions and applying to client environment and needs.
- Leverage the value in unit, department, and enterprise wide teams to develop better solutions and achieve a cross enterprise mindset.
- Contribute to drive the overall SRE stXXgy, owning roadmap build.
- 2-5 years of experience as SRE.
- 4-5 years of experience in related field.
- A Bachelor's degree in Computer Science or related technical field (Example: Mathematics/Engineering/Physics), or equivalent practical experience.
- Advanced knowledge of the following SRE practices and technologies:
- Python, YAML, Candidatell scripting.
- Azure, Linux.
- Dynatrace, Prometheus, Pager Duty, Moog, Client , Elastic, Azure monitor.
- Chaos Engineering.
- MQ, Kafka.
- Perform production support role, including off-hours support.
- Ability to influence at the Senior and/or Principal level.
- In-depth hands-on experience in a variety of SRE tools (Ansible, Azure Automation, Catchpoint).
- Provide hands-on SRE with 24x7 SRE support, including incident management, problem management, root cause analysis, monitoring, alerting, and maintenance of infrastructure, compliance.
- Track, audit, monitor and implement on technical work streams.
- Act as portfolio SME (Subject Matter Expert) understand & document common components, core functionalities, infrastructure of supported applications.
- Be an escalation point in the on-call rotation, and support our maintenance, scheduled work, support and release deployment requirements.
- Lead in incident management and problem management for applications in scope and RCA Action items fulfillment/ownership.
- Focus on Continuous improvement and technical standards Drive improvements in productivity, monitoring, tooling and best practices.
- Manage technology currency (server patching, certificate renewal, compliance, etc.) with keen eye on automating opportunities.
- Drive best-in-class technical solutions by tracking closely industry leading solutions and applying to client environment and needs.
- Leverage the value in unit, department, and enterprise wide teams to develop better solutions and achieve a cross enterprise mindset.
- Develop SRE solutions (monitoring and alerting, machine learning anomaly detection, self-healing and reliability testing).
- Apply design-thinking and agile mindset in working with SREs, Scrum Masters and Incident Leads.
- Contribute to and leverage best practices in SRE.
- Simplifies development by building repeatable solutions to manual tasks.
- Supports unit's goals to adopt automation solutions for applications in scope.
- Perform production support role, including off-hours support and rotational on-call support to be compensated accordingly with overtime pay, lieu time, and on-call allowance.
- Assist in incident management and problem management for applications in scope.
- Evaluate continuously what went well, what went wrong, what can be done to improve and prevent in future.
- Maintain technology currency (perform server patching, certificate renewal, etc.) with keen eye on automating opportunities.
- Ensure availability and uptime of applications in scope, as per service level objectives.
- Ensure compliance of all systems and applications in scope, including maintaining segregation of duties.
- Support initiatives outside of application or squad level scope Consult on products build to other teams in RBPT and enterprise.
- Stay abreast of technology change and learn constantly, through official training assignments and self-assigned learning.
- Provide demos to team at large of new technology findings.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.