Overview
On Site
Depends on Experience
Accepts corp to corp applications
Contract - W2
Contract - 6 Month(s)
Skills
Agile
Ansible
Apache Kafka
Auditing
Collaboration
Job Details
We are looking for Site Reliability Engineer for our client in Toronto, ON
Job Title: Site Reliability Engineer
Job Type: Contract
Job Description:
Responsibilities:
- Track, audit, monitor, and implement technical work streams.
- Act as portfolio SME, documenting common components, core functionalities, and infrastructure of supported applications.
- Serve as an escalation point in the on-call rotation, supporting maintenance, scheduled work, and release deployments.
- Lead incident management and problem management activities, owning RCA action items.
- Drive continuous improvement in productivity, monitoring, tooling, and technical standards.
- Manage technology currency (server patching, certificate renewals, compliance) with a focus on automation opportunities.
- Apply industry-leading technical solutions to meet organizational needs.
- Collaborate across units, departments, and enterprise-wide teams to deliver better solutions.
- Develop SRE solutions such as monitoring, alerting, machine learning anomaly detection, self-healing, and reliability testing.
- Apply design thinking and an agile mindset in collaboration with SREs, Scrum Masters, and Incident Leads.
- Contribute to and leverage best practices in SRE.
- Build repeatable automation solutions to simplify manual tasks.
- Support automation adoption for applications in scope.
- Perform production support, including off-hours support and rotational on-call responsibilities.
- Assist in incident and problem management for applications in scope.
- Continuously evaluate incidents to identify improvements and prevent recurrence.
- Maintain technology currency with focus on automation.
- Ensure availability and uptime of applications in scope per service level objectives.
- Ensure compliance of systems and applications, maintaining segregation of duties.
- Support initiatives outside of application or squad-level scope.
- Provide consultation on product builds to other teams within the enterprise.
- Stay updated on technology changes and continuously learn through training and self-study.
- Provide demos of new technology findings to the team.
- Bachelor s degree in Computer Science, Mathematics, Engineering, Physics, or related technical field, or equivalent practical experience.
- 4 5 years of experience in SRE or related field.
- Advanced knowledge of SRE practices and technologies.
Strong hands-on experience with:
Python, YAML, Shell scripting.- Azure, Linux.
- Dynatrace, Prometheus, PagerDuty, Moog, Splunk, Elastic, Azure Monitor.
- Chaos Engineering.
- MQ, Kafka.
- Ansible, Azure Automation, Catchpoint.
- Experience performing production support including off-hours support.
- Dynatrace Less than 1 year.
- Kafka Less than 1 year.
- Network programming (Perl, Python, Java, etc.) Less than 1 year.
- Microsoft Azure Less than 1 year.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.