Senior Site Reliability Engineer

Overview

On Site
Depends on Experience
Accepts corp to corp applications
Contract - W2
Contract - 6 Month(s)

Skills

Amazon Web Services
Apache Kafka
Business Intelligence
Business Objects
Capital Market
Cloud Computing

Job Details

We are looking for Senior Site Reliability Engineer for our client in Toronto, ON
Job Title: Senior Site Reliability Engineer
Job Location: Toronto, ON
Job Type: Contract
Job Description:
  • The Site Reliability Engineer will be responsible for ensuring the health, reliability, and performance of enterprise applications and IT infrastructure.
  • This includes defining SRE vision, implementing monitoring and alerting strategies, driving automation, supporting incident and problem management, and collaborating with development teams to deliver reliable and scalable systems.
Requirement/Must Have:
  • Set vision for SRE product base monitoring, alerting, self-healing, reliability testing.
  • Lead cross-functional collaborations to define and implement best practices for monitoring, logging, and incident response.
  • Function as a portfolio Subject Matter Expert (SME) for supported applications, understanding core functionalities, components, and infrastructure.
  • Actively participate in deploying software applications, automation tools, and IT infrastructure.
  • Work closely with development teams to understand code changes and their impact on production environments, ensuring new releases meet reliability standards.
  • Drive transformation by automating existing SRE processes and increasing operational efficiency.
  • Guide the technical direction for future deployments, advocating for reliability and performance improvements.
  • Lead incident management, problem management, and fulfillment of RCA action items.
  • Debug production issues across services and layers of the stack and provide primary operational support.
  • Perform occasional off-hours support.
Experience:
  • Bachelor s degree in Computer Science, Electrical/Electronics Engineering, or related field, or equivalent experience.
  • 3+ years of IT experience in software development, maintenance, SRE, or DevOps engineering.
  • 1+ year experience building Java Spring Boot applications and REST API development.
  • Experience with relational databases (MSSQL Server, MySQL/MariaDB, SingleStore) and in-memory/distributed databases.
  • Experience with containerization platforms (Docker) and container orchestration tools (Kubernetes, OpenShift, Azure Kubernetes Service preferred).
  • Solid Git skills with experience using CI tools (Jenkins, UCD).
  • Experience working on Windows and Linux-based infrastructure.
  • 1+ year developing cloud-native applications using Java or Python.
  • Proficiency writing SQL queries and optimizing database performance.
  • Experience with centralized logging solutions (Splunk, ELK preferred) and active monitoring systems (Dynatrace, etc.).
  • Experience deploying and operating cloud-native applications in Private OpenShift or public cloud (Azure/AWS preferred).
  • Strong communication skills for proactive status updates on projects and production issues.
  • Self-starter, motivated, resourceful, and able to work with cross-functional teams in large enterprises.
  • Financial Services domain knowledge, preferably Capital Markets or Wealth Management.
Should Have/Nice to Have:
  • Experience implementing dashboards for logs, instrumentation, and performance monitoring (Grafana preferred).
  • Exposure to data warehouses (Informatica, Snowflake, Databricks) and business intelligence tools (SAP BO or similar).
  • Experience creating runbooks, processes, and test plans around reliability and performance of infrastructure and applications.
  • Exposure to tools such as PagerDuty, Postman, ServiceNow, SonarQube, NexusIQ, and Vault.
  • Experience with event brokers (Kafka, IBM MQ), Mainframe environments, or disaster recovery exercises.
Skills:
  • Strong troubleshooting and problem-solving skills across distributed systems and applications.
  • Ability to automate tasks, streamline processes, and improve operational efficiency.
  • Expertise in monitoring, logging, incident response, and reliability engineering principles.
  • Ability to work independently and collaboratively in dynamic, high-performing teams.
Qualification And Education:
  • Bachelor s degree in a relevant field or equivalent experience.
  • A comprehensive Total Rewards Program including bonuses, flexible benefits, competitive compensation, and stock where applicable.
  • Leaders who support candidates development through coaching and managing opportunities.
  • Opportunities to make a lasting impact and work in a dynamic, collaborative, and high-performing team.
  • Access to world-class training programs in financial services.
  • Challenging work in a progressive environment that values growth and teamwork.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.