System Reliability Engineer (SRE) Kubernetes & Automation

Overview

Remote
Depends on Experience
Contract - W2
Contract - Independent

Skills

Production support
issue troubleshooting
root cause analysis
Kubernetes
container orchestration
Linux
Unix
performance analysis
NMON
log analysis
monitoring
observability tools
Prometheus
Grafana
ELK Stack
Python
Go
Java
Node.js
automation frameworks
scripting
database administration
Oracle
SQL Server
performance tuning
analytical skills
problem-solving
Terraform
Ansible
CI/CD
Jenkins
GitLab CI
CircleCI
AWS
Azure
GCP

Job Details

Job Title: System Reliability Engineer (SRE) – Kubernetes & Automation 
Location: Canada (Remote)
Duration: Long Term Contract


Job Description:

We are seeking a highly skilled and motivated System Reliability Engineer (SRE) to ensure the reliability, performance, and scalability of production systems. The ideal candidate will have strong troubleshooting skills, deep expertise in container orchestration (Kubernetes), and a proactive approach to problem-solving.


Key Responsibilities:

  • Troubleshoot and resolve complex production issues to maintain system uptime and performance.
  • Design, implement, and manage scalable Kubernetes-based infrastructure.
  • Analyze logs and metrics to monitor system health and proactively identify issues.
  • Collaborate with development and operations teams to enhance system reliability and automation.
  • Develop and maintain tools for deployment, monitoring, and operations.
  • Administer and tune database systems (Oracle, SQL Server) for optimal performance.
  • Drive continuous improvement in infrastructure and operations practices.


Qualifications:

Must Have:

  • 5–7 years of experience in Site Reliability Engineering (SRE) or DevOps roles.
  • Strong hands-on experience with production support, issue troubleshooting, and root cause analysis.
  • Expertise in Kubernetes and container orchestration tools.
  • In-depth knowledge of Linux/Unix systems and performance analysis tools (e.g., NMON).
  • Experience with observability and monitoring tools such as Prometheus, Grafana, ELK Stack or similar.
  • Proficiency in at least one programming language: Python, Go, Java, or js.
  • Proven experience in developing automation frameworks and scripts.
  • Solid understanding of database administration and performance tuning (Oracle, SQL Server).
  • Strong analytical and problem-solving skills with a proactive mindset.

Nice to Have:

  • Experience with infrastructure-as-code (Terraform, Ansible).
  • Knowledge of CI/CD pipelines and tools (Jenkins, GitLab CI, CircleCI).
  • Exposure to cloud platforms (AWS, Azure, Google Cloud Platform).


Key Skills:

Site Reliability Engineering, Kubernetes, Linux/Unix, Prometheus, Grafana, ELK Stack, Automation, Python, Go, Java, Node.js, Terraform, Ansible, CI/CD, Oracle, SQL Server, Cloud Platforms


VDart Group
, a global leader in technology, product, and talent management, empowers businesses with comprehensive solutions through our four distinct, industry-leading business units. With a diverse team of over 4,000 professionals across 13 countries, we deliver strong results across various industries, including Fortune 500 companies.

Leveraging our deep expertise as a global provider of resources and solutions, we serve a wide range of industry verticals, including BFSI, Automotive, Healthcare, Mobility, Energy, Life Sciences, Manufacturing, Consumer Industries, and Technology.

Committed to "People, Purpose, Planet," we prioritize social responsibility and sustainability, as evidenced by our EcoVadis Bronze Medal Certification and participation in the UN Global Compact.

Our dedication to delivering strong results has earned us recognition as a trusted advisor for businesses seeking to drive innovation and growth, including many Fortune 500 companies.

Join our network! Partner with VDart Group to leverage our global network, industry expertise, and proven track record with a diverse clientele.

 

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About VDart, Inc.