Lead Site Reliability Engineer

Overview

Full Time

Skills

Systems Engineering
FOCUS
Scalability
Business Intelligence
Messaging
Data Storage
Firmware
Disaster Recovery
Dashboard
Provisioning
Release Management
System Security
Technical Support
Incident Management
DevOps
Cloud Computing
Kubernetes
Docker
Linux
Grafana
.NET
ASP.NET
Cosmos-Db
MongoDB
SQL
API
Microsoft SQL Server
Terraform
Continuous Integration
Continuous Delivery
Computer Networking
Communication
Management
IoT
Leadership Development
Soft Skills
Google Cloud
Google Cloud Platform
Microsoft Azure
Amazon Web Services
LinkedIn
English
Collaboration

Job Details

Join our dynamic team as a Lead Site Reliability Engineer! If you have a substantial background in software and systems engineering and a focus on reliability and scalability in cloud environments, your expertise is needed in managing and communicating with IoT devices via our platform. You will have a critical role in duties such as device registration and connection, bi-directional messaging between devices and the cloud, device state tracking and data storage, issuing alerts and notifications for device state changes, and integrating other cloud services like Device Registry and Firmware Upgrade.

To discover more about Cloud practice at EPAM Georgia, visit this page .

This position offers remote setup with the flexibility to work from any location in Georgia, whether it's your home, well-equipped offices in Tbilisi and Batumi or a coworking space in Kutaisi.

#LI-DNI#LI-VA2

Responsibilities
  • Design, implement, and maintain highly scalable and available systems across Azure cloud architectures
  • Regularly test and implement disaster recovery (DR) plans
  • Configure and enhance monitoring and alerting processes using Prometheus, Grafana, and OpsGenie
  • Develop dashboards to visualize system performance and reliability metrics
  • Use Terraform for infrastructure provisioning and management
  • Support the development team in ongoing projects
  • Communicate with the customers DevOps team to discuss requirements and collaborate on implementations
  • Enhance release management and CI/CD processes
  • Improve system security based on security team recommendations
  • Document system support processes and design, write and test runbooks for operational tasks and incident response

Requirements
  • Minimum 5 years of experience as a DevOps or SRE engineer
  • Proven experience with Azure cloud architectures
  • Proficiency in Kubernetes and Docker/Linux services
  • Familiarity with monitoring tools: Prometheus, Grafana, OpsGenie
  • Experience with .NET Core and ASP.NET Core applications
  • Strong knowledge of Cosmos DB (both Mongo API & SQL API) and MS SQL Server
  • Expertise in Terraform
  • Experience with CI/CD tools and Azure Networking concepts
  • Excellent communication skills, ability to manage tasks and projects independently
  • Experience with Azure IoT Hub and EventHub is an added advantage

We offer
  • We connect like-minded people:
    • Delivering innovative solutions to industry leaders, making a global impact
    • Enjoyable working environment, whether it is the vibrant office or the comfort of your own home
    • Opportunity to work abroad for up to two months per year
    • Relocation opportunities within our offices in 55+ countries
    • Corporate and social events
  • We invest in your growth:
    • Leadership development, career advising, soft skills and well-being programs
    • Certifications, including Google Cloud Platform, Azure and AWS
    • Unlimited access to LinkedIn Learning and Get Abstract
    • Free English classes with certified teachers
  • We cover it all:
    • Participation in the Employee Stock Purchase Plan
    • Monetary bonuses for engaging in the referral program
    • Comprehensive medical & family care package
    • Five trust days per year (sick leave without a medical certificate)
    • Benefits package (sports activities, a variety of stores and services)

EPAM Georgia is a team of innovators united by a passion for technology. The dynamic and inclusive culture we embrace helps positively impact our communities, clients, and employees. Here you will collaborate with multi-national teams, contribute to numerous cutting-edge projects, deliver the most creative solutions, and have an opportunity to learn. Our people are at the heart of our success, and we are proud to provide talents with a solid ground to develop and grow.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.