Site Reliability Engineer (SRE)

Overview

Hybrid
Depends on Experience
Full Time

Skills

Ansible
Azure DevOps
Collaborate
DBMS
Datadog
DevOps
IP networking
Java
Linux
NoSQL
Prometheus
SDLC
SLA
TCP / IP
TCP / IP networking
Terraform
communication skills
containerization
e - commerce
planning
problem solver
verbal communication
virtualization

Job Details

  • Responsibilities:
    We are looking for an experienced Site Reliability Engineer who will be responsible for the stability and reliability of the client platform:

    - Building reliable, observable, and predictable applications by implementing SR practices at each stage of SDLC
    - Take care of production environments by troubleshooting failures alongside with practicing postmortems to prevent incidents in future
    - Establishing a monitoring solution to bring off the infrastructure and application observability
    - Bringing the fulfillment of "everything-as-code" approach
    - Continuously automating routine operations
    - Collaborate with other teams and client to find the best solutions
  • Mandatory Skills Description:
    Mandatory work in the office 2 days per week.
    - Production support experience as developer for e-commerce platform
    - Strong knowledge and experience in Java - SRE experience - Scripting experience - 5+ years of experience with administrating Linux and at least 2 years in supporting production environments; - Experience with designing large-scale distributed solutions accompanied with it's capacity planning;
    - Deep understanding of TCP/IP networking;
    - Familiar with SLA, SLO, and SLI terms;
    - Experience with monitoring and alerting tools like Grafana, Datadog, Prometheus etc;
    - Strong knowledge of virtualization and containerization principles including orchestration tools;
    - Familiar with CaC and IaC tools (Ansible, Salt, Terraform, Packer);
    - Familiar with CI/CD tools (Jenkins, Azure DevOps);
    - Experience with relational and NoSQL DBMS
    - A clear understanding of Agile and DevOps culture and what kind of problem they intended to solve;
    - Strong written and verbal communication skills;
    - Understanding of information security principles;
    - Understanding of popular deployment strategies (Feature flags, Blue/Green, Canary, Dark launch, etc);
    - "Critical thinker" and "problem solver"
  • Nice-to-Have Skills:
    - Experience working with Azure
    - Previous experience of working in SRE teams;

About Luxoft USA Inc