Site Reliability Engineer

Overview

On Site
Depends on Experience
Full Time
No Travel Required

Skills

infrastructure
Linux
Python
CI/CD
Chinese

Job Details

This role also open for junior (3+ yoe) candidates, and SRE lead (7+ yoe).

Site Reliability Engineering(SRE) team combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. In our team, you ll have the opportunity to manage the complex challenges of scale, while using expertise in coding, algorithms, complexity analysis, and large-scale system design. We embrace a culture of diversity, intellectual curiosity, openness, and problem-solving. We encourage close collaboration while promoting self-direction.

Responsibilities

  • Engage in and improve the whole lifecycle of services from inception and design, throughout development, capacity planning, and launch reviews, to deployment, operation, and refinement
  • Design and implement software platforms and monitor frameworks for efficient, automated, and intelligent service-oriented architecture (SOA) governance
  • Scale systems sustainably through mechanisms such as automation; evolve systems reliability, efficiency, and velocity by pushing for changes
  • Practice sustainable user support, incident response, and blameless postmortems.

Qualifications
Bachelor's degree in Computer Science or a related technical field with 5+ years of experience

  • Experience programming in one of the following Languages: C, C++, Java, Python, Go, and Rust
  • Familiar with Unix/Linux system internals, networking, and distributed systems
  • [Preferred] Experience in MySQL, Redis, Ngnix, Kubernetes, Docker, OpenStack, Hadoop, Spark, Flink, etc.
  • [Preferred] Experience in designing and analyzing large-scale distributed systems
  • [Preferred] Strong skills in problem solving and communication