Principle Site Reliability Engineer

Principle Site Reliability Engineer
Full Time
$140k - 175k per year

Job Description

job summary:

Our platforms are built on AWS and GCP. We use technologies such as Kafka, Samza, HBase, MySQL, and Postgres. We build and manage our systems using TravisCI, Jenkins, Docker, Kubernetes, Terraform, and Chef. We use a combination of managed and self-hosted approaches. This is a unique opportunity to lead the engineering organization in areas of standardized automated infrastructure and service provisioning and orchestration, service-oriented architectural excellence, and forward-looking planning and execution of large technical projects.



How you will make an impact




  • Define a roadmap for all engineering teams to utilize fully automated, self-service, highly scalable, cost-efficient, observable, auditable and reliable infrastructure services as standard practice
  • Drive the execution of this roadmap across the engineering organization, collaborating with SREs and senior engineers across engineering while also performing hands-on work on the most critical challenges
  • Provide expert technical guidance and ongoing engineering design review to teams planning and implementing large migrations, service-oriented architecture, broad architectural shifts, and capacity growth
  • Build a metrics-driven operational culture standardizing our practices for SLO definition and review as well as for logging, monitoring, alerting, and on-call practices
  • Make iterative improvements to blameless incident management processes, root cause analyses, outage prevention, and service recovery strategies across the engineering organization
  • Partner closely with Security, Quality, and Product teams to achieve high priority security, privacy, compliance, reliability, and business-continuity objectives on our overall roadmap
  • Propose and drive large improvements to production systems to achieve a significant impact to our business and engineering teams
  • Mentor and coach engineers to be curious and effective at discovering and solving technical challenges

Qualifications


  • You have proven experience (10+ years) demonstrating hands-on technical leadership and business impact in combining software engineering skills with systems engineering skills to solve complex automation and reliability challenges
  • You have deep technical experience with various cloud providers, containerization technologies, automated deployment frameworks, orchestration frameworks, monitoring, logging, alerting, system internals, networking, databases, distributed systems, and service-oriented architecture
  • You have the skills to implement load, stress, performance, and reliability testing standards at scale to improve service, platform, and infrastructure resiliency
  • You promote openness, diversity of opinions, and inclusive discussions at all times to evaluate a wide variety of ideas and perspectives in solving challenging problems
  • You demonstrate clear decision making and good trade-offs in complex situations comprising multiple opinions, needs, teams, technologies, cloud providers, and architectural settings
  • You communicate effectively with stakeholders ranging from executives to junior engineers across the breadth and depth of the engineering organization
  • You exemplify high accountability, integrity, and resilience to maintain focus on both big-picture goals and milestones to get there
  • You enable the engineering organization to innovate and deliver with greater speed and safety

 

location: Nashua, New Hampshire

job type: Permanent

salary: $140,000 - 175,000 per year

work hours: 8am to 5pm

education: Bachelors

 

responsibilities:


  • Define a roadmap for all engineering teams to utilize fully automated, self-service, highly scalable, cost-efficient, observable, auditable and reliable infrastructure services as standard practice
  • Drive the execution of this roadmap across the engineering organization, collaborating with SREs and senior engineers across engineering while also performing hands-on work on the most critical challenges
  • Provide expert technical guidance and ongoing engineering design review to teams planning and implementing large migrations, service-oriented architecture, broad architectural shifts, and capacity growth
  • Build a metrics-driven operational culture standardizing our practices for SLO definition and review as well as for logging, monitoring, alerting, and on-call practices
  • Make iterative improvements to blameless incident management processes, root cause analyses, outage prevention, and service recovery strategies across the engineering organization
  • Partner closely with Security, Quality, and Product teams to achieve high priority security, privacy, compliance, reliability, and business-continuity objectives on our overall roadmap
  • Propose and drive large improvements to production systems to achieve a significant impact to our business and engineering teams
  • Mentor and coach engineers to be curious and effective at discovering and solving technical challenges



 

qualifications:


  • Experience level: Experienced
  • Minimum 10 years of experience
  • Education: Bachelors
 

skills:
  • Reliability
  • SRE (6 years of experience is required)



Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.

Dice Id : cxsapwma1
Position Id : 854004
Originally Posted : 2 months ago
Have a Job? Post it

Similar Positions

Lead Back-End Developer - Site Reliability Operations
  • BNY Mellon Corporation
  • Wellesley, MA, USA
Senior Site Reliability Engineer (SRE)
  • SS & C Technologies Inc
  • Waltham, MA, USA
Senior Site Reliability Engineer
  • SS & C Technologies Inc
  • Waltham, MA, USA
Manager Data Engineer
  • Capital One
  • Cambridge, MA, USA
Senior Software Reliability Engineer
  • Jobot
  • Boston, MA, USA
Distinguished Data Engineer
  • Capital One
  • Boston, MA, USA
Full Stack Engineer- Level 5 CST
  • Apex Systems
  • Boston, MA, USA
Configuration Engineer
  • Apex Systems
  • Billerica, MA, USA