Site Reliability Engineer

Site Reliability Engineer
Full Time
$110k - 125k per year

Job Description

job summary:



As a Site Reliability Engineer (SRE), you'll help build and scale our production services for performance, reliability, and security. You'll collaborate with cross-functional product squads to identify, design, and implement patterns that allow our applications to be scalable and observable. You're passionate about distributed systems and you'll find ways to optimize the logs, metrics, and signals they can generate.


What you'll be doing:



    • Establish and maintain Service-Level Objectives (SLOs) for production services
    • Implement and tune alerts for critical production services
    • Lead incident response, troubleshooting, root-cause analysis, and postmortems
    • Ensure sufficient service capacity by conducting regular capacity planning reviews
    • Contribute to software and systems architecture design
    • Identify and automate manual tasks related to software development, deployment, and operations


We're looking for someone who has:



    • Proficiency with observability tools
    • Experience supporting Java-based distributed services in production
    • Experience running production workloads in AWS
    • Experience with building and deploying Docker containers
    • A basic understanding of network routing and load-balancing technologies
    • The ability to communicate and collaborate effectively with Product Owners, Developers, QA, Operations, and Security Engineers


Highly preferred candidates also have:



    • Experience developing and operating services written in Java, Go, PHP, or Javascript
    • Experience running production workloads on Kubernetes at scale
    • Hands-on experience designing, deploying and troubleshooting AWS services such as EC2, EKS, IAM, S3, Elasticache, ALB/NLBs, or Route53
    • Hands-on experience with using and operating observability tools: Grafana, Prometheus, Zipkin, Kibana, Elasticsearch, Fluentd
    • Experience with designing and conducting load, performance, or stress tests.
    • Proficiency with scripting and automation (bash, ansible, python)
    • Proficiency with Linux systems administration


 

location: Jersey City, New Jersey

job type: Permanent

salary: $110,000 - 125,000 per year

work hours: 8am to 5pm

education: Bachelors

 

responsibilities:


  • Establish and maintain Service-Level Objectives (SLOs) for production services
  • Implement and tune alerts for critical production services
  • Lead incident response, troubleshooting, root-cause analysis, and postmortems
  • Ensure sufficient service capacity by conducting regular capacity planning reviews
  • Contribute to software and systems architecture design
  • Identify and automate manual tasks related to software development, deployment, and operations



 

qualifications:


  • Experience level: Experienced
  • Minimum 5 years of experience
  • Education: Bachelors
 

skills:
  • Reliability
  • Python (4 years of experience is required)
  • bash (3 years of experience is preferred)
  • AWS services (4 years of experience is required)



For certain assignments, Covid-19 vaccination and/or testing may be required by Randstad's client or applicable federal mandate, subject to approved medical or religious accommodations. Carefully review the job posting for details on vaccine/testing requirements or ask your Randstad representative for more information.



Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.

Dice Id : cxsapwma1
Position Id : 870027
Originally Posted : 2 months ago
Have a Job? Post it

Similar Positions

Senior Software Engineer, Backend
  • Jobot
  • New York, NY, USA
Azure Site Reliability Engineer
  • Jobot
  • New York, NY, USA
AWS Site Reliability Engineer
  • Jobot
  • New York, NY, USA
Application Support Engineer
  • Apex Systems
  • New York, NY, USA
Site Reliability Engineer - Remote
  • ConsultNet, LLC
  • New York, NY, USA
Application Support Engineer (with Java)
  • BNY Mellon Corporation
  • New York, NY, USA
Site Reliability Engineer
  • Randstad Technologies
  • Mount Arlington, NJ, USA
DevOps Engineer- AWS- Remote
  • Jefferson Frank
  • New York, NY, USA
Sr. Network Operations Engineer
  • Randstad Technologies
  • Wayne, NJ, USA