job summary:
We are seeking a Staff Site Reliability Engineer to lead the evolution of our platform infrastructure as we navigate rapid Series B growth. Operating as a senior individual contributor, you will partner closely with engineering and product teams to design, build, and operate systems that are reliable, observable, and built to scale.
As the team has scaled rapidly from 35 to over 125 employees, we need a senior engineering presence who can bring operational maturity and structure to a fast-moving, dynamic environment. You will own our infrastructure direction, shape how we ship software, and set the standard for operational excellence without becoming a gatekeeper.
location: New York, New York
job type: Permanent
salary: $180,000 - 200,000 per year
work hours: 8am to 5pm
education: No Degree Required
responsibilities:
What You'll Own & Do
- Platform Evolution: Lead the AWS infrastructure direction, including the migration from ECS/Fargate toward a more modern, scalable runtime.
- Developer Experience: Mature CI/CD systems with a strong emphasis on safety, speed, and self-serve automation.
- Observability & Reliability: Define reliability standards, error budgets, and SLOs; improve alerting and incident response processes.
- Technical Leadership: Act as a floating, senior technical presence to mentor a growing, relatively junior engineering team and instill SRE best practices.
- Incident Response: Lead high-severity incidents, drive clear postmortems, and foster a blameless engineering culture.
Key Skills & Core Competencies
- Cloud Infrastructure: Expert-level AWS ecosystem management (ECS, Fargate, EC2).
- Infrastructure as Code (IaC): Deep experience with modern IaC tools to migrate and mature runtimes.
- CI/CD & Automation: GitHub Actions, automated deployment pipelines, and preview/ephemeral environments.
- Observability: Metrics, logs, and tracing tools; setting up alert hygiene, dashboards, and SLO/SLI frameworks.
- Application Development: Strong software engineering capabilities in TypeScript and Python.
qualifications:
- 10+ years of experience in SRE, infrastructure, or backend engineering roles.
- Proven track record operating within a Series B scale company (roughly 100-200 employees/customers). Must understand how to bring structure and maturity to a fast-growing, slightly ambiguous environment.
- Deep, hands-on experience running and migrating distributed systems in production at scale on AWS.
- Proficiency in at least one modern language, with a strong ability to write code and automate away toil.
- Experience guiding and mentoring junior-to-mid-level engineers without acting as a direct people manager.
- Onsite work 5 days a week in NYC required.
Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.
At Randstad Digital, we welcome people of all abilities and want to ensure that our hiring and interview process meets the needs of all applicants. If you require a reasonable accommodation to make your application or interview experience a great one, please contact
Pay offered to a successful candidate will be based on several factors including the candidate's education, work experience, work location, specific job duties, certifications, etc. In addition, Randstad Digital offers a comprehensive benefits package, including: medical, prescription, dental, vision, AD&D, and life insurance offerings, short-term disability, and a 401K plan (all benefits are based on eligibility).
This posting is open for thirty (30) days.
![]()