Overview
On Site
175k - 225k
Full Time
Skills
Scalability
Software Design
Performance Metrics
Service Level
Root Cause Analysis
Innovation
Emerging Technologies
Boost
Mentorship
Software Engineering
DevOps
Reliability Engineering
JavaScript
TypeScript
Python
Database Performance Tuning
Optimization
Systems Design
Continuous Integration
Continuous Delivery
Configuration Management
Grafana
Orchestration
Kubernetes
Cloud Computing
Linux
Virtualization
Computer Networking
Firewall
Amazon Web Services
Provisioning
Command-line Interface
Computer Science
SAP BASIS
Job Details
Site Reliability Engineer
As the Senior or Staff SRE on the Platform Engineering team, you'll be joining at a foundational stage and play a key role in building and shaping a secure, resilient, and high-performance platform that powers engineering capabilities.
The company is located in New York and will remain 100% remote.
What You Will Be Doing:
This position doesn't provide sponsorship.
As the Senior or Staff SRE on the Platform Engineering team, you'll be joining at a foundational stage and play a key role in building and shaping a secure, resilient, and high-performance platform that powers engineering capabilities.
The company is located in New York and will remain 100% remote.
What You Will Be Doing:
- Drive Platform Excellence: Continuously improve the platform's reliability, scalability, and deployment efficiency through innovative solutions and resilient system design.
- Build Advanced Observability Solutions: Design, implement, and maintain comprehensive observability and monitoring frameworks to ensure system health, availability, and reliability.
- Establish and Track Key Performance Metrics: Develop and monitor Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to define and measure system performance benchmarks.
- Resolve Complex Issues and Perform Root Cause Analysis: Respond swiftly to critical incidents, troubleshoot sophisticated system and application problems, and conduct detailed root cause analyses to implement long-term solutions.
- Lead with Innovation: Stay current with industry trends and emerging technologies. Evolve best practices to boost development quality and delivery speed.
- Architect Scalable Systems: Take ownership of designing scalable, fault-tolerant, and distributed systems that meet high standards for performance and reliability.
- Mentor and Advocate: Promote the use of modern technologies and best practices, foster adoption of sound architectural patterns, and provide mentorship to engineering peers across the organization.
- 10-12+ years of experience in software engineering, DevOps, or Site Reliability Engineering (SRE)
- Proficiency in at least two of the following languages: JavaScript, TypeScript, Python, Go
- Strong expertise in diagnosing and resolving issues in complex distributed systems
- Deep understanding of database performance tuning and optimization best practices
- Proven ability to innovate and drive the adoption of new tools, processes, and standards
- Strong skills in system design and cloud-native architecture
- Expertise in CI/CD pipelines, configuration management, automation, and monitoring
- Advanced understanding of observability practices and tools such as ELK, Datadog, OpenTelemetry, Prometheus, and Grafana
- Experience with deployment and orchestration tools like AWS ECS, Kubernetes, Cloud Run, etc.
- Solid knowledge of Linux systems, virtualization, networking, VPCs, firewalls, and security configurations
- Hands-on experience with AWS services and infrastructure provisioning through CLI, APIs, or Infrastructure as Code (IaC)
- Bachelor's degree in Computer Science or a related technical field, or equivalent practical experience
This position doesn't provide sponsorship.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.