Overview
On Site
USD 104,000.00 - 156,000.00 per year
Full Time
Skills
Problem Solving
Scalability
Provisioning
Configuration Management
Real-time
Reliability Engineering
DevOps
Terraform
Ansible
Cloud Computing
Google Cloud Platform
Google Cloud
Amazon Web Services
Kubernetes
Programming Languages
Python
Java
C
C++
Ruby
JavaScript
Software Development
Change Management
Dynatrace
Management
IaaS
FOCUS
Operational Excellence
Communication
Collaboration
Mentorship
Continuous Integration
Continuous Delivery
Incident Management
Health Care
Life Insurance
Job Details
Job Description
ADT is in the process of transitioning to a hybrid in-office work model, which combines the best of in-office and remote work. New team members will work from home, but should plan to return to a hybrid in-office model at a later date. We will keep you well informed and supported throughout the transition. Once our hybrid work policy is in place, you will work from Boca Raton, FL, Irving, TX, or Blue Bell, PA and enjoy the benefits of a balanced work schedule.
ADT's Site Reliability Engineering (SRE) team is seeking talented individuals who want their code to positively influence our customers, the bottom line, and the industry. Our team of engineers works tirelessly to keep the ADT platform running smoothly and our customers protected 24/7.
What You'll Do:
As a member of ADT's SRE team, you will play a critical role in ensuring the reliability, scalability, and performance of our large-scale distributed systems. You will drive operational excellence by proactively identifying and solving problems, improving system performance, and ensuring our production environments remain resilient and efficient. Your expertise in orchestrating and automating complex systems, combined with a focus on improving software release processes and managing large cloud environments, will be key to our ongoing success.
Key Responsibilities:
What You'll Need:
Compensation & Benefits:
The salary range for this role is $104,000.00 - $156,000.00 and is based on experience and qualifications.
Certain roles are eligible for annual bonus and may include equity. These awards are allocated based on company and individual performance.
We offer employees access to healthcare benefits, a 401(k) plan and company match, short-term and long-term disability coverage, life insurance, wellbeing benefits and paid time off among others. Employees accrue up to 120 hours in their first year. Your accrual rate increases after your first year. We also offer 6 paid holidays.
Anticipated application end date will be on 6/25/2025.
ADT is in the process of transitioning to a hybrid in-office work model, which combines the best of in-office and remote work. New team members will work from home, but should plan to return to a hybrid in-office model at a later date. We will keep you well informed and supported throughout the transition. Once our hybrid work policy is in place, you will work from Boca Raton, FL, Irving, TX, or Blue Bell, PA and enjoy the benefits of a balanced work schedule.
ADT's Site Reliability Engineering (SRE) team is seeking talented individuals who want their code to positively influence our customers, the bottom line, and the industry. Our team of engineers works tirelessly to keep the ADT platform running smoothly and our customers protected 24/7.
What You'll Do:
As a member of ADT's SRE team, you will play a critical role in ensuring the reliability, scalability, and performance of our large-scale distributed systems. You will drive operational excellence by proactively identifying and solving problems, improving system performance, and ensuring our production environments remain resilient and efficient. Your expertise in orchestrating and automating complex systems, combined with a focus on improving software release processes and managing large cloud environments, will be key to our ongoing success.
Key Responsibilities:
- Ensure the reliability, availability, and scalability of large-scale distributed systems and applications.
- Provide engineering and operational support for multiple production environments, ensuring uptime and minimal downtime.
- Identify performance bottlenecks, reliability issues, and areas for improvement, and implement solutions proactively.
- Develop and manage infrastructure as code using tools like Terraform and Ansible to automate cloud resource provisioning and configuration management.
- Manage cloud environments (AWS, Google Cloud Platform) and work with Kubernetes-based infrastructure.
- Implement and manage observability and monitoring solutions (e.g., Dynatrace, Prometheus) to provide real-time insights and identify issues.
- Contribute to and improve software release processes, ensuring smooth deployments and minimal disruption to production systems.
- Collaborate with cross-functional teams to enhance the operational health of our systems.
- Mentor junior SREs and help develop best practices.
What You'll Need:
- 5+ years in Site Reliability Engineering, DevOps, or related roles.
- Strong focus on tactical operations and experience managing large-scale distributed software applications.
- Solid experience with infrastructure as code (Terraform, Ansible).
- Proven experience with cloud environments such as Google Cloud Platform and AWS.
- Expertise in managing and optimizing Kubernetes clusters for large-scale deployments.
- Proficiency in one or more programming languages, such as Python, Java, C/C++, Ruby, or JavaScript.
- Strong understanding of software development and change management processes.
- Experience with monitoring and observability platforms like Dynatrace, Prometheus, or similar tools.
- In-depth experience managing dynamic, scalable cloud infrastructure and distributed systems.
- Ability to diagnose and resolve complex system issues with a focus on operational excellence.
- Strong communication skills with the ability to collaborate across teams and mentor junior engineers.
- Comfortable with ambiguity and complex systems, with the ability to handle challenges with confidence.
- Experience in CI/CD pipelines and automation tools.
- Familiarity with incident response processes and post-mortem analysis.
Compensation & Benefits:
The salary range for this role is $104,000.00 - $156,000.00 and is based on experience and qualifications.
Certain roles are eligible for annual bonus and may include equity. These awards are allocated based on company and individual performance.
We offer employees access to healthcare benefits, a 401(k) plan and company match, short-term and long-term disability coverage, life insurance, wellbeing benefits and paid time off among others. Employees accrue up to 120 hours in their first year. Your accrual rate increases after your first year. We also offer 6 paid holidays.
Anticipated application end date will be on 6/25/2025.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.