Site Reliability Engineer

Blue Bell, PA, US • Posted 2 days ago • Updated 8 hours ago
Full Time
On-site
Fitment

Dice Job Match Score™

📊 Calculating match score...

Job Details

Skills

  • IT Security
  • Scalability
  • Operational Excellence
  • Problem Solving
  • Conflict Resolution
  • Provisioning
  • Dashboard
  • Production Support
  • DevOps
  • Software Engineering
  • Scripting
  • FOCUS
  • Terraform
  • Ansible
  • Cloud Computing
  • Amazon Web Services
  • Google Cloud
  • Google Cloud Platform
  • Kubernetes
  • Scripting Language
  • Python
  • Bash
  • Software Development
  • Change Management
  • Dynatrace
  • Continuous Integration
  • Continuous Delivery
  • Incident Management
  • Communication
  • Collaboration
  • Artificial Intelligence
  • Documentation
  • Audio Engineering
  • Apache Kafka
  • Java
  • Customer Facing
  • JIRA
  • Git
  • Workflow
  • Auditing

Summary

Job Description

Locations and Workstyle:

Blue Bell, PA: Primarily remote; candidates should be within commuting distance of the Blue Bell office and able to work onsite as needed. Option to come onsite more frequently if desired.
Irving, TX and Boca Raton, FL: Hybrid schedule - onsite a minimum of four days per week, with one remote day. Five days onsite may be required based on business needs.

What You'll Do:
  • Work closely with Infrastructure and Development teams to keep the ADT platform running and customers protected, while collaborating with cross-functional partners (IT, Security, DevOps, Engineering) to improve operational health and apply SRE best practices
  • Support the reliability, availability, scalability, and performance of large-scale distributed systems
  • Drive operational excellence through problem-solving, performance improvements, and resilient production environments
  • Use tools such as Terraform, Ansible, Kubernetes, and Dynatrace to support mission-critical applications
  • Work within cloud environments (AWS, Google Cloud Platform) and Kubernetes-based infrastructure, with guidance on complex design decisions
  • Identify performance bottlenecks and reliability gaps, and implement improvements
  • Build and maintain infrastructure as code (Terraform, Ansible) for provisioning, configuration, patching, and releases
  • Contribute to observability and monitoring (Dynatrace, Prometheus), including dashboards, alerts, runbooks, and tuning
  • Support software releases, including validation, rollback planning, and post-change verification across ADT+ and legacy platforms
  • Provide production support, including on-call participation, incident response, remediation follow-through, and support for customer-impacting issues during major incidents

What You'll Need:
  • 3+ years of experience in SRE, DevOps, platform engineering, software engineering, or related roles with production and on-call responsibility
  • Background in systems or operations with progression toward engineering work (automation, scripting, IaC, observability)
  • Focus on production operations and reliability for distributed applications
  • Experience with infrastructure as code (Terraform, Ansible), including building and maintaining environments
  • Experience working in cloud environments (AWS and/or Google Cloud Platform)
  • Familiarity with Kubernetes in production environments
  • Proficiency in at least one programming or scripting language (Python, Java, Bash, or similar), including working with existing codebases
  • Understanding of software development and change management practices
  • Experience with monitoring and observability tools (Dynatrace, Prometheus, or similar)
  • Ability to diagnose and resolve production issues with sound judgment around risk, rollback, and escalation
  • Experience with CI/CD pipelines and automation tools
  • Familiarity with incident response and post-incident follow-up
  • Strong communication skills and ability to collaborate across teams
  • Comfortable learning complex systems and seeking guidance when needed
  • Comfortable using AI tools to accelerate investigation, automation, and documentation while maintaining sound engineering judgment

Preferred Qualifications:
  • Experience with Kafka, Java/JVM ecosystems, or large customer-facing platforms
  • Experience with security remediation at scale (patch SLAs, CVE response, OS upgrades)
  • Experience working with Jira-driven workflows and cross-team escalation
  • Familiarity with Harness, enterprise Git workflows, and audit-driven change controls
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10507796
  • Position Id: 3094e4fc0902f6f1fa2057c281860d07
  • Posted 2 days ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Philadelphia, Pennsylvania

Today

Contract

USD 150,000.00 - 180,000.00 per year

Berkeley Heights, New Jersey

Today

Contract

USD 70.00 - 80.00 per hour

Berkeley Heights, New Jersey

16d ago

Easy Apply

Full-time

$50 - $55

Remote or California

Today

Full-time

USD 140,000.00 - 180,000.00 per year

Search all similar jobs