Support / SRE Lead – Digital Platforms (Web & Mobile)

Overview

Hybrid
Depends on Experience
Contract - W2

Job Details

Job Title: Support / Site Reliability Engineering (SRE) Lead – Digital Platforms (Web & Mobile)
Location: Irving, TX (Hybrid Monday - Wednesday On-Site)
Duration: Contract with option to hire

Position Summary:
We’re looking for a dynamic and experienced Support / SRE Lead to oversee the stability, performance, and operational excellence of our digital platforms—spanning both web and mobile applications. This role is ideal for a hands-on leader with a deep technical foundation, a passion for reliable systems, and a track record of building high-performing support teams. You’ll collaborate across engineering, product, and operations to ensure exceptional user experiences and robust, scalable platforms.


Key Responsibilities:

  • Lead day-to-day operations for support and reliability across web and mobile platforms.

  • Serve as the senior escalation point for major incidents, driving rapid and effective resolution.

  • Monitor application health, performance, and availability using modern observability practices.

  • Partner with development, QA, and product teams to support seamless deployments and feature rollouts.

  • Define and enforce best-in-class support practices, including incident response, problem management, and post-incident reviews.

  • Establish, track, and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs).

  • Design and implement scalable monitoring, alerting, and logging frameworks.

  • Develop and maintain automation scripts, CI/CD pipelines, and recovery tools to support system resilience.

  • Maintain a centralized knowledge base for troubleshooting, playbooks, and platform documentation.

  • Drive continuous improvement by analyzing recurring issues and implementing long-term solutions.

  • Collaborate with stakeholders to understand user needs and improve platform usability.

  • Stay up-to-date on technology trends and introduce innovative solutions to enhance system reliability.


Qualifications:

  • 8+ years of experience in application support, technical operations, or site reliability engineering.

  • Minimum 3 years in a leadership role managing support or SRE teams.

  • Deep understanding of web/mobile app architectures, APIs, cloud services (AWS, Azure, or GCP), and databases.

  • Strong experience with incident response, root cause analysis, and ITIL processes.

  • Proficiency with monitoring and alerting tools (e.g., Datadog, Cloudflare), and log analysis.

  • Hands-on experience with ticketing platforms like Azure DevOps, ServiceNow, or Freshservice.

  • Solid grasp of CI/CD pipelines, DevOps practices, and automation tooling.

  • Excellent communication and stakeholder management skills.

  • Proven ability to lead and develop teams in high-pressure, fast-paced environments.

  • Experience working with cross-functional engineering and product teams.


Nice to Have:

  • Industry experience in e-commerce, fintech, healthcare, or media.

  • Exposure to mobile frameworks (Flutter, React Native, Kotlin, Swift).

  • Experience with CMS platforms (WordPress, Drupal, Crownpeak, AEM).

  • Familiarity with front-end frameworks (JavaScript/TypeScript, React).

  • Relevant certifications (e.g., ITIL, AWS, Azure, SRE).

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Optimize Search Group