Title: Engineering Program Management 3
Role: Technical Program Manager
Location: San Diego, CA (92127)
Job Description
We are seeking an experienced
Technical Program Manager (TPM) to support large-scale, highly available commerce and payment systems in a cloud-based environment. This role focuses on driving Site Reliability Engineering (SRE) initiatives that ensure availability, resiliency, scalability, and performance across a complex portfolio of services.
As part of a reliability-focused engineering organization, you will work closely with service development, platform, security, and global operations teams to enable seamless delivery of new technical and customer-facing features. You will act as a technical leader, proactively identifying opportunities to improve processes, systems, and operational excellence in a fast-paced environment.
The ideal candidate brings a strong balance of technical depth, program management expertise, and interpersonal skills, and is comfortable leading large, cross-functional programs.
Responsibilities
- Drive delivery of SRE and reliability initiatives across 90+ commerce and payment-related services in a cloud (AWS) environment.
- Lead technical and operational programs in a fast-paced, collaborative, and distributed engineering organization.
- Manage delivery schedules, milestones, dependencies, and risks across multiple teams and products.
- Partner with engineering teams specializing in platform hosting, Kubernetes, CI/CD, and data services to improve application resiliency and performance.
- Advocate for and implement continuous improvement in processes, tooling, and automation.
- Simplify and clearly communicate complex technical problems to both technical and non-technical stakeholders, including leadership.
- Act as a bridge between engineers and decision-makers, enabling alignment and informed decision-making.
- Promote and enforce core SRE principles, including availability, resiliency, observability, capacity planning, elasticity, supportability, and automation.
- Participate in a rotational on-call program, responding to and resolving production incidents.
- Conduct, document, and present root cause analyses and post-incident reviews to drive learning and prevention.
Qualifications
- Bachelor's degree in Computer Science, Engineering, or a related technical field.
- 3+ years of experience in Technical Program Management within an agile software development environment.
- 3+ years of hands-on experience with AWS, including building, integrating, or managing applications.
- 3+ years of experience working in a highly visible or mission-critical production software environment.
- 3+ years of hands-on software engineering or systems support experience (e.g., Java and/or C++ services).
- Experience with containerization and orchestration technologies (Docker, Kubernetes, EKS).
- Experience with observability and monitoring tools (e.g., Datadog, CloudWatch, Splunk).
- Proven ability to manage cross-functional projects, dependencies, and release schedules.
- Strong analytical skills with a high sense of ownership and accountability.
- Excellent written and verbal communication skills.
- Strong organizational, problem-solving, and judgment abilities.
Skills
- Agile / Scrum methodologies
- Technical program and project management
- Excellent verbal and written communication
- Cloud, reliability, and infrastructure concepts (technical skills listed above are a plus)