Job DescriptionAssume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.
As a Lead Site Reliability Engineer at JPMorgan Chase within the Enterprise Technology team, you hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers.
Job responsibilities- Demonstrate and champion SRE culture and practices, with a strong emphasis on blameless learning and continuous improvement through structured review workflows.
- Lead the design and implementation of a minimum viable operational learning data model that links incident records, RCA outputs, problem records, support tickets, customer signals, and observability/telemetry into a coherent review dataset.
- Build and maintain robust pipelines and transformations that expose recurring patterns, operational toil themes, systemic issue categories, and avoidable demand signals across heterogeneous sources.
- Develop lightweight, workflow-supporting data products that translate operational events into actionable learning, clear ownership handoffs, and trackable prevention work-without creating reporting overhead.
- Partner with Support, SRE, and Engineering leaders to define required data fields, taxonomies, classifications, and handoff structures that make review outputs measurable and decision-useful.
- Design mechanisms to distinguish one-off events from recurring classes of failure, enabling recurrence detection, prioritization, and prevention roadmap focus.
- Establish practical data quality standards and lightweight governance (e.g., field definitions, lineage, stewardship, access patterns) for multi-source operational learning datasets.
- Safeguard blameless review practices by ensuring outputs promote learning and improvement rather than punitive reporting; embed "just culture" norms into data and workflow design.
- Translate loosely defined operational problems into structured datasets and decision-support views that are clearly useful to both operators and leadership.
- Document data models, assumptions, transformation logic, and operating procedures to support maintainability, transparency, and long-term scale.
- Build solutions that can start manual or semi-manual and progressively automate as process maturity grows, integrating over time with enterprise workflow systems such as ServiceNow and Jira.
Required qualifications, capabilities, and skills- Formal training or certification with 5+ years in site reliability engineering in cloud-based environments.
- Strong data modeling instincts and hands-on delivery: proven ability to build durable datasets/products using SQL and at least one programming language such as Python (or equivalent).
- Operational-domain data engineering experience: comfortable working across heterogeneous and imperfect sources (incident/problem records, tickets, RCA artifacts, telemetry) and turning them into reliable, decision-useful outputs.
- Applied use of LLMs/agents, RAG, anomaly detection, and/or automated runbooks to accelerate evidence collection, summarization, and action routing within review workflows.
- Blameless workflow design judgment: ability to design artifacts and reporting that support learning/improvement rather than punitive reporting, and that resist "gaming" behaviors.
- Investigative rigor and evidence integrity: ability to reconstruct precise timelines across systems; produce reproducible, auditable outputs that clearly separate facts, interpretations, and hypotheses.
- RCA fluency: hands-on familiarity with multi-factor RCA techniques (e.g., 5 Whys, causal mapping / fault-tree style thinking) and how to operationalize learning into prevention work.
- Classification/taxonomy design: experience defining controlled vocabularies and classification schemes that enable consistent categorization and downstream actionability.
- Enterprise workflow integration: experience integrating with and/or shaping fields and workflows in platforms such as ServiceNow and Jira to enable action routing, ownership, and tracking.
- Observability literacy: ability to work with telemetry concepts and common tools (e.g., Grafana, Dynatrace, Prometheus, Datadog, Splunk) to incorporate operational signals into review datasets.
- Structured communication and documentation: produces clear model documentation, transformation logic, and operating procedures; can run structured (non-leading) SME/operator interviews and synthesize qualitative inputs into structured data.
Preferred qualifications, capabilities, and skills- Direct experience with incident/problem management and post-incident reviews in SRE operating models.
- Familiarity with service health metrics and review-oriented measures (e.g., MTTD, MTTR) and how to use them responsibly (as signals, not blame).
- RCA fluency: hands-on familiarity with multi-factor RCA techniques (e.g., 5 Whys, causal mapping / fault-tree style thinking) and how to operationalize learning into prevention work.
- Familiarity with structured high-reliability investigation methods (e.g., Bowtie, AcciMap, STPA), peer review/checklists, cross-source corroboration, and cognitive bias mitigation.
- Experience operating in regulated, large-scale environments (e.g., financial services), including comfort with data privacy, retention, and access controls.
- Demonstrated ability to design feedback loops that measure corrective-action impact and reduce recurrence over time.
- Relevant certifications/training (e.g., SRE, reliability/safety, RCA-focused training, Lean/Six Sigma) or equivalent practical credentials.
About UsJPMorganChase, one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world's most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management.
We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation.
JPMorgan Chase & Co. is an Equal Opportunity Employer, including Disability/Veterans
About the TeamOur professionals in our Corporate Functions cover a diverse range of areas from finance and risk to human resources and marketing. Our corporate teams are an essential part of our company, ensuring that we're setting our businesses, clients, customers and employees up for success.