SRE+Dynatrace - Guadalajara, MX

Remote • Posted 3 days ago • Updated 3 days ago

Full Time

Remote

$60,000 - $80,000/yr

Fitment

Dice Job Match Score™

⭐ Evaluating experience...

Job Details

Skills

SRE
Dynatrace

Summary

Site Reliability Engineers are responsible for ensuring the availability, reliability, scalability, and performance of the firm s most critical, customer-facing microservices that power all eCommerce channels. This role appliesGoogle-inspired SRE principles to balance feature velocity and system reliability using Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets.

The role combines software engineering, cloud engineering, automation, and production operations, with a strong emphasis on building systems that are observable, resilient, and operable by default.

Primary Responsibilities:

Define, implement, and own SLIs, SLOs, and error budgets for critical microservices in collaboration with product and engineering teams.

Use error budgets to influence release decisions, prioritize reliability work, and manage operational risk.

Design and maintain observability platforms including metrics, logs, traces, and real-time telemetry.

Track, manage, and reduce operational toil by converting repetitive operational work into Jira stories and epics with clear ownership and measurable outcomes.

Design, implement, and validate resiliency mechanisms such as graceful degradation, redundancy, automated failover, and disaster recovery.

Lead incident response, act as an escalation point for high-severity incidents and drive blameless postmortems.

Capture incident action items and reliability improvements in Jira, ensuring closure, accountability, and continuous improvement.

Partner with scrum teams to improve reliability through release readiness reviews, production change validation, and testing strategies.

Perform deep root cause analysis, debugging, and performance tuning across distributed systems.

Promote shift-left reliability by embedding operability, monitoring, and failure testing early in the SDLC.

Drive continuous improvement through automation, self-healing systems, chaos engineering, and capacity planning.

Maintain runbooks, playbooks, and knowledge repositories, linking documentation to Jira tasks to reduce MTTR.

Provide technical leadership and mentoring to junior SREs and engineers.

Collaborate with global, distributed teams, leveraging Jira for transparent planning, dependency tracking, and execution.

Core Competencies & Accomplishments:

4+ years of experience in SRE, software engineering, or production operations supporting large-scale eCommerce platforms.

Hands-on experience with Java/J2EE-based distributed systems. React experience is a plus.

Proven ability to design and operate systems using SLO-driven reliability models.

Experience defining and measuring SLIs (availability, latency, error rates, throughput, saturation).

Good understanding with NoSQL technologies and RDBMS. Should be able to write queries to fetch results from database.

Experience deploying and operating services on cloud platforms (AWS, Azure, or Google Cloud).

Expertise with observability, APM, and caching tools (Dynatrace, Splunk, ELK, Akamai, Quantum Metric/Tealeaf, etc.).

Strong experience using Jira for backlog management, incident follow-ups, toil reduction tracking, and cross-team coordination.

Ability to independently own services and drive reliability initiatives end-to-end.

Strong communication skills and ability to influence engineering and product teams.

Experience being on On-Call rotation and handling critical/high incidents.

Desired Skills:

Experience building and operating microservices architectures using Spring Boot, Groovy, React, or similar.

Strong understanding of CI/CD pipelines, release automation, and progressive delivery.

Experience with eCommerce domains such as Catalog, Customer Data, and Order Management.

Familiarity with search platforms (Endeca, Solr, Lucene, Elasticsearch).

Proficiency in scripting and automation (Python, Bash, Ruby, Perl, PowerShell).

Experience with ITSM tools integrated with Jira workflows.

Exposure to capacity planning, load testing, and chaos engineering.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10462843
Position Id: 8915930
Posted 3 days ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Remote or Woonsocket, Rhode Island

•

Today

We're building a world of health around every individual - shaping a more connected, convenient and compassionate health experience. At CVS Health , you'll be surrounded by passionate colleagues who care deeply, innovate with purpose, hold ourselves accountable and prioritize safety and quality in everything we do. Join us and be part of something bigger - helping to simplify health care one person, one family and one community at a time. The Staff Engineer - SRE, Retail & Pharmacy will impleme

Full-time

USD 118,450.00 - 260,590.00 per year

Site Reliability Engineer

Remote

•

Today

The NMCI Service Management Integration and Transport (SMIT) group at Leidos has an opening for a Site Reliability Engineer to focus on the reliability, performance, and scalability of complex distributed systems. Under the SMIT Contract, the Leidos team is responsible for the core backbone for the Navy-Marine Corps Intranet, including cybersecurity services, network operations, network engineering, service desk, seat support services, and data transport. The SRE will also develop and execute te

Full-time

USD 87,100.00 - 157,450.00 per year

Site Reliability Engineering (SRE) Automation and Orchestration Engineer

Remote

•

Today

The U.S. Navy's Service Management, Integration, and Transport (SMIT) program has an opening for a Site Reliability Automation and Orchestration Engineer on a high-visibility DoD program that provides engineering support to the Navy Marine Corps Intranet (NMCI), the largest information technology (IT) network in the world. This position will provide many opportunities to challenge and grow your skills. This person should be a seasoned, self-motivated, professional with hands-on engineering and

Full-time

USD 73,450.00 - 132,775.00 per year

Senior Site Reliability Engineer- Central Platforms

Remote or New York, New York

•

Today

As a leading financial services and healthcare technology company based on revenue, SS&C is headquartered in Windsor, Connecticut, and has 27,000+ employees in 35 countries. Some 20,000 financial services and healthcare organizations, from the world's largest companies to small and mid-market firms, rely on SS&C for expertise, scale, and technology. Job Description We are seeking a Site Reliability Engineer (SRE) to join our Internal Platform Services team, responsible for the reliability, sca

Full-time

Search all similar jobs

SRE+Dynatrace - Guadalajara, MX

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs