Immediate need for a talented Product Reliability Engineering Lead. This is a 12+ Months Contract opportunity with long-term potential and is located in US (Remote-CST). Please review the job description below and contact me ASAP if you are interested.
Job ID:26-15460
Pay Range: $85 - $95/hour. Employee benefits include, but are not limited to, health insurance (medical, dental, vision), 401(k) plan, and paid sick leave (depending on work location).
Key Responsibilities:
- Define and lead the reliability strategy for the Acquisition Platform, ensuring alignment with product, platform, and enterprise goals.
- Establish SLOs, SLIs, and error budgets that tie reliability targets to business outcomes and partner expectations.
- Shift reliability requirements into early design and development phases so resiliency, failover, and graceful degradation are architected in, not bolted on.
- Design reliability patterns across platform services, APIs, workflows, and dependent systems both internal and external.
- Architect end to end observability across the platform including metrics, structured logging, distributed tracing, and alerting.
- Establish monitoring standards and dashboards that provide real time visibility into platform health, partner facing services, and integration dependencies.
- Embed observability into platform services from design through deployment so teams can detect, diagnose, and resolve issues rapidly.
- Drive adoption of synthetic monitoring and canary deployments to validate production behavior proactively.
- Collaborate closely with the Acquisition delivery team and stakeholders to align outcomes with the reliability strategy.
- Partner with AMS, infrastructure, and other tech teams to ensure clear ownership boundaries and smooth operational handoffs.
- SRE principles SLOs, SLIs, error budgets, toil reduction, blameless postmortems
- Observability design distributed tracing, APM telemetry, structured logging, real time alerting, synthetic monitoring
- Resilience and fault tolerance circuit breakers, bulkheads, retry/backoff, graceful degradation, failover validation
- Chaos engineering and reliability testing fault injection, load/stress testing, failure mode analysis
- CI/CD reliability integration automated reliability gates, canary deployments, feature flags, progressive rollouts
- AI assisted reliability techniques anomaly detection, predictive alerting, prompt driven runbook automation, agent based remediation
- Responsible AI use including consideration of security, data exposure, and operational risk
- Cloud native operations containerized platforms, event driven architectures, infrastructure as code
- Growth oriented mindset ability to think beyond constraints of today and identify what is required to build the future
- Excellent communication skills ability to translate reliability concerns between engineering, product, and business teams
Key Requirements and Technology Experience:
- Must have skills: - Site Reliability Engineering (SRE), AWS Cloud (EKS/ECS/Lambda), Observability & Monitoring (PrometheGrafana/Datadog/Splunk), Kubernetes & CI/CD Automation, Chaos Engineering & Reliability Testing, SLO/SLI/Error Budget Management
- 5+ years of experience in site reliability engineering, platform engineering, or production operations roles
- Experience defining and operating SLO/SLI frameworks tied to business outcomes
- Hands on experience designing observability for distributed, API driven platforms
- Experience with reliability and resiliency testing including chaos engineering and fault injection
- Experience guiding and mentoring engineers on reliability practices
- Enterprise scale delivery experience with both onshore and offshore cross functional teams
- Direct experience applying Agile methodologies in product centric delivery models
- AWS operational experience CloudWatch, X Ray, Fault Injection Simulator, ECS/EKS, Lambda, EventBridge
- Experience integrating reliability practices with DevSecOps and CI/CD pipelines
- Familiarity with AI/ML driven operations tools and incident management platforms
Our client is a leading Insurance Industry and we are currently interviewing to fill this and other similar contract positions. If you are interested in this position, please apply online for immediate consideration.
Pyramid Consulting, Inc. provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
By applying to our jobs you agree to receive calls, AI-generated calls, text messages, or emails from Pyramid Consulting, Inc. and its affiliates, and contracted partners. Frequency varies for text messages. Message and data rates may apply. Carriers are not liable for delayed or undelivered messages. You can reply STOP to cancel and HELP for help. You can access our privacy policy .