SRE with Strong Middleware Expertise

Overview

On Site
Depends on Experience
Accepts corp to corp applications
Contract - W2
Contract - Independent
Contract - 12 Month(s)
No Travel Required
Unable to Provide Sponsorship

Skills

Amazon Web Services
API
Terraform
Middleware
Kubernetes
Apache HTTP Server
Ansible
Dynatrace
weblogic

Job Details

Position Title: SRE with Strong Middleware Expertise

Job Location: Plano, TX(Onsite)

Joining Mode: Long Term Contract

Shift 1: 8:00 AM – 5:00 PM

Shift 2: 4:00 PM – 1:00 AM

Shift 3: 12:00 AM – 9:00 AM

Job Summary

We are seeking a Site Reliability Engineer (SRE) with strong Middleware expertise to design, operate, and continuously improve highly available, secure, and scalable enterprise platforms.

This role blends deep middleware operations (WebLogic, API gateways, Java platforms) with SRE principles such as automation, observability, SLIs/SLOs, error budgets, and incident reduction.

The ideal candidate will partner with application, infrastructure, security, and DevOps teams to ensure platform reliability while driving automation, standardization, and operational excellence.

 

Key Responsibilities

Reliability & SRE Practices

          Define, implement, and track SLIs, SLOs, and error budgets for middleware and platform services

          Drive MTTR reduction, availability improvements, and operational resilience

          Lead incident response, root cause analysis (RCA), and post-incident reviews

          Implement proactive monitoring and alerting to reduce noise and prevent outages

 

Middleware Platform Engineering

          Administer and support enterprise middleware platforms including:

o          Oracle WebLogic, Apache, NGINX

o          API Gateways (Apigee Edge / X)

o          Java application servers and JVM-based services

          Perform patching, upgrades, configuration tuning, and capacity planning

          Manage certificates, keystores, trust stores, and TLS configurations

          Ensure platform security, compliance, and performance standards

 

Observability & Monitoring

          Design and maintain end-to-end observability using tools such as:

o          Dynatrace, ELK/Kibana, Splunk (or equivalent)

          Build executive and operational dashboards for real-time health visibility

          Reduce alert fatigue through smart alerting, thresholds, and suppression

          Monitor JVM metrics, behavior, thread utilization, and API performance

 

Automation & Infrastructure Efficiency

          Develop automation and self-healing solutions using:

o          Shell scripting, Python, Ansible, Terraform, or similar tools

          Automate routine operational tasks (restarts, validations, health checks)

          Enable CI/CD-friendly middleware deployments and configuration management

          Standardize environments across development, QA, and production

 

Cloud, Containers & Modern Platforms

          Support middleware workloads on:

o          Kubernetes / OpenShift

o          Public or hybrid cloud environments (AWS, Azure, Google Cloud Platform)

          Integrate platform reliability into containerized and microservices architecture

          Collaborate with DevOps teams on deployment pipelines and release strategies

 

Collaboration & Leadership

          Act as a reliability advisor to application and development teams

          Partner with Unix/Linux, Database, Network, and Security teams

          Provide mentoring, documentation, and best-practice guidance

          Participate in on-call rotations and production support leadership

 

Required Skills & Experience

Technical Skills

          5+ years of experience in Middleware / Platform Operations / SRE

          Strong expertise in WebLogic, Java middleware, Apache/NGINX

          Hands-on experience with observability platforms (Dynatrace, ELK, Splunk)

          Solid understanding of Linux/Unix systems and networking fundamentals

          Experience with API platforms (Apigee preferred)

          Automation and scripting skills (Shell, Python, Ansible, Terraform)

          Experience with Kubernetes/OpenShift and containerized workloads

 

SRE & Operational Excellence

          Practical experience implementing SRE principles in production

          Strong troubleshooting skills (thread dumps, heap analysis, logs)

          Experience with incident management, RCA, and change management

          Ability to balance reliability vs delivery velocity

 

Nice-to-Have

          Experience with cloud-native architectures and service meshes

          Knowledge of IAM / Security integrations (OAuth, SAML, mTLS)

          Exposure to CI/CD tools (Jenkins, GitHub Actions, GitLab CI)

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Mango Analytics