Overview
Skills
Job Details
Position Title: SRE with Strong Middleware Expertise
Job Location: Plano, TX(Onsite)
Joining Mode: Long Term Contract
Shift 1: 8:00 AM – 5:00 PM
Shift 2: 4:00 PM – 1:00 AM
Shift 3: 12:00 AM – 9:00 AM
Job Summary
We are seeking a Site Reliability Engineer (SRE) with strong Middleware expertise to design, operate, and continuously improve highly available, secure, and scalable enterprise platforms.
This role blends deep middleware operations (WebLogic, API gateways, Java platforms) with SRE principles such as automation, observability, SLIs/SLOs, error budgets, and incident reduction.
The ideal candidate will partner with application, infrastructure, security, and DevOps teams to ensure platform reliability while driving automation, standardization, and operational excellence.
Key Responsibilities
Reliability & SRE Practices
• Define, implement, and track SLIs, SLOs, and error budgets for middleware and platform services
• Drive MTTR reduction, availability improvements, and operational resilience
• Lead incident response, root cause analysis (RCA), and post-incident reviews
• Implement proactive monitoring and alerting to reduce noise and prevent outages
Middleware Platform Engineering
• Administer and support enterprise middleware platforms including:
o Oracle WebLogic, Apache, NGINX
o API Gateways (Apigee Edge / X)
o Java application servers and JVM-based services
• Perform patching, upgrades, configuration tuning, and capacity planning
• Manage certificates, keystores, trust stores, and TLS configurations
• Ensure platform security, compliance, and performance standards
Observability & Monitoring
• Design and maintain end-to-end observability using tools such as:
o Dynatrace, ELK/Kibana, Splunk (or equivalent)
• Build executive and operational dashboards for real-time health visibility
• Reduce alert fatigue through smart alerting, thresholds, and suppression
• Monitor JVM metrics, behavior, thread utilization, and API performance
Automation & Infrastructure Efficiency
• Develop automation and self-healing solutions using:
o Shell scripting, Python, Ansible, Terraform, or similar tools
• Automate routine operational tasks (restarts, validations, health checks)
• Enable CI/CD-friendly middleware deployments and configuration management
• Standardize environments across development, QA, and production
Cloud, Containers & Modern Platforms
• Support middleware workloads on:
o Kubernetes / OpenShift
o Public or hybrid cloud environments (AWS, Azure, Google Cloud Platform)
• Integrate platform reliability into containerized and microservices architecture
• Collaborate with DevOps teams on deployment pipelines and release strategies
Collaboration & Leadership
• Act as a reliability advisor to application and development teams
• Partner with Unix/Linux, Database, Network, and Security teams
• Provide mentoring, documentation, and best-practice guidance
• Participate in on-call rotations and production support leadership
Required Skills & Experience
Technical Skills
• 5+ years of experience in Middleware / Platform Operations / SRE
• Strong expertise in WebLogic, Java middleware, Apache/NGINX
• Hands-on experience with observability platforms (Dynatrace, ELK, Splunk)
• Solid understanding of Linux/Unix systems and networking fundamentals
• Experience with API platforms (Apigee preferred)
• Automation and scripting skills (Shell, Python, Ansible, Terraform)
• Experience with Kubernetes/OpenShift and containerized workloads
SRE & Operational Excellence
• Practical experience implementing SRE principles in production
• Strong troubleshooting skills (thread dumps, heap analysis, logs)
• Experience with incident management, RCA, and change management
• Ability to balance reliability vs delivery velocity
Nice-to-Have
• Experience with cloud-native architectures and service meshes
• Knowledge of IAM / Security integrations (OAuth, SAML, mTLS)
• Exposure to CI/CD tools (Jenkins, GitHub Actions, GitLab CI)