Overview
Skills
Job Details
Job Title: SRE / Java Production Support Engineer (MainFrame + Distributed Systems)
Location: Phoenix, AZ (Onsite/Hybrid)
Face-to-face interview
Job Description:
We are seeking an experienced SRE with strong Java and Splunk expertise to support, troubleshoot, and optimize production environments across mainframe and distributed systems. The ideal candidate will be responsible for diagnosing complex issues, ensuring system reliability, and improving observability and performance in a high-availability environment.
Key Responsibilities:
Provide production support for MF and distributed applications, identifying and resolving issues in real time.
Utilize Java expertise to troubleshoot application-level defects, performance bottlenecks, and service failures.
Leverage Splunk for log analysis, monitoring, alerting, and root-cause investigation.
Apply SRE principles to enhance system reliability, automation, scalability, and incident response.
Implement monitoring, dashboards, and alerting improvements to reduce MTTR.
Collaborate with engineering, QA, and infrastructure teams to ensure seamless system operations.
Participate in on-call rotation and support critical production releases.
Required Skills:
Strong hands-on experience in Java, production debugging, and application diagnostics.
SRE experience is critical, including incident response, automation, and reliability engineering.
Expertise in Splunk (queries, dashboards, alerts, log deep-dive analysis).
Solid understanding of distributed systems and MF (mainframe) environments.
Strong analytical and problem-solving abilities.