Overview
Skills
Job Details
Area(s) of responsibility
We are seeking a highly experienced and technically proficient Senior Production Support Engineer to join our dynamic technology team in the USA. The ideal candidate will have 7-10 years of hands-on experience in supporting high-volume, critical applications, with deep expertise across our core technology stack and a solid understanding of the financial services domain.
Key Responsibilities
Incident Management & Resolution: Act as the primary point of contact for high-priority production incidents. Drive timely resolution, perform root cause analysis (RCA), and implement preventive measures to minimize future occurrences.
Application Monitoring & Health: Proactively monitor the health, performance, and capacity of production applications using advanced monitoring tools like Splunk and New Relic. Develop and maintain dashboards, alerts, and runbooks.
Change Management: Evaluate, approve, and oversee production changes, adhering strictly to Change Management protocols to ensure stability and minimize risk. Participate in release and deployment activities.
Performance Optimization: Identify performance bottlenecks in application code and infrastructure (Java, Database, Cache) and collaborate with development teams to implement fixes and efficiency improvements.
System Maintenance: Perform regular system maintenance, health checks, and capacity planning for application infrastructure running on AWS and Pivotal Cloud Foundry (PCF).
Documentation & Knowledge Sharing: Create and maintain comprehensive support documentation, knowledge base articles, and troubleshooting guides.
On-Call Support: Participate in an on-call rotation to provide 24/7 support for critical production systems.
Required Technical Skills & Experience (7-10 Years)
Core Programming: Java (Deep proficiency) 7+ Years
Frameworks: Spring Boot (Microservices architecture-Extensive
Frontend: React (Understanding of application flow): Proficient
Cloud/PaaS: AWS, Pivotal Cloud Foundry (PCF): Strong
Database: MySQL (Querying, optimization, troubleshooting): Strong
Caching/Messaging: Redis, Cache Management principles: Expert
Monitoring/Logging: Splunk, New Relic: Expert (Developing queries, dashboards, alerts)
Process: Incident Management, Change Management: Strong (ITIL framework knowledge is a plus)