Overview
Skills
Job Details
Payrate: 70 - 80 per hour
We are seeking an experienced Technical Business Analyst to manage and enhance the stability, performance and availability of our client facing applications. This role requires a proactive leader who can guide a dedicated support team, collaborate with engineering teams, and effectively manage incidents to minimize downtime, improve user experience and communicate with stakeholders.
Key Responsibilities:
Incident Management and Resolution:
- Oversee the triage, investigation and resolution of production issues, ensuring timely communication and status updates
- Manage incident response efforts, including documentation and root cause analysis and post-incident reviews to identify preventative actions
- Establish clear escalation protocols and ensure adherence to serve level agreements (SLAs)
- Coordinate resolution and follow ups with dependencies outside immediate team
- Coordinate KTs between development teams and L1/L2 triage to establish runbooks and knowledge base
Team Leadership and Coordination:
- Coordinate with development, QA, and infrastructure teams to ensure seamless issue resolution and knowledge sharing
- Foster a strong ownership mindset within the team, ensuring accountability for system health and stability
Monitoring and Alerting
- Define and maintain effective monitoring solutions in partnership with development teams to proactively identify and address potential issues
- Continuously improve observability by implementing dashboards, alerts and automated health checks in partnership with development teams
Process and Documentation
- Develop and maintain detailed runbooks, SOPs and knowledge base articles to ensure consistent response procedures
- Establish best practices for incident response, including communication templates and decision frameworks
Stakeholder Communication:
- Serve as the primary point of contact for production issues affecting client experiences
- Provide clear, concise updates to leadership, internal teams and clients during incidents and post-incident reviews.
Continuous Improvement
- Identify patterns in recurring incidents and partner with development teams to implement permanent fixes
- Drive initiatives to enhance system reliability, scalability, and performance.
Qualifications and Skills:
- Proven experience in a production support leadership role for client facing applications
- Strong understanding of incident management frameworks
- Proficiency in troubleshooting application, database, and infrastructure issues
- Familiarity with monitoring tools such Dynatrace, Datadog , Splunk etc
- Familiarity with incident management platforms such as ServiceNow
- Ability to prioritize tasks effectively, and communicate technical concepts to non technical stakehodlers
- Excellent problem solving skills and a calm, solution-focused approach under pressure
- Experience working in AWS
- Familiarity with CI/CD pipelines and release management processes
Preferred:
- Background in software development or scripting for automation
- Previous experience in the financial services industry
Success Metrics
- MTTA: Mean time to acknowledge
- MTTR: Mean time to resolve
- Stakeholder satisfaction with incident communication
- Knowledge base usage rate and coverage
- Number of issues handed over to L1/L2, EMKT teams
- Measure # of system identified vs user reported alerts and trends over time
- Enhancements and alerts requested
- Minimize # of user reported incidents
- Measure incidents resolved with L1/L2 without app support team
- Reduction in resolution times due to documented processes
#LI-JV1