Technical Business Analyst / Incident Manager

Overview

On Site
Hybrid
BASED ON EXPERIENCE
Contract - Independent
Contract - W2

Skills

User Experience
Management
Root Cause Analysis
UPS
Team Leadership
Development Testing
Issue Resolution
Knowledge Sharing
Accountability
Dashboard
Collaboration
Partnership
Documentation
Continuous Improvement
Reliability Engineering
Scalability
Production Support
Leadership
Customer Facing
Database
Dynatrace
Splunk
Incident Management
ServiceNow
Conflict Resolution
Problem Solving
Amazon Web Services
Continuous Integration
Continuous Delivery
Release Management
Software Development
Scripting
Financial Services
MEAN Stack
Communication
Knowledge Base
Physical Layer
Data Link Layer

Job Details

Location: Newark(Hybrid)

Payrate: 70 - 80 per hour

We are seeking an experienced Technical Business Analyst to manage and enhance the stability, performance and availability of our client facing applications. This role requires a proactive leader who can guide a dedicated support team, collaborate with engineering teams, and effectively manage incidents to minimize downtime, improve user experience and communicate with stakeholders.

Key Responsibilities:
Incident Management and Resolution:
- Oversee the triage, investigation and resolution of production issues, ensuring timely communication and status updates
- Manage incident response efforts, including documentation and root cause analysis and post-incident reviews to identify preventative actions
- Establish clear escalation protocols and ensure adherence to serve level agreements (SLAs)
- Coordinate resolution and follow ups with dependencies outside immediate team
- Coordinate KTs between development teams and L1/L2 triage to establish runbooks and knowledge base
Team Leadership and Coordination:
- Coordinate with development, QA, and infrastructure teams to ensure seamless issue resolution and knowledge sharing
- Foster a strong ownership mindset within the team, ensuring accountability for system health and stability
Monitoring and Alerting
- Define and maintain effective monitoring solutions in partnership with development teams to proactively identify and address potential issues
- Continuously improve observability by implementing dashboards, alerts and automated health checks in partnership with development teams
Process and Documentation
- Develop and maintain detailed runbooks, SOPs and knowledge base articles to ensure consistent response procedures
- Establish best practices for incident response, including communication templates and decision frameworks
Stakeholder Communication:
- Serve as the primary point of contact for production issues affecting client experiences
- Provide clear, concise updates to leadership, internal teams and clients during incidents and post-incident reviews.
Continuous Improvement
- Identify patterns in recurring incidents and partner with development teams to implement permanent fixes
- Drive initiatives to enhance system reliability, scalability, and performance.
Qualifications and Skills:
- Proven experience in a production support leadership role for client facing applications
- Strong understanding of incident management frameworks
- Proficiency in troubleshooting application, database, and infrastructure issues
- Familiarity with monitoring tools such Dynatrace, Datadog , Splunk etc
- Familiarity with incident management platforms such as ServiceNow
- Ability to prioritize tasks effectively, and communicate technical concepts to non technical stakehodlers
- Excellent problem solving skills and a calm, solution-focused approach under pressure
- Experience working in AWS
- Familiarity with CI/CD pipelines and release management processes
Preferred:
- Background in software development or scripting for automation
- Previous experience in the financial services industry
Success Metrics
- MTTA: Mean time to acknowledge
- MTTR: Mean time to resolve
- Stakeholder satisfaction with incident communication
- Knowledge base usage rate and coverage
- Number of issues handed over to L1/L2, EMKT teams
- Measure # of system identified vs user reported alerts and trends over time
- Enhancements and alerts requested
- Minimize # of user reported incidents
- Measure incidents resolved with L1/L2 without app support team
- Reduction in resolution times due to documented processes
#LI-JV1

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.