Site Reliability Engineer

Overview

On Site

Full Time

Skills

Finance

Internet

SAP BASIS

Linux

Cloud Computing

Application Support

Web Portals

Web Services

Bridging

Turnover

Requirements Elicitation

Estimating

Design Documentation

Change Management

FTS

Open Systems

Root Cause Analysis

Collaboration

Risk Management

Auditing

Management

IT Operations

Database

IT Risk

Disaster Recovery

Recovery

Job Details

Job Title: Site Reliability Engineer
Assignment Type: ~12-month contract to hire

Location: Berkley Heights, NJ

Employment Type- W2 only no C2C

Summary of Position:

The SRE candidate will assume a key role in supporting the day-to-day operations of the Digital group within the Card Services organization of Client. The candidate is responsible for providing on-call service continuity and escalated support for Web Portal, Web Services, Micro Services and other assigned production applications.
The candidate will work closely with development and the technology groups to monitor and triage problems identified by different departments of the organization. This is a high performance culture and the candidate must demonstrate the ability to work efficiently and quickly on a financial Web Portals. You will be measured on your ability to meet individual and departmental objectives, and reduction of service affecting issues.

Responsibilities:
Provide 24x7 support of production Internet applications on a rotating basis.
Hands on understanding of Linux systems
Good understanding of cloud concepts
Point of escalation for application support to diagnose and resolve complex customer issues in accessing the Portal and Web Services environments
Drive Open Systems SEVerity crisis technical bridges and/or management bridges, as required and leverages experience and organizational knowledge to reduce MTTR
Review turnover paperwork to ensure that they are complete prior to production installs
Participate in the requirements gathering process, representing the production environments, to ensure that all operational aspects are identified and documented. Provide all tasks and detailed estimates to project managers, review and approve design documentation to ensure understanding of business logic changes and technical solution being implemented
Works with Change Management/ Release Managers to review propose change events for production
Work with FTS and Open System development to perform project code installations with assistance from the development and business groups. Validate successful implementations or fallbacks.
Document install-defects and assign severity to the problems that occurred. After fallback, perform post mortem to identify root cause analysis (RCA)
Direct incident recovery, and cross-functional teams to collaborate on identified issues
Identify and implement improvements to incident recovery, incident engagement, and incident communications
Perform trending and analysis of problems; anticipate problems and develop risk mitigation plans
Participate in internal and external audits, as requires by management
Ensures monitoring alerts and system events are assessed, prioritized, and worked aggressively
Escalate issues to the technology, operations, and/or vendor(s) where appropriate
Ensure database/application controls and procedures remain compliant with Corporate IT risk
Support Disaster Recovery tests and live recovery for all production environments
Work with Card Services architects to validate and design enterprise solutions and application monitoring tools

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share