Role: SRE- Application Support Engineer
Work Location & Reporting Address: Alpharetta, GA (Onsite)
Looking for W2
Job Details:
Position Overview:
The Wealth Management Production Management Site Reliability Engineer position is a highly visible/critical role, which will be a team member of technical SMEs managing the stability and optimization of the Wealth Management systems.
Scope includes but not limited to, the day-to-day support of the organization's technology related outages, collaboration on technology projects
focused on stability, optimization, business impact analysis, and associated risk-related methodologies.
This role will be responsible for overall stability of the Wealth Management Investment Management application platforms, participation on key optimization initiatives, and collaboration with multiple technical teams within .
Additionally, partner with WM business units, various levels of management and staff to collect, analyze and make recommendations on optimizing the platform.
This position will mainly perform DevOps/SRE role in Java, Unix & SQL technologies technology.
Responsibilities include:
- Incident Management -Create and manage necessary process involving incidents
- Partner with Ops Control to ensure IT and/or End User communications are handled appropriately
- Engage with the development team throughout the life cycle to support Application build for Reliability
- Develop software to automate manual operational work
- Run, maintain and improve the service against established Service Level Objectives by applying software engineering principles
- Responsible for the availability, performance, change (CP) management, monitoring, and capacity management of their services
- Troubleshoot priority incidents, conduct blameless post-mortems and ensure permanent closure of the incidents
- Analyze patterns of production incidents, develop permanent remediation plans, and implement automation to prevent future incidents from occurring through software engineering
- Manage process related functions around large-scale events such as disaster recovery. Communicate closely with impacted groups to ensure all events are properly managed.
Primary Skills / Must have:
- Site Reliability Engineer (SRE) in which 80% will be support [React/Protect], 10% will be in Dev Ops[Enable] space.
- Proven track record supporting large scale multi-tiered cloud-based applications.
- Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns
- Hands on experience with Java, Angular, Spring, DB2, Unix scripting and experienced in scheduler tools such as TWS, autosys
- L2-L3 Production Support, Debugging skills, problem solving
- Experience working in an Agile Development environment
- Proven ability to understand and troubleshoot complex problems under pressure
- Excellent communication skills (both written and oral), listening skills, influencing and negotiation skills
- Experience with performance troubleshooting and remediation
- Experience with observability tools such as Splunk, Kibana, Grafana, Prometheus
- Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead in DevOps automation and best practices.
Secondary Skills / Desired skills:
- Having good expertise on Linux and shell scripting. Need to be very comfortable with Linux
- Grafana/Kibana dashboarding experience
- Good problem-solving skills
- Good communicator
- Good understanding of brokerage business
- Jobs (controlM/CBSS/CRON) experience
- Bachelor's/Master's Degree in Computer Science, Information Systems or related field
Best Regards:
Julia T
Phone:
Email: