Job Title: SRE Engineer
Location : Alpharetta GA Local
Contract
Interview type: L1 Video and L2 - Face to Face
Position Overview:
The Wealth Management Production Management Site Reliability Engineer position is a highly visible/critical role, which will be a team member of technical SMEs managing the stability and optimization of the Wealth Management systems. Scope includes but not limited to, the day-to-day support of the organization's technology related outages, collaboration on technology projects
focused on stability, optimization, business impact analysis, and associated risk-related methodologies. This role will be responsible for overall stability of the Wealth Management Investment Management application platforms, participation on key optimization initiatives, and collaboration with multiple technical teams within . Additionally, partner with WM business units, various levels of management and staff to collect, analyze and make recommendations on optimizing the platform. This position will mainly perform DevOps/SRE role in Java, Unix & SQL technologies technology.
Responsibilities include:
- Incident Management -Create and manage necessary process involving incidents
- Partner with Ops Control to ensure IT and/or End User communications are handled appropriately
- Engage with the development team throughout the life cycle to support Application build for Reliability
- Develop software to automate manual operational work
- Run, maintain and improve the service against established Service Level Objectives by applying software engineering principles
- Responsible for the availability, performance, change (CP) management, monitoring, and capacity management of their services
- Troubleshoot priority incidents, conduct blameless post-mortems and ensure permanent closure of the incidents
- Analyze patterns of production incidents, develop permanent remediation plans, and implement automation to prevent future incidents from occurring through software engineering
- Manage process related functions around large-scale events such as disaster recovery. Communicate closely with impacted groups to ensure all events are properly managed.
Primary Skills / Must have
- Bachelor's/Master's Degree in Computer Science, Information Systems or related field
- Proven track record supporting large scale multi-tiered cloud-based applications.
- Hands on experience with Java, Angular, Spring, DB2, Unix scripting and experienced in scheduler tools such as TWS, autosys.
- Experience working in an Agile Development environment
- Proven ability to understand and troubleshoot complex problems under pressure
- Excellent communication skills (both written and oral), listening skills, influencing and negotiation skills
- Experience with performance troubleshooting and remediation
- Experience with observability tools such as Splunk, Kibana, Grafana, Prometheus
Secondary Skills / Desired skills
- Knowledge of Azure platform is a plus
- Working knowledge of python