Overview
Skills
Job Details
JPC - 3482
Level 4: (7 to 12 Yrs of exp)
LINUX / UNIX Application Support, SRE Engineer, Devops, Kubernetes, IaC, CI/CD, Python/ Shell NYC
Location: NYC (Hybrid 3 days a week onsite)
Duration: 12+ Months Cont
Interview: inperson interview will be needed - locals given preference.
Description:
The team supports strategic initiatives like modernization, containerization, observability, SRE, DevOps and automation.
This team also partners with central technology teams like Infrastructure, Security, Network, Database and procurement to design and deliver solutions. The team leverages a variety of tools for platform design and application troubleshooting as they also provide elevated level 4 production support to the application operations teams.
Responsibilities
The successful candidate will:
The successful candidate will be involved in application support, application server administration, technical troubleshooting of infrastructure and user incidents
Incorporate Site Reliability Engineering practices into the day-to-day role by developing automated solutions to long-standing problems to ensure minimal downtime and reduce toil
Experience with web architecture implementation including performance, availability, scalability, and disaster recovery planning.
Experience with monitoring and alerting tools, configuring application monitors using industry standard monitoring tools, as well as developing customized monitoring solutions
Revisit SRE Metrics and confirm against the firm and department goals
Identify areas for improvement including automation, toil reduction, resiliency and observability across the platforms and help build up the knowledge and documentation for the team
Partner with other teams such as enterprise infrastructure, networking, security, storage, and database and data center to roll out application platforms successfully as per the design.
Produce reusable infrastructure designs patterns and periodically review / refresh the patterns.
Support vendor / vendor technology onboarding following the best practices and security blueprint.
Apply technical skills to automate daily support functions, improve system stability, support hygiene initiatives and deliver innovation that creates efficiency and consistency.
Occasional weekend availability and on-call work on a rotation basis.
Required Skills
Strong infrastructure knowledge in Linux / Unix, Databases, Storage and Networking technologies.
Hands-on experience with containers and container orchestration platforms OpenShift / Kubernetes
Experience with scripting in Python and Shell
Hands-on experience of web servers (Apache / Nginx), application integration, configuration, and troubleshooting.
Clear concept of load balancer, web proxies and storage platforms like NAS / SAN from an implementation perspective only.
Familiar with basic security practices to ensure secure hosting solutions, including single sign-on (SSO) and standard encryption protocols.
Prior experience managing large web-based n-tier applications in secure environments on cloud
Strong knowledge SRE Principles with grasp over tools / approach to apply them
Strong infrastructure knowledge in Storage, Networking and Databases
Experience in troubleshooting Application Issues and Managing Incidents
Exposure to tools like Prometheus, Grafana, and Open Telemetry framework
Excellent verbal and written communication skills.
Desired / Nice to have skills
Exposure and experience with data pipeline technologies such as Kafka, Redis and Airflow
Exposure to Big Data platforms like Hadoop / Cloudera and ELK Stack
Capacity planning and performance tuning exercise
Identity management protocols like OIDC / OAuth, SAML, LDAP integration
Cloud Application and infrastructure knowledge is a plus.
Experience in Cloud / Distributed computing technology or certification is a plus
Experience
7 to 12 years in a similar role of hands-on application / middleware specialist.
Prior experience of working in a global finacial firm helpful.