Site Reliability Engineer - Document Management

Engineer, Management, Java, Unix, Linux, SQL, Oracle, JavaScript, Python, Perl, J2EE, Version Control, Application, Networks
Full Time

Job Description

Job Description:

Our client, a top tier US Global Investment Bank is looking to hire a Site Reliability Engineer (SRE) for a permanent Full Time role, role is located in Delaware. The client is rebuilding their global Content Document Management applications and they are using cutting edge technologies.

THE DAY TO DAY RESPONSIBILITIES:

  • Design, code, test and deliver software to automate manual operational work
  • Troubleshoot priority incidents, facilitate blameless post-incident evaluations and ensure permanent closure of incidents.
  • Engage with development teams throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes.
  • Identify application patterns and analytics in support of better service level objectives.
  • Design/develop self-healing and resiliency patterns.
  • Design/develop performance tests, identify bottlenecks and opportunities for optimization and capacity demands, and present solutions for continuous improvements.
  • Implement best-in-class monitoring frameworks to accomplish end-to-end flow monitoring and noiseless alerting.
  • Develop automated software and product upgrades, change and release management solutions.
  • Influence developers/other teams globally to ensure resiliency and stability standards.
  • Effectively split time between operational work and engineering work.
  • Contribute to around the clock support coverage as needed.


THE SKILLS YOU NEED TO GET THE JOB:

  • 5+ years of Core Java and Unix Shell scripting (Java Developers who are interested in SRE will be considered).
  • Scripting with at least one technology stack - designing, coding, testing, delivering software.
  • Experience in software development, infrastructure development, or development and operations.
  • Linux infrastructures, CI/CD tools (Jenkins, Jules, Maven).
  • Understanding of SQL and Oracle, MS SQL Server, or NoSQL databases such as MarkLogic, Hadoop, MongoDB etc. is desired.
  • Excellent debugging and troubleshooting skills.
  • Scripting with one or more of the following: Java, JavaScript, Python or Perl or Ruby.
  • Scrum/Kanban/Agile methodologies.
  • Basic knowledge with development in J2EE, Spring Boot, MVC etc. is desired.
  • Have participated in Incident Management and Issue Resolution across multiple teams.
  • Have performed Production releases and support triaging of issues during releases or post releases
  • Working knowledge of Centralized logging (Splunk) or Log As Service.
  • Knowledge of Support models - Incident Management, Problem Management.
  • Knowledge of Version Control repositories like GIT.
  • Application Monitoring tools - Apica, AppDynamics or Dynatrace are desired.
  • Good to have - Cloud Native technologies such as AWS, Kubernetes and Pivotal GAIA
  • Working knowledge/understanding of infrastructure components - routers, load balancers, Cloud products, Containers, Compute, Storage and Networks.
  • Problem Solver attitude / energy.
  • Good interpersonal communication skills.
  • BS/BA degree/equivalent experience in a software engineering.


Please Note:
  • This is a hybrid work environment requiring 2-3 days on-site.
  • Candidates must be able to work without visa transfer.
  • The client is open to relocation


Dice Id : aegisoft
Position Id : 17822
Originally Posted : 2 months ago
Have a Job? Post it