Site Reliability Engineer - Document Management

Engineer, Management, Java, Unix, Linux, SQL, Oracle, JavaScript, Python, Perl, J2EE, Version Control, Application, Networks
Full Time

Job Description

Job Description:

Our client, a top tier US Global Investment Bank is looking to hire a Site Reliability Engineer (SRE) for a permanent Full Time role, role is located in Delaware. The client is rebuilding their global Content Document Management applications and they are using cutting edge technologies.


  • Design, code, test and deliver software to automate manual operational work
  • Troubleshoot priority incidents, facilitate blameless post-incident evaluations and ensure permanent closure of incidents.
  • Engage with development teams throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes.
  • Identify application patterns and analytics in support of better service level objectives.
  • Design/develop self-healing and resiliency patterns.
  • Design/develop performance tests, identify bottlenecks and opportunities for optimization and capacity demands, and present solutions for continuous improvements.
  • Implement best-in-class monitoring frameworks to accomplish end-to-end flow monitoring and noiseless alerting.
  • Develop automated software and product upgrades, change and release management solutions.
  • Influence developers/other teams globally to ensure resiliency and stability standards.
  • Effectively split time between operational work and engineering work.
  • Contribute to around the clock support coverage as needed.


  • 5+ years of Core Java and Unix Shell scripting (Java Developers who are interested in SRE will be considered).
  • Scripting with at least one technology stack - designing, coding, testing, delivering software.
  • Experience in software development, infrastructure development, or development and operations.
  • Linux infrastructures, CI/CD tools (Jenkins, Jules, Maven).
  • Understanding of SQL and Oracle, MS SQL Server, or NoSQL databases such as MarkLogic, Hadoop, MongoDB etc. is desired.
  • Excellent debugging and troubleshooting skills.
  • Scripting with one or more of the following: Java, JavaScript, Python or Perl or Ruby.
  • Scrum/Kanban/Agile methodologies.
  • Basic knowledge with development in J2EE, Spring Boot, MVC etc. is desired.
  • Have participated in Incident Management and Issue Resolution across multiple teams.
  • Have performed Production releases and support triaging of issues during releases or post releases
  • Working knowledge of Centralized logging (Splunk) or Log As Service.
  • Knowledge of Support models - Incident Management, Problem Management.
  • Knowledge of Version Control repositories like GIT.
  • Application Monitoring tools - Apica, AppDynamics or Dynatrace are desired.
  • Good to have - Cloud Native technologies such as AWS, Kubernetes and Pivotal GAIA
  • Working knowledge/understanding of infrastructure components - routers, load balancers, Cloud products, Containers, Compute, Storage and Networks.
  • Problem Solver attitude / energy.
  • Good interpersonal communication skills.
  • BS/BA degree/equivalent experience in a software engineering.

Please Note:
  • This is a hybrid work environment requiring 2-3 days on-site.
  • Candidates must be able to work without visa transfer.
  • The client is open to relocation

Dice Id : aegisoft
Position Id : 17822
Originally Posted : 2 months ago
Have a Job? Post it