Overview
Skills
Job Details
Job Title: Site Reliability Engineer
Location: Florham Park, NJ Hybrid 3 days onsite (Onsite day 1)
Due to client requirements, we need or candidates.
The interview will be virtual on July 11 AM or 11:30 AM EST
Project Details:
This individual will work with R&D teams under Product Development for Retirement Services.
They are building software to improve DevOps, ITOps, and support processes for a platform as a service.
Need strong Windows and OpenStack experience 7 years or more
The candidate should have a good balance between troubleshooting an issue, understanding potential problems with an OS, Network, Security and Database.
Responsibilities:
Work with R&D teams to understand the standards of Product Development and recommend changes towards increased stability of the products and applications.
Building software to improve DevOps, ITOps, and support processes which support the everything as code model such as Infrastructure as code , Platform as a service, etc.
Perform safe reliable deployments of all appropriate software artifacts into various systems from Development, Staging to Production.
Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
Create / Maintain plan for disaster recovery in the staging and production environments
Analyze system problems including root cause determination and manage any needed recovery process to ensure a quick restoration of service without loss of data.
Maintains a broad knowledge of state-of-the-art technology, equipment, and/or systems
Able to understand RESTful services, even using APIs to help towards automation goals
Maintain network and system security, understand security protocols, certificate management
Experience/Skills:
Strong Windows and OpenStack experience
Ability to analyze and resolve problems in systems, networks, software, and APIs; understanding where all sources of information can come from.
Strong experience with Splunk and Dynatrace
Understanding of source/version control such as GIT or BitBucket.
DevOps processes and tools such as Azure DevOps or Jenkins
Involvement with containerization, such as Docker or Kubernetes
CI/CD implementation expertise
Experience with IT automation in general. Using tools like Ansible, coding with programming languages like Python, Groovy, PowerShell, or Bash scripts.
Windows and Linux OS knowledge preferred.
Use of monitoring and logging tools such as Splunk, Dynatrace or similar
Advanced English proficiency
Understanding Microsoft suite of development tools is a plus, including Visual Studio, IIS, MS SQL Server, .NET
Must haves:
Windows, OpenStack, .Net applications knowledge, experience writing scripts on Ansible, Python scripting, scripting for servers