Site Reliability Engineer

  • Florham Park, NJ
  • Posted 8 hours ago | Updated 7 hours ago

Overview

On Site
$50 - $60
Contract - W2
Contract - 24 Month(s)
Able to Provide Sponsorship

Skills

.NET
Ansible
Apache Velocity
Art
Bash
Bitbucket
Continuous Delivery
Database
DevOps
Disaster Recovery
Docker
Dynatrace
English
Git
Groovy
Linux
Microsoft Azure
Microsoft IIS
Microsoft Office
Microsoft SQL Server
Microsoft Visual Studio
Microsoft Windows
Network
Programming Languages
ROOT
RESTful
Recovery
Management
Network Security
Product Development
OpenStack
Python
Version Control
Research and Development
Writing
SAFE
Scripting
Servers
System Security
Splunk
Windows PowerShell

Job Details

Job Title: Site Reliability Engineer

Location: Florham Park, NJ Hybrid 3 days onsite (Onsite day 1)

Due to client requirements, we need or candidates.

The interview will be virtual on July 11 AM or 11:30 AM EST

Project Details:

This individual will work with R&D teams under Product Development for Retirement Services.

They are building software to improve DevOps, ITOps, and support processes for a platform as a service.

Need strong Windows and OpenStack experience 7 years or more

The candidate should have a good balance between troubleshooting an issue, understanding potential problems with an OS, Network, Security and Database.

Responsibilities:

Work with R&D teams to understand the standards of Product Development and recommend changes towards increased stability of the products and applications.

Building software to improve DevOps, ITOps, and support processes which support the everything as code model such as Infrastructure as code , Platform as a service, etc.

Perform safe reliable deployments of all appropriate software artifacts into various systems from Development, Staging to Production.

Maintain services once they are live by measuring and monitoring availability, latency and overall system health.

Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.

Create / Maintain plan for disaster recovery in the staging and production environments

Analyze system problems including root cause determination and manage any needed recovery process to ensure a quick restoration of service without loss of data.

Maintains a broad knowledge of state-of-the-art technology, equipment, and/or systems

Able to understand RESTful services, even using APIs to help towards automation goals

Maintain network and system security, understand security protocols, certificate management

Experience/Skills:

Strong Windows and OpenStack experience

Ability to analyze and resolve problems in systems, networks, software, and APIs; understanding where all sources of information can come from.

Strong experience with Splunk and Dynatrace

Understanding of source/version control such as GIT or BitBucket.

DevOps processes and tools such as Azure DevOps or Jenkins

Involvement with containerization, such as Docker or Kubernetes

CI/CD implementation expertise

Experience with IT automation in general. Using tools like Ansible, coding with programming languages like Python, Groovy, PowerShell, or Bash scripts.

Windows and Linux OS knowledge preferred.

Use of monitoring and logging tools such as Splunk, Dynatrace or similar

Advanced English proficiency

Understanding Microsoft suite of development tools is a plus, including Visual Studio, IIS, MS SQL Server, .NET

Must haves:

Windows, OpenStack, .Net applications knowledge, experience writing scripts on Ansible, Python scripting, scripting for servers

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.