Site Reliability Engineer

Overview

On Site

Contract - W2

Contract - 22 Month(s)

Skills

Reliability Engineering

Systems Engineering

Fluency

Programming Languages

System Administration

Linux

Microsoft Windows

Software Performance Management

New Relic

Computer Networking

HTTP

TLS

Secure Shell

Dragon NaturallySpeaking

DNS

Continuous Integration and Development

GitLab

Continuous Integration

Kubernetes

Version Control

Management

Git

Cloud Computing

Hosting

Amazon Web Services

Google Cloud

Google Cloud Platform

Microsoft Azure

Web Servers

Apache HTTP Server

Node.js

Nginx

Apache Tomcat

Performance Monitoring

Clustering

Debugging

NoSQL

MySQL

Redis

System Monitoring

Instrumentation

Scripting

Java

Python

AppDynamics

Splunk

HP SiteScope

Jenkins

Akamai

Health Insurance

Insurance

Team Building

Collaboration

Wiki

Knowledge Base

Status Reports

Account Management

IT Consulting

Managed Services

Recruiting

Artificial Intelligence

Cyber Security

Enterprise Architecture

Training

FOCUS

Job Details

OP is seeking an eager Senior Site Reliability Engineer to play an integral role in a leading entertainment company to help elevate practices, promote and onboard new technologies, solve complex problems, and integrate next-generation digital platforms. Site Reliability Engineering (SRE) combines software and systems engineering disciplines to build and operationalize large-scale, massively distributed, fault-tolerant systems. SREs are talented engineers who improve the resiliency of production systems and reduce operational toil using a data-driven approach. The Senior SRE will help support business-critical systems for our guests and cast members within, Experiences and Products segment. This includes consulting, architecting, developing, and operationalizing infrastructure, applications, automation, creating telemetry for monitoring, and engineering high reliability and reinforcing operational best practices.

Responsibilities:

Fluent in core scripting languages and advanced skills in programming languages (e.g., Python, Node, Java, etc) with the ability to build test coverage for all software being developed.
Systems administration expertise with Linux and Windows platforms, including OS performance monitoring, setup, configuration, tuning, and troubleshooting.
Experience with a major Application Performance Monitoring (APM) tool (e.g., AppDynamics, New Relic).
Networking skills and protocols (e.g., HTTP, TLS, SSH, DNS).
Continuous Integration (CI) Pipeline knowledge (e.g,. Jenkins, Gitlab CI).
Experience with Distributed Systems and Container Platforms (e.g., Kubernetes/GKE, ECS, Fargate).
Experience with Source Control Management systems (e.g,. Git).
Expertise in public and private cloud hosting services (AWS, Google Cloud, Azure).
Expert in web server technologies (e.g., Apache, Node.js, Nginx, Tomcat), including setup, configuration, performance monitoring, tuning, clustering, and debugging.
Proficient with data technologies (e.g., NoSQL, MySQL, Redis, Elastic), including being able to perform basic setup, configuration, and troubleshooting.
Able to implement existing base standards for new systems and/or applications for all of the following:
- Site/Systems monitoring and instrumentation.
- Application monitoring and instrumentation.
- System monitoring and instrumentation.
- Resilience, performance & Telemetry data.
Able to diagnose simple to complex system and process problems.
Demonstrate exceptional troubleshooting methodology, including the ability to author and instruct new methodologies to the SRE team.
Independently resolve moderately to highly complex system and application incidents.
Able to identify and propose system and application fixes for performance bottlenecks.
Able to evaluate new application requirements for capacity and run-time best practices.
Able to evaluate new systems and/or infrastructure solutions for technical feasibility against known requirements and standards.

Additional Qualification for Reference:

Akamai Kona Site Defender.
BOT mitigation experience.
Scripting experience in Java, Python.
Observability experience.
Monitoring experience through Splunk or other tools.

Preferred Qualifications:

Experience with Prometheus, Akamai, AppDynamics, Splunk, SiteScope, Rundeck, or Jenkins is a plus.
Akamai- IS A MUST.

Required Education:

Bachelor's or Equivalent.

Benefits:

401(k).
Dental Insurance.
Health insurance.
Vision insurance.
We are an equal opportunity employer and value diversity, equality, inclusion, and respect for people.
The salary will be determined based on several factors including, but not limited to, location, relevant education, qualifications, experience, technical skills, and business needs.

Additional Responsibilities:

Participate in OP monthly team meetings, and participate in team-building efforts.
Contribute to OP technical discussions, peer reviews, etc.
Contribute content and collaborate via the OP-Wiki/Knowledge Base.
Provide status reports to OP Account Management as requested.

About us:

OP is a technology consulting and solutions company, offering advisory and managed services, innovative platforms, and staffing solutions across a wide range of fields - including AI, cyber security, enterprise architecture, and beyond. Our most valuable asset is our people: dynamic, creative thinkers, who are passionate about doing quality work. As a member of the OP team, you will have access to industry-leading consulting practices, strategies & and technologies, innovative training & education. An ideal OP team member is a technology leader with a proven track record of technical excellence and a strong focus on process and methodology.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share