Systems Reliability Engineer

Overview

On Site
Depends on Experience
Contract - W2
Contract - 18 Month(s)
No Travel Required

Skills

Amazon Web Services
Apache Mesos
Apache HTTP Server
Cloud Computing
Google Cloud Platform
Dash Python
Docker
Apache Tomcat
Collaboration
Decision-making
Capacity Management
Continuous Integration and Development
Continuous Integration
Linux Administration
IT Management
Load Balancing
Microsoft Windows Server
Kubernetes
Jenkins
Agile
Python
Performance Tuning
Reliability Engineering
Unix
IT Operations
F5
Microsoft Azure
Microsoft IIS

Job Details

Senior System Engineer: NO C2C:

Additional Information:

  • Must have: Cloud/AWS, or Azure.
  • Coming into to support internal products.
  • Must have the ability to collaborate in a team environment and articulate the work and progress being done.
  • Tech Stack: Amazon Google cloud, Azure, AWS is preferred-where majority of the work load
  • Tools: Terraform, Chef
  • Operating systems: Unix/Linux-most products, Windows is used as well
  • Language: python, Dash are most commonly used at Disney
  • Must Have Cloud aptitude and know Cloud well
  • Location of Position: Locations: Seattle (4th and Madison,)Glendale, or Orlando
  • Must have enterprise level experience
  • On call rotation-once per month-, 1 week at a time every 5 weeks.

The Senior Systems Reliability Engineer is responsible for developing and implementing solutions using best practices in the maintenance and administration of enterprise systems, including software, platform and infrastructure. Teams operate in a fast paced dynamic environment supporting a variety of complex systems and applications for the Walt Disney Company and affiliated business. Code, and deploy systems, new technologies, and best practices in the cloud using self-healing, infrastructure-as-code, security, and automation patterns.

  • Develop useful telemetry, alerts, and response to identify and address reliability risks
  • Participate in on-call rotation with other engineering teams
  • Identify, experiment, & evangelize new technologies, ideas, and best practices across the broader engineering community
  • Collaborate and provide technical leadership within and across teams What s your story?
  • Proficient, collaborative, & experienced in building reliable, scalable, micro-service-oriented systems
  • Passionate and curious about ways to leverage technology while continually learning
  • Ability to identify root-cause sources of instability in a high-traffic, large-scale distributed system
  • Configuration management and orchestration (e.g. Chef, Terraform, Cloud Formation)
  • One or more languages in your skillset (e.g. GO, Python, Java, Ruby)

Basic Qualifications

  • Containerization (e.g. Docker, Kubernetes, Mesos, Elastic Container Service)
  • Skilled inCloud/PaaS Environments (e.g. AWS, Google Cloud Compute)
  • Thorough knowledge of continuous integration tools (e.g. Jenkins)
  • UNIX/Linux administration, troubleshooting, performance tuning, & security
  • 5 years of experience in technical operations or systems reliability engineering
  • Minimum 3+ years operating complex, large-scale Enterprise guest-facing Applications or web sites
  • Prefer 3+ years operating complex, large-scale Enterprise guest-facing Applications or web sites
  • Experienced with Distributed Data Platforms
  • Experience with AWS, Google or similar cloud computing environments.
  • Experience working in an Agile development environment

Preferred Qualifications

  • Equivalent experience in technical operations or software engineering
  • Bachelor's degree in computer science or related field preferred
  • Minimum 3+ years operating complex, large-scale Enterprise guest-facing Applications or web sites
  • Prefer 5+ years operating complex, large-scale Enterprise guest-facing Applications or web sites
  • Experienced with Distributed Data Platforms
  • Experience with AWS, Google or similar cloud computing environments.
  • Experience working in an Agile development environment
  • Experience working in a high capacity, highly scalable mission-critical web serving environment
  • Excellent judgment, problem resolution, team building, negotiation, and decision-making skills as well as the ability to work under continual deadline pressure
  • Experience with F5 load balancing helpful
  • UNIX/LINUX and some Windows server experience, including expertise in system installation, configuration, administration, troubleshooting, performance tuning, preventative maintenance, capacity planning, monitoring, and security procedures
  • Web (IIS, Apache) and Java application (Tomcat, Jboss, etc) server expertise including installation, administration, configuration, troubleshooting, performance tuning, preventative maintenance, capacity planning, monitoring, and security procedures

Job Duties will include 50% of the time performing the following functions:

  • Code, and deploy systems, new technologies, and best practices in the cloud using self-healing, infrastructure-as-code, security, and automation patterns
  • Develop useful telemetry, alerts, and response to identify and address reliability risks

Required Education:

  • Bachelors degree or equivalent work experience
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.