Site Reliability Engineer

Overview

On Site
USD 80.00 - 87.83 per hour
Full Time

Skills

Durable Skills
Software Engineering
SIMD
Git
Continuous Integration
Continuous Delivery
Environment Management
Data Quality
Service Level
Root Cause Analysis
ROOT
Continuous Improvement
Optimization
High Availability
Capacity Management
Documentation
Incident Management
IaaS
Management
Cloud Computing
Scalability
FOCUS
Testing
Terraform
Snow Flake Schema
Amazon Web Services
Software Development
Reliability Engineering
SLA
TypeScript
Object-Oriented Programming
Python
Continuous Integration and Development
Version Control
Communication
Articulate
Effective Communication
Collaboration
Clarity
Regulatory Compliance
Accountability
Problem Solving
Conflict Resolution
Critical Thinking
Cost Reduction
Taxes
Life Insurance
Business Transformation
Law

Job Details

Please note a few important items that are non-negotiable:
  • This role requires 2x/wk on-site in Orlando, FL and 3 days/wk remote
  • This role is NOT open for C2C or referral fees

Essential Skills (REQUIRED):
  • Software Engineering experience. Object-Oriented Programming: Experience with object-oriented development of scalable applications in Python or Typescript. Proficiency in relevant libraries such as Boto3, Typed Arrays, SIMD, etc.
    • Version control such as Git
    • CI/CD experience
  • Site Reliability Engineering: Strong understanding of Site Reliability Engineering (SRE) practices and the ability to articulate their importance and implementation
    • Application monitoring experience/observability experience
  • AWS Experience: serverless and IaC such as CloudFormation, CDK, or Terraform

Description
TEKsystems is seeking a Site Reliability Engineer (SRE) with at least 5+ years of experience in object-oriented software development (Python or Typescript), 2+ years of experience in Site Reliability Engineering, and 2+ years of experience in AWS Serverless engineering. They will spend an estimated 60% of time on software development and application monitoring with the remaining 40% on AWS environment management and automation. This role gives a Site Reliability Engineer the opportunity to impact the future state of Disney's internal Data platform team towards enhancing data quality, optimizing infrastructure, and improving service reliability and support a decision maker immediately.
Responsibilities:
Service Reliability: Establish key service-level indicators (SLIs) and continuously monitor them to ensure system reliability, availability, and performance. Proactively develop alerts and automated responses to prevent service degradation or outages.
Service Monitoring & Observability: Oversee observability and monitoring across the platform (AWS, serverless, containers, Snowflake, etc.), ensuring actionable insights are available for operational teams.
Root Cause Analysis & Incident Response: Conduct deep analysis of system issues, identify root causes, and define actionable strategies for remediation. Lead post-mortem analysis and continuous improvement efforts.
Infrastructure & System Optimization: Focus on the high availability, scalability, and performance of services in production environments, ensuring they meet business and customer needs.
Capacity Planning & Right-Sizing: Lead efforts to ensure that services are properly scaled for current and future workloads. Engage in capacity planning to optimize resource utilization.
Documentation & Runbooks: Maintain detailed documentation and create robust runbooks for incident management and troubleshooting, ensuring smooth responses to service disruptions.
Cloud Infrastructure & IaC: Utilize tools such as Terraform and AWS CDK to manage and automate infrastructure as code (IaC) in a cloud-native environment.
Collaboration with Development Teams: Work closely with developers to ensure that applications are designed for service reliability, scalability, and maintainability.
GitOps-Driven Environment: Drive infrastructure changes and service deployment using GitOps practices to ensure consistency and traceability in deployments.
Code Quality: Write clean, performant, and well-documented application code with a focus on reliability and service availability.
Automation & Tooling: Build and maintain automated deployment pipelines and tooling for monitoring and platform testing.
Skills
Aws, software development, Site reliability engineering, SRE, SLA, SLO, SLI, Serverless, Lambda, Python, Typescript, object-oriented programming, Python Programing, CICD, Library Components, Version Control, GitOps, Application monitoring, Observability, Infrastructure as code, Datadog, Terraform, Secrets vault, Snowflake
Top Skills Details
Aws,software development,Site reliability engineering,SRE,SLA,SLO,SLI,Serverless,Lambda,Python,Typescript,object-oriented programming,Python Programing,CICD,Library Components,Version Control,GitOps,Application monitoring,Observability,Infrastructure as
Additional Skills & Qualifications
Non-Technical Skills for SREs at Disney (absolutely must-have):
Effective Communication:
o Excellent communication skills to articulate ideas clearly and persuasively to diverse audiences.
o Ability to convey technical concepts to non-technical stakeholders effectively.
Collaborative Team Player:
o Highly effective communication skills, both verbal and written, to facilitate collaboration and ensure clarity in team interactions.
o Ability to work seamlessly with cross-functional teams, fostering a cooperative and inclusive work environment.
Growth Mindset:
o Demonstrates a passion for continuous learning and self-improvement.
o Embraces challenges and views failures as opportunities for growth and development.
Enterprise Process Understanding:
o Deep understanding of enterprise processes, compliance requirements, and best practices.
o Ability to discern the best compromise between timeline, value, and practicality, ensuring optimal project outcomes.
Ownership Mindset:
o Takes full ownership of projects and tasks, ensuring accountability and driving them to successful completion.
o Proactively identifies and addresses issues, demonstrating a strong sense of responsibility.
Critical Thinking and Problem Solving:
o Strong critical thinking skills to analyze complex problems and develop innovative solutions.
o Ability to approach challenges methodically and make data-driven decisions.
Value Demonstration:
o Ability to demonstrate effectiveness, efficiencies, and cost savings based on the work being done.
o Skilled in promoting and communicating the value of work to stakeholders, helping to tell the story of that value.
Experience Level
Expert Level
Pay and Benefits
The pay range for this position is $80.00 - $87.83/hr.
Eligibility requirements apply to some benefits and may depend on your job
classification and length of employment. Benefits are subject to change and may be
subject to specific elections, plan, or program terms. If eligible, the benefits
available for this temporary role may include the following:
Medical, dental & vision
Critical Illness, Accident, and Hospital
401(k) Retirement Plan - Pre-tax and Roth post-tax contributions available
Life Insurance (Voluntary Life & AD&D for the employee and dependents)
Short and long-term disability
Health Spending Account (HSA)
Transportation benefits
Employee Assistance Program
Time Off/Leave (PTO, Vacation or Sick Leave)
Workplace Type
This is a hybrid position in Orlando,FL.
Application Deadline
This position is anticipated to close on May 16, 2025.

About TEKsystems and TEKsystems Global Services

We're a leading provider of business and technology services. We accelerate business transformation for our customers. Our expertise in strategy, design, execution and operations unlocks business value through a range of solutions. We're a team of 80,000 strong, working with over 6,000 customers, including 80% of the Fortune 500 across North America, Europe and Asia, who partner with us for our scale, full-stack capabilities and speed. We're strategic thinkers, hands-on collaborators, helping customers capitalize on change and master the momentum of technology. We're building tomorrow by delivering business outcomes and making positive impacts in our global communities. TEKsystems and TEKsystems Global Services are Allegis Group companies. Learn more at TEKsystems.com.

The company is an equal opportunity employer and will consider all applications without regard to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About TEKsystems c/o Allegis Group