Senior Site Reliability Engineer || Fulltime || Carrolton, TX

Carrollton, TX, US • Posted 18 hours ago • Updated 18 hours ago

Full Time

On-site

Depends on Experience

Fitment

Dice Job Match Score™

🛠️ Calibrating flux capacitors...

Job Details

Skills

API
Amazon SQS
Data Link Layer
Bash
DevOps
Kubernetes
Docker
Documentation
Cloud Computing
PostgreSQL
Problem Management
Remote Desktop Services
Systems Engineering
Terraform
TypeScript
Root Cause Analysis
New Relic
Grafana
Node.js
Operational Excellence
Incident Management
Amazon Web Services
Analytical Skill
Conflict Resolution
Database
General Skills
Scripting Language
Product Engineering
Virtual Private Cloud
Scripting
NoSQL

Summary

Job Description

Senior Site Reliability Engineer

Must Have Technical/Functional Skills

o 5-7 years of professional experience in a Site Reliability, DevOps, or Systems Engineering role.

o 3-5 years of hands-on experience managing production workloads in a cloud environment, preferably AWS.

o Proven experience acting in an L2/L3 support capacity, with strong diagnostic and troubleshooting skills.

Technical Skills (Must Have):

o Cloud Expertise: Deep understanding and hands-on experience with the AWS ecosystem (EC2, S3, RDS, Lambda, VPC, IAM, CloudWatch).

o Infrastructure as Code (IaC): Strong proficiency with tools like AWS CDK (preferred), Terraform, or CloudFormation.

o Scripting & Automation: Proficiency in at least one scripting language such as Python, Bash, or NodeJS for automation and tooling.

o Monitoring & Observability: Hands-on experience with modern monitoring, logging, and tracing tools. (e.g., NewRelic is preferred, Datadog, Prometheus, Grafana, ELK Stack).

o Containerization: Experience with Docker and container orchestration systems (e.g., Kubernetes, ECS).

General Skills:

o Excellent analytical, troubleshooting, and complex problem-solving skills with a methodical approach.

o A calm and focused demeanor during high-pressure incidents.

o Strong verbal and written communication skills, with the ability to explain complex technical concepts to diverse audiences.

o Highly attentive to detail, organized, and capable of prioritizing effectively in a dynamic environment.

o A collaborative mindset and the ability to work effectively both independently and as part of a team.

Preferred Skills & Qualifications:

Domain knowledge in FinTech or the Mortgage industry.

Experience with the AWS Serverless stack (Lambda, API Gateway, SQS, SNS, DynamoDB).

Familiarity with application development environments (e.g., NodeJS, TypeScript, Python) to facilitate effective troubleshooting and collaboration with development teams.

Experience with relational databases (Postgres) and NoSQL databases.

Experience working within an Agile/SCRUM development process using Jira.

Roles & Responsibilities

Incident Response & L2/L3 Support: Serve as a primary escalation point for complex production incidents. Lead troubleshooting efforts, perform deep-dive root cause analysis (RCA), and work with Product Engineering teams to implement permanent solutions to prevent recurrence.

Monitoring & Observability: Develop and manage comprehensive monitoring and alerting solutions using tools like Datadog, CloudWatch, or similar. Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to ensure system health.

Collaboration & Architectural Input: Partner closely with backend development teams, conducting Production Readiness Reviews and influencing the design of new services to ensure they meet SLOs and are built for reliability, observability, and operational excellence from the start. Advocate for SRE best practices across the engineering organization.

Problem Management: Analyze incident trends and system metrics to identify underlying problems. Develop and execute long-term solutions, including automating away operational toil, software enhancements, and architectural improvements.

Runbook & Documentation: Create and maintain clear, concise documentation and runbooks to enable faster incident resolution and share operational knowledge across teams.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 91137892
Position Id: 8919758
Posted 18 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Richardson, Texas

•

Today

Job Description Job Description: As a DevOps Cloud Engineer within the Runtime Engineering team, you will be a key contributor to our ongoing cloud journey as we stand up and evolve our new platform in AWS. This role is designed for an engineer with 1-3 years of experience who is passionate about automation, Infrastructure as Code (IaC), and modern DevOps practices. You will be responsible for building, managing, and enhancing the platform infrastructure, overseeing application deployments, an

Full-time

Sr. Specialist - Database Site Reliability Engineer

Southlake, Texas

•

Today

Your Opportunity At Schwab, you're empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us "challenge the status quo" and transform the finance industry together. Schwab Technology Services enables the future of how clients manage their money by providing innovative and reliable technology products and services as part of our ongoing commitment to democratize access to investing and financial planning. Workplace Service Engineering is an

Full-time

USD 115,000.00 - 139,000.00 per year

Lead Site Reliability Engineer-Infrastructure Technology

Plano, Texas

•

Today

Job Description Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Infrastructure & Production Management sector of Consumer & Community Banking, you hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues fac

Full-time

Lead Site Reliability Engineer

Plano, Texas

•

Today

Full-time

Search all similar jobs

Senior Site Reliability Engineer || Fulltime || Carrolton, TX

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs