Overview
On Site
Contract - W2
Contract - 31 day((s))
Skills
DevOps
Terraform
Ansible
Jenkins
Docker
Kubernetes
Automation
Infrastructure as Code (IaC)
Architecture
Ruby
cloud computing
Quality Management
metrics
Python (Programming Language)
Communication Skills
Microsoft Azure
Stakeholder Management
Team Working
Amazon Web Services
Analytical Thinking
Computer Programming
Java (Programming Language)
Software Engineering
Testing Skills
Problem Solving
Reliability
Continuous Integration
Administration of Computer Systems
Containerisation
Networking Skills
Perseverance
Self Motivation
Backlogs
C++ (Programming Language)
Incident Response
Instrumentation
Reliability Engineering
Software Requirements Analysis
Spinnaker
Job Details
Job Role: SRE Er./ Devops SRE ER.
Location: Dallas, Texas (Hybrid)
Duration: FTE
Exp: 10+ Years
Job Description
Job Summary
The Site Reliability Engineer (SRE) role bridges software engineering and systems administration. Beyond ensuring the reliability and performance of platforms, the role also focuses on working with Development and Architecture teams to address:
Location: Dallas, Texas (Hybrid)
Duration: FTE
Exp: 10+ Years
Job Description
Job Summary
The Site Reliability Engineer (SRE) role bridges software engineering and systems administration. Beyond ensuring the reliability and performance of platforms, the role also focuses on working with Development and Architecture teams to address:
- quality (gates and measurement criteria)
- foundational architecture and stack components
- metrics, trackers, and baselines
- automated operations
- Automation - automate tasks (scripts and triggers and workflow automations) for deployment, monitoring, and incident response (improve efficiency and reduce manual effort)
- Monitoring and Observability design instrumentation and identify KPIS/Metrics and identify Events/Alerting to track system health and identify potential issues proactively.
- Incident Response - responsible for responding to and resolving incidents that have exceeded L1/L2 thresholds. Work with L3 teams to ensure minimal downtime and a quick return to normal operations as well as identifying and following up on problem backlogs and shift left initiatives.
- Infrastructure as Code (IaC) - Use tools like Terraform or Ansible to manage infrastructure as code, enabling repeatable and scalable deployments.
- Collaboration - Work closely with architecture, development, QA and Testing, and Operations teams to understand system requirements and contribute to the overall resilience of the software/platform.
- Problem-Solving - They possess strong analytical and problem-solving skills to diagnose and resolve complex issues.
- Communication - Communicate effectively with both technical and non-technical stakeholders, translating technical details into actionable insights.
- Soft Skills - Ability to work in a team, manage their time effectively, and be proactive in identifying and addressing potential problems.
- Programming - Experience with languages like Python, Java, C/C++, or Ruby can be beneficial along with IaC languages (Ansible, Terraform, and Cloud Native).
- Cloud Platforms - Knowledge of cloud platforms like AWS, Azure, or Google Cloud Platform is highly valued.
- Containerization - Familiarity with container technologies like Docker and Kubernetes is essential.
- Networking and System Administration - Strong understanding of networking and system administration principles is crucial.
- CI/CD - Experience with CI/CD tools like Jenkins, Harness, or Spinnaker is valuable
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.