Site Reliability Engineer - Raleigh, NC - C2H ()

Overview

On Site
$65
Accepts corp to corp applications
Contract - Independent
Contract - W2
Contract - 6 Month(s)

Skills

Python
Bash
Kubernetes
IAC
IRM

Job Details

Face to Face interview mandatory

Site Reliability Engineer

  • Ensuring System Reliability and Availability: Design and maintain fault-tolerant architectures by leveraging redundancy, load balancing, and automated failover mechanisms. These strategies help minimize downtime and provide seamless service availability, even during unexpected failures.
  • Incident Management and Response: Implement automated alerting and response systems that detect, analyze, and mitigate failures in real-time. By reducing mean time to recovery (MTTR), SREs help minimize service disruptions and ensure smooth user experiences.
  • Observability and Performance Monitoring: Deploy real-time monitoring tools to track logs, metrics, and system traces. By proactively identifying performance bottlenecks, SREs can resolve issues before they escalate and impact end users.
  • Capacity Planning and Scalability: Analyze traffic patterns and infrastructure load to predict demand fluctuations. By optimizing resource allocation and implementing scalable solutions, SREs prevent system overloads and ensure high performance during peak traffic.
  • Blameless Postmortems: Conduct incident retrospectives to identify failure patterns and implement long-term improvements. By fostering a culture of learning rather than assigning blame, SREs improve system resilience and prevent recurring issues.
  • Developing and Maintaining Internal Tooling: Build and refine custom automation tools to enhance developer productivity, streamline deployments, and improve system health. These tools help teams reduce manual workload and improve overall functionality.
  • Security and Compliance Management: Collaborate with security teams to enforce best practices, vulnerability assessments, and compliance standards. By integrating security into reliability efforts, SREs ensure infrastructure remains robust against potential threats and regulatory requirements.

Key Competencies:

  • Scripting and Programming Languages: Proficiency in Python, Bash, or Go enables automation of repetitive tasks and infrastructure management.
  • Expertise in Kubernetes: Knowledge of Kubernetes and container orchestration is crucial for managing scalable deployments.
  • Understanding of CI/CD: Continuous integration and deployment practices streamline software updates and improve system reliability.
  • Incident Response Management: Familiarity with incident response tools and processes ensures quick recovery from system failures.
  • Infrastructure as Code (IaC): Mastery of Terraform, Ansible, or similar tools helps automate and standardize infrastructure deployment.

 

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.