Site Reliability Engineer

Plano, TX, US • Posted 4 hours ago • Updated 4 hours ago
Contract W2
6 Months
On-site
$60 - $66.17/hr
Fitment

Dice Job Match Score™

🛠️ Calibrating flux capacitors...

Job Details

Skills

  • Java
  • JavaScript
  • Cloud-based Microservices
  • Spring Boot
  • AWS
  • RESTful APIs
  • site reliability engineering (SRE)
  • incident response
  • root cause analysis
  • integration

Summary

Production Engineer

6 months C-H

Plano, TX
Onsite

Responsibilities:

    • 3-4 years of experience in production engineering and site reliability engineering (SRE) to design, implement, and maintain highly available, scalable, and resilient systems.
    • Own end-to-end operational responsibilities include monitoring, incident response, root cause analysis, capacity planning, and automation to ensure optimal system performance and reliability in production environments.
    • Collaborate cross-functionally with development, QA, and infrastructure teams to streamline CI/CD pipelines, automate deployments, and enforce best practices for security, compliance, and disaster recovery.
    • Utilize a broad set of tools and technologies to proactively detect, troubleshoot, and resolve production issues, minimizing downtime and improving service-level objectives (SLOs) and service-level agreements (SLAs).
  • Requirements:

Requirements:
Java, JavaScript, Cloud-based Microservices, Spring Boot, AWS

    • Build, deploy, and maintain cloud-native microservices using Java, Spring Boot, and JavaScript frameworks, ensuring high availability and scalability.
    • Design and implement RESTful APIs and event-driven architectures using AWS services such as Lambda, ECS/EKS, SQS, and SNS.
    • Develop and maintain CI/CD pipelines with Jenkins, GitLab CI, or AWS CodePipeline for automated testing and deployment.
    • Monitor application and infrastructure health using AWS CloudWatch, Prometheus, Grafana, and distributed tracing tools like Jaeger or AWS X-Ray.
    • Troubleshoot production issues, perform root cause analysis, and implement fixes to improve system reliability.
    • Implement security controls including IAM roles, OAuth2, JWT, and encryption for data in transit and at rest.
    • Collaborate with cross-functional teams to design fault-tolerant, resilient systems with automated failover and recovery.
    • Optimize cloud resource usage and cost through rightsizing and autoscaling configurations.
    • Automate operational tasks and incident response using scripting and infrastructure as code (Terraform, CloudFormation).
    • Maintain detailed documentation of system architecture, deployment processes, and operational runbooks.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10121335
  • Position Id: 8974132
  • Posted 4 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Plano, Texas

Today

Easy Apply

Contract

Depends on Experience

Plano, Texas

Today

Easy Apply

Contract

58 - 58

Plano, Texas

Today

Easy Apply

Contract

Depends on Experience

Plano, Texas

Today

Contract

USD 75,024.00 per year

Search all similar jobs