SRE Engineer

Overview

Hybrid
Depends on Experience
Contract - W2

Skills

SRE
Site Reliability Engineering
Java
Python
Golang
OpenShift
Kubernetes
Baremetal
Cloud
Ansible
Helm Charts
CI/CD
Continuous Delivery
Continuous Integration
Docker
Helm
Jenkins

Job Details

Visa - Citizen....Resident .....GCEAD,.... L2EAD...
 
Only W2 candidate would be considered
 
Also, Candidate should be in local with DL copy

 

Job Details

 
Title - SRE Engineer
Location - Iselin, NJ/Plano, TX Hybrid 3 times a week
 

People that do not require sponsorship, now or in the future
Interview Process:
Virtual- 30 min round

Onsite- 1-2 hours

 

Needs:
Openshift
Kubernetes

Development Experience(Java, Python, Golang)
SRE Skills
Nice to Haves:
Baremetal

Cloud

Job Description:

We are looking for a highly skilled Site Reliability and operations Engineer (SRE) with extensive experience in Kubernetes-based distributed caching and compute grid solutions. This role requires a strong foundation in software development, infrastructure automation, and reliability engineering. You will be responsible for designing, implementing, and maintaining high-performance distributed systems, ensuring reliability, scalability, and efficiency.

 

Development & Implementation:

Design, develop, and optimize distributed caching and compute grid solutions on Kubernetes/OpenShift

Understanding of microservices and containerized workloads using Kubernetes, Docker, and Helm.

Implement high-throughput compute grid solutions using IBM Spectrum Symphony, Tibco Grid Server or similar technologies.

Optimize application performance by leveraging parallel compute strategies, load balancing, and efficient data distribution.

 

Site Reliability Engineering (SRE):

Ensure high availability, scalability, and reliability of distributed systems.

Implement observability, logging, and monitoring using tools like Prometheus, Grafana, ELK, or Open Telemetry.

Automate infrastructure provisioning and deployments using Ansible, and Helm Charts.

Understanding of CI/CD pipelines for seamless software deployment.

Troubleshoot and resolve incidents related to platform, infrastructure and distributed compute platforms, ensuring minimal downtime.

 

Required Skills & Qualifications:

Strong experience in Kubernetes (OpenShift and on-prem/cloud clusters).

Understanding of programming languages like Java, Go, or Python. this will be the difference maker of the L4 vs L5

Experience with containerization technologies (Docker, Helm, etc.).

Strong knowledge of CI/CD pipelines (Jenkins, ArgoCD, GitHub Actions).

Hands-on experience with observability tools (Prometheus, Grafana, Loki, Jaeger).

Understanding of networking, service meshes (Istio/Linkerd), and security best practices in Kubernetes.

Experience with multi-cluster and hybrid cloud Kubernetes deployments.

 
 
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.