Overview
Skills
Job Details
Job Title: DevOps / SRE Engineer
Location: New Jersey (Hybrid/On-site)
Type of Employment: Full-time
We are seeking an experienced Senior DevOps / Site Reliability Engineer (SRE) to support enterprise client projects in the New Jersey / New York region. This role focuses on designing, building, and maintaining scalable, secure, and highly available cloud-based platforms that support enterprise applications.
The ideal candidate has strong hands-on experience with cloud infrastructure, CI/CD automation, containerization, and Kubernetes, and is comfortable working in onsite or hybrid environments as required by client projects.
Key Responsibilities
Design, implement, and maintain CI/CD pipelines using Jenkins, Maven, GitHub Actions, or Azure DevOps
Build and manage containerized applications using Docker
Orchestrate workloads using Kubernetes (EKS, AKS, GKE, or self-managed clusters)
Automate infrastructure provisioning using Terraform, CloudFormation, ARM, or Bicep
Manage and optimize AWS and/or Azure cloud infrastructure, including compute, networking, storage, and security
Apply SRE principles to ensure high availability, reliability, scalability, and resilience
Implement monitoring, logging, and alerting using tools such as Prometheus, Grafana, ELK, CloudWatch, or Azure Monitor
Collaborate with development teams to improve deployment processes and application performance
Troubleshoot infrastructure, networking, and CI/CD pipeline issues
Participate in on-call rotations, incident response, and root cause analysis (RCA)
Drive automation initiatives to reduce manual effort and operational risk
Required Qualifications
8+ years of hands-on experience as a DevOps Engineer, SRE, or Cloud Engineer
Strong experience with Docker and containerization best practices
Hands-on expertise with Kubernetes (EKS, AKS, GKE, or on-prem clusters)
Proven experience building CI/CD pipelines using Jenkins, Maven, and Git-based workflows
Solid experience with AWS or Azure cloud platforms
Strong background in Linux system administration, networking fundamentals, and shell scripting
Experience implementing Infrastructure as Code using Terraform, CloudFormation, ARM, or Bicep
Experience with observability tools such as Prometheus, Grafana, ELK, Splunk, CloudWatch, or Azure Monitor
Experience working in Agile / Scrum environments
Strong troubleshooting and performance tuning skills
Preferred Skills:
- Experience with service mesh technologies (Istio, Linkerd).
- Knowledge of Kubernetes Operators, Helm charts, and GitOps tools (ArgoCD, Flux).
- Familiarity with secrets management tools such as HashiCorp Vault or AWS Secrets Manager.
- Experience with incident management and SRE best practices (SLIs, SLOs, error budgets).
- Knowledge of security best practices for CI/CD, cloud, and containerized environments.
Education:
- Bachelor’s or Master’s degree in Computer Science, Information Systems, Engineering, or related field (or equivalent practical experience)