AWS Cloud Ops SRE

New York City, NY, US • Posted 6 hours ago • Updated 6 hours ago
Full Time
On-site
USD $110,000.00 - 120,000.00 per year
Company Branding Image
Fitment

Dice Job Match Score™

👤 Reviewing your profile...

Job Details

Skills

  • Production Support
  • Release Management
  • Regression Testing
  • Proxies
  • IaaS
  • Amazon EKS
  • Kubernetes
  • RBAC
  • Collaboration
  • Resource Allocation
  • Scheduling
  • Image Management
  • Publishing
  • Integration Testing
  • Patch Management
  • SLA
  • Management
  • Regulatory Compliance
  • Microsoft Windows
  • Analytics
  • Dashboard
  • Continuous Integration
  • Continuous Delivery
  • Workflow
  • Reliability Engineering
  • Incident Management
  • Root Cause Analysis
  • Documentation
  • Infrastructure Architecture
  • Release Notes
  • Capacity Management
  • Optimization
  • Cloud Computing
  • DevOps
  • Amazon EC2
  • Computer Networking
  • Virtual Private Cloud
  • Amazon Route 53
  • NLB
  • Storage
  • Amazon S3
  • EBS
  • Amazon EFS
  • Database
  • Remote Desktop Services
  • Amazon RDS
  • Amazon DynamoDB
  • Pipeline Management
  • Hardening
  • Terraform
  • Amazon Web Services
  • Privacy
  • Marketing

Summary

Location: New York City, NY
Salary: $110,000.00 USD Annually - $120,000.00 USD Annually
Description:
Job Description: AWS Cloud Operations / Site Reliability Engineer (SRE)

Location: New York, NY 10010
Employment: Fulltime


Experience: 10+ Years

Role Overview

The AWS Cloud Operations / Site Reliability Engineer (SRE) will be responsible for building, maintaining, and optimizing secure, scalable, and highly available cloud infrastructure on AWS. This role involves end-to-end platform operations, AMI lifecycle ownership, infrastructure automation using Terraform, production support, observability, patching, and ensuring reliability of mission-critical workloads. Experience with Harness for DevOps pipelines is a strong plus.

Key Responsibilities

1. AWS Platform Operations & Release Management
  • Own AWS platform release cycles including validation, regression testing, and readiness reviews.
  • Operate and enhance AWS core services: VPC, IAM, KMS, Route 53, networking baselines, proxy layers, organizational guardrails.
  • Ensure environments follow governance, compliance, and security standards.

2. Infrastructure as a Service (IaC) with Terraform
  • Build, deploy, and manage cloud infrastructure using Terraform as the primary IaC tool.
  • Develop reusable Terraform modules for networking, compute, storage, EKS, and security.
  • Ensure IaC is version-controlled, peer-reviewed, immutable, and automated through CI/CD.

3. Amazon EKS (Kubernetes) Operations
  • Deploy, upgrade, and maintain production-grade EKS clusters and add-ons.
  • Implement Kubernetes standards around RBAC, networking, namespaces, and secrets.
  • Collaborate with application teams to ensure secure and reliable container workloads.
  • Optimize cluster scaling, resource allocation, performance, and workload scheduling.

4. AMI Lifecycle & Image Management
  • Manage AMI lifecycle including creation, CIS hardening, scanning, tagging, publishing, and deprecation.
  • Build automated AMI pipelines using Image Builder / Packer.
  • Maintain golden images for EC2 fleets, EKS workloads, and hybrid environments.

5. VIT (Vulnerability / Integrity / Integration Testing) & Patch Management
  • Lead vulnerability assessments, remediation, compliance tracking, and SLA adherence.
  • Manage OS and image patching using AWS Systems Manager (SSM) Patch Manager.
  • Maintain baselines, compliance dashboards, and automated maintenance windows.

6. Observability & Application Layer Monitoring
  • Build and maintain monitoring and logging systems using CloudWatch, X-Ray, OpenTelemetry, and log analytics.
  • Provide deep visibility into application behavior, dependencies, performance, and failure patterns.
  • Implement golden-signal dashboards for latency, traffic, errors, and saturation.

7. CI/CD & DevOps Automation
  • Build and maintain CI/CD pipelines for infrastructure and application deployments.
  • Integrate Terraform, AMI pipelines, EKS updates, and patch automation into pipelines.
  • Harness experience is a plus-especially for canary, blue/green, and verification workflows.

8. Reliability Engineering & Incident Response
  • Participate in on-call rotation; lead incident response, triage, and RCA.
  • Implement runbooks, automation, and solutions to reduce operational toil.
  • Recommend and drive architectural improvements to enhance availability, resiliency, and performance.

9. Documentation & Architecture
  • Produce detailed Infrastructure Design Documents, runbooks, DR plans, release notes, and architecture diagrams.
  • Conduct capacity planning, cost optimization, and operational readiness reviews.


Required Qualifications
  • 10+ years in SRE, Cloud Operations, or DevOps with a strong AWS background.
  • Hands-on experience with:
    • Compute: EC2, ASG, EKS/ECS, Lambda
    • Networking: VPC, Route 53, Security Groups/NACLs, ALB/NLB
    • Storage: S3, EBS, EFS
    • Databases: RDS, Aurora, DynamoDB
  • Expertise in AMI pipeline management, CIS hardening, and OS-level security.
  • Strong proficiency with Terraform or CloudFormation.
  • Proven ability to troubleshoot AWS and application stack issues end-to-end.


By providing your phone number, you consent to: (1) receive automated text messages and calls from the Judge Group, Inc. and its affiliates (collectively "Judge") to such phone number regarding job opportunities, your job application, and for other related purposes. Message & data rates apply and message frequency may vary. Consistent with Judge's Privacy Policy, information obtained from your consent will not be shared with third parties for marketing/promotional purposes. Reply STOP to opt out of receiving telephone calls and text messages from Judge and HELP for help.

Contact:

This job and many more are available through The Judge Group. Please apply with us today!
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: cxjudgpa
  • Position Id: 1122131
  • Posted 6 hours ago

Company Info

About Judge Group, Inc.

The Judge Group, is a leading professional services firm specializing in talent, technology, and learning solutions. We consult, staff, train, and solve. Through our work we make people and organizations better.

Our services are successfully delivered through a network of more than 30 offices across the United States, Canada, and India. The Judge Group is proud to partner with the best and brightest companies in business today, including over 60 of the Fortune 100. We serve organizations in financial services, healthcare, life sciences, insurance, government (including aerospace and defense), manufacturing, and technology and telecommunications.

About_Company_OneAbout_Company_Two
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Wilmington, Delaware

Today

Contract

USD 70.00 - 75.00 per hour

Malvern, Pennsylvania

Today

Contract

Columbus, Ohio

Today

Contract

Alpharetta, Georgia

Today

Contract

Compensation information provided in the description

Search all similar jobs