Overview
Skills
Job Details
Job Title: CloudOps RunOps Engineer Location: New York, NY (Hybrid 3 Days Onsite)
Job Overview:
We are seeking a proactive and detail-oriented CloudOps RunOps Engineer to join our growing infrastructure and operations team. This role will focus on enhancing the operational stability, automation, and resiliency of our cloud and hybrid environments. The ideal candidate will work closely with Cloud, Platform, and Application teams to support production environments through automation and observability tools.
Key Responsibilities:
Maintain and enhance Day 0/1 Infrastructure-as-Code (IaC) modules and cloud provisioning templates.
Support and automate Day 2 operations such as monitoring, patching, backups, and system recovery.
Provide Level 2/3 support for cloud infrastructure (primarily AWS), OS (Windows/Linux), and PaaS services.
Partner with Site Reliability Engineering (SRE) and application teams to improve system reliability and operational readiness.
Implement and support observability and incident response tools such as New Relic and PagerDuty.
Participate in incident triage, root cause analysis, and post-incident reviews.
Develop and maintain RunOps documentation, including playbooks and knowledge base articles.
Required Skills & Experience:
Hands-on experience with AWS (preferred), Azure, or Google Cloud Platform (Google Cloud Platform).
Proficient in Infrastructure as Code tools (Terraform, Ansible).
Strong scripting abilities in Python, Bash, or PowerShell.
Knowledge of CI/CD pipelines and DevOps best practices.
Familiar with monitoring and ITSM tools like New Relic, PagerDuty, and ServiceNow.
Exposure to IDP (Internal Developer Platform) tools and developer enablement frameworks.
Experience using GitHub Copilot or similar AI-based coding tools for automation.
Solid understanding of ITIL processes, incident/change management, and system operations.
Excellent problem-solving, troubleshooting, and communication skills.