IT Senior Cloud Operations Engineer

Overview

On Site
USD 97,388.00 - 155,567.00 per year
Full Time

Skills

Higher Education
Life Insurance
Insurance
Environment Management
High Availability
Optimization
Grafana
Cloud Security
Access Control
Documentation
Machine Learning (ML)
Workflow
Performance Monitoring
Artificial Intelligence
ITIL
Root Cause Analysis
Collaboration
Leadership
Disaster Recovery
Continuous Improvement
Reliability Engineering
Operational Efficiency
Mentorship
IT Management
CHAOS
Testing
Training
Innovation
Customer Focus
Accountability
IC
Integrated Circuit
Internal Communications
Computer Science
Microsoft Azure
Google Cloud Platform
Google Cloud
IaaS
Terraform
ARM
FOCUS
Management
Continuous Integration
Continuous Delivery
Jenkins
GitLab
Ansible
Scripting
Python
Bash
Windows PowerShell
Incident Management
Recovery
Regulatory Compliance
Amazon Web Services
Cloud Computing
Data Entry

Job Details

Division or Field Office:

Office of the CIO
Department of Position: Enterprise Tech Office Dept
Work from:
Corporate Office in Erie, Pa Salary Range:
$97,388.00 - $155,567.00 *

salary range is for this level and may vary based on actual level of role hired for

*This range represents a national range and the actual salary will depend on several factors including the scope and complexity of the role and the skills, education, training, credentials, location, and experience of an applicant, as well as level of role for which the successful candidate is hired. Position may be eligible for an annual bonus payment.

At Erie Insurance, you're not just part of a Fortune 500 company; you're also a valued member of a diverse and inclusive team that includes more than 6,000 employees and over 13,000 independent agencies. Our Employees work in the Home Office complex located in Erie, PA, and in our Field Offices that span 12 states and the District of Columbia.

Benefits That Go Beyond The Basics

We strive to be Above all in Service to our customers-and to our employees. That's why Erie Insurance offers you an exceptional benefits package, including:
  • Premier health, prescription, dental, and vision benefits for you and your dependents. Coverage begins your first day of work.
  • Low contributions to medical and prescription premiums. We currently pay up to 97% of employees' monthly premium costs.
  • Pension. We are one of only 13 Fortune 500 companies to offer a traditional pension plan. Full-time employees are vested after five years of service.
  • 401(k) with up to 4% contribution match. The 401(k) is offered in addition to the pension.
  • Paid time off. Paid vacation, personal days, sick days, bereavement days and parental leave.
  • Career development. Including a tuition reimbursement program for higher education and industry designations.

Additional benefits that include company-paid basic life insurance; short-and long-term disability insurance; orthodontic coverage for children and adults; adoption assistance; fertility and infertility coverage; well-being programs; paid volunteer hours for service to your community; and dollar-for-dollar matching of your charitable gifts each year.

Position Summary

Responsible for leading cloud operations tasks, including incident response, automation, and operational improvements.


This role involves overseeing complex issues, mentoring junior engineers, and driving operational efficiency.



Collaborates with cross-functional teams and plays a key role in ensuring cloud infrastructure, environments, and workloads are reliable, secure, and optimized for performance.


What You'll Do:
We are seeking a proactive and solutions-driven IT Cloud Operations Engineer or IT Sr Cloud Operations Engineer to support our AI Center of Excellence (CoE) within the IT organization. In this role, you will be responsible for maintaining and optimizing the platforms that enable enterprise-scale AI initiatives. You'll play a key part in ensuring the reliability, availability, and performance of cloud-based AI/ML environments.
  • Cloud Environment Management: Operate, monitor, and maintain cloud-based infrastructure supporting AI workloads across AWS, Azure, or Google Cloud Platform.
  • System Reliability & Uptime: Ensure high availability of AI platforms and services, proactively responding to performance or reliability issues.
  • Incident Response & Troubleshooting: Lead incident investigations and root cause analyses for infrastructure and platform-level issues.
  • Automation & Optimization: Automate cloud operations tasks using scripts or tools (e.g., Python, Bash, Terraform) to streamline deployment, monitoring, and scaling.
  • Monitoring & Alerts: Implement and manage monitoring solutions (e.g., Prometheus, Grafana, CloudWatch, Datadog) to support proactive alerting and system health visibility.
  • Security & Compliance: Support cloud security configurations, manage access controls, and assist with compliance processes and documentation.
  • Collaboration: Work closely with cloud engineers, data scientists, and ML engineers to ensure smooth and efficient AI/ML workflows from dev to production


What Makes You Stand Out:
  • Experience with model performance monitoring
  • Knowledge of AI concepts and tools
  • Experience with ITIL/incident management frameworks.


Duties and Responsibilities

  • Lead incident response and provide on-call support for escalated cloud incidents and deployments, conducting root cause analysis, ensuring timely resolution, and implementing improvements to prevent future incidents.
  • Develop and maintain automated operational procedures, such as system maintenance, monitoring, compliance, patching, and recovery, to enhance cloud service reliability and reduce manual intervention.
  • Collaborate with cross-functional teams and leadership, providing operational insights to improve cloud infrastructure reliability, availability, and performance.
  • Ensure cloud environments adhere to operational standards, security, and compliance requirements, managing operational readiness, disaster recovery, and performance process and procedures.
  • Monitor and optimize cloud service performance, identifying issues proactively and leading continuous improvement efforts to enhance system reliability and operational efficiency.
  • Mentor and guide junior engineers, providing technical leadership and promoting best practices in cloud operations, automation, and incident response.
  • Lead operational improvement initiatives, including chaos engineering practices, operational drills, and testing activities to improve response, resiliency, and detection capabilities.


The first five duties listed are the functions identified as essential to the job. Essential functions are those job duties that must be performed in order for the job to be accomplished.


This position description in no way states or implies that these are the only duties to be performed by the incumbent. Employees are required to follow any other job-related instruction and to perform any other duties as requested by their supervisor, or as become clear.


Capabilities

  • Collaborates
  • Cultivates Innovation
  • Customer Focus
  • Decision Quality
  • Ensures Accountability
  • Instills Trust
  • Nimble Learning
  • Optimizes Work Processes (IC)
  • Self-Development
  • Values Diversity


Qualifications

Minimum Educational and Experience Requirements
  • Bachelor's degree in computer science, engineering, or equivalent industry experience in a related technical field; and five years of professional experience in a related technical field; or
  • Associate's degree in computer science, engineering, or equivalent industry experience in a related technical field; and seven years of professional experience in a related technical field; or
  • High School degree and nine years of professional experience in a related technical field, required.


Additional Requirements
  • Cloud Platforms: Advanced knowledge of one or more cloud platforms (e.g., AWS, Azure, Google Cloud Platform), with deep expertise in managing cloud infrastructure.
  • Infrastructure as Code (IaC): Proficiency in Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation, CDK, ARM, Pulumi), with a focus on the configuration and management of cloud resources.
  • Automation & Scripting: Expertise in CI/CD pipelines and automation tools (e.g., Jenkins, GitLab, Ansible), and scripting languages (e.g., Python, Bash, Powershell) to streamline operational tasks.
  • Monitoring & Incident Response: Proficiency with monitoring and observability tools (e.g., CloudWatch, Prometheus), with experience in cloud incident response and troubleshooting.
  • Operational Procedures: Ability to execute and improve operational procedures, including system maintenance, patching, recovery, and monitoring.
  • Compliance & Standards: Understanding of operational standards, controls, and compliance requirements, ensuring that cloud environments meet necessary regulations.
  • Operational Strategy: Ability to identify operational inefficiencies and propose strategic solutions to optimize system performance and availability.


Designations and/or Licenses
  • Associate-level cloud certification (such as AWS Certified Cloud Solutions Architect - Associate) preferred or willingness to obtain within 6 months of hire.


Physical Requirements

  • Ability to move over 50 lbs using lifting aide equipment; Rarely
  • Climbing/accessing heights; Rarely
  • Driving; Occasional (<20>
  • Lifting/Moving 0-20 lbs; Occasional (<20>
  • Lifting/Moving 20-50 lbs; Rarely
  • Manual Keying/Data Entry/inputting information/computer use; Frequent (50-80%)
  • Pushing/Pulling/moving objects, equipment with wheels; Rarely


Nearest Major Market: Erie
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Erie Insurance Group