Required Skills
Application Support & SRE: Minimum 2-4 years of experience in Application Support for cloud-based applications. Experience in a Site Reliability Engineering (SRE) role focused on availability/performance.
AWS Services: Extensive hands-on experience with: EC2, S3, VPC, Route 53, RDS, CloudFormation, DynamoDB (NoSQL), Lambda, CloudWatch, IAM, ELB, EBS, ECS, SQS, SNS.
Databases: Expertise with relational databases, querying, and reporting. Snowflake experience is highly preferred.
Troubleshooting: Strong experience troubleshooting issues related to UI, API, and data flow.
Monitoring Tools: Experience with tools like AppDynamics, Grafana, or ThousandEyes.
Scripting & Automation: Required proficiency in Python. Working experience with PowerShell, SQL, and JSON.
CI/CD & Data Ops: Familiarity with ADO (Azure DevOps) pipeline framework. Background in data management, data engineering, or data operations.
Key Responsibilities: Triage and resolve application support tickets within defined Service Level Agreements (SLAs).
Perform critical application/technical problem identification and resolution, including responding to off-shift and weekend support calls.
Proactively identify, manage, and document issue resolutions, providing follow-up communication to requestors.
Participate in application availability and performance monitoring.
Develop scripts (Python/PowerShell) and automation tools to better detect, correct, and prevent application issues.
Develop and enhance monitoring and alerting capabilities.
Perform other job duties as assigned by Caterpillar management.
Soft Skills & Other Requirements:
Ability to work independently and manage issue resolution proactively.
Strong documentation and follow-up communication skills.
Availability for off-shift and weekend support calls as part of an on-call rotation.
Must be local or explicitly willing to relocate at own expense and be onsite from day one (Chicago or Peoria, IL).