Application Sustain & Operations Engineer

  • Plano, TX
  • Posted 11 days ago | Updated 12 hours ago

Overview

On Site
USD 89,000.00 - 149,000.00 per year
Full Time

Skills

Management
Problem Solving
Conflict Resolution
Process Automation
Problem Management
Incident Management
Root Cause Analysis
SAP BASIS
SAFE
Business Operations
Software Engineering
Scalability
Continuous Improvement
Data Link Layer
Productivity
Collaboration
Operational Excellence
Documentation
Technical Writing
Regulatory Compliance
Auditing
Security Management
Disaster Recovery
Business Continuity Planning
Insurance
Legal
Computer Science
Information Technology
Reliability Engineering
System Administration
Linux
Unix
Microsoft Operating Systems
Microsoft Windows Server
Grafana
Splunk
Nagios
AppDynamics
Scripting
Python
Bash
Windows PowerShell
Continuous Integration
Continuous Delivery
Configuration Management
Jenkins
Ansible
Puppet
Progress Chef
Database
MySQL
MongoDB
Apache Cassandra
Couchbase
Computer Networking
DNS
Dragon NaturallySpeaking
TCP/IP
Load Balancing
Firewall
Cloud Computing
Amazon Web Services
Microsoft Azure
Google Cloud
Google Cloud Platform
ServiceNow
Reporting
LOS
Recruiting
Law

Job Details

Overview

Role is responsible for ensuring the overall stability of production application. Reliability, availability, scalability, and efficiency of our production systems and platforms. The Operations Engineer will collaborate with cross-functional teams-including Software Engineering, Service Reliability, Infrastructure, and Business Operations-to streamline processes, manage day-to-day operations, monitor system health, and quickly resolve incidents.

The ideal candidate must be skilled in problem-solving, process automation, and root cause analysis, with a passion for operational excellence and continuous improvement.

Responsibilities

System Reliability & Availability:
  • Ensure production systems, applications, and infrastructure are reliable, performant, and available within agreed SLAs/OLAs.

Incident & Problem Management:
  • Lead troubleshooting of critical incidents and drive timely resolution as part of Incident Management. Ensure the Root Cause Analysis is performed and help coordinate the implement permanent fixes on a timely basis.
  • Analyze priority incidents to generate insights and identify gaps in the alerting mechanisms.
  • Analyze market-specific issues and conduct comparative studies to determine why certain problems occur only in specific markets.

Monitoring & Alerting:
  • Partner with the Service Reliability Engineering team to identify, develop and maintain proactive monitoring, alerting, and health checks to detect and prevent issues before business impact.
  • Assist the SRE team in identifying critical health checks for order flow, Order journey and user journeys to enable dedicated notifications for key steps.

Deployment & Change Operations:
  • Partner with the Software Engineering team to support safe, efficient deployments and configuration changes, ensuring minimal disruption to business operations.
  • Provide insights on system performance and capacity trends; provide recommendations to the Software Engineering to implement improvements for scalability and efficiency.

Automation & Continuous Improvement:
  • Identify manual operational tasks and automate processes to increase efficiency, reduce errors, and improve response times.
  • Identify recurring data anomalies through analysis and assist in determining effective technical and process-related solutions.
  • Review L2 team's manual processes to uncover automation opportunities and implement technology-specific solutions aimed at improving productivity.

Collaboration with Engineering & Product Teams:
  • Partner with development, infrastructure, and reliability engineering teams to design and deliver operable, scalable, and resilient solutions.

Operational Excellence & Documentation:
  • Maintain runbooks, SOPs, and technical documentation; uphold IT controls, compliance, and audit readiness.

Risk & Security Management:
  • Enforce operational security best practices, support vulnerability remediation, and contribute to disaster recovery and business continuity planning.

Compensation and Benefits:
  • The expected compensation range for this position is between $89,000 - $149,000.
  • Location, confirmed job-related skills, experience, and education will be considered in setting actual starting salary. Your recruiter can share more about the specific salary range during the hiring process.
  • Bonus based on performance and eligibility target payout is 10% of annual salary paid out annually.
  • Paid time off subject to eligibility, including paid parental leave, vacation, sick, and bereavement.
  • In addition to salary, PepsiCo offers a comprehensive benefits package to support our employees and their families, subject to elections and eligibility: Medical, Dental, Vision, Disability, Health, and Dependent Care Reimbursement Accounts, Employee Assistance Program (EAP), Insurance (Accident, Group Legal, Life), Defined Contribution Retirement Plan.

Qualifications

  • Bachelor's degree in computer science, Information Technology, Engineering, or a related field (or equivalent experience).
  • 5+ years of experience in operations engineering, site reliability engineering, or systems administration.
  • Strong knowledge of Linux/Unix and/or Windows server environments.
  • Experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Datadog, Splunk, Nagios, AppDynamics, Full Story, Ignio).
  • Proficiency in at least one scripting/programming language (e.g., Python, Bash, PowerShell).
  • Familiarity with CI/CD pipelines, deployment automation, and configuration management (e.g., Jenkins, Ansible, Puppet, Chef).
  • Database - MySQL, MongoDB, Cassandra, Couchbase
  • Understanding of networking fundamentals (DNS, TCP/IP, load balancing, firewalls).
  • Hands-on experience with cloud platforms (AWS, Azure, Google Cloud Platform).
  • Experience working with Service Now.

>

Our Company will consider for employment qualified applicants with criminal histories in a manner consistent with the requirements of the Fair Credit Reporting Act, and all other applicable laws, including but not limited to, San Francisco Police Code Sections 4901-4919, commonly referred to as the San Francisco Fair Chance Ordinance; and Chapter XVII, Article 9 of the Los Angeles Municipal Code, commonly referred to as the Fair Chance Initiative for Hiring Ordinance.

All qualified applicants will receive consideration for employment without regard to age, race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, or disability status.

PepsiCo is an Equal Opportunity Employer: Female / Minority / Disability / Protected Veteran / Sexual Orientation / Gender Identity / Age.

If you'd like more information about your EEO rights as an applicant under the law, please download the available EEO is the Law & EEO is the Law Supplement documents. View PepsiCo EEO Policy.

Please view our Pay Transparency Statement.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.