Role Summary
As a Lead SRE Platform Engineer, you will drive reliability engineering strategy and execution across critical IT Business Solutions platforms at Wegmans. This role focuses on improving uptime, performance, and operational efficiency through software enhancements, observability, automation, and data-driven root cause analysis (RCA).
You will serve as the technical lead for SRE practices establishing monitoring standards, improving MELT (Metrics, Events, Logs, Traces) strategy, influencing tooling decisions, and partnering across infrastructure, development, operations, and vendor teams. This is a high-impact opportunity to build and mature reliability engineering capabilities from the ground up.
What You ll Do
Reliability & Observability Leadership
Define and mature SRE best practices across cloud and on-prem environments.
Design and implement comprehensive monitoring strategies using tools such as:
oDynatrace
oDatadog
oMicrosoft SCOM
Develop dashboards, alerts, synthetic testing, and proactive monitoring capabilities.
Establish and evolve a MELT data strategy to improve service reliability.
Provide data-driven RCA investigations and implement preventative solutions.
Platform & Application Reliability
Support and enhance reliability across:
Cloud & Infrastructure
oMicrosoft Azure (software, storage, Azure local)
oHyper-V and legacy VMware environments
oNetApp and Pure storage platforms
oAzure log analytics
oInfrastructure as Code using Terraform
oMigration from Azure DevOps to GitHub (strong GitHub experience required)
Order Management Systems
oAzure-based, internally developed .NET/C# applications
oInternal message queuing systems
oLogging, analytics, and synthetic testing post-patching
oAPI-based integrations
Workforce & Payroll Platforms
oWorkday (Payroll)
oADP Vantage (Timekeeping)
Warehouse & Distribution Systems
oBlue Yonder Warehouse Management System (WMS)
oVocollect handheld voice picking devices
oNetwork analytics for identifying dead zones and connectivity issues
oBarcode scanners and device connectivity troubleshooting
DevSecOps & Automation
Lead CI/CD reliability improvements (Azure DevOps GitHub transition critical).
Enhance pipeline automation with embedded security controls.
Advance Infrastructure-as-Code standards (Terraform).
Improve configuration management and change governance.
Drive automation to reduce manual intervention and operational risk.
ITSM & Incident Management
Work within BMC ecosystem including:
oBMC Helix
oBMC Remedy
oBMC Server Automation
Optimize automated incident generation (SCOM BMC workflows).
Improve triage, escalation, and impact modeling across services.
Monitor vendor performance and escalate appropriately.
Participate in off-hour escalation support when required.
Strategic Impact
Develop predictive reliability models using statistical techniques.
Identify systemic risk across production systems.
Guide tooling decisions (e.g., Dynatrace vs. Datadog or other observability platforms).
Ensure regulatory and operational compliance standards are met.
Facilitate cross-functional collaboration and document SRE procedures and planning artifacts.
Required Qualifications
5 7+ years of Software Engineering and Infrastructure/Database Engineering experience.
Deep expertise in:
oDevSecOps practices
oObservability platforms
oAPI integrations
oPerformance management tools
oITIL principles
oITSM data analytics
oMELT data collection and analysis
Experience in Azure cloud environments.
Strong analytical and problem-solving skills.
Demonstrated ability to influence technical direction.
Excellent communication and cross-team collaboration skills.
Continuous improvement mindset focused on reliability engineering.
Preferred Qualifications
Strong programming experience in:
o.NET / C#
oPython
oSQL
Experience with MSSQL (primary) and Oracle (limited).
Experience with GitHub (critical for upcoming transition).
Agile/Scrum experience.
Knowledge of Reliability-Centered Engineering and maintenance strategies.
Experience with synthetic testing and proactive validation post-deployment.
Bachelor s degree in a related technical field.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
- Dice Id: cxbcsi
- Position Id: Job44318
- Posted 3 hours ago