SRE Platform Engineer

Remote • Posted 3 hours ago • Updated 3 hours ago
Contract W2
No Travel Required
Remote
$85 - $90/hr
Fitment

Dice Job Match Score™

🫥 Flibbertigibetting...

Job Details

Skills

  • .NET
  • BMC
  • Cloud Computing
  • Communication
  • C#
  • DevSecOps
  • DevOps
  • Dynatrace
  • Microsoft SCOM
  • Agile
  • Microsoft Azure
  • Data Collection
  • Analytics

Summary

 

Eclaro''s client is searching for a  Lead SRE Platform Engineer to drive reliability engineering strategy and execution across critical IT Business Solutions platforms. 

**This is a for EST remote hours- NO 3RD PARTIES UNABLE TO SUBCONTRACT**


What You’ll Do

Reliability & Observability Leadership

  • Define and mature SRE best practices across cloud and on-prem environments.
  • Design and implement comprehensive monitoring strategies using tools such as:
    • Dynatrace
    • Datadog
    • Microsoft SCOM
  • Develop dashboards, alerts, synthetic testing, and proactive monitoring capabilities.
  • Establish and evolve a MELT data strategy to improve service reliability.
  • Provide data-driven RCA investigations and implement preventative solutions.

Platform & Application Reliability

Support and enhance reliability across:

  • Cloud & Infrastructure
    • Microsoft Azure (software, storage, Azure local)
    • Hyper-V and legacy VMware environments
    • NetApp and Pure storage platforms
    • Azure log analytics
    • Infrastructure as Code using Terraform
    • Migration from Azure DevOps to GitHub (strong GitHub experience required)
  • Order Management Systems
    • Azure-based, internally developed .NET/C# applications
    • Internal message queuing systems
    • Logging, analytics, and synthetic testing post-patching
    • API-based integrations
  • Workforce & Payroll Platforms
    • Workday (Payroll)
    • ADP Vantage (Timekeeping)
  • Warehouse & Distribution Systems
    • Blue Yonder Warehouse Management System (WMS)
    • Vocollect handheld voice picking devices
    • Network analytics for identifying dead zones and connectivity issues
    • Barcode scanners and device connectivity troubleshooting

DevSecOps & Automation

  • Lead CI/CD reliability improvements (Azure DevOps → GitHub transition critical).
  • Enhance pipeline automation with embedded security controls.
  • Advance Infrastructure-as-Code standards (Terraform).
  • Improve configuration management and change governance.
  • Drive automation to reduce manual intervention and operational risk.

ITSM & Incident Management

  • Work within BMC ecosystem including:
    • BMC Helix
    • BMC Remedy
    • BMC Server Automation
  • Optimize automated incident generation (SCOM → BMC workflows).
  • Improve triage, escalation, and impact modeling across services.
  • Monitor vendor performance and escalate appropriately.
  • Participate in off-hour escalation support when required.

Strategic Impact

  • Develop predictive reliability models using statistical techniques.
  • Identify systemic risk across production systems.
  • Guide tooling decisions (e.g., Dynatrace vs. Datadog or other observability platforms).
  • Ensure regulatory and operational compliance standards are met.
  • Facilitate cross-functional collaboration and document SRE procedures and planning artifacts.

Required Qualifications

  • 5–7+ years of Software Engineering and Infrastructure/Database Engineering experience.
  • Deep expertise in:
    • DevSecOps practices
    • Observability platforms
    • API integrations
    • Performance management tools
    • ITIL principles
    • ITSM data analytics
    • MELT data collection and analysis
  • Experience in Azure cloud environments.
  • Strong analytical and problem-solving skills.
  • Demonstrated ability to influence technical direction.
  • Excellent communication and cross-team collaboration skills.
  • Continuous improvement mindset focused on reliability engineering.

Preferred Qualifications

  • Strong programming experience in:
    • .NET / C#
    • Python
    • SQL
  • Experience with MSSQL (primary) and Oracle (limited).
  • Experience with GitHub (critical for upcoming transition).
  • Agile/Scrum experience.
  • Knowledge of Reliability-Centered Engineering and maintenance strategies.
  • Experience with synthetic testing and proactive validation post-deployment.
  • Bachelor’s degree in a related technical field.

What Success Looks Like

  • Improved uptime and measurable reduction in production incidents.
  • Faster, data-driven RCA resolution and prevention of repeat issues.
  • Clearly defined observability strategy with actionable dashboards.
  • Automated, secure, and reliable deployment pipelines.
  • Enhanced visibility into warehouse, order management, and payroll systems.
  • A scalable SRE foundation built for long-term growth.

 

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: ndi
  • Position Id: 8949091
  • Posted 3 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote

Today

Contract

75-95/hr

Remote

15d ago

Easy Apply

Contract

$160,000 - $180,000

Remote

7d ago

Easy Apply

Contract

70 - 80

Remote

11d ago

Easy Apply

Contract

Depends on Experience

Search all similar jobs