Work Schedule & On-Call:
Standard expectation: 8-hour workday
Monthly on-call rotation (supported by offshore team; extended late hours are unlikely)
Typical hours fall within 8:00 AM – 8:00 PM, with remaining time potentially on-call
Generally, not expected to exceed 40 hours/week
Start Date:
ASAP
Duration:
12-month contract with potential for extension and/or conversion (not guaranteed at this time)
Interview Process:
Round 1: 60-minute MS Teams interview
Round 2: In-person interview
In this contingent resource assignment, you may: Consult as an expert to develop or influence initiatives and resources for highly complex business and technical needs across Engineering. Consult on the strategy and resolution of highly complex and unique challenges requiring in-depth evaluation across multiple areas, delivering solutions that are long-term, large-scale and require vision, creativity, innovation, and advanced analytical and inductive thinking. Provide expertise to client senior leadership on innovative Engineering business solutions. Strategically engage with client personnel.
Principal Engineer – Platform Engineering & Production Support
Team Overview
This role supports a critical Platform Engineering team responsible for stabilizing, scaling, and operating applications as they move closer to production release. The team plays a key role post-deployment, ensuring reliability, performance, and operational excellence across a portfolio of applications.
This is not traditional infrastructure support—it is application-focused production engineering, requiring deep technical expertise, proactive issue prevention, and strong ownership of application health in cloud environments.
Role Summary
We are seeking a Principal Engineer to backfill a key contractor position within our Platform Engineering team. This individual must be Day 1 ready, capable of operating in fast-paced, production-critical environments, and able to seamlessly balance multiple priorities.
The ideal candidate is a strong DevOps and Site Reliability Engineering (SRE) professional with hands-on expertise in observability, incident management, and cloud platforms (OpenShift). They will play a leading role in supporting production systems, preventing outages, and improving system reliability through automation and intelligent monitoring.
Key Responsibilities
Lead production support efforts across a portfolio of 20+ applications, ensuring stability, performance, and rapid issue resolution
Design and build advanced monitoring, alerting, and observability dashboards using tools such as Splunk, Grafana, AppDynamics, and Prometheus
Proactively identify risks through gap analysis, anomaly detection, and predictive alerting, preventing production incidents before they occur
Troubleshoot complex production issues across distributed microservices environments, reducing MTTR through deep technical expertise
Drive adoption of modern SRE practices, including automation, AIOps, and intelligent monitoring solutions
Support applications running on OpenShift and cloud-native platforms, with a focus on reliability and scalability
Collaborate closely with development teams during release cycles, providing production-readiness guidance and operational support
Participate in 24x7 on-call rotation, demonstrating urgency and ownership during incidents
Mentor and guide engineers, helping elevate team capabilities in SRE, DevOps, and platform engineering practices
Act as a trusted technical leader, able to quickly switch priorities and manage competing demands in a high-pressure environment
What We’re Looking For
A genuine, hands-on engineer who can operate across multiple roles (SRE, DevOps, Production Support)
Strong ability to shift priorities quickly and respond with urgency in critical situations
Deep understanding of application support in cloud environments, especially OpenShift
Experience in the financial services industry strongly preferred
Prior development experience is a plus, particularly in Java-based ecosystems
Required Qualifications:
• 10+ years of Platform and production support
• 5 years of Redhat Linux, OpenShift, Kubernetes, Java, microservices, Spring Boot, Python experience
• 5 years of Observability dashboard creation experience - Grafana, Splunk, SPLOC, AppDynamics
• 5 years of Observability alerts and Incident handling - AIOPS, Service now, Bigpanda etc
• 4 years of React.js, Apache, Kafka, relational databases experience
• 4 years of distributed systems, microservices architectures, and cloud native platforms experience.
• 7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work or consulting experience, training, military experience, education.