job summary:
The OMS Platform Reliability Lead is a highly technical role responsible for the health, stability, and automated evolution of our enterprise cloud Order Management System (OMS) ecosystem. Unlike a traditional operations role, this position leans heavily into Systems Engineering, requiring the ability to read and debug Java extensions, design complex GraphQL mutations, and build automated remediation tools for the "RUN" team.
You will manage the technical RUN support team and serve as the bridge between software engineering and IT operations. Your primary focus is to transition from manual support to "Self-Healing" operations by implementing automation for order replays, data deduplication, and predictive alerting.
location: Berkeley Heights, New Jersey
job type: Contract
salary: $60.00 - 61.25 per hour
work hours: 9am to 5pm
education: No Degree Required
responsibilities:
Key Responsibilities Technical Automation & Self-Healing Operations
- Order Remediation Automation: Design and implement automated "Order Replay" mechanisms within the OMS to resolve synchronization failures between event-driven integrations without manual intervention.
- Enhanced Observability: Build advanced telemetry dashboards (using tools like Splunk, Datadog, or New Relic) to monitor GraphQL query performance, API latency, and webhook success rates.
- Smart Alerting: Design and tune threshold-based alerting for the RUN team to identify "Stuck Orders" or inventory mismatches before they impact the customer experience.
- Tooling Development: Script custom utilities using the OMS SDK or REST APIs to facilitate bulk updates and system cleanups.
Technical Incident Management & Platform Monitoring
- Deep-Dive Troubleshooting: Act as the ultimate technical escalation point for incidents requiring code-level analysis of Java custom extensions or complex GraphQL mutations.
- Root Cause Engineering: Lead technical Root Cause Analysis (RCA) by performing deep-dives into application logs and event-driven architecture to identify architectural bottlenecks.
- Performance Tuning: Analyze API response times and database interaction patterns to propose platform optimizations to the development team.
- ITSM Compliance: Oversee the incident management lifecycle, ensuring documentation includes code-level workarounds and technical "bug-fixes" for future reference.
Stakeholder & Vendor Engineering Collaboration
- Technical Liaison: Serve as the primary technical point of contact for E-commerce and architecture teams to ensure operational requirements are included in the dev roadmap.
- Vendor Management: Collaborate with SaaS OMS product engineers to align on platform upgrades and API versioning impacts.
- Team Leadership: Mentor the RUN support team in technical skills including GraphQL query optimization and Java debugging.
Change Management & Release Integrity
- Technical Oversight: Validate technical configurations and platform extensions during the release cycle to ensure deployment integrity and performance stability.
- CI/CD Awareness: Manage version control using GIT, ensuring proper branching strategies for operational hotfixes and configuration changes.
qualifications:
Java: Proficiency in reading, debugging, and identifying performance issues in custom Java extensions and plugins.
GraphQL: Expert proficiency in query/mutation design, including the use of aliases, fragments, and variables for complex data manipulation.
Integration: Comprehensive understanding of RESTful architectures, JSON schemas, and event-driven patterns (Pub/Sub, Kafka, or Event Grid).
Observability: Experience with monitoring tools such as Datadog, Splunk, ELK Stack, or New Relic.
GIT: Deep experience with repository management and deployment pipelines.
Process Knowledge: Strong mastery of ITIL with an SRE (Site Reliability Engineering) mindset-focusing on automation over manual "toil."
Analytical Skills: Ability to parse complex system logs and use data to drive proactive stability improvements.
Communication: Ability to explain a "race condition" or "API timeout" to a business stakeholder in terms of revenue and customer impact.
Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.
At Randstad Digital, we welcome people of all abilities and want to ensure that our hiring and interview process meets the needs of all applicants. If you require a reasonable accommodation to make your application or interview experience a great one, please contact
Pay offered to a successful candidate will be based on several factors including the candidate's education, work experience, work location, specific job duties, certifications, etc. In addition, Randstad Digital offers a comprehensive benefits package, including: medical, prescription, dental, vision, AD&D, and life insurance offerings, short-term disability, and a 401K plan (all benefits are based on eligibility).
This posting is open for thirty (30) days.
![]()