Remote
•
17d ago
Design, develop, and troubleshoot large-scale, distributed, event-driven cloud systems to ensure high availability and performance. Coordinate and implement infrastructure and software improvements to meet resiliency and scalability goals. Maintain and enhance infrastructure and monitoring-as-code to ensure repeatability, traceability, and transparency in automation. Support on-call rotations, resolve operational issues, and drive long-term fixes to reduce alert fatigue. Collaborate with develop
Easy Apply
Full-time
Depends on Experience