Job OverviewThe NOC Lead, reporting to the NOC Manager, will provide strategic and technical leadership to a team of Principal Engineers and Engineers. This role is accountable for ensuring the stability and functionality of applications, batch processes, network, and infrastructure components. The Lead will drive operational excellence by maintaining maximum availability (99.9%-99.99%), overseeing incident management, and ensuring timely resolution of escalations to meet or exceed established SLAs. Additionally, this position will guide the team in implementing best practices, fostering collaboration, and delivering continuous improvements across the NOC environment.
The Network Operations Center (NOC), a key part of iCIMS Technical Operations, is dedicated to monitoring applications and infrastructure to deliver an exceptional customer experience. The team ensures optimal performance by validating availability, coordinating cross-functional event responses, and communicating any customer-impacting incidents. Additionally, the NOC analyzes key performance indicators (KPIs) to forecast future trends and provide initial recommendations to the engineering team.
The Lead, NOC reports to the NOC Manager, will be responsible for maintaining the functionality of applications, batch processes, network, and infrastructure components. This role ensures maximum availability (99.9%-99.99%) and drives timely resolution of incidents or technical escalations to meet established SLAs.
About Us When you join iCIMS, you join the team helping global companies transform business and the world through the power of talent. Our customers do amazing things: design rocket ships, create vaccines, deliver consumer goods globally, overnight, with a smile. As the Talent Cloud company, we empower these organizations to attract, engage, hire, and advance the right talent. We're passionate about helping companies build a diverse, winning workforce and about building our home team. We're dedicated to fostering an inclusive, purpose-driven, and innovative work environment where everyone belongs.
Responsibilities - Ensure Production Stability: Monitor availability and performance across the entire production environment to maintain optimal operations. Off hours support as needed.
- Leverage Monitoring Tools: Track cloud resource utilization and performance metrics to identify trends and potential issues proactively.
- Data-Driven Insights: Generate regular performance reports and recommend enhancements based on detailed analysis.
- Incident Management Excellence: Lead the restoration of normal service operations swiftly, including assessment, research, escalation, communication, and resolution management.
- Execute Production Changes: Implement necessary changes to support both internal and external customer needs.
- Operational Support: Provide effective triage and resolution for operational support requests.
- Documentation & Standards: Review and refine SOPs, policies, procedures, and system requirements to ensure accuracy and relevance.
- Automation Development: Create and maintain automation scripts using Python and Java to streamline processes and reduce manual effort.
- Infrastructure as Code (IaC): Apply IaC practices to improve deployment efficiency, consistency, and scalability.
- Comprehensive Documentation: Prepare detailed electronic documentation, including SLAs, performance metrics, installation guides, and implementation guides.
- Reduce Manual Work: Identify repetitive tasks and implement automation solutions to eliminate inefficiencies.
- Performance Reviews: Participate in monthly metric reviews to support uptime goals of 99.9%-99.99%.
- Drive Innovation: Demonstrate passion, initiative, and urgency in seeking innovative solutions and resolving issues effectively.
Qualifications - 10+ years of strong Cloud provider experience and demonstrated knowledge of relevant work
- 8+ years in administration and production support experience with on-call responsibilities
- At least 6 years of lead-level experience
- 1 technical certification in any area
- Observability tooling experience
Education/Certifications/Licenses:- Bachelor's degree in computer science, Engineering, Information Systems, or related technical field
- Equivalent combination of education and experience will be considered
Preferred- Experience with AWS / AWS Certifications
- Exposure to other cloud technologies like Azure and Google Cloud Platform
EEO StatementiCIMS is a place where everyone belongs. We celebrate diversity and are committed to creating an inclusive environment for all employees. Our approach helps us to build a winning team that represents a variety of backgrounds, perspectives, and abilities. So, regardless of how your diversity expresses itself, you can find a home here at iCIMS. We prohibit discrimination and harassment of any kind based on race, color, religion, national origin, sex (including pregnancy), sexual orientation, gender identity, gender expression, age, veteran status, genetic information, disability, or other applicable legally protected characteristics. If you'd like to request an accommodation due to a disability, please contact us at .
Compensation and BenefitsCompetitive health and wellness benefits include medical insurance (employee and dependent family members), personal accident and group term life insurance, bonding and parental leave, lifestyle spending account reimbursements, wellness services offerings, sick and casual/emergency days, paid holidays, tuition reimbursement, retirals (PF - employer contribution) and gratuity. Benefits and eligibility may vary by location, role, and tenure. Learn more here: