Position :: Observability Engineer/Architect
Location :: Los Angeles, CA (Hybrid Role)
Duration :: 6+ months
Interview :: Video
Job Description:
Must-haves:
Job Title: Senior Full Stack Observability Engineer/Architect – Catalyst Center, ThousandEyes, Grafana, Automation
Summary
We are seeking a senior engineer/architect to lead the design, implementation, and optimization of full stack observability solutions across large-scale enterprise environments. This role will drive architecture and automation for network, application, and infrastructure monitoring, leveraging platforms such as Catalyst Center, ThousandEyes, Grafana, and custom automation frameworks. The ideal candidate demonstrates deep, hands-on expertise in observability, automation, and performance analytics, with a proven ability to deliver scalable solutions and mentor engineering teams.
Key Responsibilities
- Own end-to-end observability architecture: Design and implement integrated monitoring solutions for network, application, and infrastructure domains, ensuring visibility, reliability, and actionable insights.
- Lead Catalyst Center–driven automation: Develop templates, workflows, and closed-loop operations for network assurance, leveraging Catalyst Center APIs and automation tools.
- ThousandEyes deployment and analytics: Architect and operationalize ThousandEyes for synthetic and real-user monitoring, path visualization, and outage detection across distributed environments.
- Grafana dashboarding and analytics: Build and maintain Grafana dashboards for real-time and historical performance analytics, integrating diverse data sources (SNMP, API, logs, metrics).
- Automation and integration: Develop and maintain automation scripts and frameworks (Python, Ansible, Terraform) for observability, alerting, and remediation workflows.
- Performance and reliability engineering: Define SLOs/SLIs, implement proactive monitoring, and drive root-cause analysis for critical incidents.
- Mentor and uplift engineering teams: Conduct design reviews, develop standards and runbooks, and deliver enablement sessions for operations and field engineers.
- Stakeholder leadership: Collaborate with security, cloud, application, and operations teams to translate business outcomes into technical architectures and measurable milestones.
- Documentation & governance: Produce HLD/LLD, as-builts, standards, compliance artifacts, and reusable templates for observability and automation.
Required Qualifications (Must-Have)
- 10+ years experience in enterprise networking, systems, or cloud engineering, including 3–5+ years leading observability and automation initiatives at scale.
- Proven, exceptional hands-on skills with Catalyst Center, ThousandEyes, and Grafana for monitoring, analytics, and automation.
- Deep expertise in network and application performance monitoring, synthetic and real-user analytics, and incident response.
- Strong experience with automation frameworks (Python, Ansible, Terraform) and API integrations.
- Demonstrated success leading complex, multi-phase deployments and mentoring senior engineers.
Preferred Qualifications
- Certifications in observability, automation, or cloud platforms (e.g., Cisco Certified Specialist – Observability, AWS/Azure monitoring).
- Experience with cloud networking, hybrid connectivity, and integration of DNS/DHCP/IPAM data sources.
- Familiarity with Zero Trust, NAC posture, and security monitoring.
- Experience with data center and campus interconnect monitoring (ACI concepts beneficial but not required).
Work Style & Travel
- Must be able to work onsite at client locations as required.
- Off-hours change windows may be needed for critical migrations and incident response.