Title: Senior Staff Engineer -Product Software Architecture & Engineering
Location: REDWOOD CITY-CA, Hybrid
Job Description:
Responsibilities
Architectural Strategy & Technical Vision
Core Stack Evolution: Architect and optimize our primary ingestion and storage engines utilizing Java and PostgreSQL, ensuring high availability and performance at scale.
Real-Time Data Orchestration: Lead the design of high-throughput messaging systems using Apache Kafka to handle trillions of telemetry points with sub-second latency.
Unified Visibility: Define the global standard for observability visualization in Grafana, building complex, high-performance dashboards that aggregate data from diverse telemetry sources.
High-Scale Engineering & Innovation
Stream Processing Mastery: Architect massively parallel processing pipelines and stateful stream processing frameworks (utilizing tools like Apache Flink) to enable real-time anomaly detection.
Advanced R&D: Evaluate and prototype emerging technologies such as Model-Driven Telemetry (MDT) and ClickHouse/Thanos for long-term metric storage and high-cardinality data analysis.
Technical Roadmap Ownership: Drive the engineering team toward key milestones, ensuring the code we ship aligns with the 3 5 year long-term NPE vision.
Reliability & Systemic Leadership
Service Standards: Define and monitor critical SLI/SLO metrics (e.g., P95 response times) to ensure the platform maintains world-class performance and global ITIL compliance.
Incident Authority: Serve as the senior point of contact for complex root-cause analysis, identifying architectural weaknesses in the Java/Kafka/Postgres stack to prevent future outages.
Stakeholder Synthesis: Translate complex product requirements into deep technical specifications, managing relationships with both internal software teams and external network vendors.
Required Qualifications & Experience
Tenure: 10+ years of professional experience in software engineering and distributed systems.
Domain Expertise: 5+ years of experience specifically in large-scale network engineering, telemetry, or observability platforms.
Java Expert: Mastery of Java for building high-performance, scalable backend services.
Data & Messaging: Deep expertise in PostgreSQL (schema design and tuning) and Apache Kafka (cluster architecture and stream management).
Visualization: Expert-level proficiency in Grafana for creating enterprise-level observability dashboards.
Large-Scale Systems: Proven experience with Prometheus, Thanos, or ClickHouse and working within a structured Agile/Scrum environment.
Education: Bachelor's or Master's degree in Computer Science or a related technical field.