Overview
Skills
Job Details
A globally leading consumer device company based in Cupertino, CA is looking for a Senior Data Engineer to join their team and help build the next generation of cellular analytics. You will work on production-grade ETL platforms that ingest, transform, and curate massive wireless telemetry datasets for near-real-time and batch use cases.
Role and Responsibilities:
Design, implement, and operate resilient batch and streaming ETL jobs in Spark that process terabytes of cellular network data daily, with clear KPIs for latency and availability
Build Airflow DAGs with strong observability, retries, SLAs, and automated remediation to keep production data flowing
Develop reusable libraries, testing harnesses, and CI/CD workflows that enable rapid, safe deployments and empower partner teams to self-serve
Partner with ML engineers to publish feature-ready datasets and model monitoring telemetry that align with medallion best practices
Implement automated validation, anomaly detection, and reconciliation frameworks that ensure trustworthy data at scale
Instrument data lineage, metadata cataloging, and documentation workflows to support discovery and compliance requirements
Collaborate with platform & product teams, system engineers, researchers, and security teams.
Required Skills and Experience:
8+ years delivering production ETL platforms and lakehouse datasets for large-scale systems, including ownership of business-critical workloads
Proven experience architecting, operating, and continuously scaling petabyte-class ETL/ELT platforms that power mission-critical analytics and ML workloads across bronze/silver/gold data layers
Ability to craft multi-year data platform roadmaps, drive architectural decisions, and align stakeholders around standards for quality, performance, and cost efficiency
Deep hands-on proficiency with Apache Spark (batch and structured streaming) on-prem or cloud stacks, including performance tuning, job observability, and production incident response
Production experience orchestrating complex pipelines with Apache Airflow (or equivalent), including DAG design, robust dependency modeling, SLA management, and operational excellence
Expertise with data lakehouse technologies (Apache Iceberg, Delta Lake, Hudi) and columnar storage formats (Parquet, ORC) for scalable, reliable data management
Practical knowledge of event streaming patterns and tooling such as Kafka, Kinesis, or Pulsar for ingesting high-volume network telemetry
Strong foundation in Python, Scala, or Java; disciplined CI/CD, automated testing, infrastructure-as-code, and Git-based workflows
Ability to design pragmatic schemas and semantic layers that serve ETL throughput, downstream analytics, and ML feature engineering
Experience delivering pipelines on AWS, Google Cloud Platform, or Azure using services like EMR, Databricks, Glue, Dataflow, BigQuery, or equivalent
Familiarity with Kubernetes, containerized deployments, and observability stacks (Prometheus, Grafana, ELK, OpenTelemetry) for proactive monitoring, rapid recovery, and continuous improvement
Experience working with large-scale telemetry data is a plus
Bachelor's degree or higher in Computer Science, Data Engineering, Electrical Engineering, or related technical field (or equivalent practical experience)
Type: Contract
Duration: 12 months with extension
Work Location: Sunnyvale, CA or San Diego, CA (100% on site)
Pay rate: $75.00 - $90.00 (DOE)
No 3rd party agencies or C2C