Data Architect/Engineer

  • Rosemead, CA
  • Posted 3 days ago | Updated 11 hours ago

Overview

On Site
Full Time

Skills

Data Integration
Data Engineering
Orchestration
Innovation
Data Processing
Automated Testing
Collaboration
Data Quality
Regulatory Compliance
Data Storage
Optimization
Visualization
Kubernetes
Extract
Transform
Load
Apache Spark
Scripting
Python
Bash
Apache Kafka
Continuous Integration
Continuous Delivery
Workflow
Jenkins
GitHub
Apache HTTP Server
Data Lake
Management
Data Architecture
Change Data Capture
API
Streaming
Real-time
Analytics

Job Details

Location: Rosemead,California
Expected Start Date: Jul 14, 2025

Overview: We are seeking a skilled and versatileData Architect / Data Engineerto design, build, and optimize data platforms and pipelines within a distributed environment. The ideal candidate will possess deep expertise in managing large-scale data systems, data integration, modern data engineering practices, and pipeline orchestration. You will play a key role in architecting and engineering scalable, high-performance data solutions that drive business insights and innovation.

Key Responsibilities:
  • Design, implement, and manage scalable data architectures on distributed platforms (e.g., MapR, HPE Unified Analytics and Data Fabric).
  • Develop, optimize, and maintain robust data pipelines using tools such as Spark, Airflow, and EzPresto.
  • Configure and maintain Kafka architecture, MapR Streams, and related technologies to support real-time and batch data processing.
  • Implement Change Data Capture (CDC) mechanisms and integrate data using APIs and streaming techniques.
  • Monitor, tune, and troubleshoot distributed data clusters including MapR and Kubernetes environments.
  • Develop and maintain CI/CD pipelines using Jenkins and integrate with GitHub for automated testing and deployment.
  • Collaborate with cross-functional teams to ensure data quality, governance, and compliance standards are met.
  • Leverage tools such as Iceberg and Superset for data storage optimization and visualization.

Required Skills & Experience:
  • Strong experience with distributed data platforms, including MapR and Kubernetes.
  • Proficient in data pipeline tools and frameworks:Spark, Airflow, EzPresto.
  • Solid programming and scripting skills:Python, Bash.
  • Expertise inKafkaarchitecture and operations.
  • Experience withCI/CDdevelopment workflows usingJenkinsandGitHub.
  • Knowledge and use ofApache Icebergfor data lake management.
  • Familiarity with data architecture best practices, including CDC and API-based integrations.

Preferred / Nice to Have:
  • Experience withHPE Unified Analytics and Data Fabric.
  • Familiarity withMapR StreamsandSupersetfor real-time analytics and dashboarding.

#dcejobs
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.