Data Architect/Engineer

Overview

On Site

Full Time

Skills

Data Integration

Data Engineering

Orchestration

Innovation

Data Processing

Automated Testing

Collaboration

Data Quality

Regulatory Compliance

Data Storage

Optimization

Visualization

Kubernetes

Extract

Transform

Load

Apache Spark

Scripting

Python

Bash

Apache Kafka

Continuous Integration

Continuous Delivery

Workflow

Jenkins

GitHub

Apache HTTP Server

Data Lake

Management

Data Architecture

Change Data Capture

API

Streaming

Real-time

Analytics

Job Details

Location: Rosemead,California
Expected Start Date: Jul 14, 2025

Overview: We are seeking a skilled and versatileData Architect / Data Engineerto design, build, and optimize data platforms and pipelines within a distributed environment. The ideal candidate will possess deep expertise in managing large-scale data systems, data integration, modern data engineering practices, and pipeline orchestration. You will play a key role in architecting and engineering scalable, high-performance data solutions that drive business insights and innovation.

Key Responsibilities:

Design, implement, and manage scalable data architectures on distributed platforms (e.g., MapR, HPE Unified Analytics and Data Fabric).
Develop, optimize, and maintain robust data pipelines using tools such as Spark, Airflow, and EzPresto.
Configure and maintain Kafka architecture, MapR Streams, and related technologies to support real-time and batch data processing.
Implement Change Data Capture (CDC) mechanisms and integrate data using APIs and streaming techniques.
Monitor, tune, and troubleshoot distributed data clusters including MapR and Kubernetes environments.
Develop and maintain CI/CD pipelines using Jenkins and integrate with GitHub for automated testing and deployment.
Collaborate with cross-functional teams to ensure data quality, governance, and compliance standards are met.
Leverage tools such as Iceberg and Superset for data storage optimization and visualization.

Required Skills & Experience:

Strong experience with distributed data platforms, including MapR and Kubernetes.
Proficient in data pipeline tools and frameworks:Spark, Airflow, EzPresto.
Solid programming and scripting skills:Python, Bash.
Expertise inKafkaarchitecture and operations.
Experience withCI/CDdevelopment workflows usingJenkinsandGitHub.
Knowledge and use ofApache Icebergfor data lake management.
Familiarity with data architecture best practices, including CDC and API-based integrations.

Preferred / Nice to Have:

Experience withHPE Unified Analytics and Data Fabric.
Familiarity withMapR StreamsandSupersetfor real-time analytics and dashboarding.

#dcejobs

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share