Principal Hadoop Architect

Overview

$DOE
Full Time

Skills

Hadoop

Job Details

HMG America LLC is the best Business Solutions focused Information Technology Company with IT consulting and services, software and web development, staff augmentation and other professional services. One of our direct clients is looking for Principal Hadoop Architect in Remote. Below is the detailed job description.

Job Title: Principal Hadoop Architect

Location: Remote / Hybrid (PST time)

Only FTE

Focus: Hadoop Ecosystem Optimization, Design & Code Frameworks and Design Standards

Role Objective

We are seeking a 10-15 years of Principal Hadoop Architect to serve as the central authority for a large-scale Big Data ecosystem. You will define the "Golden Standards" for data ingestion, storage, and processing, ensuring our on-premise environment is highly optimized and architecturally aligned for an eventual cloud evolution.

Core Responsibilities

1. Technical Design Authority

  • Standardization: Define and enforce "Blueprints" for Hive schemas, Spark configurations, and Kafka topics to be used across all engineering and analyst teams.
  • Reference Architecture: Maintain the official "Big Data Playbook," detailing approved design patterns for batch vs. real-time processing.
  • Review Board: Lead the Architecture & Design Review Board (ADRB) to vet new data projects, ensuring they don't introduce technical debt or inefficient resource patterns.

2. Ecosystem Optimization & "Fitness"

  • Performance Tuning: Identify "heavy-hitter" queries and inefficient YARN resource allocations. Implement mandatory partitioning and bucketing standards to reduce HDFS overhead.
  • Storage Rationalization: Implement tiered storage (Hot/Warm/Cold) policies. Enforce standard file formats (Parquet/Avro) to optimize compression and predicate push-down.
  • Lifecycle Management: Establish data retention and archival standards to prevent "Data Swamp" growth, ensuring we only store what provides value.

3. Cloud-Ready Engineering (The "Clean Room" Approach)

  • Decoupling Strategy: Lead the effort to decouple storage (HDFS) from compute (YARN) through architectural standards, making future cloud migration a "plug-and-play" exercise.
  • API-First Standards: Encourage the use of abstraction layers and APIs so that downstream applications aren't hard-coded to specific Hadoop versions.
  • Containerization Strategy: Provide guidance on moving localized workloads toward Kubernetes/Docker-friendly designs.

4. Security & Multi-Tenancy

  • Multi-Tenant Governance: Design a robust "Quotas and Queues" system to ensure a single team's rogue Spark job doesn't crash the cluster for everyone else.
  • Unified Security: Standardize Apache Ranger policies and Kerberos implementation across all nodes.

Technical Requirements

  • Expert Level Hadoop: Mastery of the Cloudera/Hortonworks stack, specifically Hive LLAP, YARN, and HDFS.
  • Standardization Experience: Proven track record of creating Enterprise Design Standards used by multiple engineering teams.
  • Processing Frameworks: Deep knowledge of Spark (Core/SQL) optimization and Kafka event-driven architecture.
  • Tooling Mastery: Experience with Apache foundation services such as Apache Atlas for lineage and Apache Ranger for centralized security.
  • Soft Skills: Ability to influence senior leadership and guide diverse engineering teams without direct reporting authority.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.