Role: Data architect with Google Cloud Platform Exp
Location: Dallas, TX or Charlotte, NC (Onsite)
Duration: 12 Months
Identity & Access Management (IAM) Data Modernization
Migration of an on-premises SQL data warehouse to a modern enterprise Data Lake platform, enabling analytics and GenAI use cases. The platform leverages PySpark-based processing, CI/CD pipelines, and containerized deployments on OpenShift (OCP), with Google Cloud Platform as a preferred cloud platform, to deliver scalable, secure, and high-performance data solutions
About Program/Project
The IAM Data Modernization program focuses on transforming legacy data platforms into a scalable and cloud-compatible architecture.
Key Highlights:
Integration Scope: 30+ source systems with multiple downstream integrations [
Capabilities: Metrics, reporting, advanced analytics, and GenAI use cases (NL querying, summarisation, cross-domain insights)
Benefits:
Scalable and resilient data platform
High-performance semantic and analytics layer
Single source of truth for enterprise-wide reporting and analytics
Role Summary
We are looking for a Data Architect with strong expertise in OpenShift (OCP), PySpark, and CI/CD pipelines to design and govern scalable data platforms.
The role requires defining end-to-end data architecture, containerised deployment patterns, orchestration strategies (Airflow/Autosys), and platform standards, along with hands-on involvement in implementation.
Key Responsibilities
Data Architecture & Platform Design
Define enterprise data architecture for IAM data lake and analytics platform
Design scalable, modular, and containerised data pipeline architectures on OCP
Establish data models, schema governance, and data lifecycle strategies
Define best practices for data partitioning, performance optimisation, and cost efficiency
OpenShift (OCP) & Platform Engineering
Architect and govern containerised data workloads on OpenShift (OCP)
Define standards for deployment, scaling, and workload isolation
Collaborate with DevOps teams for platform engineering and infrastructure alignment
Big Data & Processing (PySpark Focus)
Define architecture for PySpark-based batch and near real-time processing pipelines
Provide guidance on distributed processing design, optimisation, and performance tuning
Establish reusable frameworks for ETL/ELT processing
Data Ingestion & Orchestration
Architect data ingestion frameworks (batch, streaming, CDC)
Define orchestration strategies using Airflow / Autosys
Implement standards for retry, backfills, dependency management, and error handling
DevOps / CI-CD
Define and oversee CI/CD strategy for data and platform deployments
Enable automation of build, test, and deployment processes
Ensure integration of CI/CD pipelines with OCP-based environments
Cloud & Data Platforms (Preferred)
Provide architecture guidance for Google Cloud Platform-based data platforms (preferred, not mandatory)
Define integration patterns for cloud-native and on-premise hybrid environments
Guide teams on cloud migration strategies and modern data platform adoption
Data Governance, Quality & Observability
Define frameworks for:
Data quality, validation, and lineage
Metadata management and cataloguing
Establish monitoring, logging, alerting, and SLOs for platform reliability
Ensure compliance with data security and audit requirements
Stakeholder Collaboration
Work closely with client architects, IAM teams, and business stakeholders
Translate business requirements into scalable technical architecture
Provide architectural guidance and mentorship to engineering teams
Required Skills:
Core Skills (Must Have)
Strong experience in:
OpenShift (OCP) / Kubernetes-based platforms
PySpark / Spark ecosystem
CI/CD implementation for data platforms
Airflow / Autosys orchestration tools
Solid understanding of:
Data lake architectures (layered models)
ETL/ELT design patterns
Distributed data processing concepts
Data Engineering & Storage
Expertise in:
Data formats: Parquet, ORC, Avro
Partitioning and performance tuning
Large-scale data modelling for analytics
Cloud (Preferred Not Mandatory)
Experience with Google Cloud Platform (Google Cloud Platform) (preferred)
Exposure to services like BigQuery, Dataproc, Dataflow, GCS is a plus
Observability & Reliability
Experience defining:
Monitoring, logging, alerting frameworks
Dashboards, SLOs, and operational runbooks
Good to Have
Experience with IAM domain / cybersecurity data
Understanding of data security and access control frameworks
Exposure to GenAI-enabled data platforms
Experience in Agile delivery and team leadership
Qualifications
Experience:
10 14+ years in Data Architecture / Data Engineering
Strong experience in OCP, PySpark, CI/CD, and orchestration frameworks
Prior experience in data modernisation / migration programs
Education:
Bachelor s/Master s in Computer Science, Information Systems, or equivalent
Certifications (Preferred):
OpenShift / Kubernetes certifications
Google Cloud Platform certifications (preferred, not mandatory)