Role : Google Cloud Data Architect IAM Data Modernization Location : Dallas, TX / Charlotte, NC (Hybrid 3 days office)
Highly Preferred OCP exp
Project/Program
Identity & Access Management (IAM) Data Modernization migration of an on-premises SQL data warehouse to a target-state Data Lake on Google Cloud (Google Cloud Platform), enabling metrics & reporting, advanced analytics, and GenAI use cases (natural language querying, accelerated summarization, cross-domain trend analysis) leveraging PySpark-based processing, cloud-native DevOps CI/CD pipelines, and containerized deployments on OpenShift (OCP) to deliver scalable, secure, and high-performance data solutions.
About Program/Project
The IAM Data Modernization project involves migrating an on-premises SQL data warehouse to a target state Data Lake in Google Cloud Platform cloud environment. Key highlights include:
- Integration Scope: 30+ source system data ingestions and multiple downstream integrations
- Capabilities: Metrics, reporting, and Gen AI use cases with natural language querying, advanced pattern/trend analysis, faster summarizations, and cross-domain metric monitoring
Benefits:
- Scalability and access to advanced cloud functionality
- Highly available and performant semantic layer with historical data support
- Unified data strategy for executive reporting, analytics, and Gen AI across cyber domains
This modernization establishes a single source of truth for enterprise-wide data-driven decision-making.
Required Skills
DevOps / CI-CD
- Experience implementing CI/CD pipelines for data and analytics workloads
- Familiarity with Git-based source control, build automation, and deployment strategies
Containers & Platform
- Experience with OpenShift Container Platform (OCP) for deploying data workloads and services
- Understanding of containerized architecture, scaling, and environment management
- Proven ability to build CI/CD pipelines for data and infrastructure workloads
- Experience managing secrets securely using Google Cloud Platform Secret Manager
- Ownership of observability, SLOs, dashboards, alerts, and runbooks
- Proficiency in logging, monitoring, and alerting for data pipelines and platform reliability
Big Data & Processing
- Hands-on experience with PySpark for ETL/ELT, data transformation, and performance optimization
- Solid understanding of distributed data processing concepts
Data & Cloud Architecture
- Strong experience designing data platforms on Google Cloud Platform (Google Cloud Platform)
- Experience with Data Lakes, data warehousing, and large-scale migration programs
Data Lake Architecture & Storage
- Proven experience designing and implementing data lake architectures (e.g., Bronze/Silver/Gold or layered models)
- Strong knowledge of Cloud Storage (GCS) design, including bucket layout, naming conventions, lifecycle policies, and access controls
- Experience with Hadoop/HDFS architecture, distributed file systems, and data locality principles
- Hands-on experience with columnar data formats (Parquet, Avro, ORC) and compression techniques
- Expertise in partitioning strategies, backfills, and large-scale data organization
- Ability to design data models optimized for analytics and BI consumption
Data Ingestion & Orchestration
- Experience building batch and streaming ingestion pipelines using Google Cloud Platform-native services
- Knowledge of Pub/Sub-based streaming architectures, event schema design, and versioning
- Strong understanding of incremental ingestion and CDC patterns, including idempotency and deduplication
- Hands-on experience with workflow orchestration tools (Cloud Composer / Airflow)
- Ability to design robust error handling, replay, and backfill mechanisms
Data Processing & Transformation
- Experience developing scalable batch and streaming pipelines using Dataflow (Apache Beam) and/or Spark (Dataproc)
- Strong proficiency in BigQuery SQL, including query optimization, partitioning, clustering, and cost control
- Hands-on experience with Hadoop MapReduce and ecosystem tools (Hive, Pig, Sqoop)
- Advanced Python programming skills for data engineering, including testing and maintainable code design
- Experience managing schema evolution while minimizing downstream impact
Analytics & Data Serving
- Expertise in BigQuery performance optimization and data serving patterns
- Experience building semantic layers and governed metrics for consistent analytics
- Familiarity with BI integration, access controls, and dashboard standards
- Understanding of data exposure patterns via views, APIs, or curated datasets
Data Governance, Quality & Metadata
- Experience implementing data catalogs, metadata management, and ownership models
- Understanding of data lineage for auditability and troubleshooting
- Strong focus on data quality frameworks, including validation, freshness checks, and alerting
- Experience defining and enforcing data contracts, schemas, and SLAs
- Strong focus on data quality frameworks, including validation, freshness checks, and alerting
- Experience defining and enforcing data contracts, schemas, and SLAs
Good to have
- Security, Privacy & Compliance
- Hands-on experience implementing fine-grained access controls for BigQuery and GCS
- Experience with Sprint planning and helping team technically.
- Strong stakeholder communication and solution-architecture skills
Qualifications
- Experience: [10-14]+ years in DevOps and Data Architecture, 5+ years designing on Pyspark/Google Cloud Platform/OCP at scale; prior on-prem cloud migration a must.
- Education: Bachelor's/Master's in Computer Science, Information Systems, or equivalent experience.
- Certifications: Google Cloud Professional Cloud Architect/DevOps/OCP (required or within 3 months). Plus: Professional Data Engineer, Security Engineer.