Data Architect
Plano, TX
Hybrid
We are seeking a hands-on Data Architect to design and evolve an AWS based data platform spanning streaming ingestion (Kafka), API/enterprise integration (MuleSoft), containerized data services (EKS), data lake on S3, interactive query with Athena, and analytics/reporting on Snowflake and Tableau.
You will set data architecture standards, lead solution design, and guide engineering teams to deliver a scalable, secure, and cost 'efficient platform that accelerates product and analytics use cases.
Key Responsibilities
Architecture & Design Own the end-to-end data architecture across ingestion, storage, processing, serving, and visualization layers. Define canonical data models and domain data contracts; lead conceptual/logical/physical data modelling and schema design for batch and streaming use cases.
Establish reference architectures and patterns for event driven and API 'led data integration (Kafka, MuleSoft). Design secure, multi 'account AWS topologies (VPC, IAM, KMS) for data workloads; enforce governance, lineage, and cataloguing
Platform Enablement (New Platform Buildout) Lead the blueprint and incremental rollout of a new AWS data platform, including landing at raw at' curated zones on S3, Athena for 'hoc/interactive SQL, and Snowflake for governed analytics and reporting.
Define platform SLAs/SLOs, cost guardrails, and chargeback/show back models; optimize storage/compute footprints.
Partner with DevOps to run containerized data services on EKS (e.g., stream processors, microservices, connectors) and automate with CI/CD.
Data Integration & Processing Guide ingestion patterns: Kafka topics/partitions, retention, compaction, schema evolution (Avro/Protobuf), DLQ strategies.
Architect MuleSoft APIs/flows for system to 'system data exchange and orchestration; standardize API contracts and security. Define Athena query strategies, partitioning, file formats (Parquet/ORC), and table metadata practices for performance/cost. Set patterns for CDC, bulk/batch ETL/ELT, and stream processing; select fit purpose transformation engines.
Analytics, Reporting & Self 'Service Shape a semantic layer and governed Snowflake models (data vault/star schemas) to serve BI and data science. Enable business teams with Tableau dashboards, certified data sources, and governance for KPI definitions and refresh cadences.
Security, Governance & Quality Implement data classification, encryption, access controls (RBAC/ABAC), masking/tokenization, and audit trails.
Establish data quality standards, SLOs, observability (freshness, completeness, accuracy), and automated validation.
Leadership & Collaboration Provide architecture runway, backlog guidance, and technical mentorship for data engineers, API/streaming engineers, and BI developers.
Partner with Product, Security, and Compliance to align roadmaps, standards, and delivery milestones.
Produce decision records, diagrams, and guidance that make complex designs easy to adopt.
Required Qualifications
- 8+ years in data architecture/engineering with 3+ years architecting on AWS.
- Proven design of S3 'based data lakes with robust partitioning, lifecycle policies, and metadata/catalogue strategy.
- Hands 'on experience with Kafka (topic design, schema evolution, consumer groups, throughput/latency tuning).
- Practical MuleSoft integration design (API 'led connectivity, RAML/OAS, policies, governance).
- Production experience with Amazon EKS for data/streaming microservices and connectors.
- Strong SQL and performance tuning with Athena; expertise selecting file formats/partitioning for cost/perf.
- Data warehousing on Snowflake (ELT, clustering, resource monitors, security) and delivering analytics via Tableau.
- Mastery of data modelling (3NF, dimensional/star, data vault), data contracts, and event modelling.
- Solid foundations in security, IAM/KMS, networking for data platforms, and cost management.
Preferred Qualifications
Experience with schema registries, stream processing frameworks, and change data capture. Background in data governance (catalogue/lineage), metadata automation, and compliance frameworks.
Familiarity with and DevOps practices for data (pipeline CI/CD, environment promotion, GitOps).
Prior work enabling self 'service analytics and establishing an enterprise semantic layer.
Tools & Technologies
(Environment) AWS: S3, EKS, Athena, IAM, KMS, CloudWatch, Glue/Lake Formation (as applicable).
Streaming & Integration: Kafka (+ Schema Registry), MuleSoft. Warehouse & BI: Snowflake, Tableau.
Data Formats: Parquet/ORC/Avro/Protobu; partitioning/bucketing best practices.
Observability & Quality: Metrics, lineage, DQ checks, and alerting (tooling per org standard).