Overview
Skills
Job Details
Job Title: Data Architect / Lead Data Engineer Location: Hybrid (preferred in the Maryland/DC/VA area )
Employment Type: Full-time / Contract
Client: Federal
Job Summary:
We are seeking a highly experienced Data Architect / Lead Data Engineer to design and implement scalable data platforms, data integration frameworks, and analytics solutions. The ideal candidate should have a proven track record in data engineering, cloud migration, data governance, and leading end-to-end data pipeline development using modern data technologies.
Key Responsibilities:
Architect, design, and lead development of scalable data pipelines and cloud-based data lakes and lakehouses.
Lead the adoption and setup of Databricks and modern cloud-based data platforms (AWS, OCI).
Oversee PII data classification, governance, and data quality assurance using tools like Informatica Cloud, Alation, and Claire.
Collaborate with cross-functional teams including data scientists, BI analysts, and business stakeholders to understand data needs and deliver solutions.
Optimize performance of large-scale data ingestion and transformation frameworks using Spark, PySpark, and Airflow.
Lead the migration of legacy data warehouses (e.g., Netezza, Oracle) to cloud platforms (AWS, OCI).
Build and manage reporting solutions using tools like Tableau, Power BI, MicroStrategy, and Superset.
Mentor junior data engineers and analysts, conduct code reviews, and enforce data engineering best practices.
Implement metadata models, monitoring dashboards, and orchestration frameworks for robust job execution and data delivery.
Ensure data governance, profiling, and secure handling of PII data in compliance with enterprise and regulatory standards.
Required Qualifications:
Bachelor s or Master s in Computer Science, Engineering, or related field (IIT or similar preferred).
10+ years of experience in data engineering and architecture roles.
Strong hands-on experience with:
Cloud: AWS (S3, EMR, Lambda, RDS, Glue, Athena), OCI
Big Data Tools: Spark, PySpark, Hive, Airflow
ETL/ELT: Informatica Cloud (IICS), Glue, Sqoop
Databases: Oracle, PostgreSQL, Redshift, Netezza
Data Modeling: ERwin, SAP PowerDesigner
Reporting: MicroStrategy, SAP Business Objects, Tableau, Superset
Expertise in data governance, profiling, quality checks, and cataloging tools such as Alation and Informatica Claire.
Experience leading enterprise data migration and modernization initiatives.
Strong skills in Python, SQL, Scala, and bash scripting.
Certifications Preferred:
AWS Certified Data Engineer Associate
AWS Certified Solutions Architect Associate
Databricks Certified Data Engineer Professional
Databricks Associate Developer for Apache Spark 3.0