ROLE SUMMARY
We are looking for a Senior Data Platform / Migration Engineer to lead the modernization of an enterprise data ecosystem, including migration from Cloudera DataIQ DSS to MapR. This role requires deep expertise in large-scale distributed data systems, migration strategy, and performance optimization, with a strong focus on zero data loss, minimal downtime, and production stability.
KEY RESPONSIBILITIES
- Lead end-to-end migration of enterprise data lake from Cloudera (DataIQ, DSS, CDP) to MapR
- Define and execute migration strategy ensuring data integrity, minimal downtime, and rollback readiness
- Design and build scalable, production-grade data pipelines post-migration
- Optimize cluster performance including compute, storage, and resource utilization
- Partner with BI/reporting teams to ensure schema consistency and data availability
- Implement data validation frameworks to ensure accuracy and completeness post-migration
- Document architecture, runbooks, lineage, and operational procedures
- Collaborate with governance teams on data quality, lineage, and compliance requirements
REQUIRED SKILLS AND EXPERIENCE
- 8+ years in Data Engineering / Data Platform Engineering
- Strong hands-on experience with Cloudera (CDP, DSS, DataIQ) and/or MapR
- Strong hands-on experience with Apache Spark, Hive, Hadoop, HDFS
- Proven experience executing large-scale data lake migrations
- Strong programming skills in Python, Scala, or SQL
- Deep understanding of distributed data processing and storage systems
- Experience with ETL/ELT frameworks (Informatica, Talend, dbt, or similar)
PREFERRED QUALIFICATIONS
- Prior MapR implementation or certification
- Experience with streaming platforms (Kafka, Pulsar)
- Exposure to cloud-native data platforms (AWS S3, Azure Data Lake, Google Cloud Platform)
- Familiarity with data governance, lineage, and catalog tools
- Experience working in high-scale enterprise environments (multi-terabyte/petabyte)
CORE TECHNOLOGY STACK
Cloudera DSS / DataIQ / CDP, MapR, Apache Spark, Hive, Hadoop, HDFS, Kafka, Python, SQL, dbt, Informatica / Talend