Overview
Skills
Job Details
Cloud DevOps Administrator Expert
Important Details
Position Type: Remote
Client:
Position Overview
The Cloudera Data Engineer will play a key role in supporting the migration of a Medicaid Data Warehouse within an AWS cloud environment.
This role focuses on ensuring the seamless migration and continued operation of an existing Cloudera/Hive/Scala-based data pipeline from one AWS account to another while maintaining data integrity, system performance, and operational stability.
The selected consultant will collaborate closely with the AWS infrastructure team (VPC, IAM, S3, EC2, and networking) to replicate, configure, and optimize the Cloudera ecosystem for the new environment.
Key Responsibilities
Migration & Configuration
Replicate and configure existing Cloudera clusters (HDFS, YARN, Hive, Spark) within the new AWS account.
Coordinate with the infrastructure team to ensure proper provisioning of EC2, IAM roles, security groups, and networking.
Reconfigure cluster connectivity and job dependencies for the migrated environment.
Migrate and validate metadata stores, including Hive Metastore, job configurations, and dependencies.
Validate data integrity and ensure job outputs match the source environment.
Post-Migration Operations
Deploy, test, and operate existing Hive and Spark (Scala) jobs post-migration.
Maintain and manage job schedules, dependencies, and runtime configurations.
Monitor job performance and identify optimization opportunities.
Troubleshoot pipeline or cluster issues, implementing automated recovery and alert mechanisms.
Cluster Management
Monitor Cloudera Manager dashboards, ensuring cluster health and efficient resource utilization.
Manage user roles, permissions, and access within the Cloudera ecosystem.
Implement data cleanup, archiving, and housekeeping tasks to maintain system efficiency.
Create and maintain detailed migration documentation and operational runbooks.
Required Skills & Experience
Education:
• Bachelor’s degree in Computer Science, Information Systems, or a related discipline
Experience:
• 7+ years of experience in data engineering or big data development
• 4+ years working with Cloudera platform (HDFS, YARN, Hive, Spark, Oozie)
• Proven experience deploying and operating Cloudera workloads on AWS (EC2, S3, IAM, CloudWatch)
• Strong proficiency in Scala, Java, and HiveQL; scripting skills in Python or Bash preferred
• Advanced skills in Apache Spark & Scala programming for data processing and transformation
• Hands-on experience with Cloudera Hadoop distributions and Drools-based business rules processing
• Ability to collaborate effectively with infrastructure, DevOps, and data governance teams in complex enterprise environments
Preferred Qualifications
Cloudera Certification: CDP Data Engineer or Cloudera Administrator
Experience performing Cloudera version upgrades or AWS-to-AWS migrations
Experience in public sector or large enterprise data warehouse environments
Ideal Candidate Profile
The ideal candidate is a technically strong, detail-oriented data engineer with proven expertise in Cloudera, Hadoop, and AWS-based ecosystems.
They should have a track record of managing complex data migrations, optimizing Spark/Scala workflows, and ensuring data consistency and operational reliability in production environments.