L3 – Cloudera Public Cloud Platform Engineer

Remote • Posted 2 hours ago • Updated 2 hours ago
Contract Corp To Corp
Contract Independent
Contract W2
No Travel Required
Remote
Depends on Experience
Company Branding Image
Fitment

Dice Job Match Score™

📊 Calculating match score...

Job Details

Skills

  • Cloudera
  • Nifi
  • CDP

Summary

Role: L3 Cloudera Public Cloud Platform Engineer

Work location: Remote

Type: Contract  

  • 12+ years of experience in Big Data Platform Engineering / Cloud Platform Operations / Infrastructure roles
  • 6+ years of hands-on experience with Cloudera ecosystem (CDH/CDP/ Cloudera Public Cloud)
  • Demonstrated ability to quickly learn and adapt to new technologies and evolving platform capabilities, beyond the currently defined CDP stack
  • Strong expertise in:
    • End-to-end CDP platform operations (CDE, CDW, CDF, CDL, CAI)
    • Advanced troubleshooting across multi-cluster, multi-environment deployments
    • Kubernetes-based runtime environments (troubleshooting and diagnostics)
    • Observability frameworks, including SLIs/SLOs, alerting, and performance tuning
  • Proven experience in:
    • Leading P1/P2 incident response, triage, and resolution
    • Managing platform upgrades, patching, and lifecycle events
    • Supporting large-scale environments (TB/PB scale, high concurrency workloads)
  • Strong understanding of:
    • Cloud infrastructure (IAM, VPC, networking, storage)
    • Security and governance (Ranger, Kerberos, TLS/SSL, SDX)
  • Expected to:
    • Lead complex troubleshooting and drive root cause resolution across platform layers
    • Mentor and guide L2 engineers
    • Coordinate with Cloudera support and infrastructure teams for critical issues
  • Hands-on experience in developing and troubleshooting NiFi (CDF) data flows, including:
    • Flow design and configuration
    • Processor-level debugging and performance tuning
    • Handling backpressure, throughput optimization, and failure recovery

Required Skills

  • Strong experience with Cloudera CDP Public Cloud
  • Expertise in:
    • Cloud platforms (AWS/Azure/Google Cloud Platform)
    • Kubernetes concepts (troubleshooting-focused)
  • Hands-on with:
    • CDE, CDW, CDF (NiFi), CAI
  • knowledge of:
    • IAM, networking, observability tools
  • Platforms operating at multi-terabyte to petabyte scale with high concurrency workloads
  • Hands-on experience with:
    • Kafka (or similar streaming platforms) including monitoring, troubleshooting, and performance tuning
  • Experience with Cloudera CDP CLI (Command Line Interface) for:
    • Platform operations and administration
    • Job execution and service management (CDE/CDW/CDL)
    • Automation of routine operational tasks
  • Strong working knowledge of:
    • Cloud IAM (AWS IAM / Azure AD) including roles, policies, and cross-service access
    • User and group mapping across CDP, cloud IAM, and Ranger policies
    • Troubleshooting access issues across storage (S3/ADLS), CDP services, and data access layers

Preferred Skills

  • Experience with:
    • Modernization of legacy data platforms/applications to Cloudera CDP Public Cloud
    • Migration and onboarding of workloads to CDE, CDW, and CAI environments
    • Supporting hybrid or multi-environment transitions (on-prem → cloud)
  • Familiarity with:
    • Cloud platforms (AWS, Azure, Google Cloud Platform) including storage, IAM, and networking concepts
    • Kubernetes-based runtime environments (troubleshooting-focused)
  • Strong scripting and automation skills (Python, Shell, Terraform) for platform operations

What You’ll Work On

  • Enterprise-scale Cloudera CDP platform supporting data engineering, analytics, and AI workloads across multiple applications
  • Modernization of legacy platforms and applications into cloud-native CDP services
  • Operational support and scaling of:
    • Data services (CDE, CDW, CDF, CDL)
    • AI/ML platforms (CAI, inference, workbenches)
  • Platform performance optimization, observability, and reliability engineering for mission-critical workloads

Why This Role Matters

  • Ensures availability, stability, and performance of the CDP platform supporting all data and AI workloads
  • Enables successful modernization of legacy applications into scalable, cloud-native services
  • Maintains high availability, observability, and operational excellence across enterprise platforms
  • Acts as the backbone for data engineering, analytics, and AI initiatives
  • This role focuses on platform reliability and infrastructure operations and does not include data-layer ownership (e.g., Iceberg table management or data validation).

Job Summary

  • We are seeking a highly skilled Cloudera Public Cloud Platform Engineer to operate and manage the end-to-end CDP platform ecosystem, including data services, NiFI, Kafka, AI/ML platforms, and enterprise observability.
  • This role is responsible for ensuring availability, scalability, security, and performance of all platform services supporting data, analytics, and AI workloads across environments.
  • The ideal candidate brings strong expertise in CDP on-prem, public cloud services, cloud infrastructure, Kubernetes-based runtime environments, and platform observability, supporting high-concurrency, mission-critical workloads at multi-terabyte to petabyte scale
  • This role is critical to ensuring uninterrupted operation of data, analytics, and AI platforms—any degradation directly impacts downstream business reporting, data pipelines, and model execution.

Key Responsibilities

CDP Platform & Multi-Service Operations

  • Own end-to-end operational responsibility for Cloudera Public Cloud services across Dev / Stage / UAT / Prod:
    • CDE, CDW, COD, CDL, CDF (NiFi), CDV, CAI, Kafka
  • Ensure multi-cluster stability, workload isolation, and SLA adherence
  • Support onboarding and operations of multiple applications across environments
  • Manage and support multi-environment, multi-cluster deployments with strict isolation, governance, and release coordination across Dev/UAT/Prod

AI/ML Platform Operations

  • Operate and support Cloudera AI (CAI) environments:
    • AI Workbenches, AI Studios
    • Model training and development environments
    • AI inference endpoints and model serving
  • Troubleshoot:
    • Resource contention (CPU/GPU)
    • Model deployment/runtime failures

CDP Runtime & Kubernetes-Aware Operations

  • Operate CDP services running on Cloudera-managed Kubernetes infrastructure
  • Apply strong understanding of containerized workloads and Kubernetes concepts for troubleshooting
  • Diagnose and resolve:
    • Pod failures, restarts, and resource contention
    • Spark job failures in containerized environments (CDE)
    • Service-to-service communication issues
  • Analyze logs and metrics to identify runtime failures and performance issues
  • Collaborate with Cloudera support for managed service-level issues

Data Integration & Platform Services

  • Operate and support:
    • CDF (NiFi) for ingestion pipelines
    • CDV (Data Visualization) for reporting workloads
    • Octopai for data lineage and catalog integration
  • Ensure reliability and performance of data pipelines and integrations
  • Monitor and troubleshoot Kafka environments:
    • Topic configurations, partitions, and replication
    • Consumer lag and throughput issues
    • Broker connectivity and performance bottlenecks

Security, Governance & SDX Administration

  • Implement and manage:
    • Kerberos, TLS/SSL, Ranger policies
  • Administer SDX for:
    • Centralized security
    • Metadata and policy enforcement
  • Support Atlas and Octopai integration
  • Manage and troubleshoot user access and identity mapping across layers, including:
    • Cloud IAM roles and permissions
    • CDP users/groups and identity providers
    • Ranger policies for fine-grained data access
  • Resolve access-related issues impacting:
    • Data access (S3/ADLS)
    • Query execution (CDW/CDE)
    • Application and service-level permissions

Cloud Infrastructure & Networking

  • Troubleshoot:
    • S3 / ADLS storage issues
    • IAM roles and permissions
    • VPC, subnets, routing, security groups
    • Bastion host access and connectivity
  • Ensure secure and reliable connectivity across services
  • Understand and troubleshoot S3-based data lake patterns, including:
    • Bucket structure, prefix design, and access patterns
    • Performance issues related to small files, request rates, and throughput limits
    • Encryption (SSE-S3, SSE-KMS) and access policies
  • Manage and troubleshoot cross-account IAM roles and access patterns for CDP environments
  • Ensure secure access between:
    • CDP environments and cloud resources
    • Multiple AWS accounts (dev/prod separation)

Disaster Recovery & Resiliency

  • Support and validate disaster recovery and failover strategies across CDP environments
  • Ensure backup, recovery, and environment resiliency for critical workloads
  • Participate in DR drills and recovery validation

 

Observability, Monitoring & Alerting (Critical)

  • Implement and manage end-to-end observability:
    • Metrics, logs, and alerting
  • Use:
    • Cloudera observability, Cloudera Manager, Prometheus, Grafana
  • Monitor:
    • Cluster health
    • Workload performance
    • AI inference endpoints
  • Enable proactive issue detection and prevention
  • Define and implement SLIs/SLOs and alerting thresholds to ensure platform reliability and performance
  • Support high-severity (P1/P2) incident response, triage, and resolution within defined SLAs

Operational Support & On-Call

  • Participate in on-call rotation to support 24/7 platform operations
  • Respond to production incidents, alerts, and service disruptions within defined SLAs
  • Handle P1/P2 incidents, including triage, troubleshooting, and resolution
  • Perform root cause analysis (RCA) and implement preventive measures

Upgrades, Patching & Platform Lifecycle

  • Execute:
    • CDP upgrades and version management
    • Security patches and hotfixes
  • Perform:
    • Rolling upgrades
    • Validation and rollback strategies

Performance Optimization & Cost Efficiency

  • Optimize:
    • Platform-level performance (Spark, Hive, Impala workloads)
    • Cluster utilization and workload distribution
  • Drive:
    • Autoscaling strategies
    • Cost optimization (FinOps practices)

Automation & Operational Excellence

  • Utilize and support existing automation frameworks for:
    • Platform provisioning
    • Monitoring and alerting
    • Routine operational tasks
  • Work with infrastructure teams that manage Infrastructure-as-Code (Terraform) for environment setup and changes
  • Leverage scripting (Python / Shell) for:
  • Operational support
  • Task automation
  • Troubleshooting and diagnostics
  • Maintain and follow runbooks, SOPs, and operational procedures to ensure consistent platform operations
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10330808
  • Position Id: 96581-5195-
  • Posted 2 hours ago

Company Info

About VDart, Inc.

VDart, headquartered in Atlanta, GA, is a global leader in digital talent solutions and IT staffing, delivering top technology professionals to businesses worldwide. With a strong presence across North America, Europe and Asia, we specialize in helping organizations navigate complex technology landscapes with the right expertise.

Through a strategic, client-focused approach, we have placed over 20,000 professionals across key industries and advanced technology solutions. Whether placing top talent in cutting-edge roles or providing strategic digital workforce solutions, our network of 4,000 specialists across 13 countries is committed to excellence, agility and impact.

Backed by 18 years of industry experience, we go beyond staffing to build long-term partnerships that accelerate digital transformation and drive sustained growth. Whether you need a technology partner to fuel innovation or specialized workforce solutions to maintain a competitive edge, VDart delivers the right people, skills and mindset to create a lasting impact in a digital-first world.

Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote

Yesterday

Easy Apply

Third Party, Contract

Depends on Experience

Remote or California

3d ago

Easy Apply

Contract, Third Party

$DOE

Remote or Cleveland, Ohio

Today

Easy Apply

Contract

USD 42.75 - 49.50 per hour

Remote

8d ago

Easy Apply

Full-time, Third Party

$60 - $70

Search all similar jobs