Position: Lead Cloudera Consultant (Solution Architect)
Location: 100% remote
Duration: 12+ months
Lead Cloudera Streaming Architect (CDP | NiFi | Kafka | Flink | Kudu | SSB)
About the Role
We are seeking a Lead Cloudera Streaming Architect with deep, hands-on experience across the Cloudera CDP streaming stack, including NiFi, Kafka, Flink, Kudu/Impala, and SQL Stream Builder (SSB). This is a highly technical, architecture-plus-implementation role responsible for designing, delivering, and optimizing mission-critical real-time data pipelines at enterprise scale.
If you have personally built end-to-end CDP/CDF streaming pipelines and can execute complex ingestion, transformation, CDC, and Kudu write-path use cases on day one this role is for you.
What You ll Do
Streaming Architecture & Implementation
- Architect and build real-time data pipelines using the full Cloudera Data Platform (CDP) streaming suite:
- NiFi Kafka Flink Kudu/Impala SSB
- Own architectural decisions, patterns, and best practices for streaming, CDC, state management, schema evolution, and exactly-once delivery.
- Develop complex NiFi flows involving controller services (DBCP/JDBC), stateful processors, record processors, schema registry integrations, batch-to-stream conversions, and high-volume ingestion patterns.
- Build and optimize Flink SQL or DataStream API jobs with:
- ? Kafka sources/sinks
- ? event-time windows
- ? watermarks
- ? state management
- ? checkpointing / savepoints
- ? exactly-once guarantees
- Design and tune Kudu tables (PKs, partitioning, distribution, upserts, deletes, merges).
- Build and deploy streaming SQL jobs using Cloudera SQL Stream Builder (SSB).
Use Case Delivery
You must be able to deliver the following four core use cases immediately:
- NiFi Snowflake Impala/Kudu ingestion pipeline
- Kafka Flink streaming (real-time processing)
- Flink Kafka sink with exactly-once semantics
- CDC ingestion via NiFi, Flink CDC, or SSB (incremental keys, late events, deletes)
Optimization, Monitoring & Governance
- Tune NiFi, Kafka, and Flink clusters for performance, throughput, and stability.
- Implement schema governance, error handling, back-pressure strategies, and replay mechanisms.
- Work closely with platform engineers to optimize CDP components and CDF deployments.
- Provide architectural guidance, documentation, and mentorship to engineering teams.
Required Experience
You must have hands-on, production-grade experience with ALL of the following:
Cloudera CDP / CDF
- CDP Public Cloud or Private Cloud Base
- Cloudera Flow Management (NiFi + NiFi Registry)
- Cloudera Streams Messaging (Kafka, SMM)
- Cloudera Stream Processing (Flink, SSB)
- Kudu / Impala ecosystem
Apache NiFi (Advanced)
- Building complex flows (not just admin/ops)
- QueryDatabaseTable / GenerateTableFetch / MergeRecord
- Record-based processors & schema registry
- JDBC / DBCP controller services
- Stateful processors & incremental ingestion
- NiFi Snowflake integration
- NiFi Kudu ingestion patterns
Apache Kafka
- Kafka brokers, partitions, retention, replication, consumer groups
- Schema registry (Avro/JSON)
- Designing topics for high-throughput streaming
Apache Flink
- Flink SQL + DataStream API
- Event-time processing, watermarks, windows
- Checkpointing, savepoints, state backends
- Kafka source/sink connectors
- Exactly-once semantics
- Flink CDC a plus
Apache Kudu
- Table design (PKs, partition strategies)
- Upserts, deletes, merge semantics
- Integration with Impala
SQL Stream Builder (SSB)
- Creating jobs, connectors, materialized views
- Deploying and monitoring Flink SQL jobs in CDP
CDC (Change Data Capture)
- CDC via NiFi or Flink CDC or SSB
- Handling late-arriving events
- Handling deletes, updates, schema evolution
- Incremental key tracking
General Requirements
- 8+ years in data engineering / streaming
- 3 5+ years specifically with CDP/CDF streaming
- Strong SQL and distributed system fundamentals
- Experience in financial services, healthcare, telecom, or other high-volume industries preferred
Nice to Have
- Kubernetes experience running NiFi/Kafka/Flink operators
- Snowflake ingestion patterns (staging, Copy Into)
- Experience with Debezium
- CI/CD for data pipelines
- Security (Kerberos, Ranger, Atlas)
What Success Looks Like
In the first 90 days, you will:
- Deliver at least two of the four required streaming use cases end-to-end
- Establish architectural patterns for NiFi, Flink, and Kudu pipelines
- Optimize one existing pipeline for throughput, latency, and reliability
- Become the subject-matter expert for Data in Motion on CDP
Apply If You Can Demonstrate
- You have personally built NiFi Kafka Flink Kudu pipelines
- You understand event-time processing and exactly-once delivery
- You have designed Kudu tables and worked with Impala
- You have authored and deployed SSB SQL streaming jobs
- You can speak to real-world CDC implementations