asked 2018-03-21 02:13:01 -0500

Good day This is Cazen Could I ask a question about Streamsets when integrating with kafka?

We have 10 nodes with Apache kafka(v1.0.0), and Hadoop cluster. And I wanna connect from kafka to HDFS with streamsets(Saving data from specific topic in Kafka to HDFS location). However, I didn't find a way to do this task with cluster execution via Apache Kafka 1.0.0 (only available in the CDH Cluster Kafka version from document. It says require CDH or HDP.)

I'm new in this site, so could't attach a file but error message looks like below:

VALIDATION_0071 - Stage 'Kafka Consumer' from 'Apache Kafka 1.0.0' library does not support 'Cluster Yarn Streaming' execution mode

Could I ask a solution or different way to solve this problem? Currently, we do not consider using CDH, and will only use Streamsets

