Ask Your Question

Questions about Streamsets-Kafka integration

asked 2018-03-21 02:13:01 -0500

Cazen gravatar image

Good day This is Cazen Could I ask a question about Streamsets when integrating with kafka?

We have 10 nodes with Apache kafka(v1.0.0), and Hadoop cluster. And I wanna connect from kafka to HDFS with streamsets(Saving data from specific topic in Kafka to HDFS location). However, I didn't find a way to do this task with cluster execution via Apache Kafka 1.0.0 (only available in the CDH Cluster Kafka version from document. It says require CDH or HDP.)

I'm new in this site, so could't attach a file but error message looks like below:

VALIDATION_0071 - Stage 'Kafka Consumer' from 'Apache Kafka 1.0.0' library does not support 'Cluster Yarn Streaming' execution mode

Could I ask a solution or different way to solve this problem? Currently, we do not consider using CDH, and will only use Streamsets

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2018-08-29 16:35:28 -0500

supahcraig gravatar image

I've run Streamsets as a docker container without any associated CDH install and had no problem consuming from Kafka and using the Hadoop FS destination. The link you provided was around Cluster Pipelines (of which I know nothing), but I know what you're describing works w/o CDH.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools



Asked: 2018-03-21 02:02:40 -0500

Seen: 117 times

Last updated: Aug 29 '18