Ask Your Question
1

Questions about Streamsets-Kafka integration

asked 2018-03-21 02:13:01 -0600

Cazen gravatar image

Good day This is Cazen Could I ask a question about Streamsets when integrating with kafka?

We have 10 nodes with Apache kafka(v1.0.0), and Hadoop cluster. And I wanna connect from kafka to HDFS with streamsets(Saving data from specific topic in Kafka to HDFS location). However, I didn't find a way to do this task with cluster execution via Apache Kafka 1.0.0 (only available in the CDH Cluster Kafka version from https://streamsets.com/documentation/... document. It says require CDH or HDP.)

I'm new in this site, so could't attach a file but error message looks like below:

VALIDATION_0071 - Stage 'Kafka Consumer' from 'Apache Kafka 1.0.0' library does not support 'Cluster Yarn Streaming' execution mode

Could I ask a solution or different way to solve this problem? Currently, we do not consider using CDH, and will only use Streamsets

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
1

answered 2018-08-29 16:35:28 -0600

supahcraig gravatar image

I've run Streamsets as a docker container without any associated CDH install and had no problem consuming from Kafka and using the Hadoop FS destination. The link you provided was around Cluster Pipelines (of which I know nothing), but I know what you're describing works w/o CDH.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

2 followers

Stats

Asked: 2018-03-21 02:02:40 -0600

Seen: 87 times

Last updated: Aug 29