spark evaluator on yarn streaming hang on first batch (spark2)

asked 2018-10-18 04:26:33 -0600

Chang gravatar image

updated 2018-10-29 00:46:20 -0600

Hi, guys. I have made a simple pipeline like this.

[kafka consumer] - [spark evaluator] - [trash]

The spark evaulator does simple transform, just add some field by record.set. It works fine on preview(standalone), but not working on yarn streaming mode.

Transform code is,

override def transform(recordRDD: JavaRDD[Record]): TransformResult = {
  val rdd = recordRDD.rdd
  val errors = emptyRDD

  // Apply a function to the incoming records
  val result = rdd.map(record => {
    record.set("/a", Field.create("A"))
    record
  })

  new TransformResult(result.toJavaRDD(), new JavaPairRDD[Record, String](errors))
}

However, if I remove the line record.set("/a", Field.create("A")), it works fine. Is this bug or someting ?

I had test on CDH 6.0.0 and Streamsets 3.5.0, Streamsets 3.4.2. I also tested on CDH 5.11.0 and Streamsets 3.5.0, too.

Used stage library was 'spark 2.1.0 release 1'.

ps.

I've tested same transformer to CDH 5.11.2 + Streamsets 3.2.2.0 + spark 1.6.0-cdh5.11.2. It works fine.

edit retag flag offensive close merge delete

Comments

Hi! Have you checked sdc.log for any errors/exceptions?

iamontheinet gravatar imageiamontheinet ( 2018-10-22 14:46:40 -0600 )edit

Hi! sdc.log is find. The pipeline logs (err, out) is clean, too. In the spark jobs UI, active stage hangs in 'cout at Driver.scala:141'.

Chang gravatar imageChang ( 2018-10-23 00:58:18 -0600 )edit

Please create JIRA at https://issues.streamsets.com and specify all the details including environments and versions you've confirmed it works and in those that it fails. Tx!

iamontheinet gravatar imageiamontheinet ( 2018-10-24 09:53:09 -0600 )edit

Does this work on anything other than CDH6, with the same SDC version? We have seen multiple issues with CDH6, not directly related to our code, but due to incompatible changes in Spark itself.

hshreedharan gravatar imagehshreedharan ( 2018-10-25 10:29:13 -0600 )edit

I've tested several combination. CDH 6 + Streamsets 3.5.0 + spark 2.1.0 release 1 = not work / CDH 6 + streamsets 3.4.2 + spark 2.1.0 release 1 = not work / CDH 5.11.0 + streamsets 3.5.0 + spark 2.1.0 release 1 = not work / CDH 5.11.2 + streamsets 3.2.2.0 + sprk 1.6.0-cdh5.11.2 = work

Chang gravatar imageChang ( 2018-10-26 00:23:53 -0600 )edit