Does a batch Custom StreamSets Processor example exist?

asked 2020-08-14 10:59:20 -0500

AlvinMurphy gravatar image

I'm working on an open source geocoding processor and understanding how or having a template for how custom java processors are templated in batch would be extremely helpful. https://github.com/streamsets/tutoria... and https://github.com/metadaddy/transfor... are both excellent resources but neither address custom batch processors (both extending from SingleLaneRecordProcessor).

Looking at the source code for SingleLaneRecordProcessor as well I can't seem to find an inherited type that takes this into account.

Geocoding services like geocod.io perform better under batch operations so having the stage set a min batch would be incredibly helpful to this oss project.

Am I missing an archtype or template somewhere for BatchRecordProcessor?

edit retag flag offensive close merge delete

Comments

If it helps set folks mental model we're converting a Jython processor that does a `Batch by Batch` flow now, we iterate over the records building our batch call prior to writing out the records. It's pretty set in functionality so we want to move it to a full-fledged processor for perf reasons

AlvinMurphy gravatar imageAlvinMurphy ( 2020-08-17 14:19:19 -0500 )edit