Ask Your Question

Can you change batch size in a custom processor?

asked 2019-06-03 10:48:38 -0500

Eric Richardson gravatar image

I have a custom processor that consumes a whole file (similar to Fun with FileRefs – Manipulating Whole File Data) The problem is that after the origin, the batches are fixed, every record I add to the batch maker is in the same batch sent to the next stage. The number of records in the input batch is 1, the number of records in the output batch is ALL OF THEM.

I am using this custom processor to process data from the Library of Congress. When the input file is large, it blows out the memory with no way to incrementally process the records coming out of the parser. Adding records to the outbound batch here.

I need to send records down-stream in batches

edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted

answered 2019-06-03 10:53:00 -0500

metadaddy gravatar image

You can't manipulate the batch size in a custom processor. Data Collector is designed to store no data locally, to minimize the chance of data loss in case of failure. You should look at doing this in a custom origin, caching results in the origin for subsequent batches and storing enough state in the offset so that you can resume ingest after a failure.

edit flag offensive delete link more



Thanks, I'll start by looking for an example of a custom origin based on the directory spooler. Any suggestions on a good example?

Eric Richardson gravatar imageEric Richardson ( 2019-06-03 11:11:07 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower


Asked: 2019-06-03 10:48:38 -0500

Seen: 593 times

Last updated: Jun 03 '19