Ask Your Question

Can't set batch size to more than 1000 records

asked 2017-03-22 23:02:58 -0500

metadaddy gravatar image

updated 2017-04-27 15:58:02 -0500

No matter which origin I'm using, if I set the batch size to > 1000 records, it has no effect - all I see are batches of 1000. How do I get SDC to process bigger batches?

edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted

answered 2017-03-22 23:06:28 -0500

metadaddy gravatar image

The maximum batch size set in $SDC_CONF/ overrides the value in the origin. Increase the production.maxBatchSize property there and you will be able to configure bigger batch sizes.

Note - there are trade-offs in configuring large batch sizes. For example, a batch of records must be able to fit into memory.

edit flag offensive delete link more

answered 2017-05-17 19:49:42 -0500

Tuple gravatar image

You'll want to keep an eye on your JVM heap size if you increase the batch size significantly. I bumped it up to 10,000 records at one point and found that StreamSets would occasionally crash on the machine and I would have to hard restart it. You might also need to increase the JVM heap size itself.

A bit risky to increase this a lot of you have other important pipelines on the same box because they all share the JVM heap.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools



Asked: 2017-03-22 23:02:58 -0500

Seen: 1,000 times

Last updated: May 17 '17