How to deal with gaps in PK values in JDBC Multitable consumer

asked 2019-08-28 21:09:45 -0600

stevef gravatar image

Using JDBC Multitable consumer in partition/parallelized mode fails if there are large value gaps in the PK field that is used for partitioning. SDC will either terminate the processing because of an incorrectly-perceived "no more data" state, or it slows down to the point of unusability while it's testing partition boundaries that fall within the gap.

A nice-to-have feature would be a mode that does not have a hard dependency on PK values being contiguous, where the consumer pre-calculates the min/max values for each partition by using LIMIT/OFFSET (in an efficient manner, i.e. if using MySQL, instead of the OFFSET keyword, seek to PK value > previously derived value). then invokes partition copies using these pre-calculated boundaries.

edit retag flag offensive close merge delete

Comments

Thanks, Steve. Since this isn't really a question, could you write it up as an issue at https://issues.streamsets.com/ ? Thanks again!

metadaddy gravatar imagemetadaddy ( 2019-08-28 21:17:52 -0600 )edit