Ask Your Question
1

Why I can't fetch mysql records more than batch size in full mode?

asked 2020-06-05 03:54:31 -0500

azurewater gravatar image

updated 2020-06-06 09:08:03 -0500

The origin stage is a mysql jdbc query consumer. The query is "select * from ${dbName}.${tableName}" and table has more records than batch size. The destination is a kudu table. I aslo added a pipeline finisher when the mysql records are all readed.

But when I run the pipeline, it can't fetch mysql records more than batch size configured. In a loop mysql fetch batch size records, then pool will close the connection before the records are written to kudu. Because there is no offset, next time mysql execute query again and fetch batch size records from beginning again. The records more than batch size won't be fetched and the loop won't end.

Part of debug log as follow: 2020-06-05 16:47:48,822 DEBUG Using query fetch size: -2147483648 JdbcSource *admin
2020-06-05 16:47:48,822 DEBUG Executing query: a5a4bb2e9ba499384729c777fadfa1501b8124f96b40e4d9c79f6376b8fe6f65 JdbcSource *admin
2020-06-05 16:47:48,824 DEBUG Processed rows: 10 JdbcSource *admin
2020-06-05 16:47:48,824 DEBUG HikariPool-31 - Executed rollback on connection com.mysql.cj.jdbc.ConnectionImpl@4974595 due to dirty commit state on close(). ProxyConnection *admin
2020-06-05 16:47:48,824 DEBUG Expression 'impala::sdec_kudu_ods.ods_${str:toLower(tableName)}' is evaluated to 'impala::sdec_kudu_ods.ods_desktop_document' : ELUtils *admin 0
... 2020-06-05 16:47:48,824 DEBUG flushing buffer: Buffer{operations=10, flusherTask=null, flushNotification=Deferred@577903148(state=PENDING, result=null, callback=<none>, errback=<none>)} AsyncKuduSession *admin 0
2020-06-05 16:47:48,824 DEBUG callback=org.apache.kudu.client.AsyncKuduSession$1@62fc0ddf@1660685791 returned Deferred@450692771(state=PENDING, result=null, callback=ConvertBatchToListOfResponsesCB -> (continuation of Deferred@1959496707 after org.apache.kudu.client.AsyncKuduSession$1@62fc0ddf@1660685791), errback=passthrough -> (continuation of Deferred@1959496707 after org.apache.kudu.client.AsyncKuduSession$1@62fc0ddf@1660685791)), so the following Deferred is getting paused: Deferred@1959496707(state=PAUSED, result=Deferred@450692771, callback=<none>, errback=<none>) Deferred *admin 0
2020-06-05 16:47:48,827 DEBUG Saving offset {$com.streamsets.datacollector.pollsource.offset$=} for pipeline ME718c0784-f7f7-4199-a086-3181e826c162 ProductionSourceOffsetTracker *admin
2020-06-05 16:47:48,827 DEBUG Using query fetch size: -2147483648 JdbcSource *admin
2020-06-05 16:47:48,827 DEBUG Executing query: a5a4bb2e9ba499384729c777fadfa1501b8124f96b40e4d9c79f6376b8fe6f65 JdbcSource *admin
2020-06-05 16:47:48,828 DEBUG Processed rows: 10 JdbcSource *admin
2020-06-05 16:47:48,829 DEBUG HikariPool-31 - Executed rollback on connection com.mysql.cj.jdbc.ConnectionImpl@4974595 due to dirty commit state on close().

I guess the problem is the mysql connection is closed uncorrectly. How to solve it?

I found an issue with jdbc-protolib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/JdbcSource.java :

SDC-14882. JDBC Query Consumer closing the connection after each batch

In SDC-13540 we changed the way the stage handles the database connection. The connection is closed after a batch is generated and that implicitly closes the ResultSet. Any new batch is built again with the ResultSet pointing to the first record in the table.

This is an issue when working in non-incremental mode. We must keep the ResultSet open across batches, to iterate over new records when the queried table is longer than the max batch ... (more)

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
1

answered 2020-06-06 09:12:06 -0500

azurewater gravatar image

I saw Mr. hprop commit SDC-14882 2 days ago. So he saw my question and solved it? Thanks a lot.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2020-06-05 03:54:31 -0500

Seen: 28 times

Last updated: Jun 06