Azure - StreamSets Data Collector with HDInsight HBase Performance

asked 2020-02-03 07:13:35 -0600

jonas_souza gravatar image

updated 2020-02-03 11:36:10 -0600

metadaddy gravatar image


I would like to confirm if somenone has experienced some performance issue when inserting data to Hbase (HdInisght) ?

When I tested I've gotten something near 11 records/seconds .... With a simple table, generating random data.

image description

I've tried a few different libraries , batch sizes, etc without success.

Something that I think that could be related, is that I tried with the Zookeeper Parent Node unsecured... I didn't try with Kerberos Authentication.

I've gotten better results inserting data via Http Client , wasn't good yet, but I could insert a few millions records. But I believe that I missed something, because it should be much better the performance in Azure.

I've found an old improvement regarding the same issue, but without much details, at least I couldn't find. Improvement task

Somebody faced something similar ? Could improve the performance?

Thank you in advance

edit retag flag offensive close merge delete


Hi Jonas, Can you share a screenshot of the stage batch processing timer in the metrics pane (the donut chart).

marko gravatar imagemarko ( 2020-02-10 17:36:18 -0600 )edit