Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

sqoop and streamsets not integrating

i have billions of rows in my table. i cannot use multithreaded paritioning because my primary key is varchar hex value. streamsets is very slow to ingest all the records, so we used sqoop (as-avrodatafile) to load records into hive for history data. For incremental loading, when we are using streamsets, the records are loading as null into hive but the avro data file contains that streamsets generated has data. I think this is because of the schema file generated by sqoop and streamsets are different. Is there a way to overcome this issue?

sqoop and streamsets not integrating

i have billions of rows in my table. i cannot use multithreaded paritioning because my primary key is varchar hex value. streamsets is very slow to ingest all the records, so we used sqoop (as-avrodatafile) to load records into hive for history data. For incremental loading, when we are using streamsets, the records are loading as null into hive but the avro data file contains that streamsets generated has data. I think this is because of the schema file generated by sqoop and streamsets are different. Is there a way to overcome this issue? issue?