sqoop and streamsets not integrating

asked 2018-10-29 21:57:03 -0600

anonymous user

Anonymous

updated 2018-10-29 22:24:05 -0600

i have billions of rows in my table. i cannot use multithreaded paritioning because my primary key is varchar hex value. streamsets is very slow to ingest all the records, so we used sqoop (as-avrodatafile) to load records into hive for history data. For incremental loading, when we are using streamsets, the records are loading as null into hive but the avro data file contains streamsets generated has data. I think this is because of the schema file generated by sqoop and streamsets are different. Is there a way to overcome this issue?

edit retag flag offensive close merge delete