How do I create whole file content in Jython?

asked 2019-08-28 15:17:28 -0600

bcd2682

updated 2019-08-28 23:14:13 -0600

We have a sdc pipeline which is moving data to kudu from cvs file delimited, it is working fine, but they want to archive the files in HDFS, when we try to create the name for the file using timestamp and file info, it crashes because data is coming from file delimited. It is the reason we are trying to get an answer to question below I have a file delimited that I want to archive in HDFS as Whole file(storing the info into Kudu). I would like to know If I can create Jython script to transform the file delimited to Whole file.

Aside from the answer I gave below, could you edit your question with a bit more detail on your use case. Why do you need whole files, rather than writing record-by-record to HDFS, or direct to Kudu?

metadaddy ( 2019-08-28 22:55:29 -0600 )

1 Answer

answered 2019-08-28 22:54:28 -0600

metadaddy

You can easily read whole file content from Jython, but writing it will be a bit more tricky. I wrote a blog entry on a related topic a while ago: Fun with FileRefs – Manipulating Whole File Data; looking at Jython and Java Integration, you may be able to do something similar from Jython, extending FileRef with your own class defined in the Jython script.

I already did read a lot of your help and I am able to read from a whole file the fileref and get the records from there to create new ones and then to moved those records kudu table. But I am trying to modify the Hadoop portion, because it is easy to modify and I won't affect current pipeline logic

bcd2682 ( 2019-08-28 23:09:00 -0600 )

If I could get help from the object model from streamset maybe I can create it using jython. This is my fifth day using streamsets.

bcd2682 ( 2019-08-28 23:15:31 -0600 )
