Ask Your Question

pdf Processing

asked 2020-06-02 05:23:31 -0500

sashish gravatar image

updated 2020-06-02 05:25:18 -0500

Hi ,

I have a query that I want to read a pdf file as whole from hdfs and want to insert into mongodb as a whole file and inside mongodb it should show as one file red in binary format.

When i am trying to read pdf file and inserting into mongodb collection as a whole file then it is reading the content of pdf and dumping into mongodb collection as text data, which is an issue. i need a file inside mongodb as a one file in binary format. is it possible via streamsets?


edit retag flag offensive close merge delete


HI . My source is hdfs and my destination is mongodb. i have pdf files inside hdfs directory. data format i am using at the source level is whole file as i want to get the complete file as one. and at the destination side i want to insert this complete file as one in a collection as binary.

sashish gravatar imagesashish ( 2020-06-02 09:30:49 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted

answered 2020-06-03 12:04:45 -0500

iamontheinet gravatar image


MongoDB destination does not support binary data format. To see which origins, processors and destinations support which data formats, refer to our documentation.

Cheers, Dash

edit flag offensive delete link more


Since, Kafka destination and http client are there to support binary data format. but it is throwing an exception while running the pipeline. BINARY_GENERATOR_00 - Record '1896434571_4175_935E1982-8062-3802-E068-7E4073546D06_L-1a_20200210182041.PDF' cannot convert field path '/' value to byte[]

sashish gravatar imagesashish ( 2020-06-04 01:56:00 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower


Asked: 2020-06-02 05:23:31 -0500

Seen: 14 times

Last updated: Jun 03