Ask Your Question

Treating a file's content as a single stream

asked 2018-05-04 09:49:39 -0500

chrisbird gravatar image

Here's the whole problem - from a total newbie, so please be gentle.

In production, I will have a source system (outside my control) that will send large, chunky XML documents. In the processing stream they will be parsed and handled.

During testing, I have some sample files that have been "pretty printed" for human readability. So they will have CR/LF characters and tab characters throughout. I am less worried about the tabs, but the CR/LF is causing me difficulty. I would like to be able to hook my processing pipeline up to a file reader in early testing (i.e. mocking the connection to the real input stream). I can imagine that one way to approach this would be to edit the files externally - stripping the CR/LF characters. This feels very unsatisfactory to me.

Is there a way to gather up the records in a single file and output a stream from some stage - or would I be better writing such a stage myself? Assuming that I do have some programming ability that is.

Thanks for helping Chris

edit retag flag offensive close merge delete


Perhaps I'm not understanding clearly, but it sounds like you should be able to use a file origin (directory or file tail) with the XML data format. Did you try that yet?

jeff gravatar imagejeff ( 2018-05-04 17:18:28 -0500 )edit

Thanks, I did try that. All that seams to happen is that I get each line, one at a time - and not the full document. Essentially I want to catenate all the lines into a single string so that the XML processor has a valid document to work on. I suspect that it is a simple setting somewhere.

chrisbird gravatar imagechrisbird ( 2018-05-05 06:34:38 -0500 )edit

Would you mind sharing your pipeline?

jeff gravatar imagejeff ( 2018-05-07 13:51:22 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted

answered 2018-05-08 08:34:41 -0500

chrisbird gravatar image

It is confession time. I was rebuilding the pipeline, getting ready to send it for Jeff to help with (Thanks Jeff), and I realized that I had not checked the "ignore special characters" option in the input directory data format. Specifying that and the data were sent on as expected to the XML parser. So, thank you to Jeff for making me redo the pipeline.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2018-05-04 09:49:39 -0500

Seen: 34 times

Last updated: May 08