Ask Your Question

How we know all data has been loaded or not into destination?

asked 2017-12-14 18:57:49 -0500

anonymous user


updated 2017-12-15 15:38:36 -0500

metadaddy gravatar image

Here are my use case: I have to read the files from Local folder on every month 1st and load these all files into Hadoop . Once we load the data into Hadoop how we stops the pipeline automatically and how we know all files are loaded or not?

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2017-12-22 13:06:11 -0500

Roh gravatar image

updated 2017-12-29 10:41:37 -0500

The Local FS destination can generate events that you can use in an event stream. When you enable event generation, the destination generates event records each time the destination closes a file or completes streaming a whole file.

The event can be connected to the pipeline finisher executor, Configure a precondition for the Pipeline Finisher In the executor, add a precondition to allow only a no-more-data event into the stage to trigger the executor. You can use the following expression:

${record:eventType() == 'no-more-data'}

Tip: Records dropped because of a precondition are handled based on the stage error handling configuration. So to avoid racking up error records, you might also configure the Pipeline Finisher executor to discard error records.

Use this method when pipeline logic allows you to discard other event types generated by the origin.

How can we know if all the files are loaded or not ??

I'm not completely sure how you can monitor this without checking the file count or going to the UI, but you can trigger an email in case of pipeline failure/error while it's writing to the hdfs, that way you will be notified on issues at least.

Condition to add in the email executor :

`${record:eventType() == 'ERROR'}` 

This can be email body : Pipeline ${pipeline:title()} encountered an error. 

At ${time:millisecondsToDateTime(record:eventCreation() * 1000)}, Writing to HDFS failed: ${record:value('/id')}

You can always customize the email body based on the information you need. Please follow [this] for more understanding of email executor

More on how the pipeline finisher works

Your end product should look something like this : image description

edit flag offensive delete link more


@Roh: Thank you for response. I did the pipeline which you showed in your answer. But my pipelines is never stops after reading all files from Directory. i waited almost 1 hour but still it running stage only.

praneeth gravatar imagepraneeth ( 2017-12-22 14:14:29 -0500 )edit

@Roh: But when i added to Directory it works fine!!

praneeth gravatar imagepraneeth ( 2017-12-22 14:15:06 -0500 )edit

@praneeth added to the directory in the sense, destination as directory ??

Roh gravatar imageRoh ( 2017-12-29 10:39:22 -0500 )edit

@Roh No "pipeline finsher Excutor" added to the origin dirctory.

praneeth gravatar imagepraneeth ( 2017-12-29 11:35:02 -0500 )edit

Destinations never produce no-more-data event.

trank gravatar imagetrank ( 2019-04-12 13:27:06 -0500 )edit
Login/Signup to Answer

Question Tools



Asked: 2017-12-14 18:57:49 -0500

Seen: 1,471 times

Last updated: Dec 29 '17