Ask Your Question

Getting whole files from SFTP server and storing them to HDFS crashes sometimes

asked 2018-04-10 13:58:11 -0500

pwel gravatar image

updated 2018-04-10 14:00:27 -0500

My pipeline justs consists of a SFTP origin and a Hadoop FS destination. Both components are defined to handle "whole files".

The write on Hadoop FS produces an error (Pipeline Status: RUN_ERROR: com.streamsets.pipeline.api.StageException: HADOOPFS_13 - Error while writing to HDFS: com.streamsets.pipeline.api.StageException: HADOOPFS_14 - Cannot write record: org.apache.commons.vfs2.FileSystemException: Unknown message with code "Failure".)

This happens whenever I copy a large new file (e.g. 100 MB) to the FTP server while the pipeline is running. However, it works fine when I stop the pipeline, then copy the file and start the pipeline again.

Any idea what this caused? Could it be the incomplete file? But why is it read from SFTP when it is incomplete?

Thanks and Regards, Peter

edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted

answered 2018-04-16 22:50:40 -0500

hshreedharan gravatar image

SFTP origin expects that the files are moved into the directory atomically. Are you using mv or cp from the same file system to move the file into that directory?

edit flag offensive delete link more


Thanks. On my test-env it works when moving the files. Unfortunately I do not have control over the SFTP server. Hence I am missing a function that checks the modified date and file size for a configurable time and only starts the gathering if they did not change over that time.

pwel gravatar imagepwel ( 2018-04-17 00:16:00 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower


Asked: 2018-04-10 13:58:11 -0500

Seen: 809 times

Last updated: Apr 16 '18