Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Files randomly skipped in SFTP -> HDFS pipeline

Hi,

I'm using StreamSets Data Collector 3.5.0 to transfer CSV files from SFTP to HDFS. Multiple directories are being processed at the same time. While this generally works, some files are sometimes skipped and never transferred onto HDFS. I turned on debug level in the logs but could not find anything related to such files. Is that a known issue ? The Delivery Guarantee is set to At Least Once

That sounds a bit like https://ask.streamsets.com/question/4552/sftpftp-cant-get-all-the-files-when-there-are-many-files-at-the-same-time/

I could find many errors like this one (1 / hour) but can't tell whether it is related to my issue or not.

2019-06-30 16:19:17,146 [user:*?] [pipeline:-] [runner:] [thread:managerExecutor-pool-3-thread-1] WARN  StandaloneAndClusterPipelineManager - Cannot remove runner for pipeline: 'FTPWhateverAlld5b26cfa-8c5d-4cbf-ab59-667de772e2d1::0' due to 'com.streamsets.datacollector.store.PipelineStoreException: CONTAINER_0209 - Pipeline state file '/opt/streamsets-env/data/runInfo/FTPWhateverAlld5b26cfa-8c5d-4cbf-ab59-667de772e2d1/0/pipelineState.json' doesn't exist' com.streamsets.datacollector.store.PipelineStoreException: CONTAINER_0209 - Pipeline state file '/opt/streamsets-env/data/runInfo/FTPWhateverd5b26cfa-8c5d-4cbf-ab59-667de772e2d1/0/pipelineState.json' doesn't exist
        at com.streamsets.datacollector.execution.store.FilePipelineStateStore.loadState(FilePipelineStateStore.java:155)

Thanks, Mathieu