Files randomly skipped in SFTP -> HDFS pipeline

asked 2019-07-02 15:49:49 -0500

I'm using StreamSets Data Collector 3.5.0 to transfer CSV files from SFTP to HDFS. Multiple directories are being processed at the same time. While this generally works, some files are sometimes skipped and never transferred onto HDFS. I turned on debug level in the logs but could not find anything related to such files. Is that a known issue ? The Delivery Guarantee is set to At Least Once

I could find many errors like this one (1 / hour) but can't tell whether it is related to my issue or not.

2019-06-30 16:19:17,146 [user:*?] [pipeline:-] [runner:] [thread:managerExecutor-pool-3-thread-1] WARN  StandaloneAndClusterPipelineManager - Cannot remove runner for pipeline: 'FTPWhateverAlld5b26cfa-8c5d-4cbf-ab59-667de772e2d1::0' due to ' CONTAINER_0209 - Pipeline state file '/opt/streamsets-env/data/runInfo/FTPWhateverAlld5b26cfa-8c5d-4cbf-ab59-667de772e2d1/0/pipelineState.json' doesn't exist' CONTAINER_0209 - Pipeline state file '/opt/streamsets-env/data/runInfo/FTPWhateverd5b26cfa-8c5d-4cbf-ab59-667de772e2d1/0/pipelineState.json' doesn't exist

Thanks, Mathieu

