Files randomly skipped in SFTP -> HDFS pipeline

asked 2019-07-02 15:49:49 -0600

MathieuH gravatar image


I'm using StreamSets Data Collector 3.5.0 to transfer CSV files from SFTP to HDFS. Multiple directories are being processed at the same time. While this generally works, some files are sometimes skipped and never transferred onto HDFS. I turned on debug level in the logs but could not find anything related to such files. Is that a known issue ? The Delivery Guarantee is set to At Least Once

That sounds a bit like

I could find many errors like this one (1 / hour) but can't tell whether it is related to my issue or not.

2019-06-30 16:19:17,146 [user:*?] [pipeline:-] [runner:] [thread:managerExecutor-pool-3-thread-1] WARN  StandaloneAndClusterPipelineManager - Cannot remove runner for pipeline: 'FTPWhateverAlld5b26cfa-8c5d-4cbf-ab59-667de772e2d1::0' due to ' CONTAINER_0209 - Pipeline state file '/opt/streamsets-env/data/runInfo/FTPWhateverAlld5b26cfa-8c5d-4cbf-ab59-667de772e2d1/0/pipelineState.json' doesn't exist' CONTAINER_0209 - Pipeline state file '/opt/streamsets-env/data/runInfo/FTPWhateverd5b26cfa-8c5d-4cbf-ab59-667de772e2d1/0/pipelineState.json' doesn't exist

Thanks, Mathieu

edit retag flag offensive close merge delete