Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Issue with closing and renaming of output files in HDFS destination

Hello,

We have used StreamSets 3.4.0 for implementing data ingestion pipelines from Kafka to HDFS. In these pipelines, the configuration parameters Max File Size (MB) and Idle Timeout of the HDFS destination have been set to 0 and 10 minutes, respectively. We have observed that in several cases, the closing and renaming of the output file fails. In StreamSets logs, we have found two types of exceptions shown bellow. Is there a way we can resolve the above issue?

Error closing writer RecordWriter[path='/file_path/_tmp_file_name-fed85533-9be0-11e8-8d69-0179c1a8dd01_0'] : java.io.IOException: Could not rename '/file_path/_tmp_file_name-fed85533-9be0-11e8-8d69-0179c1a8dd01_0' to '/file_path/file_name-fed85533-9be0-11e8-8d69-0179c1a8dd01_3458d8e5-b936-47a3-b73e-bd136adb9f29'
    java.io.IOException: Could not rename '/file_path/_tmp_file_name-fed85533-9be0-11e8-8d69-0179c1a8dd01_0' to '/file_path/file_name-fed85533-9be0-11e8-8d69-0179c1a8dd01_74c04d16-8612-4658-ad88-d6bd5f6b4bb3'
    at com.streamsets.pipeline.stage.destination.hdfs.writer.DefaultFsHelper.renameAndGetPath(DefaultFsHelper.java:114)
    at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriterManager.renameToFinalName(RecordWriterManager.java:200)
    at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriterManager.commitWriter(RecordWriterManager.java:362)
    at com.streamsets.pipeline.stage.destination.hdfs.writer.ActiveRecordWriters.release(ActiveRecordWriters.java:164)
    at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriter.close(RecordWriter.java:233)
    at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriter.access$200(RecordWriter.java:48)
    at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriter$IdleCloseCallable.call(RecordWriter.java:327)
    at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriter$IdleCloseCallable.call(RecordWriter.java:315)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

ERROR RecordWriter - Error while attempting to close /file_path/_tmp_file_name-fed85533-9be0-11e8-8d69-0179c1a8dd01_0
    java.io.IOException: Unable to close file because the last block does not have enough number of replicas.
        at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2680)
        at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2642)
        at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2606)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
        at org.apache.hadoop.crypto.CryptoOutputStream.close(CryptoOutputStream.java:241)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
        at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
        at org.apache.commons.io.output.ProxyOutputStream.close(ProxyOutputStream.java:117)
        at org.apache.commons.io.output.ProxyOutputStream.close(ProxyOutputStream.java:117)
        at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:320)
        at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:149)
        at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
        at com.streamsets.pipeline.lib.generator.text.TextCharDataGenerator.close(TextCharDataGenerator.java:103)
        at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriter.close(RecordWriter.java:226)
        at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriter.access$200(RecordWriter.java:48)
        at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriter$IdleCloseCallable.call(RecordWriter.java:327)
        at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriter$IdleCloseCallable.call(RecordWriter.java:315)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)