Ask Your Question
1

Issue with closing and renaming of output files in HDFS destination

asked 2019-06-21 02:23:23 -0500

George gravatar image

Hello,

We have used StreamSets 3.4.0 for implementing data ingestion pipelines from Kafka to HDFS. In these pipelines, the configuration parameters Max File Size (MB) and Idle Timeout of the HDFS destination have been set to 0 and 10 minutes, respectively. We have observed that in several cases, the closing and renaming of the output file fails. In StreamSets logs, we have found two types of exceptions shown bellow. Is there a way we can resolve the above issue?

Error closing writer RecordWriter[path='/file_path/_tmp_file_name-fed85533-9be0-11e8-8d69-0179c1a8dd01_0'] : java.io.IOException: Could not rename '/file_path/_tmp_file_name-fed85533-9be0-11e8-8d69-0179c1a8dd01_0' to '/file_path/file_name-fed85533-9be0-11e8-8d69-0179c1a8dd01_3458d8e5-b936-47a3-b73e-bd136adb9f29'
    java.io.IOException: Could not rename '/file_path/_tmp_file_name-fed85533-9be0-11e8-8d69-0179c1a8dd01_0' to '/file_path/file_name-fed85533-9be0-11e8-8d69-0179c1a8dd01_74c04d16-8612-4658-ad88-d6bd5f6b4bb3'
    at com.streamsets.pipeline.stage.destination.hdfs.writer.DefaultFsHelper.renameAndGetPath(DefaultFsHelper.java:114)
    at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriterManager.renameToFinalName(RecordWriterManager.java:200)
    at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriterManager.commitWriter(RecordWriterManager.java:362)
    at com.streamsets.pipeline.stage.destination.hdfs.writer.ActiveRecordWriters.release(ActiveRecordWriters.java:164)
    at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriter.close(RecordWriter.java:233)
    at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriter.access$200(RecordWriter.java:48)
    at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriter$IdleCloseCallable.call(RecordWriter.java:327)
    at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriter$IdleCloseCallable.call(RecordWriter.java:315)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

ERROR RecordWriter - Error while attempting to close /file_path/_tmp_file_name-fed85533-9be0-11e8-8d69-0179c1a8dd01_0
    java.io.IOException: Unable to close file because the last block does not have enough number of replicas.
        at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2680)
        at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2642)
        at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2606)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
        at org.apache.hadoop.crypto.CryptoOutputStream.close(CryptoOutputStream.java:241)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
        at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
        at org.apache.commons.io.output.ProxyOutputStream.close(ProxyOutputStream.java:117)
        at org.apache.commons.io.output.ProxyOutputStream.close(ProxyOutputStream.java:117)
        at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:320)
        at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:149)
        at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
        at com.streamsets.pipeline.lib.generator.text.TextCharDataGenerator.close(TextCharDataGenerator.java:103)
        at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriter.close(RecordWriter.java:226)
        at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriter.access$200(RecordWriter.java:48)
        at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriter$IdleCloseCallable.call(RecordWriter.java:327)
        at com.streamsets.pipeline.stage.destination.hdfs.writer.RecordWriter$IdleCloseCallable.call(RecordWriter.java:315)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask ...
(more)
edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
0

answered 2019-06-21 13:29:11 -0500

metadaddy gravatar image

The Unable to close file error is from Hadoop. It looks like a Hadoop data node died during the write, and there is not enough redundancy in the cluster to handle the write. You need to check your cluster logs and reconfigure your cluster for more redundancy.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2019-06-21 02:23:23 -0500

Seen: 716 times

Last updated: Jun 21