Ask Your Question

How to check the integrity of the files between the source and destination

asked 2019-06-20 04:22:35 -0500

Jeyakumar gravatar image

updated 2019-06-24 02:07:18 -0500

I have SFTP as source and destination as Hadoop.I am using StreamSets for ingest the data from source to the destination.The files are moved from the source to the destination successfully.However i like to find a way to check the integrity of files,to confirm the same file has been moved from the source to destination.

I have !cksum command in SFTP and in hadoop hdfs dfs -Ddfs.checksum.combine.mode=COMPOSITE_CRC -checksum "location of the file"

In SFTP,here is the command using

sftp>!cksum *filename * sftp>(666126820,14) - the output is hash values along with size of the file.

In Hadoop

hdfs dfs -Ddfs.checksum.combine.mode=COMPOSITE_CRC -checksum filename

MD5-of-0MD5-of-512CRC32C 000002000000000000000000b377153cc30d105a8fa55ca462836dea - This is the output i am getting.

I am running this pipeline in CDH 5.13.As requested i have updated the details please let me know your comments/suggestions

But i am getting two different results,how do i fix this?.Please provide me the direction...

Thanks in advance.

edit retag flag offensive close merge delete



Please paste the full command you are using to calculate the checksum on the FTP side. Also edit your question to include information about your Hadoop cluster environment. Also, does a simple `diff` command show a difference between these files?

jeff gravatar imagejeff ( 2019-06-21 12:33:02 -0500 )edit

My money is on the hdfs checksum using a different algorithm from the SFTP one

metadaddy gravatar imagemetadaddy ( 2019-06-21 13:21:55 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted

answered 2019-06-21 13:43:14 -0500

metadaddy gravatar image

The cksum command and Hadoop's CRC checksum use different algorithms. You must use the same algorithm when comparing checksums.

This page gives one method for comparing checksums of files in HDFS against local files, using the crc32 command: Comparing checksums in HDFS.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2019-06-20 04:22:35 -0500

Seen: 448 times

Last updated: Jun 24 '19