Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

How to check the integrity of the files between the source and destination

I have SFTP as source and destination as Hadoop.I am using StreamSets for ingest the data from source to the destination.The files are moved from the source to the destination successfully.However i like to find a way to check the integrity of files,to confirm the same file has been moved from the source to destination.

I have !cksum command in SFTP and in hadoop hdfs dfs -Ddfs.checksum.combine.mode=COMPOSITE_CRC -checksum "location of the file"

But i am getting two different results,how do i fix this?.Please share your comments/suggestions.

Thanks in advance.

How to check the integrity of the files between the source and destination

I have SFTP as source and destination as Hadoop.I am using StreamSets for ingest the data from source to the destination.The files are moved from the source to the destination successfully.However i like to find a way to check the integrity of files,to confirm the same file has been moved from the source to destination.

I have !cksum command in SFTP and in hadoop hdfs dfs -Ddfs.checksum.combine.mode=COMPOSITE_CRC -checksum "location of the file"

In SFTP,here is the command using

But i am getting two different results,how do i fix this?.Please share your comments/suggestions.

Thanks in advance.

How to check the integrity of the files between the source and destination

I have SFTP as source and destination as Hadoop.I am using StreamSets for ingest the data from source to the destination.The files are moved from the source to the destination successfully.However i like to find a way to check the integrity of files,to confirm the same file has been moved from the source to destination.

I have !cksum command in SFTP and in hadoop hdfs dfs -Ddfs.checksum.combine.mode=COMPOSITE_CRC -checksum "location of the file"

In SFTP,here is the command using

sftp>!cksum *filename * sftp>(666126820,14) - the output is hash values along with size of the file.

In Hadoop

hdfs dfs -Ddfs.checksum.combine.mode=COMPOSITE_CRC -checksum filename

MD5-of-0MD5-of-512CRC32C 000002000000000000000000b377153cc30d105a8fa55ca462836dea - This is the output i am getting.

I am running this pipeline in CDH 5.13.As requested i have updated the details please let me know your comments/suggestions

But i am getting two different results,how do i fix this?.Please share your comments/suggestions.provide me the direction...

Thanks in advance.