Metadata Drift Solution for File based Ingestion

asked 2018-08-24 14:29:04 -0600

jay1988 gravatar image

Is it possible to do Metadata drift solution for a file based ingestion.We are doing it for a Database based Ingestion but not for file based.Can we do that?

answered 2018-08-24 20:21:46 -0600

metadaddy gravatar image

updated 2018-08-26 20:38:48 -0600

You can use the Hive Metadata processor and Hive Metastore destination with any origin. The processor just examines the incoming record's structure and metadata, such as field names.

If you're reading CSV data from a file, that should work just fine - you can use the headers from the file, or assign them in the pipeline using the Field Renamer processor. If you're reading something with more structure, such as Avro or JSON, you'll need to flatten the record before it hits the processor.

In that cases does the file should contain header information?Also if header information is present can the file be processed as whole file or should it be record based processing?

jay1988 gravatar imagejay1988 ( 2018-08-24 22:28:14 -0600 )edit

Edited my answer to include info on headers. You will need to use record-based processing.

metadaddy gravatar imagemetadaddy ( 2018-08-26 20:39:21 -0600 )edit
