Ask Your Question
1

Multiple data collector for a job without duplicating records

asked 2018-07-17 02:41:48 -0600

tamizh gravatar image

I have a directory consist of multiple files, and that is shared across multiple data collectors. I have a job to process those files and put it in the destination. Because the records are huge, I want to run the job in multiple data collector. but when I tried I got the duplicate entries in my destination. Is there a way to achieve it without duplicating the records. Thanks

edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted
0

answered 2018-07-17 20:40:31 -0600

metadaddy gravatar image

updated 2018-07-18 09:35:16 -0600

At present there is no way to automatically partition directory contents across multiple data collectors.

You could run similar pipelines on multiple data collectors and manually partition the data in the origin using different character ranges in the File Name Pattern configurations. For example, if you had two data collectors, and your file names were distributed across the alphabet, the first instance might process [a-m]* and the second [n-z]*.

One way to do this would be by setting File Name Pattern to a runtime parameter - for example ${FileNamePattern}. You would then set the value for the pattern in the pipeline's parameters tab, or when starting the pipeline via the CLI, API, UI or Control Hub.

edit flag offensive delete link more

Comments

How should I differentiate the **file name pattern** configuration of a pipeline for the data collectors?. If the same pipeline getting processed across the pipeline, then the same configuration applied too, which means the duplicate issue still happens. Right ???

tamizh gravatar imagetamizh ( 2018-07-18 02:14:31 -0600 )edit

You would run *similar* pipelines on the different data collector instances, so each one has a different file name pattern. I'll update the answer with more detail.

metadaddy gravatar imagemetadaddy ( 2018-07-18 09:32:34 -0600 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2018-07-17 02:41:48 -0600

Seen: 74 times

Last updated: Jul 18