Ask Your Question

Aggregate data in CSV or RabbitMQ source

asked 2018-01-27 05:58:37 -0500

Busbar gravatar image

Hi, I am looking into the options here, I was using Flink to aggregate data over a window of 5 minutes from RabbitMQ, we ingest millions of events every 5 minutes and would like to group by couple of fields and sum one field, this is so easy with flink but was looking into streamsets and it was easy to do drag/drop things

yet I am struggling with the SUM/Aggregation processor, I assume that I can get data and sum it, but I read that Streamsets works on per record basis, so how to do it it then ?!

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2018-01-31 08:00:51 -0500

tmcgrath gravatar image

You're correct, the aggregation(s) will not be added to the stream of records. To obtain and use the results of the aggregation, configure the Aggregation Processor to "Produce Events". For example, if you'd like to store the results pipeline aggregation(s) to some kind of durable storage or write to a Kafka topic, etc.

Some stages in StreamSets are able to produce events and Aggregation Processor is one of them.

I'd suggest looking here to learn more about the events the Aggregation processor generates.

From that section, there's links to event handling examples such as

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2018-01-27 05:58:37 -0500

Seen: 35 times

Last updated: Jan 31