HTTP Client origin with Pagination and "Keep all fields" just generate multiple copies of last record

asked 2020-08-26 07:58:33 -0500

Pieter gravatar image

updated 2020-08-28 03:41:39 -0500

HTTP Client origin with Pagination and "Keep all fields" just generate multiple copies of last record

I have a problem with the HTTP Client origin with Pagination and "Keep all fields" enabled. When my HTTP service responds with multiple records, the Data Collector origin correctly generates the right number of output records (matching the input record count). "Previewing" the dataflow also shows the individual records correctly. However, when hooking a Destination up (any destination) - only the last record in the HTTP response is written to the destination, but it is repeated exactly as is to match the count of records in the HTTP response. (When "Keep all fields" are not enabled, obviously I don't get the "wrapper" fields, but the destination records are then correctly written).

I'll show the example using a Kafka destination, but exactly the same happens with JDBC Producer or even just a Local File System destination.

My REST service responds to a GET : "http://<url>?eventsAfter=<timestamp>" (timestamp in example format: "2020-08-26T13:30:50.000")

Firstly, my HTTP Client origin is set up as "Polling", with Pagination "Link in Response Field" (configured to "/lastReturned") and a "stop" condition (as "${record:value('/stop')}"). Results field path configured as "/events" The pipeline behaves perfectly fine and correctly with paging / polling / stopping - no issues there.

Sample HTTP response from my REST service:

{
    "lastReturned": "2020-08-26T13:31:56.957113000",
    "stop": false,
    "events": [
        {
            "uuid": "aaecb34c-929b-4960-beda-b0c341138fde",
            "eventType": "type55",
            "timestamp": "2020-08-21T11:05:46.490463",
            "sequence": 69
        },
        {
            "uuid": "5b83184c-b863-4444-a81d-bee47a7a50f8",
            "eventType": "type20",
            "timestamp": "2020-08-21T11:05:46.490463",
            "sequence": 70
        }
    ]
}

When I have "Keep all fields" enabled in the HTTP Client origin and I "Preview" the pipeline - the following two records correctly flow through the pipeline:

{
    "lastReturned": "2020-08-26T13:31:56.957113000",
    "stop": false,
    "events": [
        {
            "uuid": "aaecb34c-929b-4960-beda-b0c341138fde",
            "eventType": "type55",
            "timestamp": "2020-08-21T11:05:46.490463",
            "sequence": 69
        }
    ]
}

and

{
    "lastReturned": "2020-08-26T13:31:56.957113000",
    "stop": false,
    "events": [
        {
            "uuid": "5b83184c-b863-4444-a81d-bee47a7a50f8",
            "eventType": "type20",
            "timestamp": "2020-08-21T11:05:46.490463",
            "sequence": 70
        }
    ]
}

However, when the pipeline is running, only the last record (with sequence: 70) is written to the destination - twice. (If the REST service responded with 7 records, the pipeline again writes just the last record to the destination, 7 times - so always the "right" number of records written, but the contents is always just the last record).

Is this a known issue in the Data Collector, or am I doing something wrong? This happens both in version 3.17.1 and 3.13.0. I've also tried with various batch sizes from 1 to thousands and different batch-timeouts.

When I disable "Keep all fields", the destination gets the two records written correctly (minus the wrapping fields):

{
    "uuid": "aaecb34c-929b-4960-beda-b0c341138fde",
    "eventType": "type55",
    "timestamp": "2020-08-21T11:05:46.490463",
    "sequence": 69
}

and

{
    "uuid": "5b83184c-b863-4444-a81d-bee47a7a50f8",
    "eventType": "type20",
    "timestamp": "2020-08-21T11:05:46.490463",
    "sequence": 70
}
edit retag flag offensive close merge delete