Can we use streamsets to insert/update (upsert) in solr

asked 2018-03-25 22:28:57 -0600

devpa gravatar image

updated 2018-03-27 10:34:58 -0600

Hi,

Is there any way to update existing documents in solr through streamsets?

Following are the sample of records and taskInstanceId is unique identifier here. First time when it sees, unique taskInstanceId and eventKind as Start, it should insert in solr and if it sees records with same taskInstanceId and eventKind as Active or Completed it should update the records in Solr.

For a particular taskinstanceId, eventKind as Start/Active/Completed will come only once. No duplicates.

{
    "id": 1,
    "eventKind": "Start",
    "taskTime": "2018-03-26T13:15:30Z",
    "taskIinstanceId": "1234"
}

{
    "id": 2,
    "eventKind": "Active",
    "taskTime": "2018-03-26T14:15:30Z",
    "taskIinstanceId": "1234"
}

{
    "id": 3,
    "eventKind": "Completed",
    "taskTime": "2018-03-26T15:15:30Z",
    "taskIinstanceId": "1234"
}

expected records in Solr after streamsets processing.

Task started - first insert

{
    "id": 4,
    "taskStatus": "start,
    "taskStartTime": "2018-03-26T13:15:30Z",
    "taskClaimTime": "",
    "taskCompletedTime": "",
    "taskInstanceId": "1234"
}

task active - Update in same document

{
    "id": 4,
    "taskStatus": "Active",
    "taskStartTime": "2018-03-26T13:15:30Z",
    "taskClaimTime": "2018-03-26T14:15:30Z",
    "taskCompletedTime": "",
    "taskIinstanceId": "1234"
}

task is completed. Another update in same document.

 {   
      "id": 4,
        "taskStatus": "Completed",
        "taskStartTime": "2018-03-26T13:15:30Z",
        "taskClaimTime": "2018-03-26T14:15:30Z",
        "taskCompletedTime": "2018-03-26T15:15:30Z",
        "taskIinstanceId": "1234"
    }
edit retag flag offensive close merge delete