Ask Your Question
0

HTTP Client Origin/Processor Stage Discrepancy

asked 2019-08-08 09:58:33 -0500

daveh gravatar image

There appears to be a significant discrepancy between the way the HTTP Clients in the Origin/Processor stages returns response data to the pipeline from an external API call. Whereby the Processor stage is only returning the very first line in the response to the pipeline.

Attempting to describe as follows.

Below are the data returned from the API call for:

  • Postman call returning the raw data as is
  • Streamsets Origin Stage HTTP Client and how it returns the data to the pipeline
  • Streamsets Proccessor Stage HTTP Client and how it returns the data to the pipeline

Raw Data

(as returned using Postman - -note lines separated by \n)

["Email Address","First Name","Last Name","Address","Phone Number","Website","Marketing Permissions","MEMBER_RATING","OPTIN_TIME","OPTIN_IP","CONFIRM_TIME","CONFIRM_IP","LATITUDE","LONGITUDE","GMTOFF","DSTOFF","TIMEZONE","CC","REGION","LAST_CHANGED","LEID","EUID","NOTES"]
["user1@gmail.com","John","Doe","","","",null,2,"2019-08-08 00:54:15","198.177.197.105","2019-08-08 00:54:32","198.177.197.105",null,null,null,null,null,null,null,"2019-08-08 00:54:32","222029885","126dad7900",null]
["user2@gmail.com","Jane","Doe","","","",null,2,"2019-08-08 00:49:30","199.177.197.105","2019-08-08 00:50:02","199.177.197.105",null,null,null,null,null,null,null,"2019-08-08 00:50:02","222029877","b209b6dda0",null]

Notice: Three lines. The first is a header line. Next two are data. Granted this is a weird response body structure, but it is what it is.

========================================

Origin Stage

Stage Name HTTP Client; Stage

Instance Name HTTPClient_01

Stage Description Uses an HTTP client to read records from an URL.

Stage Type Origin

Stage Library Basic

Record1 : LIST_MAP
0 0 : STRING  ["Email Address","First Name","Last Name","Address","Phone Number","Website","Marketing Permissions","MEMBER_RATING","OPTIN_TIME","OPTIN_IP","CONFIRM_TIME","CONFIRM_IP","LATITUDE","LONGITUDE","GMTOFF","DSTOFF","TIMEZONE","CC","REGION","LAST_CHANGED","LEID","EUID","NOTES"]
 Record2 : LIST_MAP
0 0 : STRING ["user1@gmail.com","John","Doe","","","",null,2,"2019-08-08 00:54:15","198.177.197.105","2019-08-08 00:54:32","132.177.197.105",null,null,null,null,null,null,null,"2019-08-08 00:54:32","222029885","126dad7900",null]
 Record3 : LIST_MAP
0 0 : STRING ["user2@gmail.com","Jane","Doe","","","",null,2,"2019-08-08 00:49:30","199.177.197.105","2019-08-08 00:50:02","199.177.197.105",null,null,null,null,null,null,null,"2019-08-08 00:50:02","222029877","b209b6dda0",null]

Processor Stage

Stage Name HTTP Client

Stage Instance Name HTTPClient_01

Stage Description Uses an HTTP client to make arbitrary requests.

Stage Type Processor

Stage Library Basic


Results from calling from the Processor Stage ONLY RETURNS THE FIRST LINE IN THE RESPONSE!!!!


Record1-Output Record1 : MAP
response : STRING  ["Email Address","First Name","Last Name","Address","Phone Number","Website","Marketing Permissions","MEMBER_RATING","OPTIN_TIME","OPTIN_IP","CONFIRM_TIME","CONFIRM_IP","LATITUDE","LONGITUDE","GMTOFF","DSTOFF","TIMEZONE","CC","REGION","LAST_CHANGED","LEID","EUID","NOTES"]


So…

  1. Why would the processor stage behave so drastically different from the origin stage?

  2. Can any one think of a work around for the processor stage. (I am able to manipulate the records in the origin stage to suit my needs, but as stated ...

(more)
edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted
0

answered 2019-08-09 15:16:15 -0500

jeff gravatar image

updated 2019-08-09 15:17:57 -0500

By the nature of origins, we expect them to generate multiple records. In your case, the origin is returning one record per line, because you have configured the data format as TEXT and left the rest of the configuration values to their defaults. The origin is seeing each new line, parsing it into a text record, and moving on to the next line/record.

The reason the processor is only "seeing" the first line is because you have configured the data format identically (TEXT with default settings, which means record separator is newline). This means that there are actually multiple "records" associated with a single input. Some of the Data Collector processors support handling of multiple records from a processor result, but unfortunately HTTP is not one of them. See the in-progress SDC-10968 Jira for updates on that being supported.

In the meantime, I recommend you configure the TEXT format for your processor differently. Use the _Use Custom Delimiter_ config option to specify a custom delimiter, other than newline. Provide a sequence of characters that will never appear in your source data (to make a random suggestion, something like <|^|>). Then, you can add a subsequent stage(s) to parse that data out of the result field into a different format as needed.

edit flag offensive delete link more

Comments

Jeff, that's the ticket. Thanks for the explanation and the suggestion. Much obliged.

daveh gravatar imagedaveh ( 2019-08-12 07:29:09 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2019-08-08 09:58:33 -0500

Seen: 36 times

Last updated: Aug 09