Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

HTTP Client Origin/Processor Stage Discrepancy

There appears to be a significant discrepancy between the way the HTTP Clients in the Origin/Processor stages returns response data to the pipeline from an external API call. Whereby the Processor stage is only returning the very first line in the response to the pipeline.

Attempting to describe as follows.

Below are the data returned from the API call for:

  • Postman call returning the raw data as is
  • Streamsets Origin Stage HTTP Client and how it returns the data to the pipeline
  • Streamsets Proccessor Stage HTTP Client and how it returns the data to the pipeline

Raw Data

(as returned using Postman - -note lines separated by \n)

["Email Address","First Name","Last Name","Address","Phone Number","Website","Marketing Permissions","MEMBER_RATING","OPTIN_TIME","OPTIN_IP","CONFIRM_TIME","CONFIRM_IP","LATITUDE","LONGITUDE","GMTOFF","DSTOFF","TIMEZONE","CC","REGION","LAST_CHANGED","LEID","EUID","NOTES"]
["user1@gmail.com","John","Doe","","","",null,2,"2019-08-08 00:54:15","198.177.197.105","2019-08-08 00:54:32","198.177.197.105",null,null,null,null,null,null,null,"2019-08-08 00:54:32","222029885","126dad7900",null]
["user2@gmail.com","Jane","Doe","","","",null,2,"2019-08-08 00:49:30","199.177.197.105","2019-08-08 00:50:02","199.177.197.105",null,null,null,null,null,null,null,"2019-08-08 00:50:02","222029877","b209b6dda0",null]

Notice: Three lines. The first is a header line. Next two are data. Granted this is a weird response body structure, but it is what it is.

========================================

Origin Stage

Stage Name HTTP Client; Stage

Instance Name HTTPClient_01

Stage Description Uses an HTTP client to read records from an URL.

Stage Type Origin

Stage Library Basic

Record1 : LIST_MAP
0 0 : STRING  ["Email Address","First Name","Last Name","Address","Phone Number","Website","Marketing Permissions","MEMBER_RATING","OPTIN_TIME","OPTIN_IP","CONFIRM_TIME","CONFIRM_IP","LATITUDE","LONGITUDE","GMTOFF","DSTOFF","TIMEZONE","CC","REGION","LAST_CHANGED","LEID","EUID","NOTES"]
 Record2 : LIST_MAP
0 0 : STRING ["user1@gmail.com","John","Doe","","","",null,2,"2019-08-08 00:54:15","198.177.197.105","2019-08-08 00:54:32","132.177.197.105",null,null,null,null,null,null,null,"2019-08-08 00:54:32","222029885","126dad7900",null]
 Record3 : LIST_MAP
0 0 : STRING ["user2@gmail.com","Jane","Doe","","","",null,2,"2019-08-08 00:49:30","199.177.197.105","2019-08-08 00:50:02","199.177.197.105",null,null,null,null,null,null,null,"2019-08-08 00:50:02","222029877","b209b6dda0",null]

Processor Stage

Stage Name HTTP Client

Stage Instance Name HTTPClient_01

Stage Description Uses an HTTP client to make arbitrary requests.

Stage Type Processor

Stage Library Basic


Results from calling from the Processor Stage ONLY RETURNS THE FIRST LINE IN THE RESPONSE!!!!


Record1-Output Record1 : MAP
response : STRING  ["Email Address","First Name","Last Name","Address","Phone Number","Website","Marketing Permissions","MEMBER_RATING","OPTIN_TIME","OPTIN_IP","CONFIRM_TIME","CONFIRM_IP","LATITUDE","LONGITUDE","GMTOFF","DSTOFF","TIMEZONE","CC","REGION","LAST_CHANGED","LEID","EUID","NOTES"]


So…

  1. Why would the processor stage behave so drastically different from the origin stage?

  2. Can any one think of a work around for the processor stage. (I am able to manipulate the records in the origin stage to suit my needs, but as stated above, the requirement call for the endpoint to be called by the processor stage. I know I could probably break this up into different pipelines that get called however that introduces more failure point possibilities and really think the response results returned to the pipeline from the endpoint for both stages should be the same given that the parameters for the Data Format are identical.

Thanks is advance for insight/help.