How to call API using StreamSets Data Collector? Please provide any use case with detailed screenshots

asked 2019-09-10

mspatil

updated 2019-09-10

metadaddy

I am trying to build a pipeline where I have to call an API using StreamSets Data Collector. I am able to login successfully using StreamSets HTTP client but when I try to call another API using the same pipeline i am getting error as "Unauthorized" and I am not able to call any other API after login. My assumption is that Data Collector is not able to capture and save the cookie sent by API server because of the session getting lost. Correct me if my understanding is wrong. Please suggest how to proceed with this use case and if possible attach any use case where StreamSets is interacting with an API.

Should I use HTTP client origin or HTTP client processor to call API?

REST APIs for standalone SDC do not use/require auth cookies. Is your SDC registered with StreamSets Control Hub?

iamontheinet ( 2019-09-10 )

What API are you trying to use? It's very unusual for API clients to have to deal with cookies. Usually, the authentication API returns a token for use in future calls. You should examine the API documentation to confirm the authentication mechanism.

metadaddy ( 2019-09-10 )

@iamontheinet i am using stand alone SDC and it is not registered with Control Hub

mspatil ( 2019-09-11 )

Got it. See Pat's answer below.

iamontheinet ( 2019-09-11 )

answered 2019-09-10

metadaddy

APIs vary widely, but, as I mentioned in my comment above, it's very rare for them to use cookies to carry session data. Typically, the authorization call returns a token in the HTTP response body for the client to use in subsequent calls. Again, the mechanism can vary, but StreamSets has built-in support for OAuth 2.0, one of the most common. Extract Data from Google Analytics using StreamSets Data Collector gives an example of how to call the Google Analytics API from a pipeline.

