Social media data extraction using Streamsets

asked 2017-10-01

sujeet

updated 2017-10-02

metadaddy

I am trying to pull data from social media for sentimental analysis. For example i am using facebook data. My test case is: i have an url and it results into a list of posts . Now in the next pipeline i want to use this list (iteratively) and extract post_id and on the basis of that post_id need to extract comments on that specific post. Is this can be done iteratively in using streamsets. Because the comment url will be different for different post_id . I want to pass this id as variable in the comment url. code snippet what am trying to achieve using Python.

The below code is just for reference what am trying to achieve. Can it be possible with StreamSets pipeline?

for post in posts_id:
1 Answer

answered 2017-10-02

metadaddy

You don't need to use two pipelines. The basic approach here would be to use either the HTTP Client origin or processor to get your list of posts, and the Field Pivoter processor to generate a record for each post. Each record would have the post id as one of its fields.

You can then use a HTTP Client processor to fetch the comments for each of those records, substituting the post id into the query URL using Expression Language, something like:${record:value('post-id')}/comments
edit flag offensive delete link more


Thank you so much . It really helps me to figure out the issue and fix it.

sujeet ( 2017-10-03 )

Great - please vote up the answer and mark it as correct :-)

metadaddy ( 2017-10-03 )
