Ask Your Question

Social media data extraction using Streamsets

asked 2017-10-01 05:27:08 -0500

sujeet gravatar image

updated 2017-10-02 11:31:59 -0500

metadaddy gravatar image

I am trying to pull data from social media for sentimental analysis. For example i am using facebook data. My test case is: i have an url and it results into a list of posts . Now in the next pipeline i want to use this list (iteratively) and extract post_id and on the basis of that post_id need to extract comments on that specific post. Is this can be done iteratively in using streamsets. Because the comment url will be different for different post_id . I want to pass this id as variable in the comment url. code snippet what am trying to achieve using Python.

The below code is just for reference what am trying to achieve. Can it be possible with StreamSets pipeline?

for post in posts_id:
edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2017-10-02 14:18:02 -0500

metadaddy gravatar image

You don't need to use two pipelines. The basic approach here would be to use either the HTTP Client origin or processor to get your list of posts, and the Field Pivoter processor to generate a record for each post. Each record would have the post id as one of its fields.

You can then use a HTTP Client processor to fetch the comments for each of those records, substituting the post id into the query URL using Expression Language, something like:${record:value('post-id')}/comments
edit flag offensive delete link more


Thank you so much . It really helps me to figure out the issue and fix it.

sujeet gravatar imagesujeet ( 2017-10-03 03:55:31 -0500 )edit

Great - please vote up the answer and mark it as correct :-)

metadaddy gravatar imagemetadaddy ( 2017-10-03 09:56:13 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower


Asked: 2017-10-01 05:27:08 -0500

Seen: 508 times

Last updated: Oct 02 '17