What does it mean by Field to parse and Target field in JSON Parser?

I need to extract three specific fields created_at, text and user name from Twitter JSON data. How can I achieve it using JSON Parser? What to provide for 'field to parse' and 'target field' in parse tab?

Would you please let me know is there any possibility to use File Tail as origin to get the twitter hash tags from the file which are dynamic so that the HTTP client can fetch the data of that certain keywords that I've mentioned in the file which are comma separated!

Please find below details: Resource URL provided(for HTTP client)is :

Sample JSON data that obtained is:

{"statuses":[{"**created_at"**:"Tue Jun 19 11:40:48 +0000 2018","id":1009038283123412993,"id_str":"1009038283123412993","**text**":"What does #Donald Trump appear to have in common with #Vladimir Putin - deflecting criticism by blaming others! No\u2026 https:\/\/\/Bl34HfeD6U","truncated":true,"entities":{"hashtags":[{"text":"Donald","indices":[10,17]},{"text":"Vladimir","indices":[54,63]}],"symbols":[],"user_mentions":[],"urls":[{"url":"https:\/\/\/Bl34HfeD6U","expanded_url":"https:\/\/\/i\/web\/status\/1009038283123412993","display_url":"\/i\/web\/status\/1\u2026","indices":[116,139]}]},"metadata":{"iso_language_code":"en","result_type":"recent"},"source":"\u003ca href=\"http:\/\/\/#!\/download\/ipad\" rel=\"nofollow\"\u003eTwitter for iPad\u003c\/a\u003e","in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":241247248,"id_str":"241247248","**name**":"Mel Selwood","screen
  1. From the json data I need to extract three fields which are highlighted and are to be saved in HDFS.
  2. Would you please let me know how to take input hash tags from a file(as Vladimir Putin and Facebook as given in URL) so that they can be dynamic each time I write to file!
Can you add a sample of the JSON data from the file?

metadaddy

Could you please find the edited description in question itself!

Edward

Hi Edward,

Follow these steps in the specified order:

1) Use 'File Tail' origin to read keywords, hashtags, etc. from text file

  • Under 'Files' tab enter 'File To Tail' information including 'Path' and 'Naming' (Note: in my case I entered absolute path to the file and 'Active File with Alphabetical Files')

2) Use 'HTTP Client' processor to invoke HTTP request to Twitter

  • Under 'HTTP' tab:

    --- Enter '/text' for 'Output Field'

    --- Enter '${record:value('/text')}' for 'Resource URL' (Note: this will dynamically insert hashtags, keywords, etc. as they are read from the text file)

    --- Select 'Get' for 'HTTP Method'

    --- Select 'OAuth' for 'Authentication Type'

  • Under 'Credentials' tab enter your Twitter app creds -- consumer key, secret, token and token secret

  • Under 'Data Format' tab select 'JSON' for 'Data Format' and 32768000 for 'Max Object Length (chars)'

3) Use 'Field Pivoter' processor to extract tweets from 'statuses' list in the HTTP response

  • Under 'Field Pivot' tab enter '/text/statuses' for 'Field To Pivot' and uncheck 'Copy All Fields' checkbox

4) Use 'Field Remover' processor to select the desired fields

  • Under 'Remove/Keep' tab select 'Keep Listed Fields' and enter/select '/text' '/created_at' and '/user/name' one at a time for 'Fields' (Note: click on 'Select Fields From Preview Data' to select fields)

5) Use 'Field Flattener' processor to flatten the output because user info is nested in the HTTP response (Note: this will convert the nested output fields into '' format at the top record level)

  • Keep all default settings

6) Use 'Field Renamer' processor to rename fields (Note: this will rename '' to 'user_name', for example)

  • Under 'Rename' tab enter/select '/' for 'Source Field Expression' and '/user_name' for 'Target Field Expression'

7) Use your choice of destination to store records with the selected fields (Note: in my case I selected Local File System)


For reference, here's what the pipeline looks like:

image description

And here's the final output record:

image description


Hope this helps.

Cheers, Dash (iamontheinet)

Hi Dash, I got 2 validation errors as follows when I tried to preview the pipeline. 1) HADOOPFS_44 -Could not verify the base on local exception:java.nio.channels.ClosedByInterruptException 2) HTTP client initialization error:ServiceLocatorImpl has been shut down

Edward

Can you share your Hadoop FS config/settings?

iamontheinet

Hadoop FS URI : hdfs://technocrat:8020/ Hadoop FS Configuration Directory : /etc/hadoop/conf/

Edward gravatar imageEdward ( 2018-06-26 02:20:49 -0500 )edit

Hi Edward, since you've started a new thread for this specific issue ( let's continue the conversation over there to avoid duplication. Thanks!

iamontheinet gravatar imageiamontheinet ( 2018-06-26 11:02:52 -0500 )edit

Hi @iamontheinet !As far as I noticed,I understood that the file tail should have content like %28%23facebook%20OR%20%23FIFA which will be replaced by /text in HTTP client URL,else it will throw error.How can I give comma separated values in text file like FIFA,facebook? Thanks in advance!

Edward

Hello Edward. Please refer the following to get a description of what those sections mean in the stage.

Hi Mufy! I've gone through the documentation part,but I couldn't figure out what to provide for the queried fields as per my sample json data.It would be great to let me know how to proceed further with the json data.

Edward gravatar imageEdward ( 2018-06-22 09:26:30 -0500 )edit
