Ask Your Question
1

Parse XML from StackOverflow

asked 2018-01-02 17:08:15 -0500

this post is marked as community wiki

This post is a wiki. Anyone with karma >75 is welcome to improve it.

I want to be able to parse a stackoverflow xml dump that has a very simple format of each row element having all values in attributes, into say a .csv

<posts>
  <row Id="4" PostTypeId="1" AcceptedAnswerId="7" CreationDate="2008-07-31T21:42:52.667" Score="506" ViewCount="32399" Body="&lt;p&gt;I want to use a track-bar to change a form's opacity.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;...  CommunityOwnedDate="2012-10-31T16:42:47.213" />
...

but have been unable to crack the code on parsing out the attributes using the xml parser. One caveat is that not all <row> elements have the same attributes, and I believe that there's sub nested elements in some of the row elements. Maybe I am going about this wrong and should just use jython. Have parsed the using spark, but would prefer to use SS. Anybody have any pointers?

Thank you!

edit retag flag offensive close merge delete

Comments

If my answer below doesn't point you in the right direction, please add a reference to the actual posts.xml you're working with and I can take a closer look.

metadaddy gravatar imagemetadaddy ( 2018-01-02 22:00:27 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted
1

answered 2018-01-02 21:58:44 -0500

metadaddy gravatar image

I created a sample Posts.xml file using the data in this Meta StackExchange answer:

<?xml version="1.0" encoding="utf-8"?>
<posts>
  <row Id="1" PostTypeId="1" Tags="&lt;discussion&gt;&lt;scope&gt;&lt;homebrew&gt;" AnswerCount="3"/>
  <row Id="2" PostTypeId="2" ParentId="1"  />
  <row Id="3" PostTypeId="1" Tags="&lt;discussion&gt;&lt;scope&gt;" AnswerCount="2" CommentCount="0" />
  <row Id="4" PostTypeId="1" AcceptedAnswerId="7" Tags="&lt;discussion&gt;" AnswerCount="2" CommentCount="0" />
  <row Id="6" PostTypeId="2" ParentId="4" />
  <row Id="7" PostTypeId="2" ParentId="4" />
</posts>

I used the Directory origin, with Delimiter Element set to row:

image description

It parses the data just fine - here's the preview:

image description

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2018-01-02 17:08:15 -0500

Seen: 73 times

Last updated: Jan 02