Ask Your Question
1

How do I parse row and column data in XML?

asked 2018-04-18 07:59:55 -0500

juju gravatar image

updated 2018-06-13 20:49:54 -0500

metadaddy gravatar image

I am looking for help to parse the XML below in StreamSets. The structure of the XML is throwing me off a bit. The cname nodes are column headers, with the corresponding r and c nodes providing the value for each row of the data. Also, each row has a timestamp that is calculated based on the variables in the data node. I haven't been able to figure out how to parse this to assign the row data to the column headers and also create the datestamp column for each row.

<group serial="0x757c6ce4">
    <data columns="13" time_stamp="0x582e5290" time_delta="3600" epoch="0x58110bb8">
    <cname t="P">Grid</cname>
    <cname t="S">Grid*</cname>
    <cname t="P">Gen Set</cname>
    <cname t="S">Gen Set*</cname>
    <cname t="I">I CT1</cname>
    <cname t="I">I CT2</cname>
    <cname t="I">I CT3</cname>
    <cname t="I">I CT4</cname>
    <cname t="I">I CT5</cname>
    <cname t="I">I CT6</cname>
    <cname t="V">V L1</cname>
    <cname t="V">V L2</cname>
    <cname t="V">V L3</cname>
    <r>
    <c>5767921882</c>
    <c>9191660779</c>
    <c>-10086911</c>
    <c>74160204</c>
    <c>6797464335</c>
    <c>13344163875</c>
    <c>22469766027</c>
    <c>434394781</c>
    <c>370902463</c>
    <c>584404781</c>
    <c>363009180326</c>
    <c>358965953018</c>
    <c>302435738676</c>
    </r>
    </data>
</group>
edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
0

answered 2018-06-13 20:48:37 -0500

metadaddy gravatar image

Assuming that your <data> element is repeated, you'll need to set Delimiter Element in the Data Format tab to data, then you can use a script to manipulate the data. Here it is in Jython:

for record in records:
  try:
    # Create a new ListMap to hold the data
    value = sdcFunctions.createMap(True)

    # Loop through the columns, creating a field for each one and assigning the row value
    for i in range(int(record.value['attr|columns'])):
      value[record.value['cname'][i]['value']] = record.value['r'][0]['c'][i]['value']

    # Assign the timestamp
    value['timestamp'] = record.value['attr|time_stamp']

    # Just keep the calculated rows and columns
    record.value = value

    # Write record to processor output
    output.write(record)

  except Exception as e:
    # Send record to error
    error.write(record, str(e))

This is how it looks in preview. Note that the quotes are there because of the special characters (*, ) in some of the field names - they are not part of the field name.

image description

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

2 followers

Stats

Asked: 2018-04-18 07:59:55 -0500

Seen: 27 times

Last updated: Jun 13