Ask Your Question

How do I parse row and column data in XML?

asked 2018-04-18 07:59:55 -0600

juju gravatar image

updated 2018-06-13 20:49:54 -0600

metadaddy gravatar image

I am looking for help to parse the XML below in StreamSets. The structure of the XML is throwing me off a bit. The cname nodes are column headers, with the corresponding r and c nodes providing the value for each row of the data. Also, each row has a timestamp that is calculated based on the variables in the data node. I haven't been able to figure out how to parse this to assign the row data to the column headers and also create the datestamp column for each row.

<group serial="0x757c6ce4">
    <data columns="13" time_stamp="0x582e5290" time_delta="3600" epoch="0x58110bb8">
    <cname t="P">Grid</cname>
    <cname t="S">Grid*</cname>
    <cname t="P">Gen Set</cname>
    <cname t="S">Gen Set*</cname>
    <cname t="I">I CT1</cname>
    <cname t="I">I CT2</cname>
    <cname t="I">I CT3</cname>
    <cname t="I">I CT4</cname>
    <cname t="I">I CT5</cname>
    <cname t="I">I CT6</cname>
    <cname t="V">V L1</cname>
    <cname t="V">V L2</cname>
    <cname t="V">V L3</cname>
edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2018-06-13 20:48:37 -0600

metadaddy gravatar image

Assuming that your <data> element is repeated, you'll need to set Delimiter Element in the Data Format tab to data, then you can use a script to manipulate the data. Here it is in Jython:

for record in records:
    # Create a new ListMap to hold the data
    value = sdcFunctions.createMap(True)

    # Loop through the columns, creating a field for each one and assigning the row value
    for i in range(int(record.value['attr|columns'])):
      value[record.value['cname'][i]['value']] = record.value['r'][0]['c'][i]['value']

    # Assign the timestamp
    value['timestamp'] = record.value['attr|time_stamp']

    # Just keep the calculated rows and columns
    record.value = value

    # Write record to processor output

  except Exception as e:
    # Send record to error
    error.write(record, str(e))

This is how it looks in preview. Note that the quotes are there because of the special characters (*, ) in some of the field names - they are not part of the field name.

image description

edit flag offensive delete link more
Login/Signup to Answer

Question Tools



Asked: 2018-04-18 07:59:55 -0600

Seen: 320 times

Last updated: Jun 13 '18