Ask Your Question
1

XML Transform Question

asked 2019-06-13 23:50:09 -0500

DataAnalyst1029 gravatar image

updated 2019-06-14 11:25:23 -0500

I have an xml project that I am tasked with transforming.

I am retrieving the following xml document structure from an HTTP Origin source:

<LIST>
    <OBJECT_TYPE_A>
        <OBJECT_TYPE_A_PROPERTY>value</OBJECT_TYPE_A_PROPERTY>
        ...
        ...
    </OBJECT_TYPE_A>
    <OBJECT_TYPE_B>
        <OBJECT_TYPE_B_PROPERTY>value</OBJECT_TYPE_B_PROPERTY>
        ...
        ...
    </OBJECT_TYPE_B>
</LIST>

My deliverable is a single list of both Object A and Object B properties: A list of OBJECT_TYPE_A_PROPERTY records appened to a list of OBJECT_TYPE_B_PROPERTY records. There is a possiblity in the future for there to be an OBJECT_TYPE_C, in the expected position.

<OBJECT_TYPE_A_PROPERTY>value</OBJECT_TYPE_A_PROPERTY>
<OBJECT_TYPE_A_PROPERTY>value</OBJECT_TYPE_A_PROPERTY>
<OBJECT_TYPE_B_PROPERTY>value</OBJECT_TYPE_B_PROPERTY>
...
...

I would like to set up a pipeline to handle the 2+ object types, to eventually consolidate into one list. My current HTTP client is set to Delimit by /LIST/OBJECT_TYPE_A/OBJECT_TYPE_A_PROPERTY, but I realize that only will retrieve Object A properties. I am having issues specifying two outputs for this stage, is there another approach I should be using to retrieve all records without loss?

edit retag flag offensive close merge delete

Comments

How large are these payloads (ex: the full XML wrapped by the LIST tags)?

jeff gravatar imagejeff ( 2019-06-14 10:50:14 -0500 )edit

So the XML file itself is less than 2MB currently, and each OBJECT_TYPE_X can contain anywhere from 10,000 to 2 million OBJECT_TYPE_X_PROPERTY records.

DataAnalyst1029 gravatar imageDataAnalyst1029 ( 2019-06-14 11:09:54 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted
1

answered 2019-06-14 11:43:26 -0500

jeff gravatar image

I would recommend configuring the parsing to operate on the second level elements, regardless of name. So a Delimiter Element of /LIST/* That will give you the elements you need and you can then perform further manipulations, etc. (such as unwrapping the single value from the list).

You can read more about the options for delimiting XML here.

edit flag offensive delete link more

Comments

Ran into a few issues here: SDC crashes when I delimit by /LIST/* - Note there are 70,000+ property records under the <object_type_a> record being returned. Is it possible to use a /LIST/*/* delimiter to end up with just the <object_type_x_property> records?

DataAnalyst1029 gravatar imageDataAnalyst1029 ( 2019-06-14 12:31:29 -0500 )edit

Yes, that will work as well. I was under the impression you needed to retain the element names surrounding the innermost values. But if not, then proceed as you indicate. The number of items should not be an issue w.r.t. memory at least since the XML parser is streaming.

jeff gravatar imagejeff ( 2019-06-14 12:47:15 -0500 )edit

I was not aware of the loss of the element names. I believe I can work around that thankfully, your suggestion has definitely put me on the right track to complete this assignment.

DataAnalyst1029 gravatar imageDataAnalyst1029 ( 2019-06-14 13:33:30 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2019-06-13 23:50:09 -0500

Seen: 115 times

Last updated: 2 days ago