xml array values flatten and explode as list

asked 2018-07-12 11:19:07 -0500

anonymous user

Anonymous

updated 2018-07-16 10:04:10 -0500

I need help to flatten array values in an xml attribute. Here are more details: 1.Source xml file - Configuration - Hadoop FS standalone origin, data format - xml, using delimiter element as A1 2.Field flattner - to flatten all fields with '.' separator 3.Field renamer - to rename xml field names 4. Hive metadata, metastore and hdfs - to write data to hive.

*if there are no array values above pipeline is working perfect. I have a new xml with array values, I am not sure how to flatten only that column. sample xml: (customer type is array attribute). I need solution to flatten and explode( array values as a list of records)

<ROOT>
    <A1 action="A" id="1234567">
        <Account id="1234" />
        <Subaccount id="123" />
        <Qty>20</Qty>
        <Type id="10456" />
        <Post id="1" />
        <customertype>
            <custType id="212"/>
            <custType id="270"/>
        </customertype>
        <Num><![CDATA[NR]]></Num>
    </A1>
    <A1 action="A" id="23456789">
        <Account id="141004" />
        <Subaccount id="20" />
        <Qty>30</Qty>
        <Type id="10456" />
        <Pos id="1" />
        <customertype>
            <custType id="122"/>
            <custType id="130"/>
        </customertype>
        <Num><![CDATA[NR]]></Num>
    </A1>
    <Footer>
        <RecordCount>2</RecordCount>
    </Footer>
</ROOT>

Anybody has solution for this?

edit retag flag offensive close merge delete

Comments

Hi! Probably need to see your exact desired output, but I'd try introducing Stream Splitter to check if the record has 'customertype' (${record:exists('/customertype')}) and if it does, pass it through Field Pivoter to pivot on /customertype. This should create individual records per customer type.

iamontheinet gravatar imageiamontheinet ( 2018-07-12 12:11:46 -0500 )edit

expected o/p: ACTION id account_id subaccount_id Qty type_id post_id customertypeid num A 1234567 1234 123 20 10456 1 212 NR A 1234567 1234 123 20 10456 1 270 NR A 23456789 141004 20 30 10456 1 122 NR A 23456789 141004 20 30 10456 1 130

Robot gravatar imageRobot ( 2018-07-12 15:01:05 -0500 )edit

Did you try what I suggested?

iamontheinet gravatar imageiamontheinet ( 2018-07-12 15:03:48 -0500 )edit

Note: I fixed the formatting of the XML blob and added an opening <root> element, because there was a closing one so I assumed it was supposed to be present.

jeff gravatar imagejeff ( 2018-07-12 15:15:47 -0500 )edit

Thank you. Yes, I tried with the condition ${record:exists('/customertype')} but field pivoter is not picking up anything.

Robot gravatar imageRobot ( 2018-07-12 15:40:56 -0500 )edit