Remove xml namespace prefixes before flattening
I'd like to remove all XML namespace prefixes so that I don't have to reference them in my processors. However the field renamer wants me to flatten the XML first, which would require me to reference the namespaces, defeating the purpose. Can I generically (using a regular expression) remove xml namespace prefixes from the entire nested field structure of the xml data?
Here's some raw xml...
<ODM ODMVersion="1.3" FileType="Snapshot" FileOID="ad998378" CreationDateTime="2016-06-03T14:59:52" xmlns="http://www.cdisc.org/ns/odm/v1.3" >
<AdminData studyOID="mystudyoid">
<User OID="someid" UserType="Other">
<LoginName>somelogin</LoginName>
<DisplayName>Tracy R</DisplayName>
<FullName>Tracy R</FullName>
<FirstName>Tracy</FirstName>
<LastName>R</LastName>
<Address>xyz</Address>
<Email>t.r@company.com</Email>
<Fax>999-999-0000</Fax>
<Phone>999-999-4011</Phone>
<LocationRef LocationOID="10001" />
</User>
</AdminData>
</ODM>
What I want is for that raw XML to be parsed and for the field names to look like this...
/ODM/AdminData/User/OID..... etc.
By default it looks something like below...
/ns1:ODM/ns1:AdminData/.... etc
I don't want those namespaces in the field names, and more importantly, I dont want to have to reference them, even in the removal process. Because "ns1" is something streamsets is adding, for all I know that hard-coded string could change in the next streamsets version to "NSONE", and I can't build a pipeline with a dependency on a random hard-coded string.
So I want them removed generically. I wouldn't mind writing a regex, however that means I need to flatten the structure first, and I can't do that. I need this removal to happen while the data is still hierarchical.
FWIW - "Not possible" is a valid answer to this question.
You should be able to flatten the entire record. See https://streamsets.com/documentation/datacollector/latest/help/index.html#datacollector/UserGuide/Processors/FieldFlattener.html#concept_k4x_rz1_hx
That doesn't remove the namespace prefixes.
Right, but once the hierarchy is flat, you should be able to remove namespaces with field renamer, using a regex.
If you can edit your question and add a small sample of the XML and the result you're looking for, I'll see if I can create a sample pipeline.
I updated the question with some XML. It shows what I'm talking about with the namespaces. Note the important part, which is that I don't want to have to reference the namespace "ns1". That string is not part of my XML, it's something SS comes up with.