Ask Your Question
1

replace special characters

asked 2019-09-11 08:05:10 -0500

Vss@2019 gravatar image

updated 2019-09-12 12:34:04 -0500

metadaddy gravatar image

When we are reading the files from the origin if the header value contains with spaces we get with quotes like below format. In the current input i got space is the delimiter but we are not sure in future we might get many. What all the stages are required to achieve the final output. Pls suggest

List Map output from Streamsets :
no:1
'e name':aaa
addr:bbb

For example my input file looks like in different patterns

Pattern 1 :

no,e name,addr 
1,aaa,bbb
2,ccc,ddd

How do we achieve with below output for the

Pattern 1

no,e_name,addr
1,aaa,bbb
2,ccc,ddd

Pattern 2 :

"no","e name","addr" 
"1","aaa","bbb"
"2","ccc","ddd"

How do we achieve with below output for the

Pattern 2

no,e_name,addr
1,aaa,bbb
2,ccc,ddd
edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
2

answered 2019-09-11 09:55:06 -0500

jeff gravatar image

updated 2019-09-12 11:38:27 -0500

Are you simply talking about replacing the space with an underscore, in the field names? I'm having a bit of a hard time understanding "Pattern 1" vs "Pattern 2". In any case, see here for information about replacing characters in field names.

Update: given your requirements as stated in the comments.

First scenario can be handled by a Field Mapper Processor with the following configs (will match any two digits at the beginning of a field name, tweak as needed).

  • Operate On: Field Names
  • Conditional Expression: ${str:matches(f:name(), "^[0-9]{2}.*")}
  • Mapping Expression: ${str:replaceAll(f:name(), "^([0-9]{2})(.*)", "_$1_$2")}

The second scenario can also be handled by the Field Mapper Processor. This one is simpler (only looks for _) but that can obviously be tweaked as well if you need more complexity (eg: using a regex instead).

  • Operate On: Field Names
  • Conditional Expression: ${str:endsWith(f:name(), "_")}
  • Mapping Expression: ${str:substring(f:name(), 0, str:length(f:name()) - 1)}
edit flag offensive delete link more

Comments

Thanks Jeff, But i have many scenarios for example if i want to capture the group that doesn't work. For example 1) if any header name starts with number like 12 ss it has to be replaced with _12_ss .

Vss@2019 gravatar imageVss@2019 ( 2019-09-12 07:32:52 -0500 )edit

Second scenario is after replacing with _ character if any header value contains last value as _ then it should get sliced off. for example my input is a b c. it replaces with a_b_c_ the actual result which i expect is a_b_c instead of a_b_c_

Vss@2019 gravatar imageVss@2019 ( 2019-09-12 07:34:17 -0500 )edit

All these scenarios can it be handled via Groovy/ Field Replacer processor

Vss@2019 gravatar imageVss@2019 ( 2019-09-12 07:35:05 -0500 )edit

Updated the answer in response.

jeff gravatar imagejeff ( 2019-09-12 11:38:34 -0500 )edit

Thanks jeff can all these be handled in single field mapper stage or we should add multiple based on the conditions

Vss@2019 gravatar imageVss@2019 ( 2019-09-12 11:59:23 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2019-09-11 08:05:10 -0500

Seen: 62 times

Last updated: Sep 12