Removing special characters from field names in Data Collector Pipeline
Hi,
we are running Streamsets 3.11.0 Data Collector and using the JDBC Multitable consumer to read some tables from a MS SQL database. SDC is running on CENTOS 7 with around 12G java heap space.
The origin has column names with german Umlauts (ä, ö, ü) and Brackets. We want to remove those in Streamsets because they produce issues later on.
I have chained 8 Field Renamer processors configured like for Example:
Source Field Expression: /'(.)[Ä](.)'
Target Field Expression: /$1AE$2
the approach produces the correct result but kills performance. At times we drop even below 1 record / second. Without all the field renamers performance is around 1000 rows / second.
Is there a more efficient way to do this?
Thank you and best regards