Data Masking with StreamSets

asked 2018-08-21 13:35:23 -0600

Teo gravatar image

updated 2018-08-21 13:42:57 -0600

metadaddy gravatar image

Our company will start using StreamSets.

Wondering if masking databases (Oracle, SQL Server, db2 LUW, db2 zos, IMS) is doable. The requirement is that the masked tables go back in the source database. We need to apply custom functions, lookups, and hash-lookups.

If yes, is it fast? One example: let's say we mask a field in a table of 20mill rows, 250M as csv, using a lookup table of 20 mill rows, 400 M as csv) . Our current tool takes hours; if we rewrite into stored procedure maybe 10 mins.

edit retag flag offensive close merge delete