For example, a column containing customer last names could be "Masked" by implementing a Substitution Rule on it using the Random Last Names data set. When the Substitution Rule is executed as part of the run of the masking set, random last names would be generated and substituted in place of each real customer last name. Thus the true last name of the customer would be hidden (preserving privacy and security) but the remaining data would still be referentially relevant and usable as a test system.
Once the rule has begun to execute, the substitution continues until all rows in the table (or a subset if a WHERE clause or Sampling options are specified) are updated with the new data. Commits happen at user configurable intervals (every 5000 rows is the default).
There can be any number of Substitution Rules on any columns in any table in a database. If you apply a Substitution Rule to a column that is used in a primary key or unique index then the index might have to be dropped while the Substitution rule is executing. The uniqueness of the substituted data is highly dependent on the type of data set chosen for substitution. Some data sets have options to guarantee uniqueness and some do not.
Data sets for just about every purpose are included with the Data Masker software and you can make up your own if you need to do so. The choice of data set used for a particular column is entirely up to the implementer of the Substitution Rule. It is quite possible to choose a non appropriate data set. For example, putting telephone numbers into a last name field. The Data Masker software performs no checks as to the "appropriateness" of the data set for the field contents.
The Data Masker software does, however, perform a number of other checks to prevent errors at rule execution time. When building a Substitution Rule, the type of available data sets is restricted by datatype. For example, it is not possible to substitute last names into an INTEGER column.
For textual fields such as VARCHAR columns, the size of the data supplied by the data set is restricted to the width of the column. A Substitution Rule will never, for example, attempt to update a VARCHAR(20) column with 25 characters of substitution data.
