
For example, if a Substitution rule were to be applied to the FIRST_NAME column of the EMPLOYEE table the relevance of the column could be preserved by substituting female and male names where appropriate. All female first names in the table could be substituted by choosing the Names, First Names, Female dataset and using a WHERE Clause of WHERE EMP_GENDER='F'. This would cause only the female employee records to be selected and masked. A separate rule using the Names, First Names, Male dataset and a second WHERE clause would have to be used to perform a similar substitution on the male entries.
Important Note: In the example above two rules with two WHERE Clauses were applied to the target table. It is important to realize that if there are EMP_GENDER values in the target table which are not equal to either 'M' or 'F' then these values will NOT be masked. This effect is called a WHERE Clause skip and is discussed in more detail in the Data Scrambling Issues white paper. In such cases, a Substitution rule with a default value is usually applied to the columns in the target table before the update with the WHERE Clause takes place. In the above example a Substitution rule using the Names, First Names, Female dataset could be used to mask all of the FIRST_NAME fields with female first names (irregardless of gender) then a second rule with a WHERE Clause of WHERE EMP_GENDER='M' and the Names, First Names, Male dataset could be used to mask just the male names. This two step method ensures that all rows of sensitive data in the target table get some form of masking and avoids the need to specifically target each case with a WHERE Clause. The execution order of the rules is critical in the two-step process just described. It is not valid to have both rules run simultaneously. To control the execution order use Rule Blocks and Dependencies.
The sampling option sifts the set of rows to be operated on. If no WHERE Clause is set, the sampling will apply to all rows in the table. If a WHERE Clause option has been configured, then the sampling will apply only to the rows returned by the WHERE Clause. For example, if the table contains 1000 rows and the sample percentage is sent to 10 percent then 100 rows (at random) will have the substitution operation applied to them. If a WHERE Clause was applied which reduced the selected rows from the full 1000 to 500, then the sample percentage of 10% would cause only 50 of the selected rows to have the substitution operation applied to them.