[Net 2000 Ltd. Home][Data Masker Home][Data Masker Manual][Data Masker FAQ]
Scrambling Data in the PUBS Database
The Data Masker software contains a set of masking rules for the sample PUBS database supplied with SQL Server. You are encouraged to add your own masking rules to this masking set. The first section of this page discusses the existing rules (as supplied with the installation) and the second part provides some hints of additional rules which might be added.

Masking Rules in the Sample PUBS Masking Set
The Existing Rules
- If you wish to explore each rule further load the masking set into the Data Masker software and double click on the rule with the mouse to launch it in editing mode. You can then explore the rules configuration in detail.
- Rule 0001
- This is a Rule Controller. Notice how the other rules are dependent on it. A Rule Controller defines the database on which the dependent rules will act. If you change the login information in the Rule Controller its rules will operate on the new database. It is up to you to ensure that the Rule Controller points at an appropriate database. All other types of masking rule must have a parent Rule Controller and every masking set must contain at least one Rule Controller.
- Rule 0002
- A Substitution Rule on the STORES table. This rule uses the Random Company Names data set to provide the substitution data.
- Rules 0003-0006
- More Substitution Rules on fields in the STORES table. These rules use data sets appropriate to the column content. Particularly note that the state column is substituted using uppercase two letter abbreviations of the state name and the zip column receives the 5 digit zipcode only. Other options are possible - try editing rule 0006 to implement a Zip+4 substitution value.
- Rule 0007
- A Substitution Rule on the TITLES table on the price column. This rule assumes that although the prices need to be masked, it is not acceptable to use random numbers as this would distort the data inappropriately. The Number Variance data set is used to modify each existing value within a range of plus or minus 10%. Note that this column is of a number datatype - so the available data sets are different than for text fields. It might be a good idea to double click on the rule to have a look at the available data sets.
- Rule 0008
- A Substitution Rule on the TITLES table on the pubdate column. This rule assumes that like the price column, it is not acceptable to just use random dates. The Date Variance data set is used to modify the date within the bounds of an acceptable variation. It would be a good idea to double click on this rule to look at the data set options.
- Rules 0009-0011
- These are a more complicated set of rules. The assumption here is that the title column requires contents containing two random words. This is done by substituting the title column with random words from the Random Dictionary Words dataset (rule 0009) and using the notes column as a temporary data area. The notes column is also substituted using the Random Dictionary Words data set (rule 0010) and rule 0011 concatenates the two columns within each row and places it back in the title column. This is an example of Row-Internal Synchronization - commonly seen in many masking requirements. You should take care to observe the before and after effects of the rule and double click on rule 0011 to see how it is configured.
Note: The dependency relationship illustrated is purely arbitrary and is only done to collect the rules as a group to show they are related. Rules are always executed in the order in which they are visible on the screen. Rules 0009-0011 could be placed one after another on the same level and would achieve the same effect.
- Rules 0012-0013
- This group of rules just tidies up the notes column which was used in the previous rule group. Rule 0012 nulls all of the rows in the notes column and rule 0013 stuffs some random words from the Random Paragraphs of Gibberish data set into 30% of the rows. You should examine rule 0013 to see how the sampling is configured. As with rules 0009-0011, the dependency relationship is optional and is only used to show that the collection forms a group which together achieve the result.
- Rules 0014
- This is a Command Rule which truncates all rows in the ROYSCHED table. It is assumed, for the purposes of the masked database, that the ROYSCHED data is not necessary and too sensitive to leave in place. Simply removing the data is sometimes an option in cases such as that.
Additional Rules
- Below are some suggestions for additional rules which can be added to the Pubs database sample masking set. An ER diagram for Pubs is available to assist you in your analysis of the database structure. If you do add new masking rules it is a good idea to save the modified masking set under a new name - that way if you upgrade the Data Masker software you will not overwrite your enhancements.
- TITLES:advance
- Use a substitution rule and the User Specified Numbers data set to set each value in the advance column to a fixed sum of 1024.00. Place a WHERE Clause on the rule so that values that are already NULL are not affected.
- AUTHORS:...
- Examine the columns in the Authors table and make decisions about which ones contain sensitive information and implement suitable Substitution Rules for them.
- AUTHORS:emp_id
- Use the Text, Alpha-Numeric data set to generate new values for the emp_id column. Note there are two formats in this column AAANNNNNA and A-ANNNNNA. Use the first for all rows or use two rules with WHERE Clauses to mask each in its original format.
- PUB_INFO:logo
- Use the Null Values data set to remove all values from the logo field.
- ROYSCHED:...
- Add a Command Rule to remove all values from the DISCOUNTS table where the stor_id is not null. Hint: just implement a statement like: delete from DISCOUNTS where stor_id is not null in the Command Rule.
[Net 2000 Ltd. Home][Data Masker Home][Data Masker Manual][Data Masker FAQ]