NIST 800-53 REV 5 • SYSTEM AND INFORMATION INTEGRITY
SI-19(4) — Removal, Masking, Encryption, Hashing, or Replacement of Direct Identifiers
Remove, mask, encrypt, hash, or replace direct identifiers in a dataset.
Supplemental Guidance
There are many possible processes for removing direct identifiers from a dataset. Columns in a dataset that contain a direct identifier can be removed. In masking, the direct identifier is transformed into a repeating character, such as XXXXXX or 999999. Identifiers can be encrypted or hashed so that the linked records remain linked. In the case of encryption or hashing, algorithms are employed that require the use of a key, including the Advanced Encryption Standard or a Hash-based Message Authentication Code. Implementations may use the same key for all identifiers or use a different key for each identifier. Using a different key for each identifier provides a higher degree of security and privacy. Identifiers can alternatively be replaced with a keyword, including transforming "George Washington" to "PATIENT" or replacing it with a surrogate value, such as transforming "George Washington" to "Abraham Polk."
Practitioner Notes
Remove, mask, encrypt, hash, or replace direct identifiers (names, SSNs, account numbers) to prevent identification of individuals.
Example 1: Use one-way hashing (SHA-256) to replace direct identifiers with hash values. The hash allows you to link records belonging to the same individual without knowing who they are. Use a salt to prevent rainbow table attacks on the hashes.
Example 2: Use format-preserving encryption (FPE) to encrypt SSNs and account numbers while maintaining their format. The encrypted value looks like a real SSN (9 digits, proper format) but is meaningless without the decryption key. Authorized users can decrypt; others see only the encrypted value.