NIST 800-53 REV 5 • SYSTEM AND INFORMATION INTEGRITY
SI-19(6) — Differential Privacy
Prevent disclosure of personally identifiable information by adding non-deterministic noise to the results of mathematical operations before the results are reported.
Supplemental Guidance
The mathematical definition for differential privacy holds that the result of a dataset analysis should be approximately the same before and after the addition or removal of a single data record (which is assumed to be the data from a single individual). In its most basic form, differential privacy applies only to online query systems. However, it can also be used to produce machine-learning statistical classifiers and synthetic data. Differential privacy comes at the cost of decreased accuracy of results, forcing organizations to quantify the trade-off between privacy protection and the overall accuracy, usefulness, and utility of the de-identified dataset. Non-deterministic noise can include adding small, random values to the results of mathematical operations in dataset analysis.
Practitioner Notes
Use differential privacy techniques to provide mathematical guarantees that individual records cannot be identified from published data analysis results.
Example 1: When running analytics on employee data, use a differential privacy library (like Google's DP library or OpenDP) that adds calibrated noise to query results. Analysts get useful aggregate statistics while individual privacy is mathematically guaranteed.
Example 2: Configure data sharing platforms to enforce a privacy budget — a limit on how many queries can be run against a dataset. Each query consumes some of the budget, and once exhausted, no more queries are allowed. This prevents attackers from combining many queries to isolate individuals.