Please enable JavaScript.
Coggle requires JavaScript to display documents.
Handling sensitive datasets - Coggle Diagram
Handling sensitive datasets
Problems
One person must be granted permission to
view ALL samples
Frequency attacks if data is poorly encrypted
Degrading performance of ML models on encrypted data
Notes
Find balance between security and utility
PII - personally identifiable information
Goals
Identify sensitive data
columns
identify -> secure -> document
text-based
tackled with regexes (credit card numbers, etc)
unstructured free-form
audio, image, video. etc
special tooling to detect and obfuscate information
unstructured content
anything else that provides substantial context for identification
combination of fields
patterns out of obfuscated column data that can still uniquely identify a person
Create governance plan
Protect data (without hurting ML part)
Embeddings
Public
Third-party
Home-grown
Security
Access control
who can access
audit logging
who ,when, where accessed
physical security
servers in a private net
Data Encryption
in transit
at rest
Data retention policy