Please enable JavaScript.
Coggle requires JavaScript to display documents.
Model fingerprinting
A fingerprint is a piece of information extracted…
Model fingerprinting
A fingerprint is a piece of information extracted from a model than has identification power (ideally, it is robust to some variations and uniquely identifies the model instance). It can be useful of thinking of a fingerprint as two function: fp = extract(model) and verify(fp1, fp2)
[Queries, Representation & Detection: The Next 100 Model Fingerprinting Schemes]
Good survey with poor english on IP protection in general in DNN: watermarking, fingerprinting, active defenses [Intellectual property protection of DNN models]
Related problems
Watermarking
Proactive solutions, the goal is to be able to prove ownership with a sort of "hidden key" that leaves a trace in a model or its output.
Watermarking survey [A Systematic Review on Model Watermarking for Neural Networks]
Model watermarking
A backdoored data point can serve as a model watermark
[Adversarial frontier stitching for remote neural network watermarking]
[Protecting artificial intelligence IPs: a survey of watermarking and fingerprinting for machine learning]
-
Proof-of-learning
Like a proof of work but for training model. Another way to do model authentification (if I have a proof of training, the model belongs to me) [Proof-of-Learning: Definitions and Practice]
Model tampering
It should be that m=m', but we want to make sure that nobody has made any modification to the model.
[Sensitive-Sample Fingerprinting of Deep Neural Networks]
[BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain]
Backdoor prevention
This makes sure nobody has installed a backdoor. This problem is trivial if we have weight access to m and m' so one of them at least is supposed without weight access.
Funny backdoor example: I install a backdoor in ArcFace used as a biometric system such that my face is recognized as someone who has access to all doors in my biological company
[Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks]
A targeted attack aims at gaining predictability over the output (the error outputs a specific class), an untargeted attack is easier and aims at making the model err (any class except the correct one).
[Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks]
User/copy level fingerprinting
A slightly different definition, closer to watermarking, where a base model generate many copies with each a fingerprint. This is the case when a company has many employees, gives them the same model but if the model is found in the wild it wants to be able to know who leaked it.
Not so good survey, with definition of fingerprinting at the copy/user level [Protecting artificial intelligence IPs: a survey of watermarking and fingerprinting for machine learning]
[DeepMarks: A Secure Fingerprinting Framework for Digital Rights Management of Deep Learning Models] finetunes a base model with a different regularization to encode the user-ID in the model's weights.
[DeepSigns: An End-to-End Watermarking Framework for Ownership Protection of Deep Neural Networks] encodes the unique identifier in the weight distribution
-
Active defenses
These branches of research aim at preventing unauthorized utilization of a model by rendering model stealing harder.
[Intellectual property protection of DNN models]
Model authentication
Step 1: take a model M and encode it in M_e
Step 2: create a predict function which takes M_e and a valid key as input x, and return M_e(x, k) = M(x) iff k is a valid key. Invalid key render the model useless.
[Intellectual property protection of DNN models]
Inference perturbation
A defense against query-based model extraction, where the output of the model are perturbed. The goal is to detect if input queries are fraudulent and if yes perturb the output.
[Intellectual property protection of DNN models]
Approaches
Functional fingerprinting
[Queries, Representation & Detection: The Next 100 Model Fingerprinting Schemes]
Classification boundary
[IPGuard: Protecting Intellectual Property of Deep Neural Networks via Fingerprinting the Classification Boundary]
Adversarial fingerprinting
Dominant paradigm. Adversarial sampling exploits the intuition that models tend to be characterized by their decision-boundary. Requires to compute gradients of victim model m.
[AFA: Adversarial fingerprinting authentication for deep neural networks]
[TAFA: A Task-Agnostic Fingerprinting Algorithm for Neural Networks]
Distance between functions using pairs of (x, x_adversarial) [ModelDiff: Testing-Based DNN Similarity Comparison for Model Reuse Detection]
Using Universarl Adversarial Perturbations [Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations]
Using conferrable adversarial examples, robustness against extraction [Deep Neural Network Fingerprinting by Conferrable Adversarial Examples]
Basic strategy using DeepFool [Fingerprinting Deep Neural Networks - a DeepFool Approach]
DeepJudge [Copy, Right? A Testing Framework for Copyright Protection of Deep Learning Models]
Sensitivity at random points of the train set
SSF [Sensitive-Sample Fingerprinting of Deep Neural Networks]
[ModelGiF: Gradient Fields for Model Functional Distance]
Explainability-inspired
:red_flag: [A Zest of LIME: Towards Architecture-Independent Model Distances] seem to be a strong approach
Boolean explainer, global-scale comparison of model [What Changed? Interpretable Model Comparison]
Classifier-based
A classifier is trained on fingerprints of m and m' and predicts if m' is suspicious or not.
[MetaV: A Meta-Verifier Approach to Task-Agnostic Model Fingerprinting]
Negative fingerprinting
Tolstoy’s Anna Karenina principle that states ”All happy families are alike; each unhappy family is unhappy in its own way” = There are many ways to be wrong, only one to be right = Two models that make the same mistakes are suspicious.
However, this principle could fail: tiger, lions, chairs and stools. Errors would probably be very similar for different model. So this principle is dataset-dependent.
AKH baseline [Queries, Representation & Detection: The Next 100 Model Fingerprinting Schemes]
[Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks]
Challenges
- Pointwise behavior
- Query set building choice to make
Upsides
-
Task agnostic
Methods to compare two models, even if the task is not the same
[MetaV: A Meta-Verifier Approach to Task-Agnostic Model Fingerprinting]
[ModelDiff: Testing-Based DNN Similarity Comparison for Model Reuse Detection]
Weight based fingerprint
Symmetries play a role
Goal: know, given weights if the two models are suspiciously related
Why is it interesting: because it does not rely on a dataset, so it avoids a choice
Challenges
- Symmetries (architecture, weight, instance)
- Black box access -> KO
- Model heterogeneity
- Task/architecture difference
[ModelDiff: Testing-Based DNN Similarity Comparison for Model Reuse Detection]
[A Zest of LIME: Towards Architecture-Independent Model Distances] lists limitations and builds the case against comparing model weightsUpsides
- More complete approach: no query bias
Connection with (linear) mode connectivity
There seem to be a way to connect local minima of the loss landscape by low-loss paths: mode connectivity [Taxonomizing local versus global structure in neural network loss landscapes]
Take into account the permutations by an heuristic and it becomes linear mode interpolation [Git Re-Basin: Merging Models modulo Permutation Symmetries]
:bulb: Functional mode connectivity
If we push the idea further and quotient all symmetries, it should further enhance the comparability between models (ex: how many "real functional modes" are there after a training procedure, marking qualitative differences between implemented functions)
-
Threat model: a diverse problem setup
The threat model explains the framework involving the attacker and user/owner/defender and what information is available to each.
Formally, the model owner designs and trains a deep neural network model m (x, θ) = y. Given a suspect model m ′(x, θ′ ) = y′ which has a similar prediction performance on the same task, the problem is to verify whether it is a plagiarized model stolen from the model owner.
[AFA: Adversarial fingerprinting authentication for deep neural networks] directly tackles this problem
Model stealing
Leak, hack, copy, as this Miqu release
In this case, it is a cybersecurity/human responsability
Model extraction
Can be probits or labels-based. Model extraction is non-consenting model distillation, and it is harder because intermediate features cannot be used for example. Vanilla model extraction uses a dataset labeled by the black box victim model, while adversarial model extraction is more sophisticated and seeks to maximize the information gained about the target model to reduce the number of queries for example.
Nomenclature of model extraction, equivalence-class extraction for 2 layer ReLU network [High Accuracy and High Fidelity Extraction of Neural Networks]
Without a query dataset, using a gan [Data-Free Model Extraction]
[Deep Neural Network Fingerprinting by Conferrable Adversarial Examples] is a fingerprinting method robust against extraction
-
-
Defender information
On what ground must the defender decide if the suspected model is a plagiarism case.
- Architecture and parameters
- API access only
Benchmarks
A benchmark stores positive and negative model pairs (m, m'), the goal being to discriminate between them.
- Model Reuse [ModelDiff: Testing-Based DNN Similarity Comparison for Model Reuse Detection]
- SAC Bench, might lack diversity and be tailored for their introduced method [Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks]
Ancillary branches
:<3:Model comparison, (interpretable) distances between models
- Parameter based (but what if architecture are not the same ?)
- Dataset based
Understand how two models differ from the weights and architecture
[Dynamic Interpretability for Model Comparison via Decision Rules]
Rough A/B comparison using a boolean explainer as an interpretable surrogate model [What Changed? Interpretable Model Comparison]
-
:bulb: Find a meaningful way to compare small ReLU networks from their parameter space, across different number of layers and widths
Classification boundary and adversarial examples
A lot of methods use these concepts, because these have high sensibility to model parameters.
:grey_question: do a model m and a reparametrization m' yield the same adversarial examples ?
:bulb: Leverage classification boundary multipoints: there should be points that sit at the intersection of more than 2 classes (D+1 classes in dimension D?). Are these points particularly informative ?
:bulb: Classification boundary randomization
Design an attacker technique that randomizes the decision boundary. Use the concept of dataset-dependent symmetry:
. This would mess up adversarial fingerprinting schemes such as:
- [Fingerprinting Deep Neural Networks - a DeepFool Approach]
- [Deep Neural Network Fingerprinting by Conferrable Adversarial Examples]
- [IPGuard: Protecting Intellectual Property of Deep Neural Networks via Fingerprinting the Classification Boundary]
Also, classification boundary smoothing could help against adversarial examples.
Adversarial attacks transferability
This branch studies how adversarial examples can be created on a model and work on other (more or less related) models. It is important because if a served model m (from a company or administration) is served online, that hackers have launched reconnaissance attack on it to build a white box hacked version, they are able to generate adversarial examples that might transfer to the base model m.
[Deep Neural Network Fingerprinting by Conferrable Adversarial Examples]
Unlearning verification
Remove the effect of some points in the training set. This is the case if a user exercise his right to be forgotten.
[A Zest of LIME: Towards Architecture-Independent Model Distances]
:bulb:Ideas
Backdoors
Backdoor installation
Easier scenario for the attacker: the attacker provides m, and the user must not be able to tell if there is a backdoor. Additionally, the backdoor must be stable to any action taken by the user like finetuning, pruning, fine-pruning, distillation (this one seem crazy hard)
[Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks]
Hacking - how to hide a backdoor
Find a way to place a backdoor using dataset-dependent symmetry concept:

Here D is a dataset used as a validation set by a user. The goal of the attack is to have a strong effect on some points x but making sure the tampering of the network goes unnoticed on any validation dataset.This requires to understand how to manipulate parameter space to make a function behave normally almost everywhere except for some specific inputs.
:bulb: Freeze almost everything and finetune
Select some weights, such that modifying these weights has almost the least effect on the function. Then, install the backdoor by freezing all other weights.
Most general case for the attacker: replace m_1 with m_2 which is functionally equivalent to m_1 on any dataset, except for a backdoor
If the user has access to (A) both weights it is trivial (B) only has API access it seems impossible even if [Sensitive-Sample Fingerprinting of Deep Neural Networks] argues it has solved the problem.
:bulb: Show that (B) is impossible by showing that it is possible to run a pre-classifier to chose between using the original model or the corrupted model, and that if it is well designed, it is impossible to make the API tap into the corrupted model. This is all about hiding and picking up a cue.
Theorem 1 in [High Accuracy and High Fidelity Extraction of Neural Networks] might help showing that a backdoor can easily be undetectable
-
Backdoor detection
Detect any backdoor by weight analysis: be able to tell if 2 models behave very differently on a specific part of the input space, from the weight.
Uncover the culprit: be able to generate sample from the installed backdoor -> for face recognition this will generate the face of the culpritMethodology:
- quantify a notion of acceptable distance between 2 models and look for regions with extreme differences
Research questions: - given a model, can I know if it contains a backdoor ? If yes, can I understand what the backdoor is (generate the face of the culprit) - given two classification models, and at least one of them without access to the weights but only to an API predictions (no logits), can I know if they are the same model or not ?
Detect if a dataset is poisoned [Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering]
computing a "class targetability" [Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks]
-
Distances between models using a dataset
OR
distances between datasets using models
understand what makes a dataset/model specific: most striking contrast with another
Deep learning testing [A Zest of LIME: Towards Architecture-Independent Model Distances]
a class of methods that aim to identify areas of the input space where supposedly similar models disagree with each other on the output space.
-
:<3:Cancel all symmetries at the instance (model with weights) level to compare two functions by looking just at the weights
Distinguish 3 levels of symmetries: general(architectural), weight-dependent, data-dependent
:bulb: Public fingerprinting release
The model m implements a function f.
m produces c based on complete information (weight, architecture...).
Knowing c does not compromise m, nor does it help extract m.
There is a check(f, c) function that checks if c and f are compatible.
Then every m can release its c to show it is honest.
Hard parts: 1) designing c such that it has identification power, robust and does not disclose to much 2) design the check(f, c) function
-