Please enable JavaScript.
Coggle requires JavaScript to display documents.
Adversarial Attacks and Defenses on Deep Reinforcement Learning (Defenses…
Adversarial Attacks and Defenses on Deep Reinforcement Learning
Attacks on DRL
Target the
Policy
Involving an Adversarial Agent
Adversarial Policies: Attacking Deep Reinforcement Learning (2019)
Overview
The adversary is controlling an adversary agent in the same environment with the legitimate agent
The adversary is not able to manipulate the observations of the legitimate agent but can create natural observations that can act as adversarial inputs and make the agent follow desired policy
This leads to a zero-sum game between the adversarial agent and the legitimate agent
Perturbing the States
Adversarial Attacks on Neural Network Policies (2017)
Overview
Show the effect of adversarial attacks on neural network policies in DRL
Use the FGSM attack to introduce perturbations in raw input of the DRL policy
Tactics of Adversarial Attack on Deep Reinforcement Learning Agents (2017)
Overview
Propose two adversarial attack techniques on DRL schemes, namely,
strategically-timed attack
and
enchanting attacks
Method
Strategically-timed attack
: Minimize the reward of the DRL schemes by using adversarial examples on a subset of time steps in an episode of the DRL operation
Enchanting attacks
: Luring the DRL agent to a predefined targeted state by using a generative model and a sophisticated planning algorithm
Advantage
Perturbing only 25% of the inputs using the proposed method produces the same results as the previously proposed attacks based on FGSM
Delving into adversarial attacks on deep policies (2017)
Overview
Test the effects of adversarial examples and random noise on the DRL policies
Argue that the FGSM-based adversarial examples perform better than random noise
Use the value function to guide the adversarial perturbation injection which reduces the number of adversarial perturbations needed for introducing a malfunction in DRL policies
Method
(3) The recalculation of the perturbation after
N
samples and adding the previously calculated perturbation to the intermediate steps
(2) The addition of specially de- signed perturbed inputs after
N
samples
(1) The addition of noise at a fixed frequency
Sequential Attacks on Agents for Long-Term Adversarial Goals (2018)
Overview
Use the adversarial transformer network (ATN) to impose adversarial reward on the policy network of DRL
The ATN makes the agent maximize the adversarial reward through a sequence of adversarial inputs
TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents (2019)
Overview
Show the vulnerability of DRL models to Trojan attacks with adversary having access to the training phase of the model
Advantage
By only modifying 0.025% of the training data, an adversary can induce such hidden behaviors in the policy that the models perform perfectly well until the Trojan is triggered.
The proposed attack is shown to be resistant against current defense techniques for Trojans
Sequential Triggers for Watermarking of Deep Reinforcement Learning Policies (2019)
Overview
Watermarking DRL policies from saving them from model extraction attacks
This involves the integration of a unique response to a specific sequence of states while keeping its impact on performance minimum, hence saving from the unauthorized replication of policies
The unwatermarked policies are not able to follow the identified trajectory which is specified during the training
Advantage
Can be used by adversaries to hide specific patterns in the policy and use them to their benefit later
Perturbing the Environment
CopyCAT: Taking Control of Neural Policies with Constant Attacks (2019)
Overview
Propose two types of adversarial attacks to make a DRL agent to follow a desired policy
These attacks are discussed in both the targeted and non-targeted situations
Method
Per-observation attack
: Include the creation of adversarial perturbation for every observation of the agent and adding that perturbation to the environment
Constant attack
: Include the addition of one universal perturbation, created at the start of the attack, to all the observations
Advantage
Proposed attacks are more successful if the FGSM is used for generating the perturbations in untargeted attack situations
In the case of targeted attacks the FGSM is not able to generate imperceptible adversarial samples
Model Extraction
Adversarial Exploitation of Policy Imitation (2019)
Overview
Perform a model extraction attack by using imitation learning while querying the original model iteratively
The adversarial examples generated for the model extracted are transferred successfully to the original model hence affecting its performance in a black-box setting
Use FGSM for generating adversarial examples for the imitated model
Target the
Observations
Perturbing the States
Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks (2017)
Overview
Show that the DQN is vulnerable to adversarial attacks and verify the transferability of adversarial examples across different DQN models
The attack procedure is divided into two phases,
initialization
and
exploitation
Propose an attack method to manipulate the policy of the DQN by exploiting the transferability of adversarial samples
Method
Initialization phase
: Include the training of a DQN on adversarial reward function to generate an adversarial policy
Exploitation phase
: Include generating adversarial inputs such that the target DQN can be made to follow actions governed by the adversarial policy
Advantage
Use a black-box setting and show a success rate of 70% when adversarial examples are transferred from one model to another
A Malicious Attack on the Machine Learning Policy of a Robotic System (2018)
Overview
Evaluate a white-box adversarial attack on the DRL policy of an autonomous robot in a dynamic environment
The adversary generates false routes by tempering sensory data sending to the robot to make the robot to see what the adversary desires
Target the
Reward
Perturbing the States
Robust Deep Reinforcement Learning with Adversarial Attacks (2017)
Overview
Propose
three types of gradient-based adversarial attacks
on DQN and DDPG techniques for reducing the expected reward by adding perturbations to the observations
Advantage
Proposed attacks perform better than simple FGSM attack in decaying the performance of DRL schemes
Method
First attack
: Based on a naive approach of adding random noise to the DRL states to mislead the DRL agent in selecting a sub-optimal action that decays the performance of the DRL scheme
Second attack
: A gradient-based (GB) attack, where a novel cost function is introduced for creating adversarial actions, that outperforms the FGSM in finding out the worst possible discrete action to limit the performance of DRL schemes
Third attack
: An improved version of the second attack. Instead of using a simple gradient-based approach for generating adversarial perturbation, the authors use stochastic gradient descent (SGD) for adversarial action generation which ultimately misleads the DRL agent to end up in a pre-defined adversarial state
Perturbing the Action Space
Spatiotemporally Constrained Action Space Attacks on Deep Reinforcement Learning Agents (2019)
Overview
Propose two attacks on the action space of the DRL algorithms
Method
First attack
: An optimization problem for minimizing the cumulative reward of the DRL agent with decoupled constraints called
myopic action space (MAS) attack
Second attack
: The same objective as the first one but with temporally coupled constraints called
look-ahead Action Space (LAS) attack
- more lethal in deteriorating the performance of the DRL algorithm as it can attack the dynamic information of the agent
Advantage
Perform well in the case of limited resources
Can be used to gain insights into the potential vulnerabilities of the DRL model
Cannot be defended as the action space is independent of the policy
Disadvanage
Can be detected by having a look at the decay in the reward
Reward Falsification
Reinforcement Learning for Autonomous Defense in Software-Defined Networking (2018)
Overview
Discuss the reaction of the DRL agent in software-defined networking to different adversarial attacks
Method
Flipping reward signals
: The adversary can manipulate the reward signal of the model by flipping it for a certain number of times
Manipulating states
: The attacker makes two changes in the first few steps of the training, i.e., an adversary can add one false positive and one false negative in the states
Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals (2019)
Overview
Discuss
the effect of malicious falsification of the reward signal
on the agent leading it into taking targeted decisions
Characterize a robust region for policy in which the adversary can never achieve the desired policy while keeping the cost in this region
Use four terms to specify different types of attackers. All of these attackers can mislead the agent into learning a policy desired by the adversary.
Method
(1) Omniscient attacker
: Who has all the information before a certain time
t
(2) Peer attacker
: Who does not know about the transition probabilities but has access to the knowledge the agent has before a time
t
(3) Ignorant attacker
: Who only knows the cost signals before a time
t
(4) Blind attacker
: That has no information at time
t
Target the
Environment
Gradient Band-based Adversarial Training for Generalized Attack Immunity of A3C Path Finding (2018)
Overview
Propose a
common dominant adversarial examples generation method (CDG)
for crafting adversarial examples with high confidence for the environment of DRL
The core idea of their attack is the addition of confusing obstacles to the original clean map to confuse the robot by messing with its local information
Adversarial Examples Construction Towards White-Box Q Table Variation in DQN Pathfinding Training (2018)
Overview
Propose a method of finding adversarial examples for DQNs trained for automatic pathfinding
This attack works on first making a DQN learn how to solve the problem of pathfinding and then analyzing it
Based on the analysis, weaknesses presented in the Q-value curves are identified
The attack involves the addition of adversarial examples generated from these weaknesses to the environment
Characterizing Attacks on Deep Reinforcement Learning (2019)
Overview
Introduce online sequential attacks on the environment of the DRL agent by exploiting the temporal consistency of the states
provide two methods for attacks, namely
adaptive dimension sampling-based finite difference method (SFD)
, and
optimal frame selection method
Advantage
Faster than the FGSM algorithm as no back-propagation is needed
Defenses for DRL
Adversarial Detection
Detecting Adversarial Attacks on Neural Network Policies with Visual Foresight (2017)
Overview
Propose a method of protecting the DRL algorithms from adversarial attacks by leveraging an action-conditioned frame prediction module
Detect the presence of adversarial attacks and make the model robust by using the predicted frame instead of the adversarial frame
Online Robust Policy Learning in the Presence of Unknown Adversaries (2018)
Overview
Introduce a technique of making the online algorithm robust to adversarial attacks
Detect the presence of adversarial attacks via a supervisory agent by learning separate sub-policies using the Meta-learned Advantage Hierarchy (MLAH) framework
A PCA-Based Model to Predict Adversarial Examples on Q-Learning of Path Finding (2018)
Overview
Propose an advanced Q-learning algorithm for automatic path-finding in robots, that is robust to adversarial attacks by detecting the adversarial inputs
Propose a model to predict the adversarial inputs based on a calculation determined by 5 factors:
energy point gravitation, key point gravitation, path gravitation, included angle, and the placid point
The weights for these 5 factors are calculated based on the principle component analysis (PCA)
Reinforcement Learning with Perturbed Rewards (2018)
Overview
Propose a reward confusion matrix to generate rewards to help the RL agent to learn in cases of perturbed/noisy inputs
Reinforcement Learning under Threats (2018)
Overview
Introduce threatened Markov decision processes (TMDPs), a variant of MDP
Support the decision-making process in DRL setting against adversaries that affect the reward generating process
Defensive Distillation
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks (2016)
Overview
Propose the idea of using defensive distillation to deal with adversarial attacks on ML schemes
Defensive Distillation is Not Robust to Adversarial Examples (2016)
Overview
Show that defensive distillation give false sense of robustness against adversarial examples
Policy Distillation (2016)
Overview
Present a method of extracting the policy of a dense network to train another comparatively less dense network
Distilling Policy Distillation (2019)
Overview
Propose expected entropy regularized distillation which makes the training much faster while guaranteeing convergence
Robust Learning
Mitigation of Policy Manipulation Attacks on Deep Q-Networks with Parameter-Space Noise (2018)
Overview
Propose adding noise to the parameter state while training
Use FGSM for crafting adversarial samples
Show the performance of the normal agents to deteriorate to almost no performance, while the ones which were retrained using the parameter noise show great performance even in the presence of adversarial inputs
Advantage
Very effective in mitigating the effects of both training and test time attacks for both black-box and white-box settings
Adversarially Robust Policy Learning: Active Construction of Physically-Plausible Perturbations (2017)
Overview
Show superior resilience to adversarial attacks by introducing an adversarially robust policy learning (ARPL) algorithm
Involve the use of adversarial examples during training to enable robust policy learning
Disadvantage
The agent trained using the ARPL algorithm does not perform as well as the normal one in case of no perturbations.
Robust Adversarial Reinforcement Learning (2017)
Overview
Propose Wasserstein robust reinforcement learning (WR2L) as a method of robust policy learning in the presence of an adversary
Formulate policy learning as a zero-sum minimax objective function to ensure robustness to differences in test and train conditions, even in the presence of adversary
Wasserstein Robust Reinforcement Learning (2019)
Overview
Propose a robust reinforcement learning using a novel min- max game with a Wasserstein constraint for a correct and convergent solver
Advantage
Show a significant increase in robustness in the case of both low and high-dimensional control tasks
Distributionally Robust Reinforcement Learning (2019)
Overview
Propose a distributionally robust policy iteration scheme to restrict the agent from learning sub-optimal policy while exploring in cases of high-dimensional state/action space
The scheme is based on robust Bellman operators, which provide a lower-bound guarantee on the policy/state values
Present distributionally robust soft actor-critic based on mixed exploration, acting conservatively in the short-term and exploring optimistically in a long run leading to an optimal policy
Action Robust Reinforcement Learning and Applications in Continuous Control (2019)
Overview
Propose probabilistic MDP (PR-MDP) and noisy action robust MDP (NR-MDP) as two new criteria for robustness
Enhancing performance of reinforcement learning models in the presence of noisy rewards (2019)
Overview
Present a technique to make the DRL algorithm learn in the presence of noisy rewards
The proposed scheme is based on using a noise filter based on a non-linear approximator to filter out the noise and estimate the reward
Game theoretic approach
On the robustness of learning in games with stochastically perturbed payoff observations (2014)
Overview
Examine a game approach where the players adjust their actions based on past payoff observations that are subject to adversarial perturbations
Minimax Iterative Dynamic Game: Application to Nonlinear Robot Control Tasks (2017)
Overview
Propose an iterative minimax dynamic game framework that helps in designing robust policies in the presence of adversarial inputs
Propose a method of quantifying the robustness capacity of a policy
Adversarial Training
Delving into adversarial attacks on deep policies (2017)
Overview
Retrain their agent on perturbations generated using FGSM and random noise
Robust Deep Reinforcement Learning with Adversarial Attacks (2017)
Overview
Train the DRL model using the adversarial samples generated from the gradient-based attacks
Show that the addition of noise to the training samples while training the model also increases the resilience of the DRL models against adversarial attacks
Advantage
Help the algorithm to model uncertainties in the system making them robust to similar adversarial attacks
Reinforcement Learning for Autonomous Defense in Software-Defined Networking (2018)
Overview
Propose adversarial training as a method of robustifying the DRL algorithms against adversarial attacks
Whatever Does Not Kill Deep Reinforcement Learning, Makes It Stronger (2017)
Overview
Investigate the robustness of the DRL algorithms to both training and test time attacks
Show that under the training time attack the DQN can learn and become robust by changing the policy
Show the adversarially-trained policies to be more robust to test-time attacks
Mitigation of Policy Manipulation Attacks on Deep Q-Networks with Parameter-Space Noise (2018)
Overview
Compare the resilience to adversarial attacks of two DQNs: one based on ε-greedy policy learning and another employed NoisyNets which is a parameter-space noise exploration technique.
Result show the NoisyNets to be more resilient to training-time attacks than that of the ε-greedy policy
Argue that this resilience in NoisyNets is due to the enhanced generalize-ability and reduced transferability
Gradient Band-based Adversarial Training for Generalized Attack Immunity of A3C Path Finding (2018)
Overview
Propose a gradient-based adversarial training technique
Use adversarial perturbations generated using their proposed attacking algorithm, i.e., CDG, for re-training the RL agent
Analysis and Improvement of Adversarial Training in DQN Agents With Adversarially-Guided Exploration (AGE) (2019)
Overview
Propose adversarially guided exploration (AGE) after considering the sample inefficiency of current adversarial training techniques
Based on a modified hybrid of the ε-greedy algorithm and the Boltzmann exploration
Compare the efficiency with ε-greedy and parameter-space noise exploration algorithms and prove its feasibility
Paper:
Challenges and Countermeasures for Adversarial Attacks on Deep Reinforcement Learning (2020)