Please enable JavaScript.

Coggle requires JavaScript to display documents.

Adversarial Attacks and Defenses on Deep Reinforcement Learning (Defenses…

- - - - Adversarial Policies: Attacking Deep Reinforcement Learning (2019)
        
        Overview
        
        The adversary is controlling an adversary agent in the same environment with the legitimate agent
        
        The adversary is not able to manipulate the observations of the legitimate agent but can create natural observations that can act as adversarial inputs and make the agent follow desired policy
        
        This leads to a zero-sum game between the adversarial agent and the legitimate agent
    - - Adversarial Attacks on Neural Network Policies (2017)
        
        Overview
        
        Show the effect of adversarial attacks on neural network policies in DRL
        
        Use the FGSM attack to introduce perturbations in raw input of the DRL policy
      - Tactics of Adversarial Attack on Deep Reinforcement Learning Agents (2017)
        
        Overview
        
        Propose two adversarial attack techniques on DRL schemes, namely, strategically-timed attack and enchanting attacks
        
        Method
        
        Strategically-timed attack: Minimize the reward of the DRL schemes by using adversarial examples on a subset of time steps in an episode of the DRL operation
        
        Enchanting attacks: Luring the DRL agent to a predefined targeted state by using a generative model and a sophisticated planning algorithm
        
        Advantage
        
        Perturbing only 25% of the inputs using the proposed method produces the same results as the previously proposed attacks based on FGSM
      - Delving into adversarial attacks on deep policies (2017)
        
        Overview
        
        Test the effects of adversarial examples and random noise on the DRL policies
        
        Argue that the FGSM-based adversarial examples perform better than random noise
        
        Use the value function to guide the adversarial perturbation injection which reduces the number of adversarial perturbations needed for introducing a malfunction in DRL policies
        
        Method
        
        (3) The recalculation of the perturbation after N samples and adding the previously calculated perturbation to the intermediate steps
        
        (2) The addition of specially de- signed perturbed inputs after N samples
        
        (1) The addition of noise at a fixed frequency
      - Sequential Attacks on Agents for Long-Term Adversarial Goals (2018)
        
        Overview
        
        Use the adversarial transformer network (ATN) to impose adversarial reward on the policy network of DRL
        
        The ATN makes the agent maximize the adversarial reward through a sequence of adversarial inputs
      - TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents (2019)
        
        Overview
        
        Show the vulnerability of DRL models to Trojan attacks with adversary having access to the training phase of the model
        
        Advantage
        
        By only modifying 0.025% of the training data, an adversary can induce such hidden behaviors in the policy that the models perform perfectly well until the Trojan is triggered.
        
        The proposed attack is shown to be resistant against current defense techniques for Trojans
      - Sequential Triggers for Watermarking of Deep Reinforcement Learning Policies (2019)
        
        Overview
        
        Watermarking DRL policies from saving them from model extraction attacks
        
        This involves the integration of a unique response to a specific sequence of states while keeping its impact on performance minimum, hence saving from the unauthorized replication of policies
        
        The unwatermarked policies are not able to follow the identified trajectory which is specified during the training
        
        Advantage
        
        Can be used by adversaries to hide specific patterns in the policy and use them to their benefit later
    - - CopyCAT: Taking Control of Neural Policies with Constant Attacks (2019)
        
        Overview
        
        Propose two types of adversarial attacks to make a DRL agent to follow a desired policy
        
        These attacks are discussed in both the targeted and non-targeted situations
        
        Method
        
        Per-observation attack: Include the creation of adversarial perturbation for every observation of the agent and adding that perturbation to the environment
        
        Constant attack: Include the addition of one universal perturbation, created at the start of the attack, to all the observations
        
        Advantage
        
        Proposed attacks are more successful if the FGSM is used for generating the perturbations in untargeted attack situations
        
        In the case of targeted attacks the FGSM is not able to generate imperceptible adversarial samples
    - - Adversarial Exploitation of Policy Imitation (2019)
        
        Overview
        
        Perform a model extraction attack by using imitation learning while querying the original model iteratively
        
        The adversarial examples generated for the model extracted are transferred successfully to the original model hence affecting its performance in a black-box setting
        
        Use FGSM for generating adversarial examples for the imitated model
  - - - Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks (2017)
        
        Overview
        
        Show that the DQN is vulnerable to adversarial attacks and verify the transferability of adversarial examples across different DQN models
        
        The attack procedure is divided into two phases, initialization and exploitation
        
        Propose an attack method to manipulate the policy of the DQN by exploiting the transferability of adversarial samples
        
        Method
        
        Initialization phase: Include the training of a DQN on adversarial reward function to generate an adversarial policy
        
        Exploitation phase: Include generating adversarial inputs such that the target DQN can be made to follow actions governed by the adversarial policy
        
        Advantage
        
        Use a black-box setting and show a success rate of 70% when adversarial examples are transferred from one model to another
      - A Malicious Attack on the Machine Learning Policy of a Robotic System (2018)
        
        Overview
        
        Evaluate a white-box adversarial attack on the DRL policy of an autonomous robot in a dynamic environment
        
        The adversary generates false routes by tempering sensory data sending to the robot to make the robot to see what the adversary desires
  - - - Robust Deep Reinforcement Learning with Adversarial Attacks (2017)
        
        Overview
        
        Propose three types of gradient-based adversarial attacks on DQN and DDPG techniques for reducing the expected reward by adding perturbations to the observations
        
        Advantage
        
        Proposed attacks perform better than simple FGSM attack in decaying the performance of DRL schemes
        
        Method
        
        First attack: Based on a naive approach of adding random noise to the DRL states to mislead the DRL agent in selecting a sub-optimal action that decays the performance of the DRL scheme
        
        Second attack: A gradient-based (GB) attack, where a novel cost function is introduced for creating adversarial actions, that outperforms the FGSM in finding out the worst possible discrete action to limit the performance of DRL schemes
        
        Third attack: An improved version of the second attack. Instead of using a simple gradient-based approach for generating adversarial perturbation, the authors use stochastic gradient descent (SGD) for adversarial action generation which ultimately misleads the DRL agent to end up in a pre-defined adversarial state
    - - Spatiotemporally Constrained Action Space Attacks on Deep Reinforcement Learning Agents (2019)
        
        Overview
        
        Propose two attacks on the action space of the DRL algorithms
        
        Method
        
        First attack: An optimization problem for minimizing the cumulative reward of the DRL agent with decoupled constraints called myopic action space (MAS) attack
        
        Second attack: The same objective as the first one but with temporally coupled constraints called look-ahead Action Space (LAS) attack - more lethal in deteriorating the performance of the DRL algorithm as it can attack the dynamic information of the agent
        
        Advantage
        
        Perform well in the case of limited resources
        
        Can be used to gain insights into the potential vulnerabilities of the DRL model
        
        Cannot be defended as the action space is independent of the policy
        
        Disadvanage
        
        Can be detected by having a look at the decay in the reward
    - - Reinforcement Learning for Autonomous Defense in Software-Defined Networking (2018)
        
        Overview
        
        Discuss the reaction of the DRL agent in software-defined networking to different adversarial attacks
        
        Method
        
        Flipping reward signals: The adversary can manipulate the reward signal of the model by flipping it for a certain number of times
        
        Manipulating states: The attacker makes two changes in the first few steps of the training, i.e., an adversary can add one false positive and one false negative in the states
      - Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals (2019)
        
        Overview
        
        Discuss the effect of malicious falsification of the reward signal on the agent leading it into taking targeted decisions
        
        Characterize a robust region for policy in which the adversary can never achieve the desired policy while keeping the cost in this region
        
        Use four terms to specify different types of attackers. All of these attackers can mislead the agent into learning a policy desired by the adversary.
        
        Method
        
        (1) Omniscient attacker: Who has all the information before a certain time t
        
        (2) Peer attacker: Who does not know about the transition probabilities but has access to the knowledge the agent has before a time t
        
        (3) Ignorant attacker: Who only knows the cost signals before a time t
        
        (4) Blind attacker: That has no information at time t
  - - - Overview
        
        Propose a common dominant adversarial examples generation method (CDG) for crafting adversarial examples with high confidence for the environment of DRL
        
        The core idea of their attack is the addition of confusing obstacles to the original clean map to confuse the robot by messing with its local information
    - - Overview
        
        Propose a method of finding adversarial examples for DQNs trained for automatic pathfinding
        
        This attack works on first making a DQN learn how to solve the problem of pathfinding and then analyzing it
        
        Based on the analysis, weaknesses presented in the Q-value curves are identified
        
        The attack involves the addition of adversarial examples generated from these weaknesses to the environment
    - - Overview
        
        Introduce online sequential attacks on the environment of the DRL agent by exploiting the temporal consistency of the states
        
        provide two methods for attacks, namely adaptive dimension sampling-based finite difference method (SFD), and optimal frame selection method
      - Advantage
        
        Faster than the FGSM algorithm as no back-propagation is needed
- - - - Overview
        
        Propose a method of protecting the DRL algorithms from adversarial attacks by leveraging an action-conditioned frame prediction module
        
        Detect the presence of adversarial attacks and make the model robust by using the predicted frame instead of the adversarial frame
    - - Overview
        
        Introduce a technique of making the online algorithm robust to adversarial attacks
        
        Detect the presence of adversarial attacks via a supervisory agent by learning separate sub-policies using the Meta-learned Advantage Hierarchy (MLAH) framework
    - - Overview
        
        Propose an advanced Q-learning algorithm for automatic path-finding in robots, that is robust to adversarial attacks by detecting the adversarial inputs
        
        Propose a model to predict the adversarial inputs based on a calculation determined by 5 factors: energy point gravitation, key point gravitation, path gravitation, included angle, and the placid point
        
        The weights for these 5 factors are calculated based on the principle component analysis (PCA)
    - - Overview
        
        Propose a reward confusion matrix to generate rewards to help the RL agent to learn in cases of perturbed/noisy inputs
    - - Overview
        
        Introduce threatened Markov decision processes (TMDPs), a variant of MDP
        
        Support the decision-making process in DRL setting against adversaries that affect the reward generating process
  - - - Overview
        
        Propose the idea of using defensive distillation to deal with adversarial attacks on ML schemes
    - - Overview
        
        Show that defensive distillation give false sense of robustness against adversarial examples
    - - Overview
        
        Present a method of extracting the policy of a dense network to train another comparatively less dense network
    - - Overview
        
        Propose expected entropy regularized distillation which makes the training much faster while guaranteeing convergence
  - - - Overview
        
        Propose adding noise to the parameter state while training
        
        Use FGSM for crafting adversarial samples
        
        Show the performance of the normal agents to deteriorate to almost no performance, while the ones which were retrained using the parameter noise show great performance even in the presence of adversarial inputs
      - Advantage
        
        Very effective in mitigating the effects of both training and test time attacks for both black-box and white-box settings
    - - Overview
        
        Show superior resilience to adversarial attacks by introducing an adversarially robust policy learning (ARPL) algorithm
        
        Involve the use of adversarial examples during training to enable robust policy learning
      - Disadvantage
        
        The agent trained using the ARPL algorithm does not perform as well as the normal one in case of no perturbations.
    - - Overview
        
        Propose Wasserstein robust reinforcement learning (WR2L) as a method of robust policy learning in the presence of an adversary
        
        Formulate policy learning as a zero-sum minimax objective function to ensure robustness to differences in test and train conditions, even in the presence of adversary
    - - Overview
        
        Propose a robust reinforcement learning using a novel min- max game with a Wasserstein constraint for a correct and convergent solver
      - Advantage
        
        Show a significant increase in robustness in the case of both low and high-dimensional control tasks
    - - Overview
        
        Propose a distributionally robust policy iteration scheme to restrict the agent from learning sub-optimal policy while exploring in cases of high-dimensional state/action space
        
        The scheme is based on robust Bellman operators, which provide a lower-bound guarantee on the policy/state values
        
        Present distributionally robust soft actor-critic based on mixed exploration, acting conservatively in the short-term and exploring optimistically in a long run leading to an optimal policy
    - - Overview
        
        Propose probabilistic MDP (PR-MDP) and noisy action robust MDP (NR-MDP) as two new criteria for robustness
    - - Overview
        
        Present a technique to make the DRL algorithm learn in the presence of noisy rewards
        
        The proposed scheme is based on using a noise filter based on a non-linear approximator to filter out the noise and estimate the reward
  - - - Overview
        
        Examine a game approach where the players adjust their actions based on past payoff observations that are subject to adversarial perturbations
    - - Overview
        
        Propose an iterative minimax dynamic game framework that helps in designing robust policies in the presence of adversarial inputs
        
        Propose a method of quantifying the robustness capacity of a policy
  - - - Overview
        
        Retrain their agent on perturbations generated using FGSM and random noise
    - - Overview
        
        Train the DRL model using the adversarial samples generated from the gradient-based attacks
        
        Show that the addition of noise to the training samples while training the model also increases the resilience of the DRL models against adversarial attacks
      - Advantage
        
        Help the algorithm to model uncertainties in the system making them robust to similar adversarial attacks
    - - Overview
        
        Propose adversarial training as a method of robustifying the DRL algorithms against adversarial attacks
    - - Overview
        
        Investigate the robustness of the DRL algorithms to both training and test time attacks
        
        Show that under the training time attack the DQN can learn and become robust by changing the policy
        
        Show the adversarially-trained policies to be more robust to test-time attacks
    - - Overview
        
        Compare the resilience to adversarial attacks of two DQNs: one based on ε-greedy policy learning and another employed NoisyNets which is a parameter-space noise exploration technique.
        
        Result show the NoisyNets to be more resilient to training-time attacks than that of the ε-greedy policy
        
        Argue that this resilience in NoisyNets is due to the enhanced generalize-ability and reduced transferability
    - - Overview
        
        Propose a gradient-based adversarial training technique
        
        Use adversarial perturbations generated using their proposed attacking algorithm, i.e., CDG, for re-training the RL agent
    - - Overview
        
        Propose adversarially guided exploration (AGE) after considering the sample inefficiency of current adversarial training techniques
        
        Based on a modified hybrid of the ε-greedy algorithm and the Boltzmann exploration
        
        Compare the efficiency with ε-greedy and parameter-space noise exploration algorithms and prove its feasibility