Please enable JavaScript.
Coggle requires JavaScript to display documents.
BUAS MGT, Done Tasks, Legend, Tasks, Hard Questions - Coggle Diagram
-
-
-
-
Hard Questions
John Lime 1 year ago: I'm glad this talk addresses the issue of computation cost, which is basically the main reason why it's a bit difficult to put drl in application. I guess you could use things like SAC so that you don't have to throw away your trajectory every epoch, but then again, the implementation might be a bit difficult, considering that a lot of baseline rl tests are PPO, and implementation of SAC sort of requires an understanding of maximum entropy and its tradeoff between minimizing the entropy of the action distribution emitted by the policy + a lot of training time. But even if you can overcome the cost via imitation learning, 1. how are you going to get the large amount of sample trajectories in the first place and 2. how do you overcome the unpredictability of the policy given patterns of situations or environments unseen during the training process? I heard that a lot of control engineers prefer using analytical control like LQR because of this reason, and having unpredictability in your NPC AI would just look like "bad AI" to average gamers (case study: Total War: Rome II, which if I am correct, used a form of Q-learning (?)). Gamers trying to exploit game mechanics would probably rather see a predictable failure in AI rather than something unpredictable that ends up in the agent aimlessly wandering around the place, which is definitely one of the things that would happen in unexpected situations. https://www.youtube.com/watch?v=Q5RAE73zCKQ