Please enable JavaScript.
Coggle requires JavaScript to display documents.
Corrigibility (Toon) (Properties of a corrigible system (Non-adversarial…
Corrigibility (Toon)
-
Proposed solutions
moral uncertainty
fully updated deference
-
badly specified utility function space #
-
When obedience? :star:
-
-
Some learning methods work, some don't
explicitly penalizing adversity #
adversity badly specified #
-
AI in a box
Will (want to) outsmart us #
Subproblems
-
Shutdown problem
Correctly specifying shutdown #
-
-
-