Please enable JavaScript.
Coggle requires JavaScript to display documents.
Corrigibility (Why? (convergent instrumental goals (such as? (improved…
Corrigibility
Why?
complexity of value
convergent instrumental goals
such as?
improved rationality
self-preservation
prevent counterfeit utility
resources acquisition
fragility of value
orthogonality
meta-questions
How easy/difficult to solve?
easy
hard
How important?
What are other approaches?
indirect normativity
Who has work on this?
Paul Christiano
What?
at least tolerates or preferably assists many forms of outside correction
such as?
can be turn off
: tolerate / assist the programmers in their attempts to alter or turn off the system
honest
: not attempt to manipulate or deceive its programmers, despite the fact that most possible choices of utility functions would give it incentives to do so
What's a formal definition of 'manipulation'?
auto-reparation
: have a tendency to repair safety measures (such
as shutdown buttons) if they break, or at least to notify
programmers that this breakage has occurred
corrigibility-preservation
: preserve the programmers’ ability to correct or shut down the system (even if the system creates new subsystems, self-modifies, etc.)
won't automatically turn off itself
or prove that it's impossible
How?