\( \{ X_t, B_t | t \in T\} \) is defined by its decision epochs, t, State space S, Decision Space D (with actions a), expected immediate rewards \(r(i,t)\), and transition probabilities \(p(j|i,a) \).
Furthermore, \( \delta_t : S \rightarrow S\) is called a decision rule (it takes a state and time and gives another state), and a sequence of decision rules: \( \pi = (\delta_1, \delta_2, \cdots)\) is called a policy
If all the decisions in a policy are independent of time, then the policy is called stationary. In other words: \( \pi = (\delta, \delta, \cdots)\)
This generates a "normal" Markov chain, with a transition matrix
-