Regularization
– L(theta) = smth + lambda * R(theta), there R(theta) is a regularization term, e.g. L2 (theta^T * theta)
theta_1 = [1.5, 0, 0]
theta_2 = [0.25, 0.5, 0.25]
theta_2 is "better" because it takes information from all features while theta_1 ignores 2nd and 3rd