Please enable JavaScript.
Coggle requires JavaScript to display documents.
Numerical Analysis I - Coggle Diagram
Numerical Analysis I
Singular Value Decomposition (SVD)
\(A = U \Sigma V^T\)
where
\(A \in \mathbb{R}^{m * n}\)
\(U \in \mathbb{R}^{m * m}\) orthogonal
\(\Sigma \in \mathbb{R}^{m * n}\) diagonal
\(V \in \mathbb{R}^{n * n}\) orthogonal
-
-
Singular Values
\(\theta_i \in \mathbb{R}^m\)
\(\begin{bmatrix}
\theta_1 & 0 & 0 & ... & 0\\
0 & \theta_2 & 0 & ... & 0\\
...\\
0 & 0 & 0 & ... & \theta_n
\end{bmatrix}\)
\(\theta_1 \geq \theta_2 \geq ... \geq \theta_n \geq 0\)
Eigenvalue Decomposition
Let \(A \in \mathbb{R}^{m*n}\) and \(m \geq n\)
\(u_i = \) eigenvector of \(AA^T\)
\(\theta_i = \sqrt{\lambda_i}\)
\(v_i = \) eigenvector of \(A^TA\)
-
\(A^TA = (U \Sigma V^T)^T(U \Sigma V^T)\)
\(=(V \Sigma^T U^T)(U \Sigma V^T)\)
\(=V \Sigma^T \Sigma V^T\)
\(=V \begin{bmatrix}
\theta_1^2 & 0 & 0 & ... \\
0 & \theta_2^2 & 0 & ... \\
... \\
0 & 0 & ... & \theta_n^2
\end{bmatrix} V^T\)
\(AA^T = (U \Sigma V^T)(U \Sigma V^T)^T\)
\(=(U \Sigma V^T)(V \Sigma^T U^T)\)
\(=U \Sigma \Sigma^T U^T\)
\(=U \begin{bmatrix}
\theta_1^2 & 0 & 0 & 0 & 0 & ... \\
0 & \theta_2^2 & 0 & 0 & 0 & ... \\
... \\
0 & 0 & ... & \theta_n^2 & 0 & ...\\
0 & 0 & 0 & 0 & 0 & ... \\
... & ... & ... & ... & ... & 0
\end{bmatrix} U^T\)
Note
If \(A\) is SPD, then \(A^TA = AA^T = A^2\)
Then \(\theta_i = \lambda_i\)
Effect on Unit Ball (2 norm)
\(A = U \Sigma V^T \Rightarrow AV = U \Sigma = \theta_i u_i\)
\(||A||_2 = \theta_1\)
Note
Let \(A \in \mathbb{R}^{n*n}\) then
\(A^{-1} = V \Sigma^{-1} U^T\)
\(= V \begin{bmatrix}
\frac{1}{\theta_1} & 0 & 0 & ...\\
0 & \frac{1}{\theta_2} & 0 & ...\\
...\\
0 & 0 & 0 & \frac{1}{\theta_n}
\end{bmatrix} U^T\)
so \(||A^{-1}||_2 = \frac{1}{\theta_1}\)
Conditioning Number
\(\kappa_2(A) = ||A||_2 * ||A^{-1}||_2 = \frac{\theta_1}{\theta_n}\)
The larger \(\kappa_2(A)\) is, the more the unit ball is stretched
Thin SVD
\(M \in \mathbb{R}^{m * n}\)
\(U \in \mathbb{R}^{m * n}\) orthogonal
\(\Sigma \in \mathbb{R}^{n * n}\) diagonal
\(V \in \mathbb{R}^{n * n}\) orthogonal
-
\(\Sigma_n\) Matrix
Singular Values
The covariance matrix (not normalized)
Percentage of variance captured
-
Regular-scores \(U_r\) (non-scaled coordinate)
Scale recovered coordinate of data in principal space
Rows are called PC-scores
\(U_r = U\Sigma = MV\)
-
Solving \(Ax = b\)
- Suppose \(A \in \mathbb{R}^{n*n}\), non-singular, and \(b \in \mathbb{R}^n\), then unique solution \(x\) must exist
Direct Method
"exact solution" (including errors, but correct up to finite # of digits)
LU decomposition
Gaussian Elimination
Each rows under the current row will be subtracted to make the pivoting column zero except for the pivot
Row Operation
\(A_{i.} \Leftarrow A_{i.} + c_{ij} * A_{j.}\)
\(j\) is pivot and \(c_{ij} = -\frac{A_i}{A_{pivot}}\)
Each operation can be expressed as \(E_{ij} = \begin{bmatrix}
1 & 0 & 0 & 0 & ... \\
c_{ij} & 1 & 0 & 0 & ... \\
0 & 0 & 1 & 0 & ... \\
0 & 0 & 0 & 1 & ...
\end{bmatrix}
\)
Find L and U
\((E_1 E_2 ... E_n A)x= b\)
\(Ux = E_1 E_2 ... E_n b\)
\(L = E_n^{-1} E_{n-1}^{-1} ... E_1^{-1} \)
Stroke of lucks
\(E_{i_oj_o}^{-1} = \begin{bmatrix}
1 & 0 & 0 & 0 & ... \\
-c_{i_oj_o} & 1 & 0 & 0 & ... \\
0 & 0 & 1 & 0 & ... \\
0 & 0 & 0 & 1 & ...
\end{bmatrix}
\)
\(E_{21}^{-1} * E_{31}^{-1} * ... * E_{ij}^{-1} = \begin{bmatrix}
1 & 0 & 0 & 0 & ... \\
-c_{21} & 1 & 0 & 0 & ... \\
-c_{31} & -c_{31} & 1 & 0 & ... \\
-c_{41} & -c_{42} & -c_{43} & 1 & ...
\end{bmatrix}
\)
-
Computational Efficiency
Each row (except for first row) does the following calculation:
\(A_{i.} \Leftarrow A_{i.} + c_{ij} * A_{j.}\)
- one division \(c\)
- n multiplication \(c * A_j.\)
- n additions \(A_{i.} + c * A_j.\)
Total: 2n + 1 flops
So \(n * n\) matrix take \((n-1)(2n+1)\) flops for first step
-
Generalized
Step k requires:
\((n-k)(2(n-k)+3\)
\(=2(n-k)^2 + 3(n-k)\)
So the total is:
\(\Sigma^{n-1}_{k=1} 2(n-k)^2 + 3(n-k)\)
\(= \frac{2}{3}n^3 + O(n^2)\)
LU With Pivoting
Gaussian Elimination with Partial Pivoting (GEPP)
- If the pivot is zero
- pivot is small relative to other entries
\(M_3 P_3 M_2 P_2 M_1 P_1 A = U\)
\(P_i\) represents row swapping
\(M_i\) represents column operation
\(P_3 P_2 P_1 A = \tilde{M_1}^{-1} \tilde{M_2}^{-1} \tilde{M_3}^{-1} U\)
Similar to the 2nd stroke of luck
\(\tilde{M_1}^{-1} \tilde{M_2}^{-1} \tilde{M_3}^{-1} = \begin{bmatrix}
1 & 0 & 0 & 0 & ... \\
-c_{21} & 1 & 0 & 0 & ... \\
-c_{31} & -c_{31} & 1 & 0 & ... \\
-c_{41} & -c_{42} & -c_{43} & 1 & ...
\end{bmatrix}
\)
Where in \(\tilde{M_k} = (P_n P_{n-1} ...) M_k (P_{k+1}^T ... P_n^T)\),
\(P_n P_{n-1} ...\) is row swap
and
\(P_{k+1}^T ... P_n^T\) is column swap
of \(M_k\)
3rd stroke of luck
Define:
\(\tilde{M_3} = M_3\)
\(\tilde{M_2} = P_3 M_2 P_3^T \Rightarrow \tilde{M_2} P_3 = P_3 M_2\)
\(\tilde{M_1} = P_3 P_2 M_1 P_2^T P_3^T \Rightarrow \tilde{M_1} P_3 P_2 = P_3 P_2 M_1\)
Gives
\((\tilde{M_3}) (\tilde{M_2} [P_3) (P_3^T] \tilde{M_1} P_3 P_2) P_1 A = U\)
\(\tilde{M_3} \tilde{M_2} \tilde{M_1} P_3 P_2 P_1 A = U\)
If SPD
Chelosky Decomposition
SPD matrices do not require pivoting to ensure stability.
\(A = GG^T\), \(G\) is lower triangular matrix
QR Decomposition
We want projection of \(b\) onto \(Col(A)\) to be orthogonal to \(Ax\)
\(\Rightarrow Ax - b \perp Col(A)\)
\(\Rightarrow A^T(Ax - b) = 0\)
Gram-Schmidt Process
project vector to higher dimension
\(q_1 = \frac{a_1}{||a_1||}\)
\(e_2 = a_2 - (a_2^Tq_1)q_1\), \(q_2 = \frac{e_2}{||e_2||}\)
...
\(e_n = a_n - proj_{span(q_1, q_2, ..., q_{n-1})} a_n\), \(q_n = \frac{e_n}{||e_n||}\)
This process can be expressed with matrix multiplication (QR) (Week7Notes 2 pg.2)
- It is more computationally expensive
- But it is stable \(\kappa_2(Q) = 1 \Rightarrow \kappa_2(R) = \kappa_2(A)\)
Householder Reflectors
Opposite of Gram-Schmidt. From higher dimension to lower dimension
Project \(a\) onto the \(n - 1\) dimensional hyperplane orthogonal to unit vector \(u \in \mathbb{R}^n\)
\(a - (u^Ta)u = a - u(u^Ta) = a - (uu^T)a\)
Then, go twice as far to reflect \(a\) on the opposite side of the n - 1 hyperplane orthogonal to the unit vector \(u\)
\(a - 2(uu^T)a = (I - 2uu^T)a\)
-
Iterative Method
Sequence of values \(x^{(0)}, x^{(1)}, ... , x^{(k)}\) that converges to the real \(x\) as \(k \rightarrow \infty\)
Setup
Let \(A = M - N\)
Then
\((M-N)x = b\)
\(Mx - Nx = b\)
\(Mx = Nx + b\)
\(x = M^{-1}Nx + M^{-1}b\)
\(\Rightarrow\)
\(x = Tx + C\)
-
-
-
Jacobi
\(M = D\), D is main diagonal of A
-
-
Slow convergence of Jacobi and Gauss-Seidel
Successive over-relaxation (SOR) to deal with it
Week8Note 2
Why avoid \(A^{-1}\)?
- matrix-vector multiplication takes \(2n^2\)
- \(A^{-1}\) takes about 4x as much work as computing \(LU\)
- It's unstable to calculate
Conditioning of A
If A is nearly singular, it leads to poor conditioning
-
Upper bound for error amplification
\(Ax = b\) <- true solution
Subtract
\(A\tilde{x} = \tilde{b}\) <- perturbed solution
=
\(A(x - \tilde{x}) = b - \tilde{b}\)
\(x - \tilde{x} = A^{-1}(b - \tilde{b})\)
\(||x - \tilde{x}|| = ||A^{-1}(b - \tilde{b})|| \leq ||A^{-1}||* ||b - \tilde{b}||\)
Conditioning Number (Kappa)
\(\frac{||x - \tilde{x}||}{||A|| * ||x||} \leq \frac{||x - \tilde{x}||}{||b||} = \frac{||x - \tilde{x}||}{||Ax||} \leq ||A^{-1}|| \frac{||b - \tilde{b}||}{||b||}\)
\(\frac{||x - \tilde{x}||}{||A|| * ||x||} \leq ||A^{-1}|| \frac{||b - \tilde{b}||}{||b||}\)
\(\frac{||x - \tilde{x}||}{||x||} \leq ||A||*||A^{-1}|| \frac{||b - \tilde{b}||}{||b||}\)
So \(cond(A)\) or \(\kappa(A) = ||A||*||A^{-1}||\)
-
-
-
Round-off Error
\(fl(x)\)
Machine Epsilon \(\eta \)
Half the distance from 1 to the next number.
- It's the smallest possible positive value. It is not the smallest floating-point number.
- \(\tilde{x} = x(1 + z), |z| < \eta \), \(\tilde{x}\) is the floating-point representation of x. \(\tilde{x} = fl(x)\)
- Re-written as \(|\frac{\tilde{x} - x}{x}| = |z| < \eta\), it gives bound for relative error representing \(x\)
Cancellation Error
Multiplication
\(\tilde{x_1}\tilde{x_2}\)
\(= fl(fl(x_1)fl(x_2))\)
\(=[x_1(1+z_1)x_2(1+z_2)](1+z_3), |z_i| < \eta\)
\(= x_1x_2(1+z_1+z_2+z_3+z_1z_2+z_2z_3+z_1z_3+z_1z_2z_3)\)
Notice that \(z_1z_2+z_2z_3+z_1z_3+z_1z_2z_3\) is very small, so we can ignore them
\(\frac{\tilde{x_1}\tilde{x_2} - x_1x_2}{x_1x_2} = z_1+z_2+z_3\)
Addition
\(\tilde{x_1} + \tilde{x_2}\)
\(=fl(fl(x_1) + fl(x_2))\)
\(=[x_1(1+z_1) + x_2(1+z_2)](1+z_3)\)
\(=x_1+x_2+x_1z_1+x_2z_2+x_1z_3+x_2z_3+...\)
\(\frac{\tilde{x_1} + \tilde{x_2} - (x_1 + x_2)}{x_1 + x_2} = z_3 + \frac{x_1z_1 + x_2z_2}{x_1+x_2}\)
Notice that the error gets large as \(x_1 + x_2 \approx 0\)
or subtracting two very similar numbers
ex) \(\sqrt{x+1} - \sqrt{x}\) as x gets larger, two terms gets close to each other, causing large numerical error
Stability/Conditioning
Conditioning
property of the problem.
- Well conditioned: Small change in input -> small change in output
-
Stability
Property of an algorithm.
- Stable: Small error in input -> Small error in output
ex) \(\sqrt{x+1} - \sqrt{x}\) \(\forall x > N\) for some big N, the error keep growing quickly due to cancellation error. (unstable)
Diagonalizable Matrix
\(A = SDS^{-1}\)
where
\(D = \begin{bmatrix}
\lambda_1 & 0 & 0 & 0 & ... \\
0 & \lambda_2 & 0 & 0 & ...\\
...
\end{bmatrix}\), \(S = \begin{bmatrix}
x_1 & x_2 & ...
\end{bmatrix}\)
And \(Ax_1 = \lambda_1 x_1\), \(Ax_2 = \lambda_2 x_2\), ...