线性代数|机器学习-P14随机矩阵乘法
文章目录
- 1. 概述
- 2. 样本均值与方差计算
- 3. 随机抽样AB
- 4. 拉格朗日乘子法
1. 概述
- 单个样本均值和方差
- 重复n次同一实验的总样本和总方差
- 拉格朗日乘子法求最大概率
- AB矩阵通过概率采样得到CR矩阵
A
S
S
T
B
=
C
R
,
A
B
≈
C
R
ASS^TB=CR,AB\approx CR
2. 样本均值与方差计算
- 假设我们有一个实验盒,里面放着a,b两个球,我们一次只能取一个球,取完后放回原地后重复取第二次,单次中只有50%概率取到
[
a
,
0
]
[a,0]
[
0
,
b
]
[0,b]
E
1
(
x
)
,
D
1
(
x
)
E_1(x),D_1(x)
E
1
(
x
)
=
1
2
[
a
,
0
]
+
1
2
[
0
,
b
]
=
[
1
2
a
,
1
2
b
]
\begin{equation} E_1(x)=\frac{1}{2}[a,0]+\frac{1}{2}[0,b]=[\frac{1}{2}a,\frac{1}{2}b] \end{equation}
D
1
(
x
)
=
1
2
[
[
a
,
0
]
−
[
1
2
a
,
1
2
b
]
]
2
+
1
2
[
[
0
,
b
]
−
[
1
2
a
,
1
2
b
]
]
2
=
[
a
2
4
,
b
2
4
]
\begin{equation} D_1(x)=\frac{1}{2}[[a,0]-[\frac{1}{2}a,\frac{1}{2}b]]^2+\frac{1}{2}[[0,b]-[\frac{1}{2}a,\frac{1}{2}b]]^2=[\frac{a^2}{4},\frac{b^2}{4}] \end{equation}
- 那么重复上面两次的实验的期望,方差
E
2
(
x
)
,
D
2
(
x
)
E_2(x),D_2(x)
E
2
(
x
)
=
2
E
1
(
x
)
=
[
a
,
b
]
;
D
2
(
x
)
=
2
D
1
(
x
)
=
[
a
2
2
,
b
2
2
]
\begin{equation} E_2(x)=2E_1(x)=[a,b];D_2(x)=2D_1(x)=[\frac{a^2}{2},\frac{b^2}{2}] \end{equation}
- 期望与方差的公式,需要记住,后面会用,后面公式前提是单独重复实验
D
(
x
)
=
E
(
x
2
)
−
E
2
(
x
)
,
D
n
(
x
)
=
n
D
(
x
)
,
E
n
(
x
)
=
n
E
(
x
)
\begin{equation} D(x)=E(x^2)-E^2(x),D_n(x)=nD(x),E_n(x)=nE(x) \end{equation}
3. 随机抽样AB
假设我们有一个1000000长度的矩阵AB,这样我们对AB单独进行计算是不显示的,我们希望通过概率的方式对矩阵AB进行采样形成新的矩阵CR,具体如下:
A
S
=
[
a
1
a
2
a
3
]
[
s
11
0
0
0
0
s
32
]
=
[
s
11
a
1
s
32
a
3
]
\begin{equation} AS=\begin{bmatrix}a_1&a_2&a_3\end{bmatrix}\begin{bmatrix}s_{11}&0\\0&0\\0&s_{32}\end{bmatrix}=\begin{bmatrix}s_{11}a_1&s_{32}a_3\end{bmatrix} \end{equation}
AS=[a1a2a3]
s110000s32
=[s11a1s32a3]
S
T
B
=
[
s
11
0
0
0
0
s
32
]
[
b
1
T
b
2
T
b
3
T
]
=
[
s
11
b
1
T
s
32
b
3
T
]
\begin{equation} S^TB=\begin{bmatrix}s_{11}&0&0\\0&0&s_{32}\end{bmatrix}\begin{bmatrix}b_1^T\\b_2^T\\b_3^T\end{bmatrix}=\begin{bmatrix}s_{11}b_1^T\\s_{32}b_3^T\end{bmatrix} \end{equation}
STB=
s110000s32
b1Tb2Tb3T
=
s11b1Ts32b3T
A
S
S
T
B
=
[
a
1
a
2
a
3
]
[
s
11
0
0
0
0
s
32
]
[
s
11
0
0
0
0
s
32
]
[
b
1
T
b
2
T
b
3
T
]
=
s
11
2
a
1
b
1
T
+
s
32
2
a
3
b
3
T
\begin{equation} ASS^TB=\begin{bmatrix}a_1&a_2&a_3\end{bmatrix}\begin{bmatrix}s_{11}&0\\0&0\\0&s_{32}\end{bmatrix}\begin{bmatrix}s_{11}&0&0\\0&0&s_{32}\end{bmatrix}\begin{bmatrix}b_1^T\\b_2^T\\b_3^T\end{bmatrix}=s_{11}^2a_1b_1^T+s_{32}^2a_3b_3^T \end{equation}
ASSTB=[a1a2a3]
s110000s32
s110000s32
b1Tb2Tb3T
=s112a1b1T+s322a3b3T
- 我们定义从矩阵A的列向量中随机抽取到第 j 列的概率为
q
j
q_j
p
=
1
s
q
j
p=\frac{1}{\sqrt{sq_j}}
- 单次实验的期望值
E
1
(
x
)
E_1(x)
E
1
(
x
)
=
∑
j
=
1
n
q
j
1
s
q
j
a
j
⋅
1
s
q
j
b
j
T
=
∑
j
=
1
n
1
s
a
j
b
j
T
=
∑
j
=
1
n
a
j
b
j
T
s
=
A
B
s
\begin{equation} E_1(x)=\sum_{j=1}^nq_j\frac{1}{\sqrt{sq_j}}a_j \cdot\frac{1}{\sqrt{sq_j}}b^T_j=\sum_{j=1}^n\frac{1}{s}a_jb_j^T=\frac{\sum_{j=1}^n a_jb_j^T}{s}=\frac{AB}{s} \end{equation}
- s 次重复实验后的期望
E
s
(
x
)
E_s(x)
E
s
(
x
)
=
s
E
1
(
x
)
=
s
⋅
A
B
s
=
A
B
\begin{equation} E_s(x)=sE_1(x)=s\cdot \frac{AB}{s}=AB \end{equation}
是不是很完美,这样的话,我们就能够在随机采样中,虽然得到的每个矩阵都不是对的,但是组合起来的矩阵CR的期望值居然是AB - 单次实验的方差
D
1
(
x
)
D_1(x)
E
(
x
)
E(x)
E
(
x
2
)
E(x^2)
D
1
(
x
)
=
E
(
x
2
)
−
E
(
x
)
2
\begin{equation} D_1(x)=E(x^2)-E(x)^2 \end{equation}
- 我们知道单个列采样表示的秩为1的矩阵如下
X
j
=
[
a
j
s
q
j
,
b
j
T
s
q
j
]
→
X
j
2
=
[
a
j
2
s
q
j
,
(
b
j
T
)
2
s
q
j
]
\begin{equation} X_j=[\frac{a_j}{\sqrt{sq_j}},\frac{b_j^T}{\sqrt{sq_j}}]\rightarrow X_j^2=[\frac{a_j^2}{sq_j},\frac{{(b_j^T)}^2}{sq_j}] \end{equation}
- 那么可得
E
(
x
2
)
E(x^2)
E
(
x
2
)
=
∑
j
=
1
n
q
j
a
j
2
s
q
j
⋅
(
b
j
T
)
2
s
q
j
\begin{equation} E(x^2)=\sum_{j=1}^nq_j\frac{a_j^2}{sq_j}\cdot\frac{{(b_j^T)}^2}{sq_j} \end{equation}
- 根据
D
(
x
)
=
E
(
x
2
)
−
E
2
(
x
)
D(x)=E(x^2)-E^2(x)
D
(
x
)
=
∑
j
=
1
n
q
j
a
j
2
(
b
j
T
)
2
s
2
q
j
2
−
∣
∣
A
B
∣
∣
F
2
s
2
\begin{equation} D(x)=\sum_{j=1}^nq_j\frac{a_j^2{(b_j^T)}^2}{s^2q_j^2}-\frac{||AB||_F^2}{s^2} \end{equation}
- 那么
s
次重复实验的D
s
(
x
)
D_s(x)
D
s
(
x
)
=
s
D
(
x
)
=
s
∑
j
=
1
n
q
j
a
j
2
(
b
j
T
)
2
s
2
q
j
2
−
s
∣
∣
A
B
∣
∣
F
2
s
2
\begin{equation} D_s(x)=sD(x)=s\sum_{j=1}^nq_j\frac{a_j^2{(b_j^T)}^2}{s^2q_j^2}-s\frac{||AB||_F^2}{s^2} \end{equation}
D
s
(
x
)
=
∑
j
=
1
n
a
j
2
(
b
j
T
)
2
s
q
j
−
∣
∣
A
B
∣
∣
F
2
s
\begin{equation} D_s(x)=\sum_{j=1}^n\frac{a_j^2{(b_j^T)}^2}{sq_j}-\frac{||AB||_F^2}{s} \end{equation}
- 那当
q
j
q_j
∑
j
=
1
n
q
j
=
1
\sum_{j=1}^nq_j=1
D
s
(
x
)
D_s(x)
对于约束条件下求最小值,我们一般会用到拉格朗日乘子法!!!
4. 拉格朗日乘子法
根据条件
∑
j
=
1
n
q
j
=
1
\sum_{j=1}^nq_j=1
∑j=1nqj=1,求
arg
m
i
n
D
s
(
x
)
\arg\limits_{min}D_s(x)
minargDs(x),构建拉格朗日乘子方程如下
F
(
x
,
q
j
,
λ
)
=
∑
j
=
1
n
a
j
2
(
b
j
T
)
2
s
q
j
−
∣
∣
A
B
∣
∣
F
2
s
+
λ
(
∑
j
=
1
n
q
j
−
1
)
\begin{equation} F(x,q_j,\lambda)=\sum_{j=1}^n\frac{a_j^2{(b_j^T)}^2}{sq_j}-\frac{||AB||_F^2}{s}+\lambda(\sum_{j=1}^nq_j-1) \end{equation}
F(x,qj,λ)=j=1∑nsqjaj2(bjT)2−s∣∣AB∣∣F2+λ(j=1∑nqj−1)
- 求偏导可得
∂
F
(
x
,
q
j
,
λ
)
∂
q
j
=
−
a
j
2
(
b
j
T
)
2
s
q
j
2
+
λ
=
0
\begin{equation} \frac{\partial F(x,q_j,\lambda)}{\partial q_j}=-\frac{a_j^2{(b_j^T)}^2}{sq_j^2}+\lambda=0 \end{equation}
- 整理可得:
a
j
2
(
b
j
T
)
2
s
q
j
2
=
λ
→
q
j
=
a
j
b
j
T
s
λ
,
∑
j
=
1
n
q
j
=
1
\begin{equation} \frac{a_j^2{(b_j^T)}^2}{sq_j^2}=\lambda\rightarrow q_j=\frac{a_jb_j^T}{\sqrt{s\lambda}},\sum_{j=1}^nq_j=1 \end{equation}
- 代入可得:
∑
j
=
1
n
q
j
=
∑
j
=
1
n
a
j
b
j
T
s
λ
=
1
\begin{equation} \sum_{j=1}^nq_j=\frac{\sum_{j=1}^na_jb_j^T}{\sqrt{s\lambda}}=1 \end{equation}
- 整理可得:
s
λ
=
∑
j
=
1
n
a
j
b
j
T
=
A
B
\begin{equation} \sqrt{s\lambda}=\sum_{j=1}^na_jb_j^T=AB \end{equation}
- 代入公式18,可得:
q
j
=
a
j
b
j
T
s
λ
=
a
j
b
j
T
∑
j
=
1
n
a
j
b
j
T
\begin{equation} q_j=\frac{a_jb_j^T}{\sqrt{s\lambda}}=\frac{a_jb_j^T}{\sum_{j=1}^na_jb_j^T} \end{equation}
- 小结:
当我们的概率按照q
j
=
a
j
b
j
T
∑
j
=
1
n
a
j
b
j
T
q_j=\frac{a_jb_j^T}{\sum_{j=1}^na_jb_j^T}