svd()函数获取矩阵主成分
X_centered = X - X.mean(axis=0)
U, s, V = np.linalg.svd(X_centered)
c1 = V.T[:, 0]
c2 = V.T[:, 1]
在获取主成分之前, 应当将训练集集中(第一行代码).
得到的V即为主成分矩阵, c1和c2分别对应第一, 第二主成分.
问题: 为什么要调用T()方法对主成分矩阵进行转置???
import numpy as np
X = np.random.random([6, 6])
X_centered = X - X.mean(axis=0)
U, s, V = np.linalg.svd(X_centered)
c1 = V.T[:, 0]
c2 = V.T[:, 1]
print("原始矩阵(训练集)")
print(X)
print("="*8 + "集中后的训练集")
print(X_centered)
print("="*8 + "主成分矩阵V")
print(V)
print("="*8 + "未转置的主成分1")
print(V[:, 0])
print("="*8 + "转置后的主成分1")
print(c1)
print("="*8 + "转置后的主成分2")
print(c2)
上述代码输出:
原始矩阵(训练集)
[[0.48925597 0.10007899 0.24868701 0.24582956 0.01189903 0.92280551]
[0.58452584 0.08522259 0.4727072 0.16183505 0.37137143 0.28484798]
[0.83314739 0.6253953 0.5853588 0.943809 0.39430325 0.73108571]
[0.26039373 0.52392961 0.29923633 0.99798253 0.39681654 0.76359518]
[0.34860457 0.61275214 0.89554728 0.03253871 0.41150234 0.92642534]
[0.61715827 0.18374784 0.02390839 0.69360349 0.40653224 0.86420085]]
========集中后的训练集
[[-0.03292499 -0.25510875 -0.17222049 -0.26677016 -0.32017178 0.17397875]
[ 0.06234488 -0.26996515 0.0517997 -0.35076467 0.03930063 -0.46397878]
[ 0.31096643 0.27020755 0.1644513 0.43120928 0.06223245 -0.01774105]
[-0.26178723 0.16874187 -0.12167117 0.48538281 0.06474574 0.01476842]
[-0.17357639 0.2575644 0.47463978 -0.48006101 0.07943153 0.17759858]
[ 0.09497731 -0.17143991 -0.39699912 0.18100377 0.07446144 0.11537409]]
========主成分矩阵V
[[ 0.10367429 0.17534132 -0.33069463 0.91277858 0.09734234 0.08067125]
[-0.07961523 0.66591982 0.68886836 0.09517681 0.21607987 0.14115045]
[-0.42666386 0.08883363 -0.20928781 -0.08933653 -0.26604371 0.82915914]
[ 0.87535833 0.0054328 0.10825683 -0.06744275 -0.26173744 0.38593229]
[-0.15981178 0.00365394 0.27606895 0.23488069 -0.87780784 -0.26929025]
[ 0.09528378 0.71963319 -0.53326021 -0.30013023 -0.18439526 -0.25417092]]
========未转置的主成分1
[ 0.10367429 -0.07961523 -0.42666386 0.87535833 -0.15981178 0.09528378]
========转置后的主成分1
[ 0.10367429 0.17534132 -0.33069463 0.91277858 0.09734234 0.08067125]
========转置后的主成分2
[-0.07961523 0.66591982 0.68886836 0.09517681 0.21607987 0.14115045]
可以看出, 一行为一个主成分, 如果不转置, 实际上得到的是一列(非主成分向量).
实际上, c1 = V.T[:, 0]等价于c1 = V[0, :].
证明: 矩阵截取实验
import numpy as np
x = np.random.random([3, 3])
print(x)
print("="*8 + "\nx[:, 0]:")
print(x[:, 0])
print("="*8 + "\nx[0, :]:")
print(x[0, :])
print("="*8 + "\nx.T[:, 0]:")
print(x.T[:, 0])
输出:
[[0.75288091 0.75186556 0.70652022]
[0.71672155 0.36091306 0.1308125 ]
[0.71698569 0.95277093 0.25093763]]
========
x[:, 0]:
[0.75288091 0.71672155 0.71698569]
========
x[0, :]:
[0.75288091 0.75186556 0.70652022]
========
x.T[:, 0]:
[0.75288091 0.75186556 0.70652022]