机器学习：Sklearn.Preprocessing数据标准化为了消除指标之间的量纲影响，需要进行数据标准化处理。不同

为了消除指标之间的量纲影响，需要进行数据标准化处理。
不同评价指标往往具有不同的量纲和量纲单位，当各指标间的水平相差很大时，如果直接用原始指标值进行分析，就会突出数值较高的指标在综合分析中的作用，相对削弱数值水平较低指标的作用。

涉及公式：

均值： x_mean=(x1+x2+...+xn)/n

方差： s^2=((x1-x_mean)^2+(x2-x_mean)^2+...+(xn-x_mean)^2)/n

p-范数：||x||_p=(|x1|^p+|x2|^p+...+|xn|^p)^(1/p)

预处理的方法主要有：标准差标准化(Standardization)、线性归一化(MinMaxScaler)、正则化(Normalization)、二值化(Binarization)。

(1)标准差标准化(Standardization)
结果是每个属性的所有数据都聚集在0附近，方差为1，适用于属性的最大值、最小值未知，或有利群数据的情况。在分类、聚类算法中，要使用距离度量或使用PCA降维的时候使用Z-score较好。
公式：(x-x_min)/x_std
方法一：preprocessing.scale()

sklearn.preprocessing.scale(x,axis=0,with_mean=True,with_std=True,copy=True)
#x—数组或矩阵   
#aixs—计算mean和std的样本   
#with_mean—boolean类型，默认True，将均值规范到0    
#with_std—boolean类型，默认True，将方差规范到1

方法二：preprocessing.StandardScaler()

from sklearn import preprocessing
scaler=preprocessing.StandardScaler()
x_scaled=scaler.fit_transform(x)

(2)线性归一化(MinMaxScaler)
也叫离差标准化，结果映射到[0-1]之间，目的在于对方差很小的属性可以增强稳定性，维持稀疏矩阵中为0的条目。
公式：(x-x_min)/(x_max-x_min)
如果想映射到[-1,1]，则调整公式(x-x_mean)/(x_max-x_min)

from sklearn import preprocessing
min_max_scaler=preprocessing.MinMaxScaler()
x_minmax=min_max_scaler.fit_transform(x)

(3)正则化(Normalization)
将每个样本缩放到单位范数(每个样本的范数为1)。该方法在要使用如二次型(点积)或者其它核方法计算两个样本之间的相似性时很有用，也是文本分类和聚类分析中经常使用的向量空间模型（Vector Space Model)的基础。
方法一：preprocessing.normalize()函数

from sklearn import preprocessing
x_normalized=preprocessing.normalize(x,norm='l2')

方法二：preprocessing.StandardScaler类

normalizer = preprocessing.Normalizer().fit(x) 
#Normalizer(copy=True, norm='l2')
normalizer.transform(x)

(4)二值化(Binarization)
概述：主要是为了将数据特征转变成boolean变量。

binarizer=preprocessing.Binarizer().fit(x)
#Binarizer(copy=True, threshold=0.0)
binarizer.transform(x)
#threshold——可以设定阈值，数值大于阈值的为1，小于阈值的为0。