# 用Python进行机器学习（附代码、学习资源）

“我们能不能画出数据图形直接得到结论？”

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression

https://towardsdatascience.com/simple-and-multiple-linear-regression-in-python-c928425168f9?gi=69160943145f

``````from sklearn.cross_validation#引入函数进行训练集和测试集的划分

import train_test_split

#引入函数自动生成多项式特征

from sklearn.preprocessing import PolynomialFeatures

# 引入线性回归和一个正则化的回归函数

from sklearn.linear_model import LinearRegression

from sklearn.linear_model import LassoCV

from sklearn.pipeline import make_pipeline复制代码``````

Scikit-learn提供了一个从一组线性特征中生成多项式特征的方法。你需要做的就是传入线性特征列表，并指定希望生成的多项式项的最大阶数。它还可以让你选择是生成所有交叉耦合项还是只生成主要特征的阶数。这里有一个Python代码进行演示。

http://scikit-learn.org/stable/auto_examples/linear_model/plot_polynomial_interpolation.html#sphx-glr-auto-examples-linear-model-plot-polynomial-interpolation-py

https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-ridge-lasso-regression-python/

https://www.quora.com/What-is-a-pipeline-and-baseline-in-machine-learning-algorithms

https://medium.com/@yanhann10/a-brief-view-of-machine-learning-pipeline-in-python-5f50b941fca8

https://www.oreilly.com/ideas/building-and-deploying-large-scale-machine-learning-pipelines

Scikit-learn提供了一个流水线功能，可以将多个模型和数据预处理类组合在一起，把原始数据转换为可用模型。

http://scikit-learn.org/stable/tutorial

/statistical_inference/putting_together.html

``````# LASSO回归的参数设置

lasso_eps = 0.0001

lasso_nalpha=20

lasso_iter=5000

# 多项式特征项的最大、最小阶数

degree_min = 2

degree_max = 8

# 训练集、测试集划分

X_train, X_test, y_train, y_test =

train_test_split(df['X'], df['y'],test_size=test_set_fraction)

# 建立一个流水线模型

for degree in range(degree_min,degree_max+1):

model=make_pipeline(PolynomialFeatures(degree, interaction_only=False),LassoCV(eps=lasso_eps,

n_alphas=lasso_nalpha,max_iter=lasso_iter,

normalize=True,cv=5))

model.fit(X_train,y_train)

test_pred = np.array(model.predict(X_test))

RMSE=np.sqrt(np.sum(np.square(test_pred-y_test)))

test_score = model.score(X_test,y_test)复制代码``````

https://dataorigami.net/blogs/napkin-folding/79033923-least-squares-regression-with-l1-penalty

（这里作者有提到自己的代码，同样因为GitHub地址失效没有加进来）