用Python预测大学录取情况

73 阅读6分钟

大学教育正在成为二十一世纪社会和经济生活的一个重要支柱。它不仅在教育过程中至关重要,而且还能保证两件重要的事情:一份好工作和经济稳定。另一方面,预测大学入学可能是极具挑战性的,因为学生们不知道入学标准。

因此,在本教程中,我们将使用Python编程语言建立自己的大学入学预测模型。


数据集介绍

在申请国外的硕士学位时,有几个变量需要考虑。你必须有一个像样的GRE成绩,一份sop(目的陈述),或一封推荐信,以及其他东西。如果你不是来自英语国家,你还需要提交一份托福成绩。

你可以 在这里 访问该数据集。该数据集包括以下属性:

  1. GRE分数(满分340分)
  2. 托福成绩 ( 满分120分 )
  3. 大学评级(满分5分)
  4. 目的声明和推荐信的强度 ( 满分5分 )
  5. 本科GPA ( 满分10分 )
  6. 研究经历 ( 0或1不等 )
  7. 录取几率 ( 从0到1 )

用Python实现大学录取保护

我们将把整个代码的实现分为以下几个步骤。

第1步:导入必要的模块/库

import numpy as np 
import pandas as pd
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense ,Dropout,BatchNormalization
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor

第2步:将数据集加载到程序中

df = pd.read_csv('Admission_Predict.csv')
df.head()

First5rows Of University Adm Pred Dataset

大学录取预测数据集的第一行

第3步:数据预处理和数据拆分

在建立我们的主模型之前,我们需要进行一些预处理,包括删除模型中不需要的任何列。

在这里,"序列号 "这一栏对入院预测没有必要,所以我们把它从数据中删除。

df=df.drop("Serial No.",axis=1)

在这之后,我们将把数据集分为X和Y两个子数据集,其中X将有所有的信息,Y将包括最终的概率。

Y=np.array(df[df.columns[-1]])
X=np.array(df.drop(df.columns[-1],axis=1))

现在,下一步是使用80:20训练测试分割规则将数据集分割成训练和测试数据集,其中80%的数据用于训练,其余20%用于测试。

X_train, X_test, y_train, y_test = train_test_split(X,Y, test_size=0.2, random_state=0)

预处理还将涉及训练数据集的标准化,这可以通过下面提到的代码实现。

from sklearn.preprocessing import MinMaxScaler
scaler =  MinMaxScaler()
X_train=scaler.fit_transform(X_train)
X_test=scaler.fit_transform(X_test)

第3步:建立模型

下面提到的代码是描述整个模型的主要函数,涉及模型的声明和向模型添加层。

该函数还涉及模型的编译和损失的计算。

def baseline_model():
    model = Sequential()
    model.add(Dense(16, input_dim=7, activation='relu'))
    model.add(Dense(16, input_dim=7, activation='relu'))
    model.add(Dense(16, input_dim=7, activation='relu'))
    model.add(Dense(16, input_dim=7, activation='relu'))
    model.add(Dense(1))    
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

第四步:模型的训练

下一步是创建模型对象并在训练数据集上进行训练,如下面的代码所述。你可以根据你自己的喜好来保持epochs的数量。

estimator = KerasRegressor(build_fn=baseline_model, epochs=50, batch_size=3, verbose=1)
estimator.fit(X_train,y_train)

训练的输出如下:

Epoch 1/50
107/107 [==============================] - 1s 3ms/step - loss: 0.1087
Epoch 2/50
107/107 [==============================] - 0s 4ms/step - loss: 0.0065
Epoch 3/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0057
Epoch 4/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0052
Epoch 5/50
107/107 [==============================] - 0s 4ms/step - loss: 0.0049
Epoch 6/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0050
Epoch 7/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0047
Epoch 8/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0049
Epoch 9/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0044
Epoch 10/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0043
Epoch 11/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0044
Epoch 12/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0044
Epoch 13/50
107/107 [==============================] - 0s 4ms/step - loss: 0.0043
Epoch 14/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0041
Epoch 15/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0043
Epoch 16/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0042
Epoch 17/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0040
Epoch 18/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0043
Epoch 19/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0039
Epoch 20/50
107/107 [==============================] - 0s 4ms/step - loss: 0.0040
Epoch 21/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0039
Epoch 22/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0042
Epoch 23/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0040
Epoch 24/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0038
Epoch 25/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0042
Epoch 26/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0038
Epoch 27/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0040
Epoch 28/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0042
Epoch 29/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0039
Epoch 30/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0037
Epoch 31/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0038
Epoch 32/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0043
Epoch 33/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0040
Epoch 34/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0037
Epoch 35/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0039
Epoch 36/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0037
Epoch 37/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0038
Epoch 38/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0036
Epoch 39/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0036
Epoch 40/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0036
Epoch 41/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0037
Epoch 42/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0037
Epoch 43/50
107/107 [==============================] - 0s 4ms/step - loss: 0.0036
Epoch 44/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0037
Epoch 45/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0037
Epoch 46/50
107/107 [==============================] - 0s 4ms/step - loss: 0.0038
Epoch 47/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0036
Epoch 48/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0037
Epoch 49/50
107/107 [==============================] - 0s 4ms/step - loss: 0.0037
Epoch 50/50
107/107 [==============================] - 0s 3ms/step - loss: 0.0034
<keras.callbacks.History at 0x7f10c0173e10>
[19]
0s

第5步:测试模型

现在,让我们尝试预测测试数据集的值,并将其与原始值相匹配。

prediction = estimator.predict(X_test)
print("ORIGINAL DATA")
print(y_test)
print()
print("PREDICTED DATA")
print(prediction)

输出看起来有点像这样:

ORIGINAL DATA
[0.71 0.7  0.79 0.73 0.72 0.48 0.77 0.71 0.9  0.94 0.58 0.89 0.72 0.57
 0.78 0.42 0.64 0.84 0.63 0.72 0.9  0.83 0.57 0.47 0.85 0.67 0.44 0.54
 0.92 0.62 0.68 0.73 0.73 0.61 0.55 0.74 0.64 0.89 0.73 0.95 0.71 0.72
 0.75 0.76 0.86 0.7  0.39 0.79 0.61 0.64 0.71 0.8  0.61 0.89 0.68 0.79
 0.78 0.52 0.76 0.88 0.74 0.49 0.65 0.59 0.87 0.89 0.81 0.9  0.8  0.76
 0.68 0.87 0.68 0.64 0.91 0.61 0.69 0.62 0.93 0.43]

PREDICTED DATA
[0.64663166 0.6811929  0.77187485 0.59903866 0.70518774 0.5707331
 0.6844891  0.6232987  0.8559068  0.9225058  0.50917023 0.9055291
 0.6913604  0.40199894 0.8595592  0.6155516  0.5891675  0.793468
 0.5415057  0.7054745  0.8786436  0.8063141  0.55548865 0.3587063
 0.77944946 0.5391258  0.43374807 0.62050253 0.90883577 0.6109837
 0.64160395 0.7341113  0.73316455 0.5032365  0.7664028  0.76009744
 0.59858805 0.86267006 0.60282356 0.94984144 0.7196544  0.63529354
 0.7032968  0.8164513  0.8044792  0.6359613  0.54865533 0.6914524
 0.589018   0.55952907 0.6446153  0.77345765 0.6449453  0.8998446
 0.68746895 0.74362046 0.71107167 0.73258513 0.7594558  0.8374823
 0.7504637  0.4027493  0.61975926 0.46762955 0.8579673  0.814696
 0.7111042  0.8707262  0.7539967  0.7515583  0.5506843  0.8436626
 0.8139006  0.5593421  0.933276   0.61958474 0.6084135  0.63294107
 0.9234169  0.44476634]

你可以看到,这些值在某种程度上是匹配的。但我们要确保,我们也要计算平均误差。

第6步:计算平均误差

from sklearn.metrics import accuracy_score

train_error =  np.abs(y_test - prediction)
mean_error = np.mean(train_error)

print("Mean Error: ",mean_error)

平均误差为 0.0577927375137806 这足以说明我们的结果是相当准确的。


结论

祝贺你!你刚刚学会了如何制作你自己的大学录取预测器。希望你喜欢它!😇

谢谢您抽出时间!希望你能学到新的东西!!😄