基于PaddlePaddle的工业蒸汽预测

784 阅读5分钟

一、工业蒸汽量预测

1.赛题简介

天池新人实战赛是针对数据新人开设的实战练习专场,以经典赛题作为学习场景,提供详尽入门教程,手把手教你学习数据挖掘。天池希望新人赛能成为高校备受热捧的数据实战课程,帮助更多学生掌握数据技能。

595A8E063C1B47CC81D90DE8B8E3F0CA.png

2.赛题背景

火力发电的基本原理是:燃料在燃烧时加热水生成蒸汽,蒸汽压力推动汽轮机旋转,然后汽轮机带动发电机旋转,产生电能。在这一系列的能量转化中,影响发电效率的核心是锅炉的燃烧效率,即燃料燃烧加热水产生高温高压蒸汽。锅炉的燃烧效率的影响因素很多,包括锅炉的可调参数,如燃烧给量,一二次风,引风,返料风,给水水量;以及锅炉的工况,比如锅炉床温、床压,炉膛温度、压力,过热器的温度等。

3.赛题描述

经脱敏后的锅炉传感器采集的数据(采集频率是分钟级别),根据锅炉的工况,预测产生的蒸汽量。 数据说明

数据分成训练数据(train.txt)和测试数据(test.txt),其中字段”V0”-“V37”,这38个字段是作为特征变量,”target”作为目标变量。选手利用训练数据训练出模型,预测测试数据的目标变量,排名结果依据预测结果的MSE(mean square error)。

二、数据处理

1.数据读取

import pandas as pd

df=pd.read_csv("data/data178496/zhengqi_train.txt",sep='\t')
df.head()
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
V0 V1 V2 V3 V4 V5 V6 V7 V8 V9 ... V29 V30 V31 V32 V33 V34 V35 V36 V37 target
0 0.566 0.016 -0.143 0.407 0.452 -0.901 -1.812 -2.360 -0.436 -2.114 ... 0.136 0.109 -0.615 0.327 -4.627 -4.789 -5.101 -2.608 -3.508 0.175
1 0.968 0.437 0.066 0.566 0.194 -0.893 -1.566 -2.360 0.332 -2.114 ... -0.128 0.124 0.032 0.600 -0.843 0.160 0.364 -0.335 -0.730 0.676
2 1.013 0.568 0.235 0.370 0.112 -0.797 -1.367 -2.360 0.396 -2.114 ... -0.009 0.361 0.277 -0.116 -0.843 0.160 0.364 0.765 -0.589 0.633
3 0.733 0.368 0.283 0.165 0.599 -0.679 -1.200 -2.086 0.403 -2.114 ... 0.015 0.417 0.279 0.603 -0.843 -0.065 0.364 0.333 -0.112 0.206
4 0.684 0.638 0.260 0.209 0.337 -0.454 -1.073 -2.086 0.314 -2.114 ... 0.183 1.078 0.328 0.418 -0.843 -0.215 0.364 -0.280 -0.028 0.384

5 rows × 39 columns

df.isnull().sum()
V0        0
V1        0
V2        0
V3        0
V4        0
V5        0
V6        0
V7        0
V8        0
V9        0
V10       0
V11       0
V12       0
V13       0
V14       0
V15       0
V16       0
V17       0
V18       0
V19       0
V20       0
V21       0
V22       0
V23       0
V24       0
V25       0
V26       0
V27       0
V28       0
V29       0
V30       0
V31       0
V32       0
V33       0
V34       0
V35       0
V36       0
V37       0
target    0
dtype: int64
import pandas as pd

df_test=pd.read_csv("data/data178496/zhengqi_test.txt",sep='\t')
df_merge=df.append(df_test)
df_merge.head()
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
V0 V1 V2 V3 V4 V5 V6 V7 V8 V9 ... V29 V30 V31 V32 V33 V34 V35 V36 V37 target
0 0.566 0.016 -0.143 0.407 0.452 -0.901 -1.812 -2.360 -0.436 -2.114 ... 0.136 0.109 -0.615 0.327 -4.627 -4.789 -5.101 -2.608 -3.508 0.175
1 0.968 0.437 0.066 0.566 0.194 -0.893 -1.566 -2.360 0.332 -2.114 ... -0.128 0.124 0.032 0.600 -0.843 0.160 0.364 -0.335 -0.730 0.676
2 1.013 0.568 0.235 0.370 0.112 -0.797 -1.367 -2.360 0.396 -2.114 ... -0.009 0.361 0.277 -0.116 -0.843 0.160 0.364 0.765 -0.589 0.633
3 0.733 0.368 0.283 0.165 0.599 -0.679 -1.200 -2.086 0.403 -2.114 ... 0.015 0.417 0.279 0.603 -0.843 -0.065 0.364 0.333 -0.112 0.206
4 0.684 0.638 0.260 0.209 0.337 -0.454 -1.073 -2.086 0.314 -2.114 ... 0.183 1.078 0.328 0.418 -0.843 -0.215 0.364 -0.280 -0.028 0.384

5 rows × 39 columns

2.数据归一化

columns = df_merge.columns
print(columns)
for column in columns:
    col = df_merge[column]
    col_min = col.min()
    col_max = col.max()
    normalized = (col - col_min) / (col_max - col_min)
    df_merge[column] = normalized
Index(['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10',       'V11', 'V12', 'V13', 'V14', 'V15', 'V16', 'V17', 'V18', 'V19', 'V20',       'V21', 'V22', 'V23', 'V24', 'V25', 'V26', 'V27', 'V28', 'V29', 'V30',       'V31', 'V32', 'V33', 'V34', 'V35', 'V36', 'V37', 'target'],
      dtype='object')
df_merge.dropna(axis=0,inplace=True)
df_merge.shape
(2888, 39)

3.协相关

df_merge.corr()
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
V0 V1 V2 V3 V4 V5 V6 V7 V8 V9 ... V29 V30 V31 V32 V33 V34 V35 V36 V37 target
V0 1.000000 0.908607 0.463643 0.409576 0.781212 -0.327028 0.189267 0.141294 0.794013 0.077888 ... 0.302145 0.156968 0.675003 0.050951 0.056439 -0.019342 0.138933 0.231417 -0.494076 0.873212
V1 0.908607 1.000000 0.506514 0.383924 0.657790 -0.227289 0.276805 0.205023 0.874650 0.138849 ... 0.147096 0.175997 0.769745 0.085604 0.035129 -0.029115 0.146329 0.235299 -0.494043 0.871846
V2 0.463643 0.506514 1.000000 0.410148 0.057697 -0.322417 0.615938 0.477114 0.703431 0.047874 ... -0.275764 0.175943 0.653764 0.033942 0.050309 -0.025620 0.043648 0.316462 -0.734956 0.638878
V3 0.409576 0.383924 0.410148 1.000000 0.315046 -0.206307 0.233896 0.197836 0.411946 -0.063717 ... 0.117610 0.043966 0.421954 -0.092423 -0.007159 -0.031898 0.080034 0.324475 -0.229613 0.512074
V4 0.781212 0.657790 0.057697 0.315046 1.000000 -0.233959 -0.117529 -0.052370 0.449542 -0.031816 ... 0.659093 0.022807 0.447016 -0.026186 0.062367 0.028659 0.100010 0.113609 -0.031054 0.603984
V5 -0.327028 -0.227289 -0.322417 -0.206307 -0.233959 1.000000 -0.028995 0.081069 -0.182281 0.038810 ... -0.175836 -0.074214 -0.121290 -0.061886 -0.132727 -0.105801 -0.075191 0.026596 0.404799 -0.314676
V6 0.189267 0.276805 0.615938 0.233896 -0.117529 -0.028995 1.000000 0.917502 0.468233 0.450096 ... -0.467980 0.188907 0.546535 0.144550 0.054210 -0.002914 0.044992 0.433804 -0.404817 0.370037
V7 0.141294 0.205023 0.477114 0.197836 -0.052370 0.081069 0.917502 1.000000 0.389987 0.446611 ... -0.311363 0.170113 0.475254 0.122707 0.034508 -0.019103 0.111166 0.340479 -0.292285 0.287815
V8 0.794013 0.874650 0.703431 0.411946 0.449542 -0.182281 0.468233 0.389987 1.000000 0.100672 ... -0.011091 0.150258 0.878072 0.038430 0.026843 -0.036297 0.179167 0.326586 -0.553121 0.831904
V9 0.077888 0.138849 0.047874 -0.063717 -0.031816 0.038810 0.450096 0.446611 0.100672 1.000000 ... -0.221623 0.293026 0.121712 0.289891 0.115655 0.094856 0.141703 0.129542 -0.112503 0.139704
V10 0.298443 0.310120 0.346006 0.321262 0.141129 0.054060 0.415660 0.310982 0.419703 0.120208 ... -0.105042 -0.036705 0.560213 -0.093213 0.016739 -0.026994 0.026846 0.922190 -0.045851 0.394767
V11 -0.295420 -0.197317 -0.256407 -0.100489 -0.162507 0.863890 -0.147990 -0.064402 -0.146689 -0.114374 ... -0.084938 -0.153304 -0.084298 -0.153126 -0.095359 -0.053865 -0.032951 0.003413 0.459867 -0.263988
V12 0.751830 0.656186 0.059941 0.306397 0.927685 -0.306672 -0.087312 -0.036791 0.420557 -0.011889 ... 0.666775 0.028866 0.441963 -0.007658 0.046674 0.010122 0.081963 0.112150 -0.054827 0.594189
V13 0.185144 0.157518 0.204762 -0.003636 0.075993 -0.414517 0.138367 0.110973 0.153299 -0.040705 ... 0.008235 0.027328 0.113743 0.130598 0.157513 0.116944 0.219906 -0.024751 -0.379714 0.203373
V14 -0.004144 -0.006268 -0.106282 -0.232677 0.023853 -0.015671 0.072911 0.163931 0.008138 0.118176 ... 0.056814 -0.004057 0.010989 0.106581 0.073535 0.043218 0.233523 -0.086217 0.010553 0.008424
V15 0.314520 0.164702 -0.224573 0.143457 0.615704 -0.195037 -0.431542 -0.291272 0.018366 -0.199159 ... 0.951314 -0.111311 0.011768 -0.104618 0.050254 0.048602 0.100817 -0.051861 0.245635 0.154020
V16 0.347357 0.435606 0.782474 0.394517 0.023818 -0.044543 0.847119 0.752683 0.680031 0.193681 ... -0.342210 0.154794 0.778538 0.041474 0.028878 -0.054775 0.082293 0.551880 -0.420053 0.536748
V17 0.044722 0.072619 -0.019008 0.123900 0.044803 0.348211 0.134715 0.239448 0.112053 0.167310 ... 0.004855 -0.010787 0.150118 -0.051377 -0.055996 -0.064533 0.072320 0.312751 0.045842 0.104605
V18 0.148622 0.123862 0.132105 0.022868 0.136022 -0.190197 0.110570 0.098691 0.093682 0.260079 ... 0.053958 0.470341 0.079718 0.411967 0.512139 0.365410 0.152088 0.019603 -0.181937 0.170721
V19 -0.100294 -0.092673 -0.161802 -0.246008 -0.205729 0.171611 0.215290 0.158371 -0.144693 0.358149 ... -0.205409 0.100133 -0.131542 0.144018 -0.021517 -0.079753 -0.220737 0.087605 0.012115 -0.114976
V20 0.462493 0.459795 0.298385 0.289594 0.291309 -0.073232 0.136091 0.089399 0.412868 0.116111 ... 0.016233 0.086165 0.326863 0.050699 0.009358 -0.000979 0.048981 0.161315 -0.322006 0.444965
V21 -0.029285 -0.012911 -0.030932 0.114373 0.174025 0.115553 -0.051806 -0.065300 -0.047839 -0.018681 ... 0.157097 -0.077945 0.053025 -0.159128 -0.087561 -0.053707 -0.199398 0.047340 0.315470 -0.010063
V22 -0.105643 -0.102421 -0.212023 -0.291236 -0.028534 0.146545 -0.068158 0.077358 -0.097908 0.098401 ... 0.053349 -0.039953 -0.108088 0.057179 -0.019107 -0.002095 0.205423 -0.130607 0.099282 -0.107813
V23 0.231136 0.222574 0.065509 0.081374 0.196530 -0.158441 0.069901 0.125180 0.174124 0.380050 ... 0.116122 0.363963 0.129783 0.367086 0.183666 0.196681 0.635252 -0.035949 -0.187582 0.226331
V24 -0.324959 -0.233556 0.010225 -0.237326 -0.529866 0.275480 0.072418 -0.030292 -0.136898 -0.008549 ... -0.642370 0.033532 -0.202097 0.060608 -0.134320 -0.095588 -0.243738 -0.041325 -0.137614 -0.264815
V25 -0.200706 -0.070627 0.481785 -0.100569 -0.444375 0.045551 0.438610 0.316744 0.173320 0.078928 ... -0.575154 0.088238 0.201243 0.065501 -0.013312 -0.030747 -0.093948 0.069302 -0.246742 -0.019373
V26 -0.125140 -0.043012 0.035370 -0.027685 -0.080487 0.294934 0.106055 0.160566 0.015724 0.128494 ... -0.133694 -0.057247 0.062879 -0.004545 -0.034596 0.051294 0.085576 0.064963 0.010880 -0.046724
V27 0.733198 0.824198 0.726250 0.392006 0.412083 -0.218495 0.474441 0.424185 0.901100 0.114315 ... -0.032772 0.208074 0.790239 0.095127 0.030135 -0.036123 0.159884 0.226713 -0.617771 0.812585
V28 0.035119 0.077346 0.229575 0.159039 -0.044620 -0.042210 0.093427 0.058800 0.122050 -0.064595 ... -0.154572 0.054546 0.123403 0.013142 -0.024866 -0.058462 -0.080237 0.061601 -0.149326 0.100080
V29 0.302145 0.147096 -0.275764 0.117610 0.659093 -0.175836 -0.467980 -0.311363 -0.011091 -0.221623 ... 1.000000 -0.122817 -0.004364 -0.110699 0.035272 0.035392 0.078588 -0.099309 0.285581 0.123329
V30 0.156968 0.175997 0.175943 0.043966 0.022807 -0.074214 0.188907 0.170113 0.150258 0.293026 ... -0.122817 1.000000 0.114318 0.695725 0.083693 -0.028573 -0.027987 0.006961 -0.256814 0.187311
V31 0.675003 0.769745 0.653764 0.421954 0.447016 -0.121290 0.546535 0.475254 0.878072 0.121712 ... -0.004364 0.114318 1.000000 0.016782 0.016733 -0.047273 0.152314 0.510851 -0.357785 0.750297
V32 0.050951 0.085604 0.033942 -0.092423 -0.026186 -0.061886 0.144550 0.122707 0.038430 0.289891 ... -0.110699 0.695725 0.016782 1.000000 0.105255 0.069300 0.016901 -0.054411 -0.162417 0.066606
V33 0.056439 0.035129 0.050309 -0.007159 0.062367 -0.132727 0.054210 0.034508 0.026843 0.115655 ... 0.035272 0.083693 0.016733 0.105255 1.000000 0.719126 0.167597 0.031586 -0.062715 0.077273
V34 -0.019342 -0.029115 -0.025620 -0.031898 0.028659 -0.105801 -0.002914 -0.019103 -0.036297 0.094856 ... 0.035392 -0.028573 -0.047273 0.069300 0.719126 1.000000 0.233616 -0.019032 -0.006854 -0.006034
V35 0.138933 0.146329 0.043648 0.080034 0.100010 -0.075191 0.044992 0.111166 0.179167 0.141703 ... 0.078588 -0.027987 0.152314 0.016901 0.167597 0.233616 1.000000 0.025401 -0.077991 0.140294
V36 0.231417 0.235299 0.316462 0.324475 0.113609 0.026596 0.433804 0.340479 0.326586 0.129542 ... -0.099309 0.006961 0.510851 -0.054411 0.031586 -0.019032 0.025401 1.000000 -0.039478 0.319309
V37 -0.494076 -0.494043 -0.734956 -0.229613 -0.031054 0.404799 -0.404817 -0.292285 -0.553121 -0.112503 ... 0.285581 -0.256814 -0.357785 -0.162417 -0.062715 -0.006854 -0.077991 -0.039478 1.000000 -0.565795
target 0.873212 0.871846 0.638878 0.512074 0.603984 -0.314676 0.370037 0.287815 0.831904 0.139704 ... 0.123329 0.187311 0.750297 0.066606 0.077273 -0.006034 0.140294 0.319309 -0.565795 1.000000

39 rows × 39 columns

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style('whitegrid')

# 热力图
plt.figure(figsize=(20,12))
sns.heatmap(df_merge.corr(), annot=True)

output_13_1.png

4.数据集划分

df=df_merge.iloc[:df.shape[0],:]
df_test=df_merge.iloc[df.shape[0]:,:]
df.shape
(2888, 39)
from sklearn.model_selection import train_test_split,cross_val_score

train,test=train_test_split(df,test_size=0.25,random_state=2023)

三、模型构建

搭建全连接神经网络

import paddle
import paddle.nn as nn

# 定义动态图
class Net(paddle.nn.Layer):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = paddle.nn.Linear(38, 1000)
        self.fc2 = paddle.nn.Linear(1000, 100)
        self.fc3 = paddle.nn.Linear(100, 50)
        self.fc4 = paddle.nn.Linear(50, 1)
    
    # 网络的前向计算函数
    def forward(self, inputs):
        y = self.fc1(inputs)
        y = self.fc2(y)
        y = self.fc3(y)
        pred = self.fc4(y)
        return pred

四、模型训练

搭建全连接神经网络

model=Net()
loss_func = paddle.nn.CrossEntropyLoss()
#优化器
opt = paddle.optimizer.Adam(learning_rate=0.1,parameters= model.parameters())
import paddle.nn.functional as F

EPOCH_NUM = 1000   # 设置外层循环次数
BATCH_SIZE = 256  # 设置batch大小
import numpy as np
# 定义外层循环
for epoch_id in range(EPOCH_NUM):
    # 在每轮迭代开始之前,将训练数据的顺序随机的打乱
    train.sample(frac=1)
    # 将训练数据进行拆分,每个batch包含10条数据
    mini_batches = [train[k:k+BATCH_SIZE] for k in range(0, len(train), BATCH_SIZE)]
    # 定义内层循环
    for iter_id, mini_batch in enumerate(mini_batches):
        x = np.array(mini_batch.iloc[:, :-1]) # 获得当前批次训练数据
        y = np.array(mini_batch.iloc[:, -1:])# 获得当前批次训练标签
        # 将numpy数据转为飞桨动态图tensor的格式
        features = paddle.to_tensor(x,dtype='float32')
        y = paddle.to_tensor(y,dtype='float32') 
        # 前向计算
        predicts = model(features)
        # 计算损失
        loss = F.square_error_cost(predicts, label=y)
        avg_loss = paddle.mean(loss)
        if iter_id%20==0:
            print("epoch: {}, iter: {}, loss is: {}".format(epoch_id, iter_id, avg_loss.numpy()))
        
        # 反向传播,计算每层参数的梯度值
        avg_loss.backward()
        # 更新参数,根据设置好的学习率迭代一步
        opt.step()
        # 清空梯度变量,以备下一轮计算
        opt.clear_grad()
# 保存模型参数,文件名为LR_model.pdparams
paddle.save(model.state_dict(), 'LR_model.pdparams')
print("模型保存成功,模型参数保存在LR_model.pdparams中")
模型保存成功,模型参数保存在LR_model.pdparams中

五、模型预测

target=df_merge['target']
target_max=target.max()
target_min=target.min()
# 参数为保存模型参数的文件地址
model_dict = paddle.load('LR_model.pdparams')
model.load_dict(model_dict)
model.eval()

# 参数为数据集的文件地址
one_data = np.array(test.iloc[:, :-1]) # 获得当前批次训练数据
label = np.array(test.iloc[:, -1:])# 获得当前批次训练标签
# 将数据转为动态图的variable格式 
one_data = paddle.to_tensor(one_data,dtype='float32')
predict = model(one_data)
predict=predict.numpy()
# 对结果做反归一化处理
predict = predict* (target_max - target_min) + target_min
# 对label数据做反归一化处理
label = label * (target_max - target_min) + target_min

for i in range(10):
    print("Inference result is {}, the corresponding label is {}".format(predict[i], label[i]))
Inference result is [0.18284929], the corresponding label is [0.12307417]
Inference result is [0.3642674], the corresponding label is [0.46524543]
Inference result is [0.382918], the corresponding label is [0.46291652]
Inference result is [0.50540245], the corresponding label is [0.52185597]
Inference result is [0.63235503], the corresponding label is [0.62683626]
Inference result is [0.6009079], the corresponding label is [0.59620208]
Inference result is [0.6699288], the corresponding label is [0.74435686]
Inference result is [0.51734024], the corresponding label is [0.55499821]
Inference result is [0.4891268], the corresponding label is [0.59638123]
Inference result is [0.40018553], the corresponding label is [0.49390899]

通过比较“模型预测值”和“真实蒸汽量”可见,模型的预测效果与真实蒸汽量接近。蒸汽量预测仅是一个最简单的模型,使用飞桨编写均可事半功倍。那么对于工业实践中更复杂的模型,使用飞桨节约的成本是不可估量的。同时飞桨针对很多应用场景和机器资源做了性能优化,在功能和性能上远强于自行编写的模型。