🚀 我的环境：

语言环境：Python3.6.5
编译器：jupyter notebook
深度学习环境：TensorFlow2.4.1
数据和代码：📌【传送门】

🚀 来自专栏：《深度学习100例》

如果你是一名深度学习小白可以先看看我这个专门为你写的专栏：《小白入门深度学习》

一、GCN是什么

卷积神经网络（Convolutional Neural Network, CNN）：用于和图片打交道
图卷积神经网络（Graph Convolution Networks, GCN）：用于和结构化不规则的数据打交道，例如社交网络、知识图谱等。
在CNN中，我们输入的数据通常是图片；
在GCN中，我们输入的数据通常是这样的： [nodes,edges]，nodes为图中节点的集合，nodes为图中边的集合。

本案例将讲解如何利用GCN实现论文分类。

二、数据集-Cora Dataset

1. 数据集介绍

Cora Dataset一个机器学习论文分类数据集，它包含三个文件:

README: 对数据集的介绍;
cora.cites: 论文之间的引用关系图。文件中每行包含两个Paper ID，第一个ID是被引用的Paper ID；第二个是引用的Paper ID。格式如下:
cora.content: 包含了2708篇论文的信息，每个样本都是一篇科学论文。每一个样本（每行）的数据格式如下: id+word_attributes+label。
- id是论文的唯一标识；
- word_attributes是一个维度为1433的词向量，词向量的每个元素对应一个词，0表示该元素对应的词不在Paper中，1表示该元素对应的词在Paper中。
- class_label是论文的类别，每篇Paper被映射到如下7个分类之一: Case_Based、Genetic_Algorithms、Neural_Networks、Probabilistic_Methods、Reinforcement_Learning、Rule_Learning、Theory。

import pandas as pd
import numpy  as np

# 导入数据：分隔符为Tab
raw_data_content = pd.read_csv('data/cora/cora.content',sep = '\t',header = None)
raw_data_content.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	0	...	1426	1427	1434
0	31336	...	0	1	Neural_Networks
1	1061127	...	1	0	Rule_Learning
2	1106406	...	0	0	Reinforcement_Learning
3	13195	...	0	0	Reinforcement_Learning
4	37879	...	0	0	Probabilistic_Methods

5 rows × 1435 columns

raw_data_content.shape

(2708, 1435)

raw_data_cites = pd.read_csv('data/cora/cora.cites',sep = '\t',header = None)
raw_data_cites.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	0	1
0	35	1033
1	35	103482
2	35	103515
3	35	1050679
4	35	1103960

raw_data_content.shape

(2708, 1435)

2. 准备数据

# ================================================================
#   将raw_data_cites中的论文ID进行统一编号并转化为数组
#   如果这部分你难以理解，可以试试输出其中的变量
# ================================================================

# 给论文ID进行统一的编号，并将映射放入字典idx_map中
idx     = np.array(raw_data_content.iloc[:, 0], dtype=np.int32)
idx_map = {j: i for i, j in enumerate(idx)}

# 将数据放入edge_indexs数组当中
edge_indexs = np.array(list(map(idx_map.get, raw_data_cites.values.flatten())), dtype=np.int32)
edge_indexs = edge_indexs.reshape(raw_data_cites.shape)
edge_indexs

array([[ 163,  402],
       [ 163,  659],
       [ 163, 1696],
       ...,
       [1887, 2258],
       [1902, 1887],
       [ 837, 1686]])

features = raw_data_content.iloc[:,1:-1].astype(np.float32)

labels   = pd.get_dummies(raw_data_content.iloc[:, -1])

import scipy.sparse as sp

def normalize_adj(adjacency):
    """计算 L=D^-0.5 * (A+I) * D^-0.5"""
    adjacency += sp.eye(adjacency.shape[0])    # 增加自连接
    degree     = np.array(adjacency.sum(1))
    d_hat      = sp.diags(np.power(degree, -0.5).flatten())

    return d_hat.dot(adjacency).dot(d_hat).tocsr().todense()

"""
这里生成的是(2708, 2708)值全部为0的矩阵，
关于 sp.coo_matrix 函数的解释请看：https://blog.csdn.net/qq_38251616/article/details/120361746
"""
adjacency = sp.coo_matrix((np.ones(len(edge_indexs)),(edge_indexs[:, 0], edge_indexs[:, 1])),
                            shape=(features.shape[0], features.shape[0]),
                            dtype="float32")

adjacency = normalize_adj(adjacency)

adjacency.shape

(2708, 2708)

adjacency

matrix([[0.25, 0.  , 0.  , ..., 0.  , 0.  , 0.  ],
        [0.  , 1.  , 0.  , ..., 0.  , 0.  , 0.  ],
        [0.  , 0.  , 1.  , ..., 0.  , 0.  , 0.  ],
        ...,
        [0.  , 0.  , 0.  , ..., 1.  , 0.  , 0.  ],
        [0.  , 0.  , 0.  , ..., 0.  , 0.2 , 0.  ],
        [0.  , 0.  , 0.  , ..., 0.  , 0.  , 0.25]])

# features： 代表图中的节点
# adjacency：代表图中的边
graph = [features, adjacency]

三、划分训练集、测试集和验证集

这里使用[0, 2000)个数据作为训练集合，[2000, 2300)个数据作为验证集，[2300, 2708)个数据作为测试集，实现上使用掩码(train_mask、val_mask、test_mask)的形式来区分训练集、验证集和测试集。

train_index = np.arange(2300)
val_index   = np.arange(2300, 2500)
test_index  = np.arange(2500, 2708)

train_mask  = np.zeros(edge_indexs.shape[0], dtype = np.bool)
val_mask    = np.zeros(edge_indexs.shape[0], dtype = np.bool)
test_mask   = np.zeros(edge_indexs.shape[0], dtype = np.bool)

train_mask[train_index] = True
val_mask[val_index]     = True
test_mask[test_index]   = True

edge_indexs.shape[0],edge_indexs.shape[0],edge_indexs.shape[0]

(5429, 5429, 5429)

train_mask,val_mask,test_mask

(array([ True,  True,  True, ..., False, False, False]),
 array([False, False, False, ..., False, False, False]),
 array([False, False, False, ..., False, False, False]))

四、模型训练

"""
这里导入的是自己自定义的1个文件,
如果打算运行本项目,
请前往 https://mtyjkh.blog.csdn.net/article/details/120222803 下载整个项目文件
"""
from graph   import GraphConvolutionLayer, GraphConvolutionModel

import time
import matplotlib.pyplot as plt
import tensorflow        as tf

gpus = tf.config.list_physical_devices("GPU")

if gpus:
    tf.config.experimental.set_memory_growth(gpus[0], True)  #设置GPU显存用量按需使用
    tf.config.set_visible_devices([gpus[0]],"GPU")

1. Loss计算

在Loss函数中，我们只对训练数据(train_mask为True)进行计算。

loss_object = tf.keras.losses.CategoricalCrossentropy(from_logits=True)

def loss(model, x, y, train_mask, training):
    """
    损失函数
    """
    y_ = model(x, training=training)
    
    # tf.where()将返回train_mask中为true的元素的索引
    # tf.gather_nd()将从 y/y_ 中取出index标注的部分
    test_mask_logits = tf.gather_nd(y_, tf.where(train_mask))
    masked_labels    = tf.gather_nd(y , tf.where(train_mask))

    return loss_object(y_true=masked_labels, y_pred=test_mask_logits)

def grad(model, inputs, targets, train_mask):
    """
    梯度计算函数
    """
    with tf.GradientTape() as tape:
        loss_value = loss(model, inputs, targets, train_mask, training=True)
    
    return loss_value, tape.gradient(loss_value, model.trainable_variables)

2. 训练模型

def test(mask):
    logits = model(graph)

    test_mask_logits = tf.gather_nd(logits, tf.where(mask))
    masked_labels    = tf.gather_nd(labels, tf.where(mask))

    ll       = tf.math.equal(tf.math.argmax(masked_labels, -1), tf.math.argmax(test_mask_logits, -1))
    accuarcy = tf.reduce_mean(tf.cast(ll, dtype=tf.float64))

    return accuarcy

model = GraphConvolutionModel()

optimizer=tf.keras.optimizers.Adam(learning_rate=1e-2, decay=5e-5)

# 记录过程值，以便最后可视化
train_loss     = []
train_accuracy = []
val_accuracy   = []
test_accuracy  = []

num_epochs = 150

for epoch in range(num_epochs):
     #计算梯度
    loss_value, grads = grad(model, graph, labels, train_mask)
    #更新模型
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

    accuarcy = test(train_mask)
    val_acc  = test(val_mask)
    test_acc = test(test_mask)

    train_loss.append(loss_value)
    train_accuracy.append(accuarcy)
    val_accuracy.append(val_acc)
    test_accuracy.append(test_acc)

    print("Epoch {} loss={} accuracy={} val_acc={} test_acc={}".format(epoch, loss_value, accuarcy, val_acc, test_acc))

Epoch 0 loss=1.947021484375 accuracy=0.4056521739130435 val_acc=0.37 test_acc=0.34615384615384615
Epoch 1 loss=1.763957142829895 accuracy=0.43 val_acc=0.4 test_acc=0.40865384615384615
Epoch 2 loss=1.6049641370773315 accuracy=0.5026086956521739 val_acc=0.45 test_acc=0.4519230769230769
Epoch 3 loss=1.451796054840088 accuracy=0.63 val_acc=0.55 test_acc=0.5336538461538461
......
Epoch 147 loss=0.0192192904651165 accuracy=0.9973913043478261 val_acc=0.74 test_acc=0.8076923076923077
Epoch 148 loss=0.01902584359049797 accuracy=0.9978260869565218 val_acc=0.74 test_acc=0.8028846153846154
Epoch 149 loss=0.018838025629520416 accuracy=0.9982608695652174 val_acc=0.74 test_acc=0.8028846153846154

可以看到，经过200次迭代后，最终GCN网络在验证集上的准确率达到99.8%，在测试集中的Accuracy达到了80.0%。

3. 结果可视化

plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)

plt.plot(train_accuracy, label='Training Accuracy')
plt.plot(val_accuracy, label='Validation Accuracy')
plt.plot(test_accuracy, label='Test Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation and Test Accuracy')

plt.subplot(1, 2, 2)
plt.plot(train_loss, label='Training Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

参考链接

官方源码：github.com/tkipf/gcn

五、同系列作品

🚀 深度学习新人必看：《小白入门深度学习》

🚀 往期精彩-卷积神经网络篇：

🚀 往期精彩-循环神经网络篇：

🚀 往期精彩-生成对抗网络篇：

🚀 往期精彩-目标检测系列：

深度学习100例 | 第51天-目标检测算法（YOLOv5）（一）

🚀 本文选自专栏：《深度学习100例》

💖先赞后看，再收藏，养成好习惯！💖

深度学习100例 | 第52天-图卷积神经网络（GCN）：实现论文分类