PET-AI解读 | rs-fMRI的GNN和TCN建模（图构建，时间序列归一化）相关论文：A deep graph n

相关论文：A deep graph neural network architecture for modelling spatio-temporal dynamics in resting-state functional MRI data
相关repo：github.com/tjiagoM/spa…
笔记人：陈亦新

这个是代码解读的第一部分。主要是数据加载的理解。包含下面三个部分：

时间序列数据如何变成graph
graph如何转化成无向图和稀疏图的
时间序列如何进行归一化

数据加载

我们主要从UKB数据中看看对数据的处理线索：

class UKBDataset(BrainDataset):
    def __init__(self,...):
    
    @property
    def processed_file_names(self):
        return ['data_ukb_brain.dataset']
        
    def __create_data_object(self,...):
    
    def process(self):

process函数调用了__create_data_object函数，是很规范的书写方式，赞一个。我们按照process内的部分来盘逻辑。

    def process(self):
    ...
    conn_measure = ConnectivityMeasure(
            kind='correlation',
            vectorize=False)
    ...

这个ConnectivityMeasure是从nilearn.connectome包中引入的方法。

【nilearn包介绍】这是一个用可以让机器学习、模式识别和多元统计更容易的应用在神经成像的数据上。可以很容易的应用在任务态和静息态、VBM数据。此外nilearn具有绘图功能，可以可视化神经成像等等。

ConnectivityMeasure这个方法就是构建相关性矩阵。

def process(self):
    ...
    conn_measure = ConnectivityMeasure(
            kind='correlation',
            vectorize=False)
    ...
    corr_tmp = conn_measure.fit_transform([ts])
    ...

corr_tmp就是相关性矩阵，然后通过create_thresholded_graph生成graph。

def threshold_adj_array(adj_array: np.ndarray, threshold: int, num_nodes: int) -> np.ndarray:
    num_to_filter: int = int((threshold / 100.0) * (num_nodes * (num_nodes - 1) / 2))
    # 先将下三角的数据变成0，也包含对角线上的
    adj_array[np.tril_indices(num_nodes)] = 0
    # 获取非0元素的坐标索引
    indices = np.where(adj_array)
    #从array中提取非0元素，并且排序，然后变成从大到小排序
    sorted_indices = np.argsort(adj_array[indices])[::-1]
    # 然后选择num_to_filter个最大的数字保留，其他置0
    adj_array[(indices[0][sorted_indices][num_to_filter:], indices[1][sorted_indices][num_to_filter:])] = 0
    # 重新构建成无向图对称
    adj_array = adj_array + adj_array.T

    # 将对角线上的1补回来。
    adj_array[np.diag_indices(num_nodes)] = 1.0

    return adj_array

这个过程比较有趣。我在上面写了详细的注释，总结就是，先去除一般的相关性，因为要构建无向图。然后去排序，只保留相关性最大的num_to_filter的边。

这个graph通过这样的方法转换成networkx的格式：

graph = nx.from_numpy_array(adj_array, create_using=nx.DiGraph)

然后：

edge_index = torch.tensor(np.array(graph.edges()), dtype=torch.long).t().contiguous()

转化成下图的edge索引形式：

然后

edge_attr = torch.tensor(list(nx.get_edge_attributes(graph, 'weight').values()),
                                                 dtype=torch.float).unsqueeze(1)

edge_attr就是一个value的列表，刚好和上面edge_index对应起来：

然后，我们需要将所有我们得到的东西，放到这个函数当中：

data = self.__create_data_object(person=person, ts=ts, edge_index=edge_index, edge_attr=edge_attr,
                                                 covars=main_covars)

其中：

person应该就是一个字符串；
ts就是对应person的时间序列特征；
edge_index和edge_attr就是图中非0值的索引和数值；
main_covars不知道是啥，好像是一个dataframe

我们来看__create_data_object函数的逻辑：

def __create_data_object(self, person: int, ts: np.ndarray, covars: pd.DataFrame, edge_attr: torch.Tensor,
                             edge_index: torch.Tensor):
        assert ts.shape[0] > ts.shape[1]  # TS > N

        timeseries = normalise_timeseries(timeseries=ts, normalisation=self.normalisation)

        if self.encoding_strategy == EncodingStrategy.STATS:
            assert timeseries.shape == (self.num_nodes, self.time_length)
            timeseries = calculate_stats_features(timeseries)
            assert timeseries.shape == (self.num_nodes, 16)
            timeseries[np.isnan(timeseries)] = 0
            assert not np.isnan(timeseries).any()

        if self.analysis_type in [AnalysisType.ST_UNIMODAL, AnalysisType.ST_UNIMODAL_AVG]:
            x = torch.tensor(timeseries, dtype=torch.float)

        if self.target_var == 'gender':
            y = torch.tensor([covars.loc[person, 'Sex']], dtype=torch.float)
        elif self.target_var == 'bmi':
            y = torch.tensor([covars.loc[person, 'BMI.at.scan']], dtype=torch.float)
        else:
            y = torch.tensor([covars.loc[person, 'Age.at.scan']], dtype=torch.float)

        data = Data(x=x, edge_index=edge_index, edge_attr=edge_attr, y=y)
        data.ukb_id = torch.tensor([person])

        if self.target_var == 'gender':
            data.age = torch.tensor([covars.loc[person, 'Age.at.scan']])
            data.bmi = torch.tensor([covars.loc[person, 'BMI.at.scan']])
        elif self.target_var == 'bmi':
            data.sex = torch.tensor([covars.loc[person, 'Sex']])
            data.age = torch.tensor([covars.loc[person, 'Age.at.scan']])
        else:
            data.sex = torch.tensor([covars.loc[person, 'Sex']])
            data.bmi = torch.tensor([covars.loc[person, 'BMI.at.scan']])

        return data

对于时间序列，先继续归一化，采用的归一化方式为：Normalisation('subject_norm')

def normalise_timeseries(timeseries: np.ndarray, normalisation: Normalisation) -> np.ndarray:
    """
    :param normalisation:
    :param timeseries: In  format TS x N
    :return:
    """
    if normalisation == Normalisation.ROI:
        scaler = RobustScaler().fit(timeseries)
        timeseries = scaler.transform(timeseries).T
    elif normalisation == Normalisation.SUBJECT:
        flatten_timeseries = timeseries.flatten().reshape(-1, 1)
        scaler = RobustScaler().fit(flatten_timeseries)
        timeseries = scaler.transform(flatten_timeseries).reshape(timeseries.shape).T
    else:  # No normalisation
        timeseries = timeseries.T

    return timeseries

不管怎么说，我们把ROI归一化和SUBJECT归一化都了解一下。

【ROI归一化】 RobustScaler来自sklearn.preprocessing包，是对异常值鲁棒的统计信息来缩放特征。减去中值，除以分位数的范围（默认是四分位数）对数据进行缩放。IQR四分位数范围为第1个四分位数和第3个四分位数之间的范围。数据集的标准化是一个常见的需求，通常使用平均值和单位方差来实现，但是异常值往往会对样本均值方法产生负面影响，因此这个时候使用中位数和IQR的效果会更好。

计算公式如下： $v'=\frac{v-median}{IQR}$

【SUBJECT归一化】仔细看了一下，发现其实也是上卖弄的RobustScaler方法。区别在于，ROI归一化的数据ts是TSxN的数据。N是ROI的数目，TS是时间序列的范围。所以ROI归一化是对每一个ROI区域的时间序列进行归一化。SUBJECT则是将所有的ROI都flatten，对整体的数据做一个归一化。好处我猜测是因为，这样可以保留不同ROI区域时间序列之间的整体偏差，保留ROI的整体差异性。

然后我们回到__create_data_object方法中，后面就没什么了。我们可以看到，这个dataset输出的数据包含三个部分：

时间序列
图的索引
图的值

其中，图也是从时间序列产生的，然后经过一些阈值处理和无向图处理。时间序列后来也经过了归一化处理

下一步来看模型的构建，如何将TCN和GNN结合起来的。