[AI医学翻译] QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruct

115 阅读21分钟

Doc2X | 专业公式识别工具 精准识别 PDF 文档中的公式,并支持编辑与转化到 Word 或 Latex,为科研工作者节省宝贵时间。 Doc2X | Professional Formula Recognition Tool Accurately recognizes formulas in PDFs, with editing and conversion to Word or LaTeX, saving valuable time for researchers. 👉 点击体验 Doc2X | Try Doc2X

原文链接 arxiv.org/pdf/2402.17…

QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction

QN-Mixer:一种用于稀疏视图CT重建的准牛顿MLP-Mixer模型

Ishak Ayad1,2*†

Ishak Ayad1,2*†

1{}^{1} ETIS (UMR 8051), CY Cergy Paris University, ENSEA, CNRS, France

1{}^{1} ETIS (UMR 8051),CY塞尔吉巴黎大学,ENSEA,法国

2{}^{2} AGM (UMR 8088), CY Cergy Paris University, CNRS, France

2{}^{2} AGM (UMR 8088),CY塞尔吉巴黎大学,CNRS,法国

3{}^{3} University of Ljubljana,Slovenia

3{}^{3} 卢布尔雅那大学,斯洛文尼亚

ishak.ayad@cyu.fr

Figure 1. CT Reconstruction with 32 views of State-of-the-Art Methods. Comparative analysis with post-processing and first-order unrolling networks highlights QN-Mixer's superiority in artifact removal, training time, and data efficiency.

图1. 使用32个视图的CT重建与最先进方法的比较分析。与后处理和一阶展开网络的比较突显了QN-Mixer在去除伪影、训练时间和数据效率方面的优势。

Abstract

摘要

Inverse problems span across diverse fields. In medical contexts,computed tomography(CT)plays a crucial role in reconstructing a patient's internal structure, presenting challenges due to artifacts caused by inherently ill-posed inverse problems. Previous research advanced image quality via post-processing and deep unrolling algorithms but faces challenges, such as extended convergence times with ultra-sparse data. Despite enhancements, resulting images often show significant artifacts, limiting their effectiveness for real-world diagnostic applications. We aim to explore deep second-order unrolling algorithms for solving imaging inverse problems, emphasizing their faster convergence and lower time complexity compared to common first-order methods like gradient descent. In this paper, we introduce QN-Mixer, an algorithm based on the quasi-Newton approach. We use learned parameters through the BFGS algorithm and introduce Incept-Mixer, an efficient neural architecture that serves as a non-local regularization term, capturing long-range dependencies within images. To address the computational demands typically associated with quasi-Newton algorithms that require full Hessian matrix computations, we present a memory-efficient alternative. Our approach intelligently downsamples gradient information, significantly reducing computational requirements while maintaining performance. The approach is validated through experiments on the sparse-view CT problem, involving various datasets and scanning protocols, and is compared with post-processing and deep unrolling state-of-the-art approaches. Our method outperforms existing approaches and achieves state-of-the-art performance in terms of SSIM and PSNR, all while reducing the number of unrolling iterations required.

逆问题跨越多个领域。在医学背景下,计算机断层扫描(CT)在重建患者内部结构方面发挥着关键作用,但由于固有的不适定逆问题引发的伪影,面临着挑战。以往的研究通过后处理和深度展开算法提高了图像质量,但仍面临诸如超稀疏数据导致的收敛时间延长等问题。尽管有所改进,生成的图像通常显示出显著的伪影,限制了它们在实际诊断应用中的有效性。我们旨在探索深度二阶展开算法以解决成像逆问题,强调其相较于常见的一阶方法(如梯度下降)具有更快的收敛速度和更低的时间复杂度。在本文中,我们介绍了QN-Mixer,这是一种基于拟牛顿方法的算法。我们通过BFGS算法使用学习到的参数,并引入Incept-Mixer,这是一种高效的神经网络架构,作为非局部正则化项,捕捉图像中的长程依赖关系。为了应对通常与拟牛顿算法相关的计算需求,这些算法需要完整的海森矩阵计算,我们提出了一种内存高效的替代方案。我们的方法智能地下采样梯度信息,显著减少计算需求,同时保持性能。该方法通过在稀疏视图CT问题上的实验进行了验证,涉及各种数据集和扫描协议,并与后处理和深度展开的最新方法进行了比较。我们的方法在SSIM和PSNR方面超越了现有方法,并在减少所需展开迭代次数的同时实现了最新的性能。

1. Introduction

1. 引言

Computed tomography (CT) is a widely used imaging modality in medical diagnosis and treatment planning, delivering intricate anatomical details of the human body with precision. Despite its success, CT is associated with high radiation doses, which can increase the risk of cancer induction [50]. Adhering to the ALARA principle (As Low As Reasonably Achievable) [37], the medical community emphasizes minimizing radiation exposure to the lowest level necessary for accurate diagnosis. Numerous approaches have been proposed to reduce radiation doses while maintaining image quality. Among these, sparse-view CT emerges as a promising solution, effectively lowering radiation doses by subsampling the projection data, often referred to as the sinogram. Nonetheless, reconstructed images using the well-known Filtered Back Projection (FBP) algorithm [34], suffer from pronounced streaking artifacts (see Fig. 1), which can lead to misdiagnosis. The challenge of effectively reconstructing high-quality CT images from sparse-view data is gaining increasing attention in both the computer vision and medical imaging communities.

计算机断层扫描(CT)是一种广泛应用于医学诊断和治疗规划的成像方式,能够精确地提供人体复杂的解剖细节。尽管取得了成功,CT 仍然与高辐射剂量相关,这可能增加癌症诱发的风险 [50]。医学界遵循 ALARA 原则(尽可能低) [37],强调将辐射暴露降低到进行准确诊断所需的最低水平。已经提出多种方法来减少辐射剂量,同时保持图像质量。在这些方法中,稀疏视图 CT 显示出作为一种有前景的解决方案,通过对投影数据进行子采样,通常称为正弦图,有效降低辐射剂量。然而,使用著名的滤波反投影(FBP)算法 [34] 重建的图像存在明显的条纹伪影(见图 1),这可能导致误诊。从稀疏视图数据有效重建高质量 CT 图像的挑战在计算机视觉和医学成像领域越来越受到关注。


*Corresponding author. \dagger Equal contribution.

*通讯作者。 \dagger 平等贡献。


With the success of deep learning spanning diverse domains,initial image-domain techniques [6,19,25,28,59]\left\lbrack {6,{19},{25},{28},{59}}\right\rbrack have been introduced as post-processing tasks on the FBP reconstructed images, exhibiting notable accomplishments in artifact removal and structure preservation. However, the inherent limitations of these methods arise from their constrained receptive fields, leading to challenges in effectively capturing global information and, consequently, suboptimal results.

随着深度学习在各个领域的成功,初始图像域技术 [6,19,25,28,59]\left\lbrack {6,{19},{25},{28},{59}}\right\rbrack 被引入作为 FBP 重建图像的后处理任务,在去除伪影和结构保留方面取得了显著成就。然而,这些方法的固有限制源于其受限的感受野,导致在有效捕捉全局信息方面面临挑战,从而产生次优结果。

To address this limitation, recent advances have seen a shift toward a dual-domain approach [18, 27, 29, 49], where post-processing methods turn to the sinogram domain. In this dual-domain paradigm, deep neural networks are employed to perform interpolation tasks on the sinogram data [15,24]\left\lbrack {{15},{24}}\right\rbrack ,facilitating more accurate image reconstruction. Despite the significant achievements of postprocessing and dual-domain methods, they confront issues of interpretability and performance limitations, especially when working with small datasets and ultra-sparse-view data, as shown in Fig. 1. To tackle these challenges, deep unrolling networks have been introduced [1,7,8,11,16\lbrack 1,7,8,{11},{16} , 20, 51, 54]. Unrolling networks treat the sparse-view CT reconstruction problem as an optimization task, resulting in a first-order iterative algorithm like gradient descent, which is subsequently unrolled into a deep recurrent neural network in order to learn the optimization parameters and the regularization term. Like post-processing techniques, unrolling networks have been extended to the sinogram domain [52,56]\left\lbrack {{52},{56}}\right\rbrack to perform interpolation task.

为了解决这一限制,最近的进展已转向双域方法 [18, 27, 29, 49],其中后处理方法转向正弦图域。在这一双域范式中,深度神经网络被用于对正弦图数据 [15,24]\left\lbrack {{15},{24}}\right\rbrack 进行插值任务,从而促进更准确的图像重建。尽管后处理和双域方法取得了显著成就,但它们面临可解释性和性能限制的问题,特别是在处理小数据集和超稀疏视图数据时,如图1所示。为了解决这些挑战,已引入深度展开网络 [1,7,8,11,16\lbrack 1,7,8,{11},{16},[20, 51, 54]。展开网络将稀疏视图CT重建问题视为优化任务,产生类似于梯度下降的一阶迭代算法,随后被展开为深度递归神经网络,以学习优化参数和正则化项。与后处理技术一样,展开网络也已扩展到正弦图域 [52,56]\left\lbrack {{52},{56}}\right\rbrack 以执行插值任务。

Unrolling networks,as referenced in [12,36,44]\left\lbrack {{12},{36},{44}}\right\rbrack ,exhibit remarkable performance across diverse domains. However, they suffer from slow convergence and high computational costs, as illustrated in Fig. 1, necessitating the development of more efficient alternatives [14]. More specifically, they confront two main issues: Firstly, they frequently grapple with capturing long-range dependencies due to their dependence on locally-focused regularization terms using CNNs. This limitation results in suboptimal outcomes, particularly evident in tasks such as image reconstruction. Secondly, the escalating computational costs of unrolling methods align with the general trend of increased complexity in modern neural networks. This escalation not only amplifies the required number of iterations due to the algorithm's iterative nature but also contributes to their high computational demand.

[12,36,44]\left\lbrack {{12},{36},{44}}\right\rbrack 中所述,展开网络在不同领域表现出卓越的性能。然而,它们面临着收敛速度慢和计算成本高的问题,如图1所示,这需要开发更高效的替代方案 [14]。更具体地说,它们面临两个主要问题:首先,由于依赖于使用卷积神经网络(CNN)的局部聚焦正则化项,它们常常难以捕捉长程依赖性。这一限制导致了次优结果,特别是在图像重建等任务中尤为明显。其次,展开方法的计算成本不断上升,与现代神经网络复杂性增加的一般趋势相一致。这一上升不仅增加了由于算法的迭代性质所需的迭代次数,还导致了它们的高计算需求。

To tackle the aforementioned issues, we introduce a novel second-order unrolling network for sparse-view CT reconstruction. In particular, to enable the learnable regularization term to apprehend long-range interactions within the image, we propose a non-local regularization block termed Incept-Mixer. Drawing inspiration from the multilayer perceptron mixer [46] and the inception architecture [45], it is created to combine the best features from both sides: capturing long-range interactions from the attention-like mechanism of MLP-Mixer and extracting local invariant features from the inception block. This block facilitates a more precise image reconstruction. Second, to cut down on the computational costs associated with unrolling networks, we propose to decrease the required iterations for convergence by employing second-order optimization methods such as [21,30]\left\lbrack {{21},{30}}\right\rbrack . We introduce a novel unrolling framework named QN-Mixer. Our approach is based on the quasi-Newton method that approximate the Hessian matrix using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) update [10,13,57]\left\lbrack {{10},{13},{57}}\right\rbrack . Furthermore,we reduce memory usage by working on a projected gradient (latent gradient), preserving performance while reducing the computational cost tied to Hessian matrix approximation. This adaptation enables the construction of a deep unrolling network, showcasing superlinear convergence. Our contributions are summarized as follows:

为了解决上述问题,我们引入了一种新颖的二阶展开网络用于稀疏视图CT重建。特别地,为了使可学习的正则化项能够理解图像中的长程交互,我们提出了一种称为Incept-Mixer的非局部正则化块。该块受到多层感知机混合器[46]和Inception架构[45]的启发,旨在结合两者的最佳特征:从类似注意力机制的MLP-Mixer中捕获长程交互,并从Inception块中提取局部不变特征。这个块促进了更精确的图像重建。其次,为了降低与展开网络相关的计算成本,我们建议通过采用二阶优化方法如[21,30]\left\lbrack {{21},{30}}\right\rbrack来减少收敛所需的迭代次数。我们引入了一种名为QN-Mixer的新型展开框架。我们的方法基于准牛顿法,该方法使用Broyden-Fletcher-Goldfarb-Shanno (BFGS)更新[10,13,57]\left\lbrack {{10},{13},{57}}\right\rbrack来近似Hessian矩阵。此外,我们通过在投影梯度(潜在梯度)上工作来减少内存使用,同时保持性能并降低与Hessian矩阵近似相关的计算成本。这种适应性使得构建深度展开网络成为可能,展示了超线性收敛。我们的贡献总结如下:

  • We introduce a novel second-order unrolling network coined QN-Mixer where the Hessian matrix is approximated using a latent BFGS algorithm with a deep-net learned regularization term.

  • 我们引入了一种新颖的二阶展开网络,称为QN-Mixer,其中Hessian矩阵使用潜在BFGS算法与深度网络学习的正则化项进行近似。

  • We propose Incept-Mixer, a neural architecture acting as a non-local regularization term. Incept-Mixer integrates deep features from inception blocks with MLP-Mixer, enhancing multi-scale information usage and capturing long-range dependencies.

  • 我们提出了Incept-Mixer,一种作为非局部正则化项的神经架构。Incept-Mixer将来自Inception块的深度特征与MLP-Mixer相结合,增强了多尺度信息的使用并捕获了长程依赖关系。

  • We demonstrate the effectiveness of our proposed method when applied to the sparse-view CT reconstruction problem on an extensive set of experiments and datasets. We show that our method outperforms state-of-the-art methods in terms of quantitative metrics while requiring less iterations than first-order unrolling networks.

  • 我们在大量实验和数据集上展示了我们提出的方法在稀疏视图CT重建问题上的有效性。我们表明,所提方法在定量指标上优于最先进的方法,同时所需的迭代次数少于一阶展开网络。

2. Related Works

2. 相关工作

In this section, we present prior work closely related to our paper. We begin by discussing the general framework for unrolling networks in Sec. 2.1, which is based on the gradient descent algorithm. Subsequently, in Sec. 2.2 and Sec. 2.3, we delve into state-of-the-art methods in postprocessing and unrolling networks, respectively.

在本节中,我们介绍与本文密切相关的先前工作。我们首先在第2.1节讨论基于梯度下降算法的展开网络的一般框架。随后,在第2.2节和第2.3节中,我们分别深入探讨后处理和展开网络的最先进方法。

2.1. Background

2.1. 背景

Inverse Problem Formulation for CT. Image reconstruction problem in CT can be mathematically formalized as the solution to a linear equation in the form of:

CT的逆问题表述。CT中的图像重建问题可以数学上形式化为求解以下线性方程:

where xRn\mathbf{x} \in {\mathbb{R}}^{n} is the (unknown) object to reconstruct with n=h×w,yRmn = h \times w,\mathbf{y} \in {\mathbb{R}}^{m} is the data (i.e. sinogram),where m=nv×nd,nvm = {n}_{v} \times {n}_{d},{n}_{v} and nd{n}_{d} denote the number of projection views and detectors,respectively. ARn×m\mathbf{A} \in {\mathbb{R}}^{n \times m} is the forward model (i.e. discrete Radon transform [40]). The goal of CT image reconstruction is to recover the (unknown) object, x\mathbf{x} ,from the observed data y\mathbf{y} . As the problem is ill-posed due to the missing data, the linear system in Eq. (1) becomes underdetermined and may have infinite solutions. Hence, reconstructed images suffer from artifacts, blurring, and noise. To address this issue, iterative reconstruction algorithms are utilized to minimize a regularized objective function with a L2{L}^{2} norm constraint:

其中 xRn\mathbf{x} \in {\mathbb{R}}^{n} 是待重建的(未知)物体, n=h×w,yRmn = h \times w,\mathbf{y} \in {\mathbb{R}}^{m} 是数据(即正弦图), m=nv×nd,nvm = {n}_{v} \times {n}_{d},{n}_{v}nd{n}_{d} 分别表示投影视图和探测器的数量。 ARn×m\mathbf{A} \in {\mathbb{R}}^{n \times m} 是前向模型(即离散Radon变换 [40])。CT图像重建的目标是从观察到的数据 y\mathbf{y} 中恢复(未知)物体 x\mathbf{x} 。由于缺失数据,该问题是病态的,因此方程(1)中的线性系统变得欠定,可能有无限解。因此,重建的图像会受到伪影、模糊和噪声的影响。为了解决这个问题,采用迭代重建算法来最小化带有 L2{L}^{2} 范数约束的正则化目标函数:

where R(x)\mathcal{R}\left( \mathbf{x}\right) is the regularization term,balanced with the weight λ\lambda . Those ill-posed problems were initially addressed using optimization techniques, such as the truncated singular value decomposition (SVD) algorithm [42], or iterative approaches like the algebraic reconstruction technique (ART) [4], simultaneous ART (SART) [2], conjugate gradient for least squares (CGLS) [22], and total generalized variation regularization (TGV) [43]. Additionally, techniques such as total variation [47] and Tikhonov regularization [9] can be employed to enhance reconstruction results.

在这里,R(x)\mathcal{R}\left( \mathbf{x}\right) 是正则化项,与权重 λ\lambda 平衡。这些病态问题最初通过优化技术解决,例如截断奇异值分解 (SVD) 算法 [42],或迭代方法,如代数重建技术 (ART) [4]、同时 ART (SART) [2]、最小二乘共轭梯度 (CGLS) [22] 和总广义变差正则化 (TGV) [43]。此外,还可以采用总变差 [47] 和 Tikhonov 正则化 [9] 等技术来增强重建结果。

Deep Unrolling Networks. By assuming that the regularization term in Eq. (2) (i.e. R\mathcal{R} ) is differentiable and convex, a simple gradient descent scheme can be applied to solve the optimization problem:

深度展开网络。通过假设公式 (2) 中的正则化项(即 R\mathcal{R} )是可微且凸的,可以应用简单的梯度下降方案来解决优化问题:

Here, α\alpha represents the step size (i.e. search step),and A{\mathbf{A}}^{ \dagger } is the pseudo-inverse of A\mathbf{A} .

在这里,α\alpha 代表步长(即搜索步长),而 A{\mathbf{A}}^{ \dagger }A\mathbf{A} 的伪逆。

Previous research [16,53]\left\lbrack {{16},{53}}\right\rbrack has emphasized the limitations of optimization algorithms, such as the manual selection of the regularization term and the optimization hyper-parameters, which can negatively impact their performance, limiting their clinical application. Recent advancements in deep learning techniques have enabled automated parameter selection directly from the data, as demonstrated in [7,11,23,33,38,56]\left\lbrack {7,{11},{23},{33},{38},{56}}\right\rbrack . By allowing the terms in Eq. (3) to be dependent on the iteration, the gradient descent iteration becomes:

先前的研究 [16,53]\left\lbrack {{16},{53}}\right\rbrack 强调了优化算法的局限性,例如正则化项和优化超参数的手动选择,这可能对其性能产生负面影响,限制其临床应用。最近深度学习技术的进展使得能够直接从数据中自动选择参数,如 [7,11,23,33,38,56]\left\lbrack {7,{11},{23},{33},{38},{56}}\right\rbrack 所示。通过允许公式 (3) 中的项依赖于迭代,梯度下降迭代变为:

where G\mathcal{G} is a learned mapping representing the gradient of the regularization term. It is worth noting that the step size α\alpha in Eq. (3) is omitted as it is redundant when considering the learned components of the regularization term. Finally, Eq. (4) is unrolled into a deep recurrent neural network in order to learn the optimization parameters.

在这里,G\mathcal{G} 是一个学习到的映射,表示正则化项的梯度。值得注意的是,公式 (3) 中的步长 α\alpha 被省略,因为在考虑正则化项的学习组件时它是多余的。最后,公式 (4) 被展开为一个深度递归神经网络,以学习优化参数。

2.2. Post-processing Methods

2.2. 后处理方法

Recent advances in sparse-view CT reconstruction leverage two main categories of deep learning methods: postprocessing and dual-domain approaches. Post-processing methods, including RedCNN [6], FBPConvNet [19], and DDNet [59], treat sparse-view reconstruction as a denoising step using FBP reconstructions as input. While effective in addressing artifacts and reducing noise, they often struggle with recovering global information from extremely sparse data. To overcome this limitation, dual-domain methods integrate sinograms into neural networks for an interpolation task, recovering missing data [15, 24]. Dual-domain methods, surpassing post-processing ones, combine information from both domains. DuDoNet [29], an initial dual-domain method, connects image and sinogram domains through a Radon inversion layer. Recent Transformer-based dual-domain methods, such as DuDoTrans [49] and DDPTrans-former [27], aim to capture long-range dependencies in the sinogram domain, demonstrating superior performance to CNN-based methods.

最近在稀疏视图CT重建方面的进展利用了两大类深度学习方法:后处理和双域方法。后处理方法,包括RedCNN [6]、FBPConvNet [19]和DDNet [59],将稀疏视图重建视为一个去噪步骤,使用FBP重建作为输入。虽然在处理伪影和减少噪声方面有效,但它们在从极其稀疏的数据中恢复全局信息时常常面临困难。为了克服这一限制,双域方法将正弦图集成到神经网络中进行插值任务,以恢复缺失的数据 [15, 24]。双域方法超越了后处理方法,结合了两个域的信息。DuDoNet [29],一种初始的双域方法,通过一个Radon反演层连接图像域和正弦图域。最近基于Transformer的双域方法,如DuDoTrans [49]和DDPTrans-former [27],旨在捕捉正弦图域中的长距离依赖关系,表现出优于基于CNN的方法的性能。

Self-supervised learning. SSL methods [5, 17, 26, 48, 58], have been applied for CT reconstruction. For instance, [5] proposed an equivariant imaging paradigm through a training strategy that enforces measurement consistency and equivariance conditions. To ensure equitable comparisons, we focus on supervised methods in this work.

自监督学习。SSL方法 [5, 17, 26, 48, 58] 已被应用于CT重建。例如,[5]提出了一种通过训练策略强制测量一致性和等变条件的等变成像范式。为了确保公平比较,我们在本研究中专注于监督方法。

2.3. Advancements in Deep Unrolling Networks

2.3. 深度展开网络的进展

Unrolling networks constitute a line of work inspired by popular optimization algorithms used to solve Eq. (2). Leveraging the iterative nature of optimization algorithms, as presented in Eq. (4), unrolling networks aim to directly learn optimization parameters from data. These methods have found success in various inverse problems, including sparse-view CT [7, 20, 52, 54, 56], limited-angle CT [8, 11, 51], low-dose CT [1, 16], and compressed sensing MRI [12, 44].

展开网络是一种受到流行优化算法启发的工作线,旨在解决方程(2)。利用优化算法的迭代特性,如方程(4)所示,展开网络旨在直接从数据中学习优化参数。这些方法在各种逆问题中取得了成功,包括稀疏视图CT [7, 20, 52, 54, 56]、有限角度CT [8, 11, 51]、低剂量CT [1, 16]和压缩感知MRI [12, 44]。

First-order. One pioneering unrolling network, Learned Primal-Dual reconstruction [1], replaces traditional proximal operators with CNNs. In contrast, LEARN [7] and LEARN++ [56] directly unroll the optimization algorithm from Eq. (4) into a deep recurrent neural network. More recently, Transformers [3, 31] have been introduced into unrolling networks, such as RegFormer [54] and HUMUS-Net [12]. While achieving commendable performance, these methods require more computational resources than traditional CNN-based unrolling networks and incur a significant memory footprint due to linear scaling with the number of unrolling iterations.

一阶。一个开创性的展开网络,学习的原始-对偶重建 [1],用卷积神经网络替代了传统的近端算子。相比之下,LEARN [7] 和 LEARN++ [56] 直接将优化算法从 Eq. (4) 展开为深度递归神经网络。最近,变换器 [3, 31] 被引入到展开网络中,例如 RegFormer [54] 和 HUMUS-Net [12]。尽管这些方法取得了可喜的性能,但它们比传统的基于 CNN 的展开网络需要更多的计算资源,并且由于与展开迭代次数线性相关,导致显著的内存占用。

Figure 2. Overall structure of the proposed QN-Mixer for sparse-view CT reconstruction, unrolled from Algorithm 2. The method leverages the advantages of the quasi-Newton method for faster convergence while incorporating a latent BFGS update.

图 2. 提出的 QN-Mixer 的整体结构,用于稀疏视图 CT 重建,从算法 2 展开。该方法利用准牛顿方法的优势以实现更快的收敛,同时结合了潜在的 BFGS 更新。

Second-order. To address this, a new category of unrolling optimization methods has emerged [14], leveraging second-order techniques like the quasi-Newton method [10, 13, 21]. These methods converge faster, reducing computational demands, but struggle with increased memory usage due to Hessian matrix approximation and their application is limited to small-scale problems [30, 57]. In contrast our method propose a memory-efficient approach by operating within the latent space of gradient information (i.e. xJ(x){\nabla }_{\mathbf{x}}J\left( \mathbf{x}\right) in Eq. (3)).

二阶。为了解决这个问题,一种新的展开优化方法类别应运而生 [14],利用准牛顿方法等二阶技术 [10, 13, 21]。这些方法收敛更快,降低了计算需求,但由于海森矩阵近似的增加内存使用而面临挑战,并且其应用仅限于小规模问题 [30, 57]。相比之下,我们的方法提出了一种内存高效的方法,通过在梯度信息的潜在空间内操作(即 Eq. (3) 中的 xJ(x){\nabla }_{\mathbf{x}}J\left( \mathbf{x}\right))。

Algorithm 1: Quasi-Newton for sparse-view CT

算法 1:稀疏视图 CT 的准牛顿方法

Data: y\mathbf{y} (sparse sinogram)

数据:y\mathbf{y}(稀疏正弦图)

Manual choice of the regularization term R\mathcal{R} ;

手动选择正则化项 R\mathcal{R}

H0In×n;{\mathbf{H}}_{0} \leftarrow {\mathbf{I}}^{n \times n};

x0Ay{\mathbf{x}}_{0} \leftarrow {\mathbf{A}}^{ \dagger }\mathbf{y}

for t{0,,T1}t \in \{ 0,\ldots ,T - 1\} do

对于 t{0,,T1}t \in \{ 0,\ldots ,T - 1\}

stHtxJ(xt){\mathbf{s}}_{t} \leftarrow - {\mathbf{H}}_{t}{\nabla }_{\mathbf{x}}J\left( {\mathbf{x}}_{t}\right)

xt+1xt+st{\mathbf{x}}_{t + 1} \leftarrow {\mathbf{x}}_{t} + {\mathbf{s}}_{t}

ztxJ(xt+1)xJ(xt){\mathbf{z}}_{t} \leftarrow {\nabla }_{\mathbf{x}}J\left( {\mathbf{x}}_{t + 1}\right) - {\nabla }_{\mathbf{x}}J\left( {\mathbf{x}}_{t}\right)

ρt1/(ztst){\rho }_{t} \leftarrow 1/\left( {{\mathbf{z}}_{t}^{\top }{\mathbf{s}}_{t}}\right)

Ht+1(Iρtstzt)Ht(Iρtztst)+ρtstst{\mathbf{H}}_{t + 1} \leftarrow \left( {\mathbf{I} - {\rho }_{t}{\mathbf{s}}_{t}{\mathbf{z}}_{t}^{\top }}\right) {\mathbf{H}}_{t}\left( {\mathbf{I} - {\rho }_{t}{\mathbf{z}}_{t}{\mathbf{s}}_{t}^{\top }}\right) + {\rho }_{t}{\mathbf{s}}_{t}{\mathbf{s}}_{t}^{\top }

3. Methodology

3. 方法论

QN-Mixer is a novel second-order unrolling network inspired by the quasi-Newton (Sec. 3.1) method. It approximates the inverse Hessian matrix with a latent BFGS algorithm and includes a non-local regularization term, Incept-Mixer, designed to capture non-local relationships (Sec. 3.2). To cope with the significant computational burden associated with the full approximation of the inverse Hessian matrix, we use a latent BFGS algorithm (Sec. 3.3). An overview of the proposed method is depicted in Fig. 2, and the complete algorithm is presented in Sec. 3.4.

QN-Mixer 是一种新颖的二阶展开网络,灵感来自于拟牛顿法(第 3.1 节)。它通过潜在的 BFGS 算法来近似逆 Hessian 矩阵,并包含一个非局部正则化项 Incept-Mixer,旨在捕捉非局部关系(第 3.2 节)。为了应对与逆 Hessian 矩阵完全近似相关的显著计算负担,我们使用潜在的 BFGS 算法(第 3.3 节)。所提方法的概述如图 2 所示,完整算法在第 3.4 节中呈现。

3.1. Quasi-Newton method

3.1. 拟牛顿法

The quasi-Newton method can be applied to solve Eq. (2) and the iterative optimization solution is expressed as:

拟牛顿法可以应用于求解方程 (2),迭代优化解表示为:

where HtRn×n{\mathbf{H}}_{t} \in {\mathbb{R}}^{n \times n} represents the inverse Hessian matrix approximation at iteration tt ,and αt{\alpha }_{t} is the step size. The BFGS method updates the Hessian matrix approximation in each iteration. This matrix is crucial for understanding the curvature of the objective function around the current point, guiding us to take more efficient steps and avoiding unnecessary zigzagging. In the classical BFGS approach, the line search adheres to Wolfe conditions [10, 13]. A step size of αt=1{\alpha }_{t} = 1 is attempted first,ensuring eventual acceptance for superlinear convergence [21]. In our approach, we adopt a fixed step size of αt=1{\alpha }_{t} = 1 . The algorithm is illustrated in Algorithm 1.

其中 HtRn×n{\mathbf{H}}_{t} \in {\mathbb{R}}^{n \times n} 表示迭代 tt 时的逆 Hessian 矩阵近似, αt{\alpha }_{t} 是步长。BFGS 方法在每次迭代中更新 Hessian 矩阵的近似值。该矩阵对于理解目标函数在当前点周围的曲率至关重要,指导我们采取更有效的步骤,避免不必要的锯齿形移动。在经典的 BFGS 方法中,线搜索遵循 Wolfe 条件 [10, 13]。首先尝试步长 αt=1{\alpha }_{t} = 1,确保最终接受以实现超线性收敛 [21]。在我们的方法中,我们采用固定步长 αt=1{\alpha }_{t} = 1。算法在算法 1 中进行了说明。

3.2. Regularization term: Incept-Mixer

3.2. 正则化项:Incept-Mixer

Recent research on unrolling networks has often focused on selecting the representation of the regularization term gradi-

最近对展开网络的研究通常集中在选择正则化项的表示上。

—— 更多内容请到Doc2X翻译查看—— —— For more content, please visit Doc2X for translations ——