[论文翻译]CryoSTAR: Leveraging Structural Prior and Constraints for Cryo-EM Heteroge

122 阅读21分钟

Doc2X:智能文档解析工具 Doc2X 支持从 PDF 转换为 Docx、HTML、Markdown,功能覆盖 公式识别、代码解析、表格转换、多栏布局解析,并整合了 GPT翻译 和 Deepseek 翻译! Doc2X: Intelligent Document Parsing Tool Doc2X supports PDF to Docx, HTML, and Markdown, with features like formula recognition, code parsing, table conversion, and multi-column layout parsing, integrated with GPT and DeepSeek translations! 👉 了解 Doc2X 的独特功能 | Explore Doc2X Features

原文链接:www.biorxiv.org/content/10.…

CryoSTAR: Leveraging Structural Prior and Constraints for Cryo-EM Heterogeneous Reconstruction

CryoSTAR: 利用结构先验和约束进行冷冻电子显微镜异质重建

Yilai Li 1#{}^{1\# } ,Yi Zhou 1#{}^{1\# } ,Jing Yuan 1#{}^{1\# } ,Fei Ye 1{}^{1} ,Quanquan Gu 1{}^{{1}^{ * }}

Yilai Li 1#{}^{1\# } ,Yi Zhou 1#{}^{1\# } ,Jing Yuan 1#{}^{1\# } ,Fei Ye 1{}^{1} ,Quanquan Gu 1{}^{{1}^{ * }}

'ByteDance Research

ByteDance Research

#Contributed Equally

同等贡献

'Correspondence to: quanquan.gu@bytedance.com

'通讯至: quanquan.gu@bytedance.com

Abstract

摘要

Resolving conformational heterogeneity in cryo-electron microscopy (cryo-EM) datasets remains a significant challenge in structural biology. Previous methods have often been restricted to working exclusively on volumetric densities, neglecting the potential of incorporating any pre-existing structural knowledge as prior or constraints. In this paper, we present a novel methodology, cryoSTAR, that harnesses atomic model information as structural regularization to elucidate such heterogeneity. Our method uniquely outputs both coarse-grained models and density maps, showcasing the molecular conformational changes at different levels. Validated against four diverse experimental datasets, spanning large complexes, a membrane protein, and a small single-chain protein, our results consistently demonstrate an efficient and effective solution to conformational heterogeneity with minimal human bias. By integrating atomic model insights with cryo-EM data, cryoSTAR represents a meaningful step forward, paving the way for a deeper understanding of dynamic biological processes. 1{}^{1}

在冷冻电子显微镜(cryo-EM)数据集中解决构象异质性仍然是结构生物学中的一项重大挑战。以往的方法往往仅限于处理体积密度,忽视了将任何现有结构知识作为先验或约束纳入的潜力。在本文中,我们提出了一种新颖的方法论,cryoSTAR,利用原子模型信息作为结构正则化来阐明这种异质性。我们的方法独特地输出粗粒度模型和密度图,展示不同层次的分子构象变化。通过对四个不同的实验数据集进行验证,这些数据集涵盖了大型复合物、膜蛋白和小型单链蛋白,我们的结果始终展示了以最小的人为偏见有效且高效地解决构象异质性。通过将原子模型见解与冷冻电子显微镜数据相结合,cryoSTAR代表了一步重要的进展,为深入理解动态生物过程铺平了道路。 1{}^{1}

Introduction

引言

Single particle cryo-electron microscopy (cryo-EM) is a structural biology tool that can directly observe the conformational heterogeneity of each biomolecule, that each dataset contains many 2D projections of 3D structures from potentially different conformational states’. Traditional algorithms (e.g., 3D classification) treat the heterogeneity in the dataset as discrete clusters and assign each particle to the best class 26{}^{2 - 6} . However,in many real datasets,heterogeneity often comes from conformational dynamics,a continuous process. Using traditional algorithms often results in the 3D density maps blurry in the flexible regions.

单颗粒冷冻电子显微镜(cryo-EM)是一种结构生物学工具,可以直接观察每个生物分子的构象异质性,每个数据集包含来自潜在不同构象状态的3D结构的许多2D投影。传统算法(例如,3D分类)将数据集中的异质性视为离散簇,并将每个粒子分配给最佳类别 26{}^{2 - 6}。然而,在许多真实数据集中,异质性通常来自构象动态,这是一个连续过程。使用传统算法往往导致在柔性区域的3D密度图模糊不清。

A few algorithms in recent years have been developed to resolve continuous heterogeneity from cryo-EM datasets. For example, principal component analysis (PCA) and its variants have been used to describe the variability within the dataset, which model the heterogeneity as a linear combination of a few bases710{\text{bases}}^{7 - {10}} . To achieve more expressive power with nonlinearity,deep learning-based methods were developed to map such heterogeneity onto nonlinear manifold embeddings. For example,cryoDRGN 11{}^{11} and cryoDRGN2' use a variational autoencoder (VAE) ' Based approach to map the heterogeneity within the dataset to a latent space. A generative decoder is used to generate a 3D volume given a sampled point from the latent space. On the other hand,3DFlex 14{}^{14} explicitly models the motion of flexible regions by learning a 3D deformation field and optimizing a canonical density, while encouraging local smoothness and rigidity. Nevertheless, these methods approach the continuous heterogeneity issue solely from

近年来开发了一些算法,以解决来自cryo-EM数据集的连续异质性。例如,主成分分析(PCA)及其变体已被用于描述数据集中的变异性,它将异质性建模为少数几个的线性组合 bases710{\text{bases}}^{7 - {10}}。为了实现更强的非线性表达能力,开发了基于深度学习的方法,将这种异质性映射到非线性流形嵌入上。例如,cryoDRGN 11{}^{11} 和 cryoDRGN2 使用变分自编码器(VAE)的方法将数据集中的异质性映射到潜在空间。生成解码器用于根据从潜在空间采样的点生成3D体积。另一方面,3DFlex 14{}^{14} 通过学习3D变形场并优化典型密度,同时鼓励局部平滑性和刚性,明确建模柔性区域的运动。然而,这些方法仅从


1{}^{1} The short version of this paper has been accepted by the NeurIPS workshop on New Frontiers of AI for Drug Discovery and Development.

1{}^{1} 本文的简短版本已被NeurIPS关于药物发现与开发新前沿的研讨会接受。


bioRxiv preprint doi: doi.org/10.1101/202…; this version posted December 7, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

bioRxiv 预印本 doi: doi.org/10.1101/202… bioRxiv 永久展示该预印本的许可。该文档根据 CC-BY-NC-ND 4.0 国际许可协议提供。

a computer vision perspective, without leveraging any prior knowledge that could be used as structural constraints.

从计算机视觉的角度出发,不利用任何可以作为结构约束的先验知识。

Some recent works tried to incorporate information from the atomic model into the pipeline, or to directly output coarse-grained (CG) atomic models for better interpretation. For example,Chen et al. 15{}^{15} explore the possibility of explicitly modeling atoms or residues with a Gaussian density as a follow up work of e2gmm 16{}^{16} . Other methods try to decompose heterogeneity into a few bases using normal mode analysis (NMA)17,18{\left( \mathrm{{NMA}}\right) }^{{17},{18}} or Zernike polynomials 19{}^{19} . The atomic-level or residue-level information often helps provide interpretation by offering sensible models with motions. However, these methods either only find relatively small continuous motions 17,19{}^{{17},{19}} ,or are only verified on synthetic data 18,20,21{}^{{18},{20},{21}} . In particular,NMA is more suitable for finding the possible fluctuation rather than representing the real,complex motion 22{}^{22} .

一些近期的研究尝试将原子模型中的信息纳入流程,或直接输出粗粒度(CG)原子模型以便于更好的解释。例如,Chen 等人 15{}^{15} 探讨了明确建模原子或残基的可能性,采用高斯密度作为 e2gmm 16{}^{16} 的后续工作。其他方法尝试使用正常模式分析 (NMA)17,18{\left( \mathrm{{NMA}}\right) }^{{17},{18}} 或 Zernike 多项式 19{}^{19} 将异质性分解为几个基底。原子级或残基级的信息通常通过提供合理的运动模型来帮助解释。然而,这些方法要么仅找到相对较小的连续运动 17,19{}^{{17},{19}},要么仅在合成数据上得到验证 18,20,21{}^{{18},{20},{21}}。特别是,NMA 更适合寻找可能的波动,而不是表示真实的复杂运动 22{}^{22}

In this paper, we introduce cryoSTAR (Structural Regularization), a deep neural network model that resolves continuous conformational heterogeneity from cryo-EM datasets by generating both density maps and reasonable coarse-grained (CG) models for different conformations. Our method requires an initial atomic model as the reference, whose structural information is used to properly regularize the inferred conformational dynamics. This enables us to correctly preserve the local structures, narrowing the search space by avoiding fallacious solutions, achieving better and faster convergence. The learned conformational heterogeneity allows for the concurrent generation of density maps and coarse-grained models. Notably, the density map can be used to evaluate and validate the conclusions from the coarse-grained models.

在本文中,我们介绍了 cryoSTAR(结构正则化),这是一种深度神经网络模型,通过生成不同构象的密度图和合理的粗粒度(CG)模型,从 cryo-EM 数据集中解决连续构象异质性。我们的方法需要一个初始原子模型作为参考,其结构信息用于适当地正则化推断出的构象动态。这使我们能够正确保留局部结构,通过避免错误的解决方案来缩小搜索空间,从而实现更好更快的收敛。学习到的构象异质性允许同时生成密度图和粗粒度模型。值得注意的是,密度图可以用于评估和验证粗粒度模型的结论。

We validate cryoSTAR on a synthetic dataset with known ground truth (Methods and Extended Data Fig. 1) and apply it to four public experimental datasets. On the pre-catalytic spliceosome 23{}^{23} (EMPIAR-10180) and the U4/U6.U5 tri-snRNP 24{}^{24} (EMPIAR-10073),we recover the motions that generally agree with other methods11,14{\text{methods}}^{{11},{14}} ,and generate both density maps and reasonable coarse-grained models with the corresponding dynamics. We further demonstrate the effect of using incomplete and slightly erroneous atomic models as reference. Despite the potential biases in the generated coarse-grained model, the density maps are robust and can serve as a reliable tool for validating the coarse-grained models. We then show the effectiveness of cryoSTAR on the TRPV1 channel 25{}^{25} (EMPIAR-10059), a small membrane protein. Without excluding the lipid nanodisc region by manual masking or particle subtraction, cryoSTAR successfully resolves its conformational heterogeneity. Finally, we show the performance of cryoSTAR on α-latrocrustatoxin (α-LCT) 26{}^{26} (EMPIAR-10827), a small protein with a molecular weight of 130 kDa. We resolve the continuous motion that agrees with the original paper found by discrete 3D classification 26{}^{26} , and also uncover a different type of flexible motion. Structural regularization is especially beneficial in challenging cases, such as resolving continuous motions in membrane proteins or in smaller proteins. Our experiments suggest that cryoSTAR is a powerful tool for solving the continuous conformational heterogeneity from cryo-EM images.

我们在一个具有已知真实值的合成数据集上验证了 cryoSTAR(方法和扩展数据图 1),并将其应用于四个公共实验数据集。在前催化剪接体 23{}^{23} (EMPIAR-10180) 和 U4/U6.U5 三核糖核蛋白 24{}^{24} (EMPIAR-10073) 上,我们恢复的运动通常与其他 methods11,14{\text{methods}}^{{11},{14}} 一致,并生成相应动态的密度图和合理的粗粒度模型。我们进一步展示了使用不完整和稍微错误的原子模型作为参考的效果。尽管生成的粗粒度模型可能存在潜在偏差,但密度图是稳健的,可以作为验证粗粒度模型的可靠工具。然后,我们展示了 cryoSTAR 在 TRPV1 通道 25{}^{25} (EMPIAR-10059) 上的有效性,这是一种小型膜蛋白。在不通过手动掩膜或粒子减法排除脂质纳米盘区域的情况下,cryoSTAR 成功地解析了其构象异质性。最后,我们展示了 cryoSTAR 在 α-拉托克鲁斯塔毒素 (α-LCT) 26{}^{26} (EMPIAR-10827) 上的表现,这是一种分子量为 130 kDa 的小蛋白。我们解析了与通过离散 3D 分类 26{}^{26} 发现的原始论文一致的连续运动,并揭示了另一种类型的柔性运动。在具有挑战性的情况下,例如解析膜蛋白或较小蛋白中的连续运动,结构正则化尤其有益。我们的实验表明,cryoSTAR 是解决来自冷冻电子显微镜图像的连续构象异质性的强大工具。

Results

结果

CryoSTAR

CryoSTAR

CryoSTAR models the conformation heterogeneity in cryo-EM particles as the deformations of a reference atomic model Sref RN×3{S}_{\text{ref }} \in {\mathbb{R}}^{N \times 3} ,where NN is the number of residues,and applies proper regularization to ensure the integrity of the structures (Fig. 1a). With a variational autoencoder (VAE) 13{}^{13} as the neural network architecture, cryoSTAR compresses the conformational heterogeneity into a latent variable, and deforms a coarse-grained model Sref {S}_{\text{ref }} ,which is pre-computed from a reference atomic model provided as an input (Fig. 1b). Specifically,given an image IRD×DI \in {\mathbb{R}}^{D \times D} from the cryo-EM dataset,cryoSTAR uses a VAE to predict the corresponding deformation ΔS^\Delta \widehat{S} that modifies the reference structure to the deformed structure S^=Sref +ΔS^\widehat{S} = {S}_{\text{ref }} + \Delta \widehat{S} . The deformed model is then converted to a Gaussian density VRD×D×DV \in {\mathbb{R}}^{D \times D \times D} ,which is a combination of Gaussian functions mapping the volumetric density map to the CG atomic model (see Methods for details). The 2D projection of this Gaussian density can be computed with the given orientation angles and CTF parameters (Fig. 1b). The projection is compared with the input particle image, regularized by the structural constraints derived from the given atomic model (Fig. 1a).

CryoSTAR 将冷冻电子显微镜粒子的构象异质性建模为参考原子模型的变形 Sref RN×3{S}_{\text{ref }} \in {\mathbb{R}}^{N \times 3},其中 NN 是残基的数量,并应用适当的正则化以确保结构的完整性(图 1a)。使用变分自编码器(VAE) 13{}^{13} 作为神经网络架构,CryoSTAR 将构象异质性压缩为潜在变量,并对粗粒度模型 Sref {S}_{\text{ref }} 进行变形,该模型是从提供的参考原子模型预先计算得出的(图 1b)。具体而言,给定来自冷冻电子显微镜数据集的图像 IRD×DI \in {\mathbb{R}}^{D \times D},CryoSTAR 使用 VAE 预测相应的变形 ΔS^\Delta \widehat{S},该变形将参考结构修改为变形结构 S^=Sref +ΔS^\widehat{S} = {S}_{\text{ref }} + \Delta \widehat{S}。然后,将变形模型转换为高斯密度 VRD×D×DV \in {\mathbb{R}}^{D \times D \times D},这是将体积密度图映射到 CG 原子模型的高斯函数的组合(详见方法)。可以根据给定的方向角和 CTF 参数计算该高斯密度的二维投影(图 1b)。该投影与输入粒子图像进行比较,并通过从给定原子模型导出的结构约束进行正则化(图 1a)。

bioRxiv preprint doi: doi.org/10.1101/202…; this version posted December 7, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

bioRxiv 预印本 doi: doi.org/10.1101/202… 2023 年 12 月 7 日发布。该预印本的版权持有者(未经过同行评审认证)是作者/资助者,他们已授予 bioRxiv 永久展示该预印本的许可。该预印本根据 CC-BY-NC-ND 4.0 国际许可证提供。

Structural regularization in cryoSTAR

CryoSTAR 中的结构正则化

CryoSTAR requires a reference atomic model as its structural prior, which differs from most existing methods. Incorporating the atomic model into heterogenous reconstruction allows imposing meaningful constraints on the potential motions of the target. This structural prior and the constraints are pivotal for tackling the conformational heterogeneity and distinguish cryoSTAR from other methods. In particular, the atomic model assists the algorithm in filtering out evidently incorrect dynamics, thereby facilitating better solutions and rapid convergence.

CryoSTAR 需要一个参考原子模型作为其结构先验,这与大多数现有方法不同。将原子模型纳入异质重建允许对目标的潜在运动施加有意义的约束。这种结构先验和约束对于解决构象异质性至关重要,并使 CryoSTAR 与其他方法区分开来。特别是,原子模型帮助算法过滤明显不正确的动态,从而促进更好的解决方案和快速收敛。

CryoSTAR uses a structural regularization under three basic assumptions (Fig. 1a):

CryoSTAR 在三个基本假设下使用结构正则化(图 1a):

  1. Two adjacent residues in the same chain should always remain connected. CryoSTAR uses a continuous loss Lcont {\mathcal{L}}_{\text{cont }} to enforce this:

  2. 同一链中的两个相邻残基应始终保持连接。CryoSTAR 使用连续损失 Lcont {\mathcal{L}}_{\text{cont }} 来强制执行这一点:

where dij{d}_{ij} and d^ij{\widehat{d}}_{ij} denote the distance between the ii -th and jj -th residues in the reference structure Sref {S}_{\text{ref }} and the predicted structure S^\widehat{S} ,respectively.

其中 dij{d}_{ij}d^ij{\widehat{d}}_{ij} 分别表示参考结构 Sref {S}_{\text{ref }} 和预测结构 S^\widehat{S} 中第 ii 个和第 jj 个残基之间的距离。

  1. Residues should not become too close after the predicted deformation. CryoSTAR uses a clash loss Lclash {\mathcal{L}}_{\text{clash }} to penalize if clashing happens:

  2. 残基在预测变形后不应过于接近。CryoSTAR 使用冲突损失 Lclash {\mathcal{L}}_{\text{clash }} 来惩罚发生冲突的情况:

where Pclash {P}_{\text{clash }} denotes the set of residue pairs that experience collision during training,(i,j)is an index pair numbering the residues in the structure, and the constant value kclash{k}_{\mathrm{{clash}}} is a predetermined cutoff.

其中 Pclash {P}_{\text{clash }} 表示在训练过程中经历碰撞的残基对的集合,(i,j) 是编号结构中残基的索引对,常数值 kclash{k}_{\mathrm{{clash}}} 是预定的截止值。

  1. Local structures should be as rigid as possible. CryoSTAR builds an elastic network (EN) from the reference atomic model,and uses an elastic network loss LEN{\mathcal{L}}_{\mathrm{{EN}}} to encourage this local rigidity:

  2. 局部结构应尽可能刚性。CryoSTAR 从参考原子模型构建弹性网络(EN),并使用弹性网络损失 LEN{\mathcal{L}}_{\mathrm{{EN}}} 来鼓励这种局部刚性:

where PEN{P}_{\mathrm{{EN}}} is a set of edges for building the elastic network. Since the elastic network may be subject to changes during conformational dynamics, cryoSTAR adaptively selects the edges presented in the elastic network for regularization in training (see Methods for details).

其中 PEN{P}_{\mathrm{{EN}}} 是用于构建弹性网络的边的集合。由于弹性网络可能在构象动态过程中发生变化,CryoSTAR 在训练中自适应地选择弹性网络中呈现的边进行正则化(详见方法部分)。

To summarize, the structural regularization in cryoSTAR enforces the continuity of a chain, prevents the residues from clashing, and encourages local rigidity with an adaptive elastic network (Fig. 1a). This structural regularization is critical for the model to output reasonable coarse-grained models and recover the correct dynamics.

总结来说,cryoSTAR 中的结构正则化强制链的连续性,防止残基发生冲突,并通过自适应弹性网络鼓励局部刚性(图 1a)。这种结构正则化对于模型输出合理的粗粒度模型和恢复正确的动力学至关重要。

Generating density maps with minimal bias

生成最小偏差的密度图

After the VAE is fully trained in Phase I, the latent variable is extracted and used to train a volume decoder to obtain a neural network representation of the densities in Phase II (Fig. 1c). The optimization of the density maps is solely guided by the input particles (Fig. 1d), remaining unaffected by the structural prior and regularization. This minimizes the reference bias on the output density. Therefore, the generated density maps can be used to evaluate and corroborate the coarse-grained models from Phase I. As a result, cryoSTAR can simultaneously generate both reasonable coarse-grained models and volumetric density maps at different conformations (Fig. 1a), helping the users evaluate and interpret the results at different levels.

在第一阶段 VAE 完全训练后,提取潜变量并用于训练体积解码器,以获得第二阶段的密度神经网络表示(图 1c)。密度图的优化完全由输入粒子引导(图 1d),不受结构先验和正则化的影响。这最小化了输出密度的参考偏差。因此,生成的密度图可用于评估和证实第一阶段的粗粒度模型。因此,cryoSTAR 可以同时生成合理的粗粒度模型和不同构象下的体积密度图(图 1a),帮助用户在不同层次上评估和解释结果。

bioRxiv preprint doi: doi.org/10.1101/202…; this version posted December 7, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

bioRxiv 预印本 doi: doi.org/10.1101/202… 2023 年 12 月 7 日发布。该预印本的版权持有者(未经过同行评审认证)是作者/资助者,他们已授予 bioRxiv 在永久性展示该预印本的许可。该预印本根据 CC-BY-NC-ND 4.0 国际许可协议提供。

CryoSTAR finds conformational heterogeneity of large complexes

CryoSTAR 发现大复合物的构象异质性

The yeast pre-catalytic B complex spliceosome is a big complex with more than 10,000 residues including amino acids and nucleotides. The cryo-EM dataset (EMPIAR-10180) includes conformational dynamics on the Sf3b and the helicase regions (Fig. 2a),as resolved by other methods 11,19{}^{{11},{19}} ,and has been used as a benchmark to test continuous heterogeneity algorithms. Such continuous motions are impossible to uncover by traditional 3D classification, which classifies the particles into discrete clusters of conformation. We use the atomic model in Plaschka et al. 23{}^{23} (PDB: 5NRL) as the reference atomic model to train cry-oSTAR. By traversing along the first principal component of the latent space (Fig. 2b), cryoSTAR reveals the motion of the SF3b and the helicase regions (Fig. 2a, 2c and Supplementary Video 1). While the SF3b can bend towards the “body”, the helicase region can curve down to the “foot”. This result is generally consistent with the results found by other methods 11,19{}^{{11},{19}} . Furthermore,the generated coarse-grained models do not exhibit unnatural deformations or disruptions, and more importantly, their motion is corroborated by the particle density maps (Fig. 2a and 2c). Compared to the other method that also outputs atomic models 19{}^{19} ,our result shows a more obvious motion and more reasonable coarse-grained models without local shearing effects, e.g., the local structures of the alpha helices remain intact. For example, our findings suggest that the alpha helix of Spp381 shifts toward the foot domain, which can be verified by the associated density maps, showing a seamless transition without any discernible artifacts (Fig. 2c). The reference atomic model (PDB: 5NRL) also contains the U2 snRNP region, which exhibits weaker density in the consensus map, potentially due to compositional heterogeneity 11{}^{11} . This region is also evident in the particle density maps produced by cryoSTAR, with a similar diminished intensity compared to other areas. CryoSTAR suggests a possible motion of the U2 core in the generated coarse-grained models; however, such movement cannot be fully supported by the particle density maps, which do not reveal obvious motions (Extended Data Fig. 2). Resolving the dynamics at different levels provides the users with rich information for better evaluation and interpretation.

酵母前催化 B 复合体剪接体是一个大型复合体,包含超过 10,000 个残基,包括氨基酸和核苷酸。冷冻电子显微镜数据集 (EMPIAR-10180) 包含 Sf3b 和解旋酶区域的构象动态(图 2a),这些动态通过其他方法得以解析 11,19{}^{{11},{19}},并已被用作测试连续异质性算法的基准。这种连续运动是传统三维分类无法揭示的,传统三维分类将粒子分类为离散的构象簇。我们使用 Plaschka 等人 23{}^{23} 的原子模型(PDB: 5NRL)作为参考原子模型来训练 cryoSTAR。通过沿潜在空间的第一个主成分进行遍历(图 2b),cryoSTAR 揭示了 SF3b 和解旋酶区域的运动(图 2a、2c 和补充视频 1)。虽然 SF3b 可以向“主体”弯曲,但解旋酶区域可以向“脚”弯曲。该结果与其他方法 11,19{}^{{11},{19}} 得到的结果基本一致。此外,生成的粗粒度模型没有表现出不自然的变形或破坏,更重要的是,它们的运动得到了粒子密度图的证实(图 2a 和 2c)。与其他同样输出原子模型的方法 19{}^{19} 相比,我们的结果显示出更明显的运动和更合理的粗粒度模型,没有局部剪切效应,例如,α 螺旋的局部结构保持完整。例如,我们的研究结果表明,Spp381 的 α 螺旋向脚域移动,这可以通过相关的密度图进行验证,显示出无缝过渡,没有任何明显的伪影(图 2c)。参考原子模型(PDB: 5NRL)还包含 U2 snRNP 区域,该区域在共识图中表现出较弱的密度,可能是由于成分异质性 11{}^{11}。在 cryoSTAR 生成的粒子密度图中,该区域也明显存在,与其他区域相比,强度减弱。CryoSTAR 在生成的粗粒度模型中建议 U2 核心可能的运动;然而,这种运动无法完全通过粒子密度图得到支持,后者并未揭示明显的运动(扩展数据图 2)。在不同层次上解析动态为用户提供了丰富的信息,以便更好地评估和解释。

The U4/U6.U5 tri-snRNP is a considerable part of the spliceosome before activation and is known to have flexible regions especially in the head and arm parts 14,24{}^{{14},{24}} (Fig. 3a),which cannot be resolved by traditional 3D classification. An essentially complete atomic model (PDB: 5GAN) was previously built from the consensus density map solved by cryo-EM 24{}^{24} . Using this atomic model as the input,we train cryoSTAR on the 138,899 particles in the EMPIAR-10073 dataset. We then sample along the first and second principal component of the learned latent space (Fig. 3b and Extended Data Fig. 3b), from which we generate the corresponding coarse-grained models and density maps (Fig. 3c and Extended Data Fig. 3c). Notably, cryoSTAR resolves the motion of the head domains of tri-snRNP in this dataset, which can bend towards the foot domain, as supported by the coarse-grained models and density maps (Fig. 3c, Extended Data Fig. 3c and Supplementary Video 2). Moreover, cryoSTAR also finds a possible rotation of the arm domain along the first principal component of the latent variable, which is uncorrelated with the bending of the head domain (Extended Data Fig. 3c and Supplementary Video 2). Nevertheless, the generated density maps do not exhibit a sufficiently close alignment to substantiate this finding, necessitating further investigation.

U4/U6.U5 三聚体是剪接体在激活前的重要组成部分,尤其在头部和臂部区域具有灵活性 14,24{}^{{14},{24}} (图 3a),这些特征无法通过传统的三维分类解析。之前基于冷冻电子显微镜 (cryo-EM) 解算的共识密度图构建了一个基本完整的原子模型 (PDB: 5GAN) 24{}^{24}。我们以该原子模型为输入,在 EMPIAR-10073 数据集中对 138,899 个粒子训练 cryoSTAR。然后,我们沿着学习到的潜在空间的第一和第二主成分进行采样 (图 3b 和扩展数据图 3b),从中生成相应的粗粒度模型和密度图 (图 3c 和扩展数据图 3c)。值得注意的是,cryoSTAR 解析了该数据集中三聚体头域的运动,头域可以向脚域弯曲,这得到了粗粒度模型和密度图的支持 (图 3c、扩展数据图 3c 和补充视频 2)。此外,cryoSTAR 还发现臂域沿潜在变量的第一主成分可能存在旋转,这与头域的弯曲无关 (扩展数据图 3c 和补充视频 2)。然而,生成的密度图并未表现出足够的对齐程度来证实这一发现,因此需要进一步的研究。

CryoSTAR is robust to imperfect reference atomic models

CryoSTAR 对不完美的参考原子模型具有鲁棒性

Using an atomic model as the initial reference may incorporate a strong bias in data processing. To test the robustness of cryoSTAR to imperfect reference atomic models, we manually edit the input atomic models of both the yeast pre-catalytic spliceosome and the U4/U6.U5 tri-snRNP and evaluate if the results are consistent. For the pre-catalytic spliceosome, we manually remove SmG in U5 snRNP (71 amino acids) from the input atomic model (Fig. 4a), and use the edited model as the input reference atomic

使用原子模型作为初始参考可能会在数据处理过程中引入强烈的偏差。为了测试 cryoSTAR 对不完美参考原子模型的鲁棒性,我们手动编辑了酵母前催化剪接体和 U4/U6.U5 三小核 RNA 复合物的输入原子模型,并评估结果是否一致。对于前催化剪接体,我们手动从输入原子模型中移除了 U5 snRNP 中的 SmG(71 个氨基酸)(图 4a),并使用编辑后的模型作为输入参考原子模型。

—— 更多内容请到Doc2X翻译查看—— —— For more content, please visit Doc2X for translations ——