创新背景
确定生物分子的三维形状是现代生物学和医学发现中最困难的问题之一。公司和研究机构经常花费数百万美元来确定分子结构——即使是如此巨大的努力也往往不成功。
创新过程
斯坦福大学的博士生Stephan Eismann和Raphael Townshend在计算机科学副教授Ron Dror的指导下,利用巧妙的新机器学习技术,开发了一种通过计算预测精确结构来克服这一问题的方法。最值得注意的是,他们的方法即使只从几个已知的结构中学习也成功了,这使得它适用于那些结构最难通过实验确定的分子类型。
他们的工作在两篇详细介绍RNA分子和多蛋白复合体应用的论文中得到了展示,这两篇论文分别于2021年8月27日和2020年12月发表在《科学》(Science)和《蛋白质》(Proteins)上。这篇发表在《科学》杂志上的论文是与斯坦福生物化学副教授Rhiju Das实验室合作的。
结构生物学是一门研究分子形状的学科,它有这样一个说法:结构决定功能。研究人员设计的算法可以预测准确的分子结构,这样一来,科学家就可以解释不同的分子是如何工作的,应用范围从基础生物学研究到知情的药物设计实践。
蛋白质是执行各种功能的分子机器。为了执行它们的功能,蛋白质通常会与其他蛋白质结合,这两篇论文的共同主要作者Eismann表示,如果知道一对蛋白质与一种疾病有关,就会知道它们如何在3D中相互作用,进而可以尝试用一种药物来针对这种相互作用。
研究人员没有指定是什么使结构预测更准确或更不准确,而是让算法自己发现这些分子特征。他们这样做是因为他们发现,提供此类知识的传统技术可以使算法偏向某些特征,从而阻止它找到其他有信息的特征。艾斯曼表示,算法中这些手工制作的特征的问题在于,算法会倾向于选择这些特征的人认为重要的东西,而会导致错过一些需要做得更好的信息。
在蛋白质方面取得成功后,研究人员接着将他们的算法应用到另一类重要的生物分子——rna上。他们在一系列“RNA谜题”中测试了自己的算法,这些“RNA谜题”来自他们所在领域的一项长期竞赛,在每一次测试中,这个工具都优于所有其他谜题参与者,而且它不是专门为RNA结构设计的。研究人员很兴奋地看到他们的方法可以应用到其他地方,他们已经在蛋白质复合物和RNA分子上取得了成功。
机器学习在当时取得的大多数重大进展都需要大量数据来进行训练。在训练数据很少的情况下,这种方法成功的事实表明,相关的方法可以解决许多领域中数据稀缺的未解决问题。
具体来说,对于结构生物学,研究小组表示,就有待取得的科学进展而言,他们只是触及了表面。
创新价值
这种新型机器学习算法可以准确预测药物靶标和其他重要生物分子的3D形状,这样一来,科学家就可以解释不同的分子是如何工作的,应用范围从基础生物学研究到知情的药物设计实践。
创新关键点
研究人员利用巧妙的新机器学习技术,开发了一种通过计算预测精确结构来克服这一问题的方法。
Innovative machine learning methods to accurately predict biological structure
Using ingenious new machine learning techniques, Stanford doctoral students Stephan Eismann and Raphael Townshend, under the guidance of associate Professor of computer Science Ron Dror, have developed a way to overcome this problem by computationally predicting exact structures.
Most notably, their method succeeded in learning from even a few known structures, making it applicable to those types of molecules whose structures are most difficult to determine experimentally.
Their work is demonstrated in two papers detailing the application of RNA molecules and multiprotein complexes, published in Science on August 27, 2021, and Proteins in December 2020. The Science paper was published in collaboration with the laboratory of Rhiju Das, associate professor of biochemistry at Stanford.
Structural biology is the study of the shape of molecules, and it has this saying: structure determines function.
The researchers designed algorithms that predict accurate molecular structures, so scientists can explain how different molecules work, with applications ranging from basic biological studies to informed drug design practices.
Proteins are molecular machines that perform various functions. Proteins typically bind to other proteins in order to perform their functions, said Eismann, co-lead author of both papers. "If you know a pair of proteins is associated with a disease and you know how they interact in 3D, you can try to target that interaction with a drug," Eismann said.
Instead of specifying what makes structure predictions more or less accurate, the researchers left the algorithm to discover these molecular features on its own. They did so because they found that traditional techniques for providing such knowledge could bias the algorithm toward certain features, preventing it from finding other informative features. The problem with these hand-crafted features in the algorithm, Eisman says, is that the algorithm tends to favor what the person who chose them thinks is important, and you may miss some information you need to do better.
Having succeeded with proteins, the researchers then applied their algorithm to another important class of biological molecules: RNA. They tested their algorithm in a series of "RNA puzzles" from a long-running competition in their field, and in each test the tool outperformed all other puzzle participants, and it was not specifically designed for RNA structures. The researchers are excited to see how their approach can be applied elsewhere, having already had success with protein complexes and RNA molecules.
Most of the major advances in machine learning at the time required large amounts of data to train on. The fact that this approach is successful with very little training data shows that related methods can solve the unsolved problem of scarce data in many domains.
Specifically, for structural biology, the team says they've only scratched the surface in terms of the scientific advances yet to be made.
智能推荐
使用“高速光片”显微镜可实时观察活组织细胞
2022-08-08使用高速3D显微镜MediSCAPE捕获活组织结构的图像,实时检测组织健康状况。
涉及学科涉及领域研究方向利用生物传感器和“间歇定量”技术,可量化精准测量细胞钙浓度
2022-08-08结合光致变色单荧光团生物传感器和间歇定量绝对测量开发测量细胞活性的新技术,可以轻松量化测量生物细胞中的钙浓度。
涉及学科涉及领域研究方向创新利用注射微球修复衰竭心脏
2022-08-02伦敦大学学院的研究人员最新发现,生物可降解微球可用于输送由干细胞产生的心脏细胞,以修复心脏病发作后受损的心脏。
涉及学科涉及领域研究方向生物技术创新 | 利用生物光伏为小型设备供电
2022-06-30利用藻类的光合作用生成生物光伏进行不间断的光电转化提供电力。
涉及学科涉及领域研究方向