Introduction:
Currently, drug property prediction mainly uses mono-modal deep learning methods, which limits the comprehensive understanding of drug molecules. To overcome the limitations, we construct multimodal deep learning models, which use different models to learn molecular representations and adapt five fusion methods to capture feature information. Experiments have proven that the MMFDL model is superior to the mono-modal models in accuracy, reliability and anti-noise ability. This method provides an effective tool for drug development.
Methods and Problem:
To better understand drug molecules and accurately predict molecular properties, we construct multimodal deep learning model. We convert drug molecules into three molecular representations, SMILES-encoded vectors, ECFP fingerprints, and molecular graphs. To process the modal information, Transformer-Encoder, BiGRU, and GCN are utilized for feature learning respectively, which can naturally occurring bioinformatics information. Then, we adapt five fusion methods to capture the specific features and the leverage the contribution of each modal information better.
Conclusions:
We conducted comparisons on mono-modal and multimodal learning by using six single-molecule datasets. The result indicates that the proposed multimodal fusion deep learning model not only improves the prediction accuracy and stability but also increases the ability of noise resistance by leveraging different sources of information. Tri_SGD fusion method, one of the multi-modal fusion methods, further improves the performance of multimodal learning models in dealing with uncorrelated data sources, such as chemical language and molecular graphs. Moreover, we demonstrate the generalization ability in predicting binding affinity for protein-ligand complex.
简介:
目前,药物性质预测主要采用单模态深度学习方法,限制了对药物分子的全面理解。为了克服这些限制,我们构建了多模态深度学习模型,该模型使用不同的模型来学习分子表示并采用五种融合方法。实验证明 多模态模型在准确性、可靠性和抗噪声能力方面均优于单模态模型。该方法为药物开发提供了有效的工具。
方法和问题:
为了更好地理解药物分子并准确的预测分子性质,我们将药物分子转换为三种分子表示:SMILSE-encoded vectors、ECFP指纹和分子图。为了处理模态信息,分别利用Transformer-Encoder、BiGRU和GCN进行特征学习,从而得到生物信息学的信息。采用五种融合方法来获取特定特征并更好的利用每种模态信息的贡献。
结论:
我们使用六个单分子数据集对单模态和多模态学习进行了比较。结果表明,所提出的多模态融合深度学习模型不仅提高了预测精度和稳定性,而且通过利用不同的信息源提高了抗噪声能力。Tri_SGD融合方法作为多模态融合方法之一,进一步提高了多模态学习模型在处理不相关数据源(化学语言和分子图)时的性能。此外,我们证明了预测蛋白质-配体复合物的结合亲和力的泛化能力。