Introduction:
Fragment-based drug design is an emerging technology in pharmaceutical research and development, and the identification and quantitative characterization of molecular fragments are crucial in this field. Machine learning based on decision tree algorithms can be used for identifying important molecular fragments in protein-ligand binding. The approach combines molecular fingerprints and decision tree models to quantitatively characterize the feature importance and reliably extract significant molecular fragments. The feasibility of the method in predicting protein-ligand binding affinity has been verified.
Methods and Problem:
The study proposes a strategy that encodes the three-dimensional structures of protein-ligand complexes using extended-connectivity fingerprints (ECFP) and utilizes Random Forest, XGBoost, and LightGBM decision tree models to quantify the importance of features, thereby extracting reliable and significant molecular fragments. The results demonstrate that the extracted molecular fragments contribute significantly and consistently to the binding affinity, even with a small sample size. Despite the absence of location and distance information for molecular fragments in ECFP, three-dimensional visualization combined with the reverse ECFP process reveals that the majority of the extracted fragments are located at the binding interface of the protein and the ligand. This alignment with the critical distance constraints required for binding affinity further supports the reliability of the strategy for identifying important molecular fragments.
Conclusions:
This decision tree-based method enables rapid and accurate identification of important molecular fragments in protein-ligand binding, providing strong support for drug design and optimization. The study showcases the potential of artificial intelligence technology in the field of drug design and offers effective tools and methods for high-throughput screening and drug development. The application of this method is expected to accelerate progress in drug research and make significant contributions to disease treatment and the health industry.
简介:
在分子药物设计中,基于分子碎片的药物设计是一项新兴技术,该技术对药物研发具有重要意义。其中一个关键环节是对分子碎片的鉴定和定量表征。决策树的方法可用于识别蛋白质与配体结合中的重要分子碎片。该方法是基于分子指纹和决策树算法的结合,通过定量特征表征,实现对重要分子碎片的鉴定。研究验证了该方法在预测蛋白质-配体结合亲和力方面的可行性。
方法和问题:
本研究提出了一种策略,利用扩展连接指纹(ECFP)对蛋白质-配体复合物的三维结构进行编码,然后采用三个决策树模型(随机森林、XGBoost和LightGBM)对特征重要性进行定量表征,从而提取具有高可靠性的重要分子碎片。研究结果表明,即使在小样本情况下,提取的分子碎片对结合亲和力的贡献明显且一致。尽管ECFP中没有分子碎片的位置和距离信息,但通过三维可视化和反向ECFP过程,研究发现大多数提取的分子碎片位于蛋白质和配体的结合界面。这与结合亲和力所需的距离约束相吻合,进一步验证了该策略鉴定重要分子碎片的可靠性。
结论:
通过这种基于决策树的方法,能够快速准确地确定蛋白质-配体结合中的重要分子碎片,为药物设计和优化提供了有力的支持。这项研究展示了人工智能技术在药物设计领域的潜力,为高通量筛选和药物研发过程提供了有效的工具和方法。这一方法的应用有望加速药物研究的进展,为疾病治疗和健康产业的发展做出重要贡献。