Fragment-based drug design is an emerging technology in pharmaceutical research and development, and the identification and quantitative characterization of molecular fragments are crucial in this field. Machine learning based on decision tree algorithms can be used for identifying important molecular fragments in protein-ligand binding. The approach combines molecular fingerprints and decision tree models to quantitatively characterize the feature importance and reliably extract significant molecular fragments. The feasibility of the method in predicting protein-ligand binding affinity has been verified.
Methods and Problem:
The study proposes a strategy that encodes the three-dimensional structures of protein-ligand complexes using extended-connectivity fingerprints (ECFP) and utilizes Random Forest, XGBoost, and LightGBM decision tree models to quantify the importance of features, thereby extracting reliable and significant molecular fragments. The results demonstrate that the extracted molecular fragments contribute significantly and consistently to the binding affinity, even with a small sample size. Despite the absence of location and distance information for molecular fragments in ECFP, three-dimensional visualization combined with the reverse ECFP process reveals that the majority of the extracted fragments are located at the binding interface of the protein and the ligand. This alignment with the critical distance constraints required for binding affinity further supports the reliability of the strategy for identifying important molecular fragments.
This decision tree-based method enables rapid and accurate identification of important molecular fragments in protein-ligand binding, providing strong support for drug design and optimization. The study showcases the potential of artificial intelligence technology in the field of drug design and offers effective tools and methods for high-throughput screening and drug development. The application of this method is expected to accelerate progress in drug research and make significant contributions to disease treatment and the health industry.