Improving classical scoring functions using random forest

Afifi, KarimFarouk AI-Sadek, AhmedImproving classical scoring functions using random forestThe non-additivity of free energy terms' contributions in bindingWILEY2018AUTODOCKAUTODOCK VINAdockingdrug designRF-SCOREscoringscoring functionvirtual screeningX-SCOREPROTEIN-LIGAND-BINDINGAFFINITY PREDICTIONMOLECULAR DOCKINGPDBBIND DATABASEACCURACYMy UniversityMy University2019-11-062019-11-062018enArticle1747-02771747-0285https://doi.org/10.1111/cbdd.13206https://www.ncbi.nlm.nih.gov/pubmed/29655201https://doi.org/10.1111/cbdd.13206Despite recent efforts to improve the scoring performance of scoring functions, accurately predicting the binding affinity is still a challenging task. Therefore, different approaches were tried to improve the prediction performance of four scoring functions (X-SCORE, VINA, AUTODOCK, and RF-SCORE) by substituting the linear regression model of classical scoring function by random forest to examine the performance improvement if an additive functional form is not imposed, and by combining different scoring functions into hybrid ones. The datasets were derived from the PDBbind-CN database version 2016. When evaluating the original scoring functions on the generic dataset, RF-SCORE has outperformed classical scoring functions, which shows the superiority of descriptor-based scoring functions. Substituting linear regression as a linear model by random forest as a nonlinear model had largely improved the scoring performance of AUTODOCK and VINA while X-SCORE had only a slight performance increase. All hybrid scoring functions had only a slight improvement-if any-on both of the combined scoring functions, which is not worth the slower calculation time