Sequence based Learning for Solubility Prediction from Molecular Smiles
Main Article Content
Abstract
During the process of drug discovery, the molecular property prediction of drugs is one of the time-consuming steps. The molecular property prediction includes solubility, toxicity etc., the proposed Bi-LSTM approach which helps in predicting the solubility of targets identified at the target identification step of drug discovery. SMILES(Simplified Molecular Input Line Entry System) which are molecular sequences are taken as inputs for this sequence-based approach. Outperforming traditional models, the proposed model demonstrates superior performance in predicting solubility from molecular SMILES representations taken from the FreeSolv dataset. The proposed model is achieved a rmse of 1.22. In this process we go through tokenization, where each string is broken into tokens. These tokens are embedded into the embedding layer to convert into dense vectors. We train our data and test it. Then we apply our model to get the outputs.