New deep learning model could accelerate the process of discovering new drugs



MIT researchers have developed a deep learning model that can quickly predict the likely 3D shapes of a molecule based on a 2D diagram of its structure. This technology could accelerate drug discovery. Credit: Courtesy of the researchers, published by MIT News

Take some guesswork out of drug research

A deep learning model quickly predicts the 3D shapes of drug-like molecules, which could accelerate drug discovery.

In the search for effective new drugs, scientists look for drug-like molecules that can attach to disease-causing proteins and change their functionality. It is critical that they know the 3D shape of a molecule in order to understand how it attaches to certain surfaces of the protein.

But a single molecule can fold in thousands of different ways, so experimentally solving this puzzle is a time-consuming and expensive process, similar to finding a needle in a molecular haystack.

WITH Researchers are using machine learning to streamline this complex task. You have developed a deep learning model that predicts the 3D shapes of a molecule based solely on a 2D graph of its molecular structure. Molecules are typically shown as small graphs.

Their GeoMol system processes molecules in just seconds and is better than other machine learning models, including some commercial methods. GeoMol could help pharmaceutical companies accelerate the drug discovery process by limiting the number of molecules they have to test in laboratory experiments, says Octavian-Eugen Ganea, postdoc at the Computer Science and Artificial Intelligence Laboratory (CSAIL) and co-lead author of the paper .

“When you think about how these structures move in 3D space, there are really only certain parts of the molecule that are actually flexible, these rotatable bonds. One of the most important innovations in our work is that we are thinking about modeling conformational flexibility like a chemical engineer. It’s really about predicting the potential distribution of rotatable bonds in the structure, ”says Lagnajit Pattanaik, PhD student in the Department of Chemical Engineering and co-lead author of the work.

Other authors include Connor W. Coley, Henri Slezynger Career Development Assistant Professor of Chemical Engineering; Regina Barzilay, the School of Engineering Distinguished Professor of AI and Health at CSAIL; Klavs F. Jensen, the Warren K. Lewis Professor of Chemical Engineering; William H. Green, Hoyt C. Hottel Professor of Chemical Engineering; and senior author Tommi S. Jaakkola, Thomas Siebel Professor of Electrical Engineering in CSAIL and member of the Institute for Data, Systems, and Society. The research results will be presented this week at the conference on neural information processing systems.

Illustration of a molecule

In a molecular graph, the individual atoms of a molecule are represented as nodes and the chemical bonds that connect them are edges.

GeoMol uses a new tool in the field of deep learning, the so-called Message Passing Neural Network, which was specially developed for operation with graphs. The researchers adapted a message-passing neural network to predict certain elements of molecular geometry.

Based on a molecular diagram, GeoMol first predicts the lengths of the chemical bonds between atoms and the angles of these individual bonds. The way the atoms are arranged and connected determines which bonds can rotate.

GeoMol then predicts the structure of each one atom‘s local neighborhood one by one and puts neighboring pairs of rotatable bonds together by calculating the torsion angles and then aligning them. A torsion angle determines the movement of three connected segments, in this case three chemical bonds that connect four atoms.

“Here, the rotatable bonds can assume a huge range of possible values. Using these messaging neural networks allows us to capture many of the local and global environments that affect this prediction. The rotatable bond can take on multiple values, and we want our forecast to reflect this underlying distribution, ”says Pattanaik.

Overcome existing hurdles

Modeling the chirality is a major challenge in predicting the 3D structure of molecules. A chiral molecule cannot be superimposed on its reflection like a pair of hands (no matter how you twist your hands, there is no way their features can be precisely matched). When a molecule is chiral, its mirror image does not interact with the environment in the same way.

This could cause drugs to improperly interact with proteins, which can lead to dangerous side effects. Current machine learning methods often require a long, complex optimization process to ensure that the chirality is correctly identified, says Ganea.

Since GeoMol determines the 3D structure of each bond individually, it explicitly defines the chirality during the prediction process and makes subsequent optimization superfluous.

After making these predictions, GeoMol outputs a number of likely 3-D structures for the molecule.

“What we can do now is take our model and connect it end-to-end to a model that predicts this binding to specific protein surfaces. Our model is not a separate pipeline. It’s very easy to integrate with other deep learning models, ”says Ganea.

A “super fast” model

Researchers tested their model on a data set of molecules and the likely 3-D shapes they could take, developed by Rafael Gomez-Bombarelli, the Jeffrey Cheah Career Development Chair in Engineering, and PhD student Simon Axelrod.

They rated how many of these likely 3-D structures their model was able to capture versus models from machine learning and other methods.

In almost all cases, GeoMol outperformed the other models on all metrics tested.

“We found our model to be super fast, which was really exciting. And most importantly, expect these algorithms to slow down significantly as you add more rotatable bonds. But we didn’t really see that. The speed scales well with the number of rotatable bonds, which holds great promise for using these type of models across the board, especially applications where you are trying to quickly predict the 3-D structures within these proteins, ”says Pattanaik .

In the future, researchers hope to apply GeoMol to the field of high-throughput virtual screening by using the model to determine small molecular structures that would interact with a particular protein. They also want to refine GeoMol with additional training data so that it can more effectively predict the structure of long molecules with many flexible bonds.

“Conformational analysis is a key component of many computational drug design tasks and an important component in advancing machine learning approaches in drug discovery,” said Pat Walters, senior vice president of computation at Relay Therapeutics, who was not involved in research. “I am pleased with the continued advances in this area and I thank MIT for its contribution to broader insights in this area.”

Reference: “GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles” by Octavian-Eugen Ganea, Lagnajit Pattanaik, Connor W. Coley, Regina Barzilay, Klavs F. Jensen, William H. Green and Tommi S. Jaakkola, June 8, 2021 , Physics> Chemical Physics.
arXiv: 2106.07802

This research was funded by the Machine Learning for Pharmaceutical Discovery and Synthesis consortium.



Comments are closed.