Model automates molecule design to speed drug development
3 posters
Page 1 of 1
Model automates molecule design to speed drug development
Public Release: 6-Jul-2018
Model automates molecule design to speed drug development
Machine-learning model could help chemists make molecules with higher potencies, much more quickly.
Massachusetts Institute of Technology
Designing new molecules for pharmaceuticals is primarily a manual, time-consuming process that's prone to error. But MIT researchers have now taken a step toward fully automating the design process, which could drastically speed things up -- and produce better results.
Drug discovery relies on lead optimization. In this process, chemists select a target ("lead") molecule with known potential to combat a specific disease, then tweak its chemical properties for higher potency and other factors.
Often, chemists use expert knowledge and conduct manual tweaking of molecules, adding and subtracting functional groups -- atoms and bonds responsible for specific chemical reactions -- one by one. Even if they use systems that predict optimal chemical properties, chemists still need to do each modification step themselves. This can take hours for each iteration and may still not produce a valid drug candidate.
Researchers from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and Department of Electrical Engineering and Computer Science (EECS) have developed a model that better selects lead molecule candidates based on desired properties. It also modifies the molecular structure needed to achieve a higher potency, while ensuring the molecule is still chemically valid.
The model basically takes as input molecular structure data and directly creates molecular graphs -- detailed representations of a molecular structure, with nodes representing atoms and edges representing bonds. It breaks those graphs down into smaller clusters of valid functional groups that it uses as "building blocks" that help it more accurately reconstruct and better modify molecules.
"The motivation behind this was to replace the inefficient human modification process of designing molecules with automated iteration and assure the validity of the molecules we generate," says Wengong Jin, a PhD student in CSAIL and lead author of a paper describing the model that's being presented at the 2018 International Conference on Machine Learning in July.
Joining Jin on the paper are Regina Barzilay, the Delta Electronics Professor at CSAIL and EECS and Tommi S. Jaakkola, the Thomas Siebel Professor of Electrical Engineering and Computer Science in CSAIL, EECS, and at the Institute for Data, Systems, and Society.
The research was conducted as part of the Machine Learning for Pharmaceutical Discovery and Synthesis Consortium between MIT and eight pharmaceutical companies, announced in May. The consortium identified lead optimization as one key challenge in drug discovery.
"Today, it's really a craft, which requires a lot of skilled chemists to succeed, and that's what we want to improve," Barzilay says. "The next step is to take this technology from academia to use on real pharmaceutical design cases, and demonstrate that it can assist human chemists in doing their work, which can be challenging."
"Automating the process also presents new machine-learning challenges," Jaakkola says. "Learning to relate, modify, and generate molecular graphs drives new technical ideas and methods."
Generating molecular graphs
Systems that attempt to automate molecule design have cropped up in recent years, but their problem is validity. Those systems, Jin says, often generate molecules that are invalid under chemical rules, and they fails to produce molecules with optimal properties. This essentially makes full automation of molecule design infeasible.
These systems run on linear notations of molecules, called "simplified molecular-input line-entry systems," or SMILES, where long strings of letters, numbers, and symbols represent individual atoms or bonds that can be interpreted by computer software. As the system modifies a lead molecule, it expands its string representation symbol by symbol -- atom by atom, and bond by bond -- until it generates a final SMILES string with higher potency of a desired property. In the end, the system may produce a final SMILES string that seems valid under SMILES grammar, but is actually invalid.
The researchers solve this issue by building a model that runs directly on molecular graphs, instead of SMILES strings, which can be modified more efficiently and accurately.
Powering the model is a custom variational autoencoder -- a neural network that "encodes" an input molecule into a vector, which is basically a storage space for the molecule's structural data, and then "decodes" that vector to a graph that matches the input molecule.
At encoding phase, the model breaks down each molecular graph into clusters, or "subgraphs," each of which represents a specific building block. Such clusters are automatically constructed by a common machine-learning concept, called tree decomposition, where a complex graph is mapped into a tree structure of clusters -- "which gives a scaffold of the original graph," Jin says.
(More at link: https://www.eurekalert.org/pub_releases/2018-07/miot-mam070618.php )
Model automates molecule design to speed drug development
Machine-learning model could help chemists make molecules with higher potencies, much more quickly.
Massachusetts Institute of Technology
Designing new molecules for pharmaceuticals is primarily a manual, time-consuming process that's prone to error. But MIT researchers have now taken a step toward fully automating the design process, which could drastically speed things up -- and produce better results.
Drug discovery relies on lead optimization. In this process, chemists select a target ("lead") molecule with known potential to combat a specific disease, then tweak its chemical properties for higher potency and other factors.
Often, chemists use expert knowledge and conduct manual tweaking of molecules, adding and subtracting functional groups -- atoms and bonds responsible for specific chemical reactions -- one by one. Even if they use systems that predict optimal chemical properties, chemists still need to do each modification step themselves. This can take hours for each iteration and may still not produce a valid drug candidate.
Researchers from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and Department of Electrical Engineering and Computer Science (EECS) have developed a model that better selects lead molecule candidates based on desired properties. It also modifies the molecular structure needed to achieve a higher potency, while ensuring the molecule is still chemically valid.
The model basically takes as input molecular structure data and directly creates molecular graphs -- detailed representations of a molecular structure, with nodes representing atoms and edges representing bonds. It breaks those graphs down into smaller clusters of valid functional groups that it uses as "building blocks" that help it more accurately reconstruct and better modify molecules.
"The motivation behind this was to replace the inefficient human modification process of designing molecules with automated iteration and assure the validity of the molecules we generate," says Wengong Jin, a PhD student in CSAIL and lead author of a paper describing the model that's being presented at the 2018 International Conference on Machine Learning in July.
Joining Jin on the paper are Regina Barzilay, the Delta Electronics Professor at CSAIL and EECS and Tommi S. Jaakkola, the Thomas Siebel Professor of Electrical Engineering and Computer Science in CSAIL, EECS, and at the Institute for Data, Systems, and Society.
The research was conducted as part of the Machine Learning for Pharmaceutical Discovery and Synthesis Consortium between MIT and eight pharmaceutical companies, announced in May. The consortium identified lead optimization as one key challenge in drug discovery.
"Today, it's really a craft, which requires a lot of skilled chemists to succeed, and that's what we want to improve," Barzilay says. "The next step is to take this technology from academia to use on real pharmaceutical design cases, and demonstrate that it can assist human chemists in doing their work, which can be challenging."
"Automating the process also presents new machine-learning challenges," Jaakkola says. "Learning to relate, modify, and generate molecular graphs drives new technical ideas and methods."
Generating molecular graphs
Systems that attempt to automate molecule design have cropped up in recent years, but their problem is validity. Those systems, Jin says, often generate molecules that are invalid under chemical rules, and they fails to produce molecules with optimal properties. This essentially makes full automation of molecule design infeasible.
These systems run on linear notations of molecules, called "simplified molecular-input line-entry systems," or SMILES, where long strings of letters, numbers, and symbols represent individual atoms or bonds that can be interpreted by computer software. As the system modifies a lead molecule, it expands its string representation symbol by symbol -- atom by atom, and bond by bond -- until it generates a final SMILES string with higher potency of a desired property. In the end, the system may produce a final SMILES string that seems valid under SMILES grammar, but is actually invalid.
The researchers solve this issue by building a model that runs directly on molecular graphs, instead of SMILES strings, which can be modified more efficiently and accurately.
Powering the model is a custom variational autoencoder -- a neural network that "encodes" an input molecule into a vector, which is basically a storage space for the molecule's structural data, and then "decodes" that vector to a graph that matches the input molecule.
At encoding phase, the model breaks down each molecular graph into clusters, or "subgraphs," each of which represents a specific building block. Such clusters are automatically constructed by a common machine-learning concept, called tree decomposition, where a complex graph is mapped into a tree structure of clusters -- "which gives a scaffold of the original graph," Jin says.
(More at link: https://www.eurekalert.org/pub_releases/2018-07/miot-mam070618.php )
Leo11- Posts : 2
Join date : 2019-02-20
Age : 26
Re: Model automates molecule design to speed drug development
.
I remember a news story from over twentyfive years ago. They built a program to design electric circuits. It needed plenty of expert attention at first, but it soon started making the high frequency circuits were hoping for. Then surprise! After about three months it was creating circuits that none of the experts could understand. Using the charge field, please explain.
Let’s keep things simple enough to delay the day we must give in to our new machine overlords.
Welcome!
.
Hello Sir, what’s that you say?Leo11 wrote. Cool teme
I remember a news story from over twentyfive years ago. They built a program to design electric circuits. It needed plenty of expert attention at first, but it soon started making the high frequency circuits were hoping for. Then surprise! After about three months it was creating circuits that none of the experts could understand. Using the charge field, please explain.
Let’s keep things simple enough to delay the day we must give in to our new machine overlords.
Welcome!
.
LongtimeAirman- Admin
- Posts : 2078
Join date : 2014-08-10
Similar topics
» The Largest Molecule(s)
» Di-Radicals and O=CC=O in the news -- Does this Molecule exist or not?
» Subatomic microscopy as a key to materials design
» Small Molecule Detection with an FEB system -- Field Effect Biosensing
» 1. Speed of Light is Additive; 2. Redshift
» Di-Radicals and O=CC=O in the news -- Does this Molecule exist or not?
» Subatomic microscopy as a key to materials design
» Small Molecule Detection with an FEB system -- Field Effect Biosensing
» 1. Speed of Light is Additive; 2. Redshift
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum