Using computational tools for molecule discovery

Assistant professor Connor Coley is developing tools that would be able to predict molecular behavior and learn from both successes and mistakes.

Discovering a drug, material, or anything new requires finding and understanding molecules. It’s a time- and labor-intensive process, which can be helped along by a chemist’s expertise, but it can only go so quickly, be so efficient, and there’s no guarantee for success. Connor Coley is looking to change that dynamic. The Henri Slezynger (1957) Career Development Assistant Professor in the MIT Department of Chemical Engineering is developing computational tools that would be able to predict molecular behavior and learn from the successes and mistakes.

It’s an intuitive approach and one that still has obstacles, but Coley says that this autonomous platform holds enormous potential for remaking the discovery process. A reservoir of untapped and never-before-imagined molecules would be opened up. Suggestions could be made from the outset, offering a running start and shortening the overall timeline from idea to result. And human capital would no longer be a restriction, allowing scientists to be be freed up from monitoring every step and instead tackle bigger questions that they weren’t able to before. “This would let us boost our productivity and scale out the discovery process much more efficiently,” he says.

Playing detective

Molecules present a couple of challenges. They take time to figure out and there are a lot of them. Coley cites estimates that there are 1020 to 1060 that are small and biologically relevant, but fewer than 109 have been synthesized and tested. To close that gap and accelerate the process, his group has been working on computational techniques that learn to correlate molecular structures with their functions.

One of the tools is guided optimization, which would evaluate a molecule across a number of dimensions and determine which will have the best properties for a given task. The aim is to have the model make better predictions as it runs through a technique called active learning, and Coley says that it might reduce the number of experiments it takes for a hypothetical new drug to go from initial stages to clinical trials “by an order of magnitude.”

There are still inherent limitations. The guided optimization relies on models that are currently available, and molecules, unlike images, aren’t numerical or static. Their shapes change based on factors like environment and temperature. Coley is looking to take those elements into account, so the tool can learn patterns, and the result would be “a more nuanced understanding of what it means to have a molecular structure and how best to capture that as an input to these machine learning models.”

One bottleneck, as he calls it, is having good test cases to benchmark performance. As an example, two molecules that are mirror images can still behave differently in different environments, one of those being the human body, but many datasets don’t show that. Developing new algorithms and models requires having specific tasks and goals, and he’s working on creating synthetic benchmarks that would be controlled but would still reflect real applications.


An article by Steve Calechman for MIT News

Read the full article here



Leave a reply

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert


Log in with your credentials

Forgot your details?