The data science of digital alchemy

0


[ad_1]

(Rost9 / Shutterstock)

In nature, particles arrange themselves spontaneously into crystalline structures that have certain properties. Humans have learned to use this behavior to make useful materials in the laboratory, but we are limited by the physical nature of experimentation. A University of Michigan research group, led by “digital alchemist” Sharon C. Glotzer, uses data science and high-performance computing resources to predict which nanoparticles will self-organize and thus accelerate the creation of novel materials.

Glotzer, one of the world’s leading researchers in the field of self-assembly of nanoparticles, heads the Glotzer Group, a group of about 30 researchers in the Department of Chemical Engineering at the University of Michigan and their Biointerfaces Institute. During the ACM SIGKDD 2021 conference last week, Glotzer described how her team uses data science software, HPC hardware, and human ingenuity to make the magic work.

Crystals are ubiquitous in nature. Ice and table salt are examples of crystal structures that form spontaneously when the right elements are presented under the right conditions. But nature has far more complicated crystals that hide from view. At the microscopic level, crystal structures made up of various elements will be assembled into extremely complex, repetitive units, each with tens of thousands of atoms. The possible combinations are too wide to be explored, but Glotzer and her colleagues have dedicated themselves to this area.

“Are we obsessed with understanding how such complexity comes about? How did the system find out that it organizes itself into different crystal structures? Why does it prefer one crystal structure over another? And how does it get there? How does that work? “Asked Glotzer during their session at KDD2021, which took place virtually due to the COVID pandemic.

“We know that quantum mechanics explains a lot about bonding. Thermodynamics is important in determining what the stable phase will look like. And every crystal structure you get has to obey the laws of statistical thermodynamics, ”continued Glotzer. “But what we have no theory for is an understanding of the microscopic factors that lead from disorder to order.”

(Image courtesy of the Glotzer Group)

Glotzer and her colleagues approach the problem logically, using data science and HPC resources. The goal is to expand our understanding of the assembly pathways by which nanoparticles self-arrange – that is, create stable crystal structures on their own, with a minimum of human encouragement. In the end, creating novel materials that are useful to humans in various use cases

“It really is an infinite scope of possibilities,” said Glotzer, who was referred to as a “digital alchemist” in a 2017 Quanta magazine items. “Computer simulations are the perfect tool for exploring design space because we can do it faster than experiments, and we can keep certain things constant and vary other things in ways that experiments may not.”

Glotzer’s team doesn’t focus on metallic structures, but rather on “soft matter,” things like proteins, DNA, virus capsids and gamma particles, she said. One of the key aspects of research is knowing which organic molecules act as binders or ligands that bind the building blocks together. DNA is an example of a ligand.

The researchers work backwards from where they want to be. “We want to start with ‘Here are the behavioral properties we want our material to have’. On that basis, this is the structure of the crystal that we need, ”said Glotzer. “On this basis, we should produce nanoparticles and which binding elements we use so that these particles, when we throw them into a bucket of water, assemble themselves into exactly the desired structure.”

Biochemists today have a great deal of control over the manufacturing process of nanoparticles. According to Glotzer, it is now possible “to produce practically any type of nanoparticle shape from many, many types of materials with great uniformity, so that all particles are approximately the same size and shape”.

Presented with a large and compelling room filled with building blocks and glue, it is Glotzer’s job to figure out how all of this can come together in the most beneficial way.

(Image courtesy of the Glotzer Group)

“What if I gave you a crystal structure and tell me which nanoparticle shape to use? It would be hard to say what that shape should look like, ”she said. “For example, if you have a bunch of differently shaped particles that can all arrange themselves into structures like this clathrate structure, which one does the best? Which makes the best crystal with the highest yield and highest quality? We try to answer such questions with computer simulations. “

There are a few basic approaches that Glotzer takes. One approach is to create a simulation that uses molecular dynamics to predict the forces that various particles exert on one another and the resulting structure. The other is a Monte Carlo simulation, in which the system mimics the Brownian motion of nanoparticles in a liquid.

“When we examine the systems, we don’t know what they’re going to do,” she said. “When we start with a shape, we don’t know what they’re going to make or if they’re going to make anything at all. We do not know whether they arrange themselves, at what concentration, at what pressure or at what temperature they assemble themselves into this crystal structure. We don’t know about it and so we have to simulate a lot to hope that we start to see something that assembles itself. “

The Glotzer Group has developed its own code called HOOMD-blue to run the simulations. She said her team ran hundreds of thousands of simulations over 50,000 particle shapes. With so much uncertainty about what structures, if any, will emerge, your team needs access to a lot of computing power to make it worth it. That includes Summit, the 200 petaflops supercomputer installed at Oak Ridge National Laboratory.

The Glotzer Group shares the tools it has developed for studying the self-organization of nanoparticles (image courtesy of the Glotzer Group)

“We don’t know how big the unit cell is, and we don’t want to influence it because we have a particle plot that is too small and therefore we have to have really large systems,” she says. “All of this together means that we generate shiploads of data every day, terabytes and terabytes of data, and so we need a way to organize that data so that we can work with it scientifically.”

One of the tools that Glotzer’s team uses is signac, a sleek, application-independent framework written in Python that helps users manage and scale file-based workflows. According to Glotzer, signac is the glue that connects the various components in your team’s HPC workflow and is crucial for ensuring that the data generated is transparent, reproducible, usable and expandable by others.

“What signac is great about is managing file-based heterogeneous data in a local file system to search for and access data,” she said. “You can do this in Python or from the command line and develop scalable and reproducible computing workflows, including very complex ones.”

It is also important to pick the relevant patterns from the data machine of the simulations. To do this, Glotzer’s team uses unsupervised machine learning algorithms to generate the descriptors.

“The idea is to use machine learning to develop a microscopic understanding of assembly paths,” she said. “So we start with particles, and we need some descriptors that tell us what the local particle environment is so that we can distinguish one crystal structure from another and form crystal cultures of a liquid or parts of a structure.” Other parts of the crystal structure. “

Sharon C. Glotzer, a professor at the University of Michigan, received her Ph.D. in Soft Condensed Matter Theoretical Physics from Boston University in 1993

A major challenge here is the high dimensionality of the data. Glotzer’s group uses an algorithm called Uniform Manifold Approximation and Projection (UMAP) to reduce the dimensionality of the data while maintaining its native form in the reduced dimensional space. It also performed well on GPUs too Nvidia‘s RAPIC library for CUDA, she said.

Equipped with the continuous topological order parameter from UMAP, Glotzer’s group now has insights into the self-organization of nanoparticles. The color-coded results of this analysis also provide information about the type of packing of the nanoparticles. “We can follow these on the way from the fluid to the crystal and know how each particle environment changes over time on this way,” she says.

When Glotzer merges the various UMAP embeddings, an image is created. “We can see the whole structure of the manifold with all the different crystal structures that can be made up of all the different systems we are looking at, from the beginning as a liquid and at the end as a crystal,” she said. “This information helps us to develop new assembly routes.”

For more information on Glotzer’s work, visit her team’s website at glotzerlab.engin.umich.edu/home.

Similar articles:

What is data science? One Turing Prize winner shares his opinion

Clemson Software optimizes big data transfers

AI is committed to tracking complex chemical interactions in humans

[ad_2]

Share.

Leave A Reply