Researchers from Lawrence Berkeley National Laboratory trained an AI called Word2Vec on scientific papers to see if there was any “latent knowledge” that humans weren’t able to grock on first pass.
The study, published in Nature on July 3, reveals that the algorithm found predictions for potential thermoelectric materials which can convert heat into energy for various heating and cooling applications.
The algorithm didn’t know the definition of thermoelectric, though. It received no training in materials science. Using only word associations, the algorithm was able to provide candidates for future thermoelectric materials, some of which may be better than those we currently use. –Motherboard
“It can read any paper on material science, so can make connections that no scientists could,” said researcher Anubhav Jain. “Sometimes it does what a researcher would do; other times it makes these cross-discipline associations.”
The algorithm was designed to assess the language in 3.3 million abstracts from material sciences, and was able to build a vocabulary of around half-a-million words. Word2Vec used machine learning to analyze relationships between words.
“The way that this Word2vec algorithm works is that you train a neural network model to remove each word and predict what the words next to it will be,” said Jain, adding that “by training a neural network on a word, you get representations of words that can actually confer knowledge.”
Using just the words found in scientific abstracts, the algorithm was able to understand concepts such as the periodic table and the chemical structure of molecules. The algorithm linked words that were found close together, creating vectors of related words that helped define concepts. In some cases, words were linked to thermoelectric concepts but had never been written about as thermoelectric in any abstract they surveyed. This gap in knowledge is hard to catch with a human eye, but easy for an algorithm to spot.
After showing its capacity to predict future materials, researchers took their work back in time, virtually. They scrapped recent data and tested the algorithm on old papers, seeing if it could predict scientific discoveries before they happened. Once again, the algorithm worked. –Motherboard
The technology isn’t restricted to materials science either – as it can be trained on a wide variety of disciplines by retraining it on literature from whichever subject for which one wants to provide a deeper analysis.
“This algorithm is unsupervised and it builds its own connections,” said the study’s lead author, Vahe Tshitoyan, adding “You could use this for things like medical research or drug discovery. The information is out there. We just haven’t made these connections yet because you can’t read every article.”
Check out the full thing right over here.