 
										
																						
																				In 1863 five members of the Chōshū han in Japan made a secret journey to University College London in the UK to study. At the time of their departure, travel overseas was illegal in Japan, nonetheless all five students made an impact on the University that is commemorated to this day, and returned to establish institutions that augured a new era in their homeland, including the National Mint, the Japanese railways and the first prime minister. In the same spirit of international collaborations fostering pioneering innovations, materials and data scientists met at the Japanese Embassy in London on Friday 21 June during the “Season of Culture” to discuss “Global Trends in Research on Data-driven Discovery in Materials Science”. The event was the 10th scholarly colloquium organized by the journal Science and Technology of Advanced Materials (STAM).
Developments in data present an interesting example in science diplomacy where science and technology may facilitate a diplomatic agenda that in turn serves the interests of science. Speaking to attendees at the embassy, Teruo Kishi, science and technology adviser to the Minister for Foreign Affairs in Japan described a programme the Japanese government has recently begun promoting to strengthen the global digital economy – “Data Free Flow with Trust”. The initiative aims to establish global rules and norms for data sharing to prevent digital databases and institutions from fragmenting and becoming less productive. “The engine for growth, if you think about it, is fuelled no longer by gasoline but more and more by data,” added Kishi, quoting the words of the Japanese prime minister Shinzo Abe at a World Economic Forum earlier in 2019.
Data quantity and quality
The work at the National Institute of Materials Science (NIMS) in Japan exemplifies the benefits extensive data can offer materials research. NIMS is a global leader in nickel-based materials research, having developed alloys with world record creep rupture performance with respect to temperature since the mid-1980s. The secret to this success, suggested NIMS president and STAM editor-in-chief Kazuhito Hashimoto in his presentation at the embassy, can be attributed to the high-quality nickel alloy samples safeguarded at NIMS for almost 40 years.
“Both data quality and quantity are important,” Kazuhito told attendees, a point that becomes ever more pertinent in an age where machine learning algorithms are making increasingly significant contributions to materials science. How these algorithms reach their results is not always obvious. However, feeding in large quantities of data, “trains” the algorithms to identify overarching trends from which they can then extrapolate the likely outcomes of unfamiliar scenarios. As such they have become powerful tools for materials discovery, but as the old adage goes “rubbish in, rubbish out”. As well as high quantities of data to train the algorithms, high quality is imperative.
Machine learning in data science
So what materials science projects could benefit from machine learning? NIMS researcher and STAM board member Ryo Tamura also described work at NIMS using machine learning to find new molecules that not only offer valuable properties but can be physically synthesized, as well as a smell sensor that identifies the ratio of ethanol and methanol – very similar organic molecules – and water, using two channels to diminish the machine learning error. Even the uncertainties in machine learning calculations can be useful. Tamura also described how he and his colleagues had used uncertainty sampling with points of least confidence to generate phase diagrams.
James Elliott, a researcher at the University of Cambridge and STAM board member highlighted some of the artificial algorithms that have raised eyebrows in the worlds of chess, shoji and even Go, where world champions have all now met their match against machines. Like other machine learning algorithms, programs like Alpha Go have been trained on previous human and machine games, but the new player in the field – Alpha Zero – trains on no other input but the rules of the game, from which it then plays itself to train. Could this type of algorithm discover new materials based on just the rules of physics, Elliott asked attendees?
He went on to describe work using machine learning to complement more conventional ab initio calculations. One example stems from the shorter run times of machine learning calculations, which means they can highlight what ab initio calculations to run for more efficient numerical research. Work in Elliott’s group has helped to understand how the layers of graphite slide over each other, a process which despite its prevalence in systems ranging from a carbon nanotube space elevator to the humble pencil, until recently remained poorly understood.
Data demands
While a physical hoard of nickel samples stretching back over four decades of materials science may be hugely beneficial to researchers at NIMS, it is easier to share and strengthen that advantage with digital data repositories. In the spirit of Data Free Flow with Trust and recognizing the demand for high-quality data in materials science, NIMS has launched a Materials Database project expected to run from 2017-2021 to automatically collect data from participating scientific publications and facility data repositories.
The idea seems likely to benefit materials science in general a great deal, although whether it will cater for fields like nanomaterials remains to be seen. “If the UK and Japan are willing to create a proper database for nanomaterials – that up to now does not exist,” Francesca Baletto told Physics World. A researcher at King’s College London and attendee at the event, she highlighted that so far existing materials databases are primarily for bulk materials, whereas nanomaterials, which have many more parameters to control present more specific logging requirements.
Automatic collection of data from institutional repositories poses a range of challenges. At present institutions record data in different forms, prompting suggestions that some form of international data standardization is needed. Despite the vastness of a scheme that hopes to sort and store such large and varied collections of data, the problem of data standardization may still be more tractable than potential issues around privacy, a point that cropped up in discussions after the talks as Adarsh Sandhu, researcher at the University of Electro-Communications in Japan and STAM deputy editor, pointed out. Data Free Flow with Trust and shared materials science databases intend to deal with purely nonpersonal data. However, it is not hard to see how valuable it might be if a machine learning algorithm for driverless cars, for example, could automatically access and train on data from people’s car journeys, or how quickly that might start to impinge on people’s expectations of privacy.
The databases that scour the literature for data may also benefit from more scholarly journals published open access so that they are not behind a paywall online. Despite the arguments and growing support for mandatory open access publication for all publicly funded research, points passionately voiced by the UK Research Institute’s executive chair David Sweeney, funds and procedures for universal open access have proved difficult to establish.
 
				A machine-learning revolution
Flying the flag for open access is the journal STAM, which organized the colloquium on materials informatics at the Japanese Embassy and whose Editorial Board many of the speakers at the event belong to, including Ryo Tamura, James Elliott, Adarsh Sandhu and Masanobu Naito, as well as editors in chief both past and present Teruo Kishi and Kazuhito Hashimoto. STAM, as Naito explained, began as a shop window for research at NIMS and other institutions in Japan and now competitively represents research from across the world. An open access publication since 2008, STAM celebrates its 20th anniversary next year in 2020.
Photo credits: Embassy of Japan in the UK
 
								 
		 
																
														 
																
														