A systematic approach to retraining deep-learning artificial intelligence algorithms to deal with different situations has been developed has been developed by climate researchers in the US. The team found that, contrary to conventional wisdom, retraining earlier levels of the algorithm often achieves better results than retraining later ones.
Deep learning is a highly advanced, sometimes controversial type of machine learning in which computer algorithms teach themselves the important features of a system and learn to make classifications about its nature and predictions about its behaviour, often with accuracies that outstrip the capabilities of humans. Perhaps the most famous demonstration of deep learning in action was the victory of Google’s AlphaGo program over the champion go player Lee Sedol in 2017. However, deep learning has more practical applications: it can predict protein folding, screen tissue biopsies for early signs of cancer and predict weather patterns.
However, as deep learning algorithms are not programmed by an external operator, they cannot simply be reprogrammed either. Instead, if the system changes, the algorithm must be retrained using data from the new system. This is important in climatology if deep learning algorithms that trained using today’s climatic conditions are to make useful predictions about weather conditions in a world affected by climate change. This process – familiar to humans – of adapting prior experience to unfamiliar situations is known to computer scientists as transfer learning.
Deep mystery
Climate scientist Pedram Hassanzadeh of Rice University in Texas explains that deep learning algorithms process information in a sequence of layers. “The information goes into a layer, which extracts some information, and then sends this information to another layer, which extracts more information.” This process eventually produces the output, but as Hassanzadeh explains, “Nobody knows exactly what the job of each layer is because we don’t design any of them – they are all learned.” Transfer learning uses the small amount of available data from the new data set to retrain one (or a few) of these levels, and Hassanzadeh says it is “important which level you pick”.
Conventional wisdom, he says, dictates that the specifics of the problem are worked out in the deepest layers of the network (those layers closest to the output). Therefore, to perform transfer learning, these are the best to retrain. “What’s been done in the past is that, say, Google trains a thousand-layer network on Google Images, and then somebody brings a small number of X-rays, so they retrain layers 998 and 999,” Hassanzadeh explains. Now he and his colleagues have taken a systematic approach instead.
The researchers performed high-resolution simulations of the behaviour of fluids under three different sets of conditions. They used these data to train three 10-layer deep learning algorithms to predict the behaviour of fluids under each of these specific parameters. They changed some parameters such as the Reynolds number (the ratio of inertial forces to viscous forces) or the vorticity of the fluid in each case and conducted another set of high-resolution simulations to predict the behaviour of the new fluids. In each of the three cases, they trained the same algorithms on the new data. Finally, they conducted transfer learning of the old algorithms on a small subset of the new data, looking at the effect of retraining each level and comparing the performance of the retrained old algorithm with the algorithm that had been trained from scratch on the new data.
Retraining shallow layers
The results were surprising. “In this paper, we found that the shallowest layers were the best to retrain,” says Hassanzadeh. Having access to the predicted signal produced by retraining each layer in turn gave them a window into the effect each layer had on this final signal. Therefore, they simply used spectral analysis of each signal to see how each layer was modifying each frequency present. Some levels were controlling the low-frequencies, and it was useful to retrain these as they captured the smoothly varying, macroscopic features of the algorithm. Other levels, meanwhile, predicted the details, and retraining these alone was near-useless. The researchers have provided a protocol for determining the most important levels in any given case. “We didn’t want to say we have a rule of thumb in this paper,” says Hassanzadeh. “Now we have found systems where, for example, the middle layers are the best [to retrain].”
Artificial intelligence simplifies calculations of electronic properties
The team describes the work in a paper published in PNAS Nexus.
“I think it’s a really interesting paper,” says astrophysicist and machine learning expert Shirley Ho of the Flatiron Institute in New York City. She adds, “On the other hand, in many other scientific disciplines we’ve been using spectral analysis for a long time now, so I guess the question is whether or not applying it to the multiple layers is a significant contribution. I get the feeling that it’s probably one of those things that’s been in people’s minds, but no-one has written it. It may be one of those great papers where, once you say it, it’s obvious to everybody.”