The groundwork for machine learning was laid down in the middle of last century. But increasingly powerful computers – harnessed to algorithms refined over the past decade – are driving an explosion of applications in everything from medical physics to materials, as Marric Stephens discovers
When your bank calls to ask about a suspiciously large purchase made on your credit card at a strange time, it’s unlikely that a kindly member of staff has personally been combing through your account. Instead, it’s more likely that a machine has learned what sort of behaviours to associate with criminal activity – and that it’s spotted something unexpected on your statement. Silently and efficiently, the bank’s computer has been using algorithms to watch over your account for signs of theft.
Monitoring credit cards in this way is an example of “machine learning” – the process by which a computer system, trained on a given set of examples, develops the ability to perform a task flexibly and autonomously. As a subset of the more general field of artificial intelligence (AI), machine-learning techniques can be applied wherever there are large and complex data sets that can be mined for associations between inputs and outputs. In the case of your bank, the algorithm will have analysed a vast pool of both legitimate and illegitimate transactions to produce an output (“suspected fraud”) from a given input (“high-value order placed at 3 a.m.”).
But machine learning isn’t just used in finance. It’s being applied in many other fields too, from healthcare and transport to the criminal-justice system. Indeed, Ge Wang – a biomedical engineer from the Rensselaer Polytechnic Institute in the US who is one of those pioneering its use in medical imaging – believes that when it comes to machine learning, we’re on the cusp of a revolution.
The inside story
Wang’s research involves taking incomplete data from scans of human patients (the input) and “reconstructing” a real image (the output). Image reconstruction is essentially the inverse of a more common application of machine-learning algorithms, whereby computers are trained to spot and classify existing images. Your smartphone, for example, might use these algorithms to recognize your handwriting, while self-driving cars deploy them to identify vehicles and other potential hazards on the road.
Image reconstruction is not just a medical technique – it’s found in ports and airports, where it allows security staff to use X-rays to peer inside sealed containers. It’s also valuable in the construction and materials industries where 3D ultrasound images can reveal dangerous flaws in structures long before they fail. But for Wang, his goal is to overcome the noise and artefacts that arise when reconstructing a volumetric image of an object (such as a patient’s heart) based on imperfect and incomplete medical-physics data.
There are good reasons for making do with as little data as possible. In magnetic-resonance imaging (MRI), for example, taking scans quickly avoids unwanted movements of the patient’s heart and lungs that would otherwise smear the resulting picture unacceptably. In X-ray computed tomography (CT), meanwhile, you want to minimize the radiation dose to the patient, which means capturing just enough data to produce an image – and no more.
Traditional “analytic” reconstruction methods produce images by combining measurements made from every angle around the patient, which is difficult as it means taking complete data sets. Although “iterative-reconstruction” algorithms developed in recent years are better at tolerating gaps in the data, they need lots of computer power. That’s because these algorithms produce multiple candidate images, each of which has to be compared to “correct” data, so that a final reconstruction is arrived at gradually.
In the short term, Wang envisages machine-learning techniques replacing specific individual components of the reconstruction process. The techniques would be based on “artificial neural networks” (see box below), which approximately emulate the workings of a biological brain, with each input processed by one or more “hidden” layers of artificial neurons. Interactions between the layers are weighted so that the process is nonlinear, and these parameters change as the system learns, modifying the output accordingly. So-called “deep-learning” approaches are those that make use of “deep” networks when there are many hidden layers.
To begin with, Wang thinks that improvements will be marginal rather than revolutionary. In iterative reconstruction, for example, using a neural network to make the initial “guess” for the image based on a large data set would simply make the whole process more efficient. Another substitution would see a neural network take the role of deciding when enough iterations have been performed to produce an adequate output.
Longer term, however, Wang is more ambitious. He calls for a completely integrated system, in which machine-learning algorithms – using raw imaging data as inputs – reconstruct the image and then extract and classify pathological features like cancers and neural diseases. Such a system could even be extended to encompass treatment planning, automating the whole process from data acquisition to therapy.
Yet despite its achievements and its promise, Wang says, deep learning lacks a decent overarching theory, which means the technique’s inconsistencies are still mysterious. “By changing one small pixel’s value, the artificial neural network could return weird results. It’s not always right,” says Wang. A goal for the future, then, is to develop more easily explainable, interpretable AI, opening the black box, which – Wang jokes – “is still a grey box”.
Quantum questions
Machine learning could also have a profound impact on quantum physics, notably solving “quantum many-body problems”. Such problems arise when you have a set of interacting objects that can be understood only by accounting for their quantum nature. “What these problems have in common is the fact that studying their properties requires, in principle, a full knowledge of the many-body wave function,” says Giuseppe Carleo, a physicist at the Simons Foundation’s Flatiron Institute in New York, US.
The many-body wave function is, in Carleo’s words, “a monster, whose complexity scales exponentially with the number of constituents”. Imagine, for example, a system of particles that can each spin either clockwise or anticlockwise. With two particles, you have four possible states. With three particles, eight states, which is still manageable. Go much further, however, and things quickly get out of hand.
Traditional methods are ineffective at tackling the problem for more than a few components, so Carleo and Matthias Troyer – who was then a colleague at ETH Zurich in Switzerland – applied a machine-learning approach. The pair found that a relatively “shallow” neural-network architecture – using just a single hidden layer – could efficiently “learn” a representation of the wave function, for an example problem of spins on a 1D or 2D lattice.
The same difficulties in solving the quantum many-body problem arise in “quantum-state tomography”. Just as tomographic imaging reconstructs the interior of an object from measurements made from without, so quantum state tomography determines a system’s quantum state from a small number of measurements made on its more accessible parts. As with the quantum many-body problem, the information encoded in the wave function grows exponentially with the number of components in the system.
One quantum state that would be useful to describe is the way in which qubits are entangled in a quantum computer, making quantum state tomography vital for understanding how such a computer would cope with noise and loss of coherence. The problem is, any quantum computer worth having will include dozens or hundreds of qubits, so a brute-force approach to determining its quantum state will be inadequate. That’s where artificial neural networks come to the rescue, making it possible – Carleo found – to efficiently reconstruct the state of a quantum computer comprising 100 qubits. Standard approaches, in contrast, are limited to around eight qubits.
And there’s more to come. Machine-learning approaches have been applied to this field only recently, which means that the techniques used by researchers are still at the proof-of-principle stage. Indeed, the methods demonstrated by Carleo and colleagues typically involve neural networks with just one or two hidden layers, whereas more mature commercial applications – such as those used by the likes of Google and Facebook – can employ much deeper architectures, and run on dedicated hardware that has been optimized for the job.
Unfortunately, the notorious weirdness of quantum physics means that these more complex neural networks could not simply be translated directly to the quantum regime; Carleo and others had to rewrite the algorithms almost from scratch, and are yet to match the complexity seen at the cutting edge of machine-learning applications. Catching up with those mature systems will allow artificial neural networks to solve even more complex quantum problems. “I think that the next few years will see this methodological and technological gap shrink more and more, leading to applications we cannot even imagine right now,” says Carleo.
Artificial neural networks will be able to solve even more complex quantum problems within the next few years
Finding new materials
Whereas artificial neural networks must typically be fed large data sets before they produce useful results, over at the University of Virginia in the US, Prasanna Balachandran employs tools that are not so data-hungry. The aim of his research is to identify, from the vast, multidimensional space of possibilities, the relatively few formulations that yield materials with favourable properties. To explore such a space by trial and error would take much too long, and the mapped regions – corresponding to materials whose properties are known – are a vanishingly small part of the whole.
The method that Balachandran uses to solve this problem is a particular form of machine learning known as statistical learning. This approach gets around the need for large training sets by assuming that patterns in the data follow strict statistical rules. “We train machine-learning models to learn about things that we already know, and we apply those models to predict things that we do not know,” he explains.
In this case, we know the behaviour of certain material combinations, and what we essentially want to predict are the properties of every other possible formulation. However, the confidence with which the properties of a given material can be predicted depends on how well the surrounding neighbourhood is known, so – for each prediction – Balachandran also quantifies the error bars associated with every expected value.
Regions where knowledge is lacking can therefore be identified, and the system can suggest the most profitable experiments to do next. It’s a novel approach. “Generally, in materials science, the way that experiments have been carried out is biased by the intuition of the scientist who is running them,” says Balachandran.
Balachandran and colleagues in the US and China recently demonstrated the fruitfulness of this approach by discovering a set of high-performing “shape-memory alloys” from nearly a million possible compositions (Nature Comms 7 11241). Such materials are useful because they deform as they change phase upon heating or cooling. The temperature of the phase change depends on the direction of the transition, with this difference – the thermal hysteresis – determining the applications that the alloy is suited to. Balachandran’s group was particularly keen on materials with the smallest possible thermal hysteresis and found that almost half of the few-dozen alloys that they synthesized on the basis of the machine’s predictions beat the best sample to date.
Exploring the infinite space of material properties might be one of those activities derided by Ernest Rutherford as mere “stamp collecting”, but it could be key to discovering new physics. “In the next five to 10 years we want to go beyond correlation and start thinking about causation,” says Balachandran. “You need to have the right kind of data to explore the concept of causation itself. In my opinion we have the solution to that part of the puzzle, and we know how to find representative samples for any given problem that is of interest to us – fast.”
Statistics, statistics, statistics
While machine-learning techniques have yielded concrete results and insights in medical, quantum and materials physics that wouldn’t be possible otherwise, progress has been less clear in statistical physics. “We are still waiting for the big example that the community would agree we would not have done without machine learning,” admits Lenka Zdeborová, who studies the theory of machine learning at Université Paris-Saclay in France.
Sure, there have been promising developments in statistical physics, but Zdeborová says these techniques have so far not been deployed at the frontiers of the field. She points to dozens of papers that use neural networks to study models such as the 2D Ising model, which describes the interactions between spinning particles on a 2D lattice, but says none so far are telling us anything fundamentally new.
It may be disappointing that machine learning is not yet driving advances in statistical physics, but knowledge and insight are certainly flowing the other way. Imagine, for example, a neural network required to identify images. Each image will contain lots of data (pixels) and be noisy (because any given image will be masked by masses of irrelevant features); and there will also be correlations between the different weights in the network.
Happily, problems that are multidimensional, noisy and correlated are just the sort of thing that statistical physicists have been learning how to deal with since the middle of the last century. “Just think about the theories that physics has developed in disordered systems,” says Zdeborová, whose own background is in a specific kind of disordered magnet known as spin glasses. Such systems have lots of particles (i.e. lots of dimensions), have a finite temperature (i.e. are thermally noisy) and many inter-particle interactions (i.e. lots of correlations). In fact, in some cases the equations that describe models of machine learning are exactly the same as those used to handle systems in statistical physics.
This insight could be key to developing a comprehensive theory that explains just why these methods work so well. Machine learning may have advanced further than was generally predicted a couple of decades ago, but its successes still arise largely from empirical trial-and-error approaches. “We want to be able to predict the optimal architecture, how we should set the parameters, and what the algorithm should be,” Zdeborová concludes. “Currently we have no clue how to get those without huge human effort.”
Machine-learning jargon buster
Artificial intelligence (AI)
Intelligent behaviour exhibited by machines. But the definition of intelligence is controversial so a more general description of AI that would satisfy most is: the behaviour of a system that adapts its actions in response to its environment and prior experience.
Machine learning
As a group of approaches to endow a machine with artificial intelligence, machine learning is itself a broad category. In essence, it is the process by which a system learns from a training set so that it can deliver autonomously an appropriate response to new data.
Artificial neural networks
A subset of machine learning in which the learning mechanism is modelled on the behaviour of a biological brain. Input signals are modified as they pass through networked layers of neurons before emerging as an output. Experience is encoded by varying the strength of interactions between neurons in the network.
- A new IOP Publishing ebook Machine Learning for Tomographic Imaging by Ge Wang, Yi Zhang, Xiaojing Ye and Xuanqin Mou will be published later this year.