Deputy director of the School of Computing at the Australian National University, Amanda Barnard talks to Hamish Johnston about research at the interface of computational modelling, applied machine learning and artificial intelligence
From using supercomputers to tap into new kinds of materials to training machine learning models to study complex properties at the nanoscale, Australian computational scientist Amanda Barnard works at the interface of computing and data science. A senior professor in the School of Computing at the Australian National University, Barnard is also deputy director and computational-science lead. These days, she uses a variety of computational methods to solve problems across the physical sciences, but Barnard began her career as a physicist, receiving her PhD in theoretical condensed-matter physics in 2003.
After spending the next few years as a postdoc at the Center for Nanoscale Materials at Argonne National Laboratory in the US, she began to broaden her research interests to encompass many aspects of computational science, including the use of machine learning in nanotechnology, materials science, chemistry and medicine.
A fellow of both the Australian Institute of Physics and the Royal Society of Chemistry, in 2022 Barnard was appointed a Member of the Order of Australia. She has also won a number of awards, including the 2014 Feynman Prize in Nanotechnology (Theory) and the 2019 medal from the Association of Molecular Modellers of Australasia. She speaks to Hamish Johnston about her interest in applying machine learning to a range of problems, and about the challenges and rewards of doing university administration.
Can you tell us a bit about what you do as a computational scientist?
Computational science involves designing and using mathematical models to analyse computationally demanding problems in many areas of science and engineering. This includes advances in computational infrastructure and algorithms that enable researchers across these different domains to perform large-scale computational experiments. In a way, computational science involves research into high-performance computing, and not just research using a high-performance computer.
We spend most of our time on algorithms and trying to figure out how to implement them in a way that makes best use of the advanced hardware; and that hardware is changing all the time. This includes conventional simulations based on mathematical models developed specifically in different scientific domains, be it physics, chemistry or beyond. We also spend a lot of time using methods from machine learning (ML) and artificial intelligence (AI), most of which were developed by computer scientists, making it very interdisciplinary research. This enables a whole bunch of new approaches to be used in all these different scientific areas.
Machine learning enables us to recapture a lot of the complexity that we’ve lost when we derive those beautiful theories
Simulation was born out of the theoretical aspects of each scientific area that, with some convenient levels of abstraction, enabled us to solve the equations. But when we developed those theories, they were almost an oversimplification of the problem, which was done either in the pursuit of mathematical elegance or just for the sake of practicality. ML enables us to recapture a lot of the complexity that we’ve lost when we derive those beautiful theories. But unfortunately, not all ML works well with science, and so computational scientists spend a lot of time trying to figure out how to apply these algorithms that were never intended to be used for these kinds of data sets to overcome some of the problems that are experienced at the interface. And that’s one of the exciting areas that I like.
You began your career as a physicist. What made you move to computational science?
Physics is a great starting point for virtually anything. But I was always on the path to computational science without realizing it. During my first research project as a student, I used computational methods and was instantly hooked. I loved the coding, all the way from writing the code to the final results, and so I instantly knew that supercomputers were destined to be my scientific instrument. It was exciting to think about what a materials scientist could do if they could make perfect samples every time. Or what a chemist could do if they could remove all contaminations and have perfect reactions. What could we do if we could explore harsh or dangerous environments without the risk of injuring anyone? And more importantly, what if we could do all of these things simultaneously, on demand, every time we tried?
The beauty of supercomputers is that they are the only instrument that enables us to achieve this near-perfection. What captivates me most is that I can not only reproduce what my colleagues can do in the lab, but also do everything they can’t do in the lab. So from the very early days, my computational physics was on a computer. My computational chemistry then evolved through to materials, materials informatics, and now pretty much exclusively ML. But I’ve always focused on the methods in each of these areas, and I think a foundation in physics enables me to think very creatively about how I approach all of these other areas computationally.
How does machine learning differ from classical computer simulations?
Most of my research is now ML, probably 80% of it. I still do some conventional simulations, however, as they give me something very different. Simulations fundamentally are a bottom-up approach. We start with some understanding of a system or a problem, we run a simulation, and then we get some data at the end. ML, in contrast, is a top-down approach. We start with the data, we run a model, and then we end up with a better understanding of the system or problem. Simulation is based on rules determined by our established scientific theories, whereas ML is based on experiences and history. Simulations are often largely deterministic, although there are some examples of stochastic methods such as Monte Carlo. ML is largely stochastic, although there are some examples that are deterministic as well.
With simulations, I’m able to do very good extrapolation. A lot of the theories that underpin simulations enable us to explore areas of a “configuration space” (the co-ordinates that determine all the possible states of a system) or areas of a problem for which we have no data or information. On the other hand, ML is really good at interpolating and filling in all the gaps and it’s very good for inference.
Indeed, the two methods are based on very different kinds of logic. Simulation is based on an “if-then-else” logic, which means if I have a certain problem or a certain set of conditions, then I’ll get a deterministic answer or else, computationally, it’ll probably crash if you get it wrong. ML, in contrast, is based on an “estimate-improve-repeat” logic, which means it will always give an answer. That answer is always improvable, but it may not always be right, so that’s another difference.
Simulations are intradisciplinary: they have a very close relationship to the domain knowledge and rely on human intelligence. On the other hand, ML is interdisciplinary: using models developed outside of the original domain, it is agnostic to domain knowledge and relies heavily on artificial intelligence. This is why I like to combine the two approaches.
Can you tell us a bit more about how you use machine learning in your research?
Before the advent of ML, scientists had to pretty much understand the relationships between the inputs and the outputs. We had to have the structure of the model predetermined before we were able to solve it. It meant that we had to have an idea of the answer before we could look for one.
We can develop the structure of an expression or an equation and solve it at the same time. That accelerates the scientific method, and it’s another reason why I like to use machine learning
When you’re using ML, the machines use statistical techniques and historical information to basically programme themselves. It means we can develop the structure of an expression or an equation and solve it at the same time. That accelerates the scientific method, and it’s another reason why I like to use it.
The ML techniques I use are diverse. There are a lot of different flavours and types of ML, just like there are lots of different types of computational physics or experimental physics methods. I use unsupervised learning, which is based entirely on input variables, and it looks at developing “hidden patterns” or trying to find representative data. That’s useful for materials in nanoscience, when we haven’t done the experiments to perhaps measure a property, but we know quite a bit about the input conditions that we put in to develop the material.
Unsupervised learning can be useful in finding groups of structures, referred to as clusters, that have similarities in the high-dimensional space, or pure and representative structures (archetypes or prototypes) that describe the data set as a whole. We can also transform data to map them to a lower-dimensional space and reveal more similarities that were not previously apparent, in a similar way that we might change to reciprocal space in physics.
I also use supervised ML to find relationships and trends, such as structure-property relationships, which are important in materials and nanoscience. This includes classification, where we have a discrete label. Say we already have different categories of nanoparticles and, based on their characteristics, we want to automatically assign them to either one category or another, and make sure that we can easily separate these classes based on input data alone.
I use statistical learning and semi-supervised learning as well. Statistical learning, in particular, is useful in science, although it’s not widely used yet. We think of that as a causal inference that is used in medical diagnostics a lot, and this can be applied to effectively diagnose how a material, for example, might be created, rather than just why it is created.
Your research group includes people with a wide range of scientific interests. Can you give us a flavour of some of the things that they’re studying?
When I started in physics, I never thought that I’d be surrounded by such an amazing group of smart people from different scientific areas. The computational science cluster at the Australian National University includes environmental scientists, earth scientists, computational biologists and bioinformaticians. There are also researchers studying genomics, computational neuroscience, quantum chemistry, material science, plasma physics, astrophysics, astronomy, engineering, and – me – nanotechnology. So we’re a diverse bunch.
Our group includes Giuseppe Barca, who is developing algorithms that underpin the quantum chemistry software packages that are used all around the world. His research is focused on how we can leverage new processors, such as accelerators, and how we can rethink how large molecules can be partitioned and fragmented so that we can strategically combine massively parallel workflows. He is also helping us to use supercomputers more efficiently, which saves energy. And for the past two years, he’s held the world record in the best scaling quantum chemistry algorithm.
Also on the small scale – in terms of science – is Minh Bui, who’s a bioinformatician working on developing new statistical models in the area of phylogenomics systems [a multidisciplinary field that combines evolutionary research with systems biology and ecology, using methods from network science]. These include partitioning models, isomorphism-aware models and distribution-tree models. The applications of this include areas in photosynthetic enzymes or deep insect phylogeny transcription data, and he has done work looking into algae, as well as bacteria and viruses such as HIV and SARS-CoV-2 (which causes COVID-19).
On the larger end of the scale is mathematician Quanling Deng, whose research focuses on mathematical modelling and simulation for large-scale media, such as oceans and atmosphere dynamics, as well as Antarctic ice floes.
The best part is when we discover that a problem from one domain has actually been already solved in another, and even better when we discover one experienced in multiple domains so we can scale super linearly. It’s great when one solution has multiple areas of impact. And how often would you find a computational neuroscientist working alongside a plasma physicist? It just doesn’t normally happen.
As well as working with your research group, you’re also deputy director of the Australian National University’s School of Computing. Can you tell us a bit about that role?
It’s largely an administrative role. So as well as working with an amazing group of computer scientists across data science, foundational areas in languages, software development, cybersecurity, computer vision, robotics and so on, I also get to create opportunities for new people to join the school and to be the best version of themselves. A lot of my work in the leadership role is about the people. And this includes recruitment, looking after our tenure-track programme and our professional-development programme as well. I’ve also had the opportunity to start some new programmes for areas that I thought needed attention.
One such example was during the global COVID pandemic. A lot of us were shut down and unable to access our labs, which left us wondering what we can do. I took the opportunity to develop a programme called the Jubilee Joint Fellowship, which supports researchers working at the interface between computer science and another domain, where they’re solving grand challenges in their areas, but also using that domain knowledge to inform new types of computer science. The programme supported five such researchers across different areas in 2021.
I am also the chair of the Pioneering Women Program, which has scholarships, lectureships and fellowships to support women entering computing and ensure they’re successful throughout their career with us.
And of course, one of my other roles as deputy-director is to look after computing facilities for our school. I look at ways that we can diversify our pipeline of resources to get through tough times, like during COVID, when we couldn’t order any new equipment. I also look into how we can be more energy efficient, because computing uses an enormous amount of energy.
It must be a very exciting time for people doing research in ML, as the technology is finding so many different uses. What new applications of ML are you most looking forward to in your research?
Well, probably some of the ones you’re already hearing about, namely AI. While there are risks associated with AI, there’s also enormous opportunity, and I think that generative AI is going to be particularly important in the coming years for science – provided we can overcome some of the issues with it “hallucinating” [when an AI system, such as a large language model, generates false information, based on either a training data-set or contextual logic, or a combination of them both].
No matter what area of science we’re in, we’re restricted by the time we have, the money, the resources and the equipment we have access to. It means we’re compromising our science to fit these limitations rather than focusing on overcoming them
But no matter what area of science we’re in, whether computational or experimental, we’re all suffering under a number of restrictions. We’re restricted by the time we have, the money, the resources and the equipment we have access to. It means we’re compromising our science to fit these limitations rather than focusing on overcoming them. I truly believe that the infrastructure shouldn’t dictate what we do, it should be the other way around.
I think generative AI has come at the right time to enable us to finally overcome some of these problems because it has a lot of potential to fill in the gaps and provide us with an idea of what science we could have done, if we had all the resources necessary.
Indeed, AI could enable us to get more by doing less and avoid some of the pitfalls like selection bias. That is a really big problem when applying ML to science data sets. We need to do a lot more work to ensure that generative methods are producing meaningful science, not hallucinations. This is particularly important if they’re going to form the foundation for large pre-trained models. But I think this is going to be a really exciting era of science where we’re working collaboratively with AI, rather than it just performing a task for us.