Skip to main content

Topics

Biophysics and bioengineering

Biophysics and bioengineering

From statistical physics to bioinformatics: using data science for biological discoveries

08 Sep 2021 Laura Hiscott
Taken from the September 2021 issue of Physics World where it first appeared under the headline "Physics for biological breakthroughs". Members of the Institute of Physics can enjoy the full issue via the Physics World app.

It might not sound like an obvious place for a physicist to work, but the European Molecular Biology Laboratory (EMBL) is highly multidisciplinary, employing people from across all scientific fields. Laura Hiscott speaks to Wolfgang Huber, a physicist at EMBL who uses his mathematical skills to contribute to the life sciences

Wolfgang Huber
Getting physical Wolfgang Huber uses his numerical skills to develop methods for analysing and interpreting large biological datasets, which can be used to make biological discoveries. (Courtesy: Sabine Arndt)

Wolfgang Huber was always fascinated by how the world works. As a kid, he used to love construction sets and building little machines – anything with motors and gears. Later, he liked taking apart old TVs and making his own electronic circuits, and got into computer programming, with home computers like the Sinclair ZX81. However, Huber felt that he was doing things a bit haphazardly, and that he was missing something – the underlying theory.

No-one in Huber’s family had studied science, but as a boy growing up in rural Germany, he came across two books about physics in his local village library. Written by the Nobel-prize-winning physicist Emilio Segrè, the books – From Falling Bodies to Radio Waves and From X-rays to Quarks – describe the history of physics through the lives and discoveries of important physicists. “I was fascinated by this period around the turn of the last century when relativity and quantum mechanics emerged,” recalls Huber. “I wanted to learn more, so I enrolled in a physics degree course at the University of Freiburg.”

Initially, Huber spent the summers working in factories to help pay for his studies, but programming jobs were starting to become available offering better pay and conditions. He found a job in the neurology department of his university, coding computer models of neuroscientists’ data.

“It was accidental that I got a job in the clinic,” he says, “but I quickly grew fond of it. Although all the biology and neuroanatomy seemed quite intimidating to me as a physicist and mathematician, I realized that there was a huge need in the biomedical field for more quantitative skills.”

Not yet set on a particular field, Huber did his PhD in a department focused on statistical physics. His project was on theoretical laser physics, modelling how atoms and molecules absorb and emit photons. The idea was to use stochastic processes to simulate quantum jumps. Although this was not directly related to his neurology work, the mathematical tools he was using were similar. Indeed, other scientists in the same department were using dynamical systems models to model neurological data.

Change of scene

After finishing his PhD in 1998, Huber did a postdoc at IBM in California, working on cheminformatics. “I had done my PhD in the same place as my undergrad,” he says, “so I longed for a change of scenery, both geographically and topic-wise. Going to Silicon Valley was such a cool opportunity.” At the time, IBM was seeking to apply its database technologies to the needs of pharma companies that wanted to search for molecular structures in their large compound libraries. Huber was creating tools that drug developers could use to look for molecules based on their 3D shape. For example, if a high-throughput screening tested lots of molecules and got some hits of potential drugs, then Huber’s similarity search could find similarly shaped molecules that might also be promising.

During his time at IBM, Huber enrolled in evening classes for professionals at Stanford University and the University of California, Berkeley, to learn some more biology. He found out about the Human Genome Project and the then new technology of microarrays – microscopic samples of DNA attached to chips, which scientists use to study genetics. Huber sensed that biology was in the midst of a revolution, reminding him of the great upheaval in physics in the early 20th century that had inspired him to study the subject in the first place. “I felt like the same spirit was now present in biology,” he says. “It was really exciting, and I realized that computational and data science skills were in great demand.”

Deciding therefore to move into the life sciences, Huber found a postdoc position at the German Cancer Research Center (DKFZ) in Heidelberg, Germany. “My mother had died of cancer some years before,” he says. “At the time it had been an abstract and mysterious disease to me, but I decided that I wanted to learn more about it, and maybe even make a small contribution.” He therefore moved back to Germany in 2000, working at the DKFZ on cancer transcriptomics – the study of the role of mRNA molecules (which carry instructions for protein production in cells) in cancer development.

It was after that second postdoc that Huber moved to the European Molecular Biology Laboratory (EMBL), which in 2004 offered him his first position as a group leader. EMBL has six sites, and the one he joined was EMBL’s European Bioinformatics Institute (EMBL-EBI) in Hinxton, near Cambridge, UK. Here, he got involved in new developments in statistical computing, including Bioconductor – an open-source collaboration creating tools to analyse large quantities of genomic data from molecular biology labs. In 2009 a genome biology unit was opened at EMBL’s Heidelberg site, and he moved there as a statistician.

Part of EMBL’s mission is to train scientists in molecular biology research techniques at all levels, from interns and Master’s students to first-time group leaders, who can then take those new skills back to institutes and companies in the member states. This is why group leader jobs at EMBL are usually limited to nine years, with recruitment based on candidates’ potential, rather than on long-proven accomplishment.

However, there are a small number of open-ended positions, for leadership, management and people with specific technical expertise, where staff can stay for longer than nine years. “I was expecting to move on,” says Huber, “but I guess somebody decided that it would be good to keep a statistician in the house, and EMBL made me an offer that I couldn’t resist.”

Data analysis

Today, Huber continues to work as a group leader in Heidelberg, where his research team has three main aims. The first is to develop new statistical methods that allow others to make new biological measurements, or analyse big and complex datasets. “For instance, we might make noisy measurements of the mRNA levels of thousands of genes in millions of cells taken from a hundred different tumour specimens,” he says, “and then we fit a complex high-dimensional model to the data. We use techniques from machine learning and Bayesian inference to do this.”

The second aim is to make scientific discoveries by collaborating with biologists and medical researchers who have interesting new data that Huber’s group can analyse and interpret. One such collaboration recently led them to publish a paper about a subgroup of patients with a particular type of leukaemia. “Knowing that this subgroup exists can help in therapy planning,” Huber says, “because people with one type might respond differently to a certain treatment than people with another.”

A third aspect of the group’s work is developing software packages that other people can use, to benefit the research community more widely. Huber’s team continues to contribute, for example, to the open-source project Bioconductor. This ties in with another part of EMBL’s mission, which is to create resources for the life-science community, that any researchers across the world can use.

To accomplish all of this, Huber’s lab draws on a whole range of disciplines, with his team including theoretical physicists, computational physicists, statisticians, mathematicians, biologists, a pharmacologist and an immunologist. “People can move into bioinformatics from different directions,” he says. “Someone with a physics background may initially tend towards method development, while someone with more biological training might use the methods to make biological discoveries. With time, many people become more confident and get involved in both sides.”

Huber’s role also involves mentoring other scientists, writing grant proposals and reviewing papers, and serving on committees to draw up new policies and strategic areas of engagement for EMBL. It’s rewarding work both in the lab and beyond. “My favourite part is interacting with people, mentoring, and creating new methods that allow us to see things that we couldn’t see before,” he says.

You can start digging anywhere and quickly find something that’s unknown or poorly understood

Wolfgang Huber

He advises prospective scientists that learning is a lifelong process, and that the content of your degree might be outdated in 20 years’ time, so it’s important to stay up to date and keep your eyes open to new opportunities. And even if your background is in physics, you shouldn’t be afraid of biological research. “The nature of knowledge in physics tends to be ‘vertical’, and you often have to spend years learning and climbing the tower to get to the boundaries of our knowledge,” he says. “Biology, in contrast, is much more spread-out, but the knowledge is not as built-up. You can start digging anywhere and quickly find something that’s unknown or poorly understood, so it’s possible to make original contributions quite quickly.”

Among the exciting technological developments in the field, Huber cites microscopy, single-cell sequencing and CRISPR technology, which have taken huge strides, or even only been developed in recent years. Indeed, Huber believes that the COVID-19 pandemic has showcased what life scientists can do. “The fact that it was possible to sequence the virus within a few days of identifying it and put that information online, so that scientists elsewhere could synthesize it and start developing vaccines – it shows how far we’ve come. And the pace of progress is still really fast. It’s breathtaking.”

EMBL at a glance

EMBL

Why was it set up? The European Molecular Biology Laboratory (EMBL) was founded in 1974 with the mission of promoting molecular biology research in Europe, training young scientists, and developing new technologies. One of its founders was the physicist Leo Szilard, and the library at the Heidelberg campus is named after him. EMBL trains and supports people at all stages of their scientific careers, taking interns, Master’s students, PhD students, postdocs, technicians, staff scientists and group leaders. There is a high turnover and dissemination of ideas as people move on to other institutes or industry.

Number of member states: 27

Where is it based? EMBL has six sites: Barcelona, Grenoble, Hamburg, Heidelberg, Hinxton (near Cambridge) and Rome.

How many staff? EMBL employs about 1800 people across its six sites.

Main areas of research: Barcelona: tissue biology and disease modelling; Grenoble: structural biology; Hamburg: structural and infection biology; Heidelberg: cell biology and biophysics, developmental biology, genome biology and structural and computational biology; Hinxton: bioinformatics; Rome: epigenetics and neurobiology.

EMBL recruits physicists at all levels – PhD students, postdocs, technicians, staff scientists and group leaders – to work on topics ranging from theoretical biology and bioinformatics to data science and instrument development. It also runs a technology development programme called Career Accelerator for Research Infrastructure Scientists (ARISE), for people interested in this aspect of research. Applications for ARISE are open from 1 September until 31 October 2021.

Copyright © 2021 by IOP Publishing Ltd and individual contributors