Living cells store genetic information as strings of symbols in a way that is very similar to the storage of data in books or on computer disks. This information is written in a four-letter alphabet that is somehow read to produce the proteins that do just about everything in the body. The four letters of this alphabet are the four nucleotides found in DNA - adenine (A), cytosine (C), guanine (G) and thymine (T). Proteins are made from amino acids, and since there are 20 amino acids, the process by which proteins are produced is frequently likened to a translation from one language with a four-letter alphabet (the nucleotides in the DNA) to another language with a 20-letter alphabet (amino acids and proteins).

However, this simple model overlooks a crucial fact about proteins: the DNA code just lists the order in which the amino acids combine; for the amino acids to form a protein - for them to become alive - the extended chain of amino acids must "fold" into a compact globular object with exactly the right shape. This folding process is akin to understanding language, rather than just merely mechanically manipulating symbols independent of the context and meaning.

The idea that protein folding is the process responsible for understanding the genetic information has been around for a long time on a qualitative hand-waving level. In the March issue of Physics World, Alexander Grosberg of the University of Minnesota, USA, describes how Thomas Fink, now at the Ecole Normale Supérieure in Paris, France, and Robin Ball, now at Warwick University in the UK, have been able to raise it to the level of an exact quantitative statement (Phys. Rev. Lett. 2001 87 198103).