Google could be a good way of measuring the "impact" of a particular scientific paper and might even be used to replace traditional citation indices, according to a new statistical analysis by physicists in the US. The researchers have found that the Google PageRank algorithm, which measures the relative importance of Web pages, can provide a systematic way to find important papers. The technique also uncovers scientific "gems" -- top papers overlooked by conventional searches (physics/0604130).
Scientists usually measure the importance of a paper by counting the number of times it is cited by other papers. However, the technique is not always reliable. It can, for example, overlook papers with relatively few citations that have nonetheless had a great influence on physics. One example is Richard Feynman and Murray Gell-Mann’s 1958 publication “Theory of the Fermi Interaction”, which introduced a new theory that subsequently became the “standard model” of weak interactions. This was one of the papers discovered with the new technique.
Sidney Redner and Pu Chen of Boston University and Huafeng Xie and Sergei Maslov at the Brookhaven National Laboratory now propose a new technique to unearth such papers using the Google PageRank algorithm. In their study, the researchers simply applied this algorithm to the entire network of citations for all articles in the Physical Review family of journals published between 1893 to June 2003. The network in the experiment consists of 353,268 “nodes”, which represent all articles published during this time, and 3,110,839 “links” that represent all citations to Phys. Rev. articles from other Phys. Rev. articles.
The algorithm involves launching many random “walkers” on the network of citations. Half the time, each walker jumps from a paper to one of its references (each with equal probability) and the rest of the time the walker jumps to random papers in the entire network. This hopping process is repeated until the populations of random walkers at each node becomes statistically constant. The average number of walkers at a given node in the network is the Google number.
The team found that the results from the PageRank technique are linearly correlated with those obtained from citation indices. In other words, highly cited papers also have high Google rank numbers. However, the team was surprised to find a few “outliers” — exceptional papers that have anomalously high Google rank numbers compared with their citation rank.
Examples of such “classic” papers are:
*1933 Phys. Rev. paper by Wigner and Seitz, “On the Constitution of Metallic Sodium” – the Wigner-Seitz construction appears in most solid-state textbooks;
*1957 Phys. Rev. publication by Gell-Mann and Brueckner, “Correlation Energy of an Electron Gas at High Density”, which is important for many-body theory;
*1963 Phys. Rev. Lett. paper by Glauber, “Photon Correlations”, recognized in last year’s Nobel Prize for physics.
“I imagine using Google PageRank to help organize scientific literature searches,” says Redner. “The technique might also emerge as a more useful measure of scientific impact than merely the number of citations alone,” he adds. The method will also help unearth recent important papers — not just older gems.
However, Maslov warns that the technique should not become the only way to do literature searches and that scientists should continue to randomly browse papers the “old-fashioned” way. As he points out, a mediocre paper that temporarily appears near the top of a Google ranking list could attract a disproportionately large number of citations. “The high status of this paper would then become a self-fulfilling prophecy,” he says. “Another danger is that a preferential placement could be either bought (think of ‘sponsored links’ on Google) or spontaneously generated because of inherent fluctuations in the algorithm itself.”