Papers introducing concepts that have since become common knowledge are often under-cited by researchers, skewing those articles’ true impact. That’s the conclusion of new study using machine learning to identify “foundational” work in science that is often not properly cited. Being able to count such hidden citations could provide more accurate bibliometric measures of impact, the study says. (PNAS Nexus 3 pgae155).
The number of times a paper is cited is widely seen as a marker of its scientific credibility. But some concepts or ideas are so well known that no-one cites them. It would be unusual for an article on, say, general relativity to refer to Albert Einstein’s original 1915 paper on the subject. Xiangyi Meng, a physicist at Northwestern University in the US, who led the new study, calls such non-references “hidden citations”.
In their work, Meng and colleagues used a machine-learning model to analyse one million papers on the arXiv preprint server. It detected catchphrases that suggest specific discoveries and then linked each to at least one foundational paper. The researchers identified 343 topics in physics that accumulate hidden citations, each of which has at least one catchphrase.
The researchers found that the ratio of hidden citations – i.e. citations that should have been made but were not – to actual citations for foundation papers was, on average 0.98:1, suggesting that papers usually acquire hidden citations at the same rate as citations.
Some publications, however, acquire much higher rates of hidden citations. Alan Guth’s 1981 paper that introduced cosmological inflation theory, for example, has 8.8 times more hidden citations than actual citations.
In another example, their model predicts that the phrase “quantum discord” – a quantity that relates two subsystems of a quantum state – should in principle be accompanied by a reference to a 2001 paper by Harold Ollivier and Wojciech Zurek. The algorithm found that hidden citations account for 34.6% of all detectable credit for the “quantum discord” paper.
Foundational papers that acquire hidden citations are nevertheless still highly cited, with an average of 434 citations, compared with an average of 1.4 citations for all physics papers.
Meng adds that when they count hidden citations, the order of the top 100 cited papers in physics changes. Many publications drop down the pecking order, such as Juan Maldacena’s 1999 work on anti-de Sitter/conformal field theory. Lying top for explicit citations, it falls to second in the revised charts mostly because it has a large number of hidden citations.
A few papers with high numbers of hidden citations show significant increases. Guth’s 1981 paper, for example, jumps from eighth place to top spot, overtaking Maldacena’s paper. “Without hidden citations, citation ranks don’t really mean anything,” Meng adds.
Community acceptance
To explore the impact of hidden citations on authors, the researchers used Microsoft Academic Graph’s “author saliency” metric. It judges the academic impact of scientists using a range of metrics, such as the connectivity of articles, authors and journals as well an author’s citation count.
Citations in science are biased towards a handful of nations – and the gap is growing
The team found that authors with more hidden citations also have a higher author saliency, with this effect particularly notable for those with lower numbers of citations. In other words, while these authors have credibility and reputation, citation counts are not fully capturing the true impact of their work.
“Authors with more hidden citations actually have a higher impact, they appear to be more reputational than those authors with fewer hidden citations,” says Meng. “If you have hidden citations, it means that your concept, your work has been widely accepted by the community.”
Mang explains that hidden citations are also inevitable given that it is difficult for researchers to cite every paper or concept used in their work, which is why, he says, it is important that they are counted in some way.