Robert P Crease describes his adventures creating a database to tackle the history of physics
I’ve gone over to the dark side.
That, at any rate, was what one historian told me when I mentioned I was working on a database with imaging capabilities. Databases, I was informed, are booby traps for historians. Convenient boxes for storing data, they only return reshuffled versions of what you put in. History is about the unboxable motives and meanings that give rise to events. I was ceasing to be a true historian, my colleague concluded, and about to become a big-data wonk.
I did not mean to go to the dark side. It was forced on me by my attempt to understand how research develops at synchrotron light sources – central facilities that produce multiple tunable and intense beams of X-rays. Half a century ago, researchers from academia or industry who needed information about a process or material would run an experiment at one of a synchrotron’s many beam lines. They would take the results back to their labs, generate new questions and then do another experiment. A historian could easily describe this process, which consisted mainly of short and simple feedback loops, by consulting the facility’s annual reports.
No longer. The research at synchrotrons – and at other materials-science facilities with a large user base – is teeming with webs of instruments, industries and interests. A team from China, say, could be doing research on an instrument built by a German company that’s been installed temporarily at a US synchrotron, working alongside art historians, chip makers and government scientists. The same instrument can support several different research programmes simultaneously, while the same programme can use different instruments at the same facility.
What we historians like to call the research “space” at a synchrotron facility is therefore less like a set of feedback loops and more like a rapidly changing ecosystem.
How to get a grip
I recently spoke to someone who had worked at Brookhaven National Laboratory’s National Synchrotron Light Source (NSLS) in its early days in the mid-1980s. He described an evening at the bar with friends placing bets on which research carried out at a synchrotron source would be the first to get a Nobel prize. Like his buddies, he’d bet on some aspect of condensed-matter physics, probably involving superconductivity. “Nobody ever dreamed it would be for structural biology!” he exclaimed.
Once a minor player at synchrotron sources, structural biology unexpectedly zoomed in importance – and has claimed all synchrotron-related Nobels given out so far (four, I count, all in chemistry). But how can a historian discover why, with hard data to back it up? Ditto other seemingly simple questions, such as: how has the ecosystem evolved over the years? Which instruments and techniques drive the ecosystem? Which industries put the most into this ecosystem and get the most out? What research paths lead to applied products? How might such paths be improved?
Answering such questions will not only help us to understand the history of physics and physics facilities, but also improve how we plan research and train new scientists. Unfortunately, finding the answers by reading all the NSLS annual reports on the shelf in my office – several feet worth of dense information – would be a time-consuming, if not Sisyphean task. So to see if digital tools could help, I contacted Elyse Graham, a Stony Brook colleague who is a professor in the booming field of “digital humanities”. I wanted to see how to identify key markers, mine data about them from available sources, and then put that information in a database. By mapping and imaging that information – not in spreadsheet-style but the way sociologists map and image demographic information – I hoped to guide my understanding of research ecologies.
Having already used software tools to analyse James Joyce’s Ulysses, Graham was hugely encouraging and we began to collaborate. Many other historian colleagues, however, were sceptical. Databases, they warned, are confined to fixed categories. The way they select and process data is not transparent. What’s more, databases can take over your thinking by leading you to pose only those questions that can be solved by consulting them.
But digital humanists, I discovered, know about these dangers and have been trying to address them for some time by establishing better research practices and designing superior tools. In any case, to say that databases are confined to fixed categories is not a real objection, Graham pointed out – after all, the same is true of traditional sources of historical information. Even those annual reports were assembled by lab administrators who picked the types of information useful for their purposes.
As for the lack of transparency, it’s possible for the structure and operation of databases to be made clear enough to users, even if this requires historians to acquire a new skill that Graham calls “production literacy” – describing, designing and using the software for digital tools. Finally, if a database is designed with care, it can not only re-discover existing relationships but also unearth new patterns and relationships, such as in the research ecology at a synchrotron. Historians can then ask new questions to new informants – not just scientists but technicians, programme planners and grant administrators too.
The critical point
A database and its imaging capabilities, in short, is like having a highly flexible and sophisticated map. Maps don’t do your moving for you, but allow you to move more confidently in complex and unfamiliar landscapes. Databases will never take over the function of historians because unravelling the motives, meanings and interpretations that underlie scientific events will always be a job for humans. But today, those humans who take on this task will often need to rely on big-data tools. Now that I’m on the dark side, I’ve finally seen the light.