Web searches can peak and trough in line with the prevalence of flu, dengue fever, kidney stones, stroke, deaths by suicide and depression. Now researchers from Austria and Portugal have found that internet data can also indicate corn planting and harvest dates in the US.
“This paper is the first to appear on an environmentally related application of web-search data, which we feel has great potential in other environmental areas,” Marijn van der Velde of the International Institute of Applied Systems Analysis (IIASA) told Physics World. “We believe this is only the first glimpse of a host of options that may become possible once search volumes increase around the world and access to absolute search volumes becomes possible.”
Van der Velde and colleagues from IIASA and Portugal’s University of Aveiro used Google Insights for Search to analyse corn (maize) planting and harvesting data in the US state-by-state and week-by-week.
Crop-calendar dates are invaluable for monitoring and modelling. At the moment, according to van der Velde, they are either derived directly from climate data or are based on old information from the 1990s/2000s, for example the crop-calendar dataset of Sacks et al compiled in 2010.
But climate is not necessarily a good indicator of planting date since social and technological factors may also come into play. Van der Velde said that when Waha et al (2011) simulated global-planting dates based on climate conditions and crop-specific temperatures, the simulated dates were accurate within one month (giving a three-month window) for only 50% of the grid cells.
“The search data effectively encapsulates many of the different factors that affect planting dates such as weather and technology,” he said. “In a practical sense the results of our work are comparable to global datasets on planting dates (Sacks et al 2010) with the advantage that our results are up to date.”
Web searches in the US for corn planting and harvesting peaked at the end of May and the middle of October. Use of Google Correlate indicated a strong link between those searching for corn planting terms and investigating other agricultural issues.
At a state level, search data for corn planting week-by-week generally agreed well with a global crop-calendar dataset. Iowa and Nebraska, which are among the highest corn-producing US states, both showed high search volumes for the term “corn planting”. Data trends for harvest searches were not so clear, perhaps because some people were searching for the term for cultural reasons such as Thanksgiving.
The team’s efforts to look at wheat and soybean in the same way were foiled by inadequate search volumes. At the moment Google Insights for Search only provides data below a national level and week-by-week for relatively popular search terms. It also normalizes searches according to the highest search volume, rather than providing absolute search volumes.
“Once internet use is increasing in data-poor regions it may well be possible to obtain data on, for instance, crop-disease infestations before a capacity to monitor and report these infestations has been established,” said van der Velde.
Monitoring real-time web-search activity also has potential for predicting how farmers are responding to climate change. “Other agronomical uses could include combining real-time weed and pest infestation searches with remotely sensed satellite observations of phenological development,” said van der Velde. “For instance, in 2009, the searches for ‘weeds’ and ‘blight’ – including ‘potato blight’ – both peaked in comparison to other years, which corresponds to online reports of blight impacting both potatoes and tomatoes.”
At the moment search volumes limit widespread use of the method. “Given the exponential increases in search volumes over the last five years and increasing internet penetration in the developing world, the real future benefit will be in deriving crop calendars in those countries where such information is currently sparse or unreliable,” said van der Velde. “Ultimately, we anticipate that a combination of different types of crowd-sourced information such as pictures, data, reports and observations can be analysed in parallel to improve the understanding of our interactions with and in the environment and improve our capacity to respond adequately.”
The team reported the study in Environmental Research Letters (ERL).