An international team of information scientists has begun a two-year study to investigate how web links in scientific and other academic articles fail to lead to the resources being referenced. This is the focus in the Hiberlink project in which the team from the University of Edinburgh and the Los Alamos National Laboratory (LANL) will assess the extent of ‘reference rot’ using a vast corpus of online scholarly work. It is funded by a grant of US$496,000 (£310,000) from the US-based Andrew W. Mellon Foundation, coordinated by EDINA, the Jisc-designated online services centre at the University, which serves the needs of universities and colleges across the UK.
Increasingly, web-based scholarship includes links that point at resources needed or created in research activity, including software, datasets, websites, presentations, blogs, videos etc as well as scientific workflows and ontologies. These referenced resources often evolve over time, unlike traditional scholarly articles. The ‘reference rot’ problem occurs whenever the original version of a linked resource is not available any more.
The problem has two aspects. First, the http:// link that references a resource may no longer work. Second, the content at the end of the link may have evolved and may even have become dramatically different from when originally referenced. So when eventually a researcher revisits an online scholarly work, and double-checks referenced resources to confirm evidence or establish context, the original information on websites or in online databases may have changed or even ceased to exist. The same is true for policy, decision-making or practical purpose.
The Hiberlink project builds directly upon a pilot study from LANL, powered by their Memento “Time Travel for the Web” technology that confirmed that as much as 30% of the http:// links in a selection of 400,000 arXiv.org papers did not function and that 65% of the remaining links referred to a resource that was not archived, and hence in danger of disappearing without a trace. Using the text mining and information extracting tools by the Language Technology Group (LTG) at the University of Edinburgh School of Informatics, the project will examine a vast corpus of scholarly publication in order to assess what links still work as intended and what web content has been successfully archived and therefore preserved for use by future researchers and students.
The ultimate goal for the Hiberlink project is to identify practical solutions to the ‘reference rot’ problem, and to develop approaches that can be integrated easily in the publication process. The project intends to work with academic publishers and other web-based publication venues to ensure more effective preservation of web-based resources so to increase the prospect of continued access for future generations of researchers, students and their teachers.
The project runs for 24 months from March 2013.