Robust Links - Specification


Ongoing explorative and experimental revision:
   https://robustlinks.org/spec/

Authored by:
   Martin Klein - Los Alamos National Laboratory
   Shawn M. Jones - Los Alamos National Laboratory
   Harihar Shankar - Los Alamos National Laboratory
   Richard Wincewicz - University of Edinburgh
   Michael L. Nelson - Old Dominion University
   Herbert Van de Sompel - Data Archiving and Networking Services (DANS)

Abstract

Robust Links provide multiple pathways to revisit a link's original content, even a long time after the link was put in place. This document describes approaches to robustify links in HTML pages. All approaches assume that, when linking to a web resource, a snapshot of the state of that resource is created, for example, in a web archive or a versioning system. When linking, the URI of the resource, the URI of the snapshot, and the datetime of linking are conveyed.

Table of Contents

1. Why robustify links?

Robust Links are intended provide multiple pathways to revisit a link's original content over time. Check out this document that describes the motivation and rationale for robustifying HTML links.

2. The Robust Links approach

The approach assumes that, when linking to a web resource, a snapshot of the state of that resource is created, for example, in a web archive or a versioning system. Several web archives provide services that allow taking such snapshots, and versioning systems take them automatically. With a snapshot taken, a link can be robustified by including:
  • The URI of the original resource;
  • The URI of a snapshot of the original resource;
  • The datetime of linking, of taking the snapshot.
This information, when provided in a machine-actionable manner, allows:
  • Revisiting the original resource, which meanwhile may have changed, some time after linking;
  • Visiting the snapshot that was taken around the time of linking;
  • Finding snapshots that are temporally close the one taken, in case the snapshot itself becomes temporally or permanently inaccessible.
The Robust Link approach is to convey this information on a link by leveraging HTML5's attribute extensibility mechanism. It introduces the following data- attributes for the anchor (<a>) element:
  • data-originalurl for the URI of the original resource;
  • data-versionurl for the URI of the snapshot;
  • data-versiondate for the datetime of linking, of taking the snapshot.
The remainder of this document details how to use these attributes for various cases.

3. Robustifying a link when linking to the original resource

If the main intent is to link to an original resource but also allow future users of that link to see the state of the original resource around the time the link was put in place, then Robust Link information is conveyed as follows:
  • href for the URI of the original resource;
  • data-versionurl for the URI of the snapshot of the original resource;
  • data-versiondate for the datetime of linking, of taking the snapshot.
For example, assume that we created a Robust Link to https://www.w3.org/ on January 21 2015.
  • In case we created a snapshot of the resource, as recommended, the Robust Link to the W3C home page would look like this:
    <a href="https://www.w3.org/"
       data-versionurl="https://archive.today/r7cov"
       data-versiondate="2015-01-21">Robust Link to the W3C home page</a>
    
  • In case we did not create a snapshot of the resource, the Robust Link to the W3C home page would look like this:
    <a href="https://www.w3.org/"
       data-versiondate="2015-01-21">Robust Link to the W3C home page</a>
    

4. Robustifying a link when linking to a specific version

If the main intent is to link to a specific state of an original resource, for example a snapshot of the original resource in a web archive or a version in a version control system, then Robust Link information is conveyed as follows:
  • href for the URI of the specific version, i.e. the URI of the snapshot of the original resource;
  • data-originalurl for the URI of the original resource;
  • data-versiondate for the datetime of the specific version.
For example,
  • Assume that we created a Robust Link on January 21 2015 that was primarily intended to convey the state of https://www.w3.org/ on that day. In order to do so, we created the snapshot https://archive.today/r7cov. In this case, the Robust Link to this specific version of the W3C home page looks like this:
    <a href="https://archive.today/r7cov"
       data-originalurl="https://www.w3.org/"
       data-versiondate="2015-01-21">Robust Link to this specific version of the W3C home page</a>
    
  • Assume that we created a Robust Link on January 21 2015 that was primarily intended to point to the version of https://en.wikipedia.org/wiki/Web_archiving that was operational on that day, which is https://en.wikipedia.org/w/index.php?title=Web_archiving&oldid=637465880. In this case, the Robust Link to this specific version of the Wikipedia page looks like this:
    <a href="https://en.wikipedia.org/w/index.php?title=Web_archiving&oldid=637465880"
       data-originalurl="https://en.wikipedia.org/wiki/Web_archiving"
       data-versiondate="2015-01-21">Robust Link to this specific version of the Wikipedia page</a>
       
  • Assume that we created a Robust Link on January 21 2015 that was primarily intended to point to the version of https://en.wikipedia.org/wiki/Web_archiving that was operational on March 20 2012, which is https://en.wikipedia.org/w/index.php?title=Web_archiving&oldid=485347845. In this case my Robust Link to this specific version of the Wikipedia page looks like this:
    <a href="https://en.wikipedia.org/w/index.php?title=Web_archiving&oldid=485347845"
       data-originalurl="https://en.wikipedia.org/wiki/Web_archiving"
       data-versiondate="2012-03-20">Robust Link to this specific version of the Wikipedia page</a>

5. Acknowledgements

The Robust Links approach is an outcome of the 2013-2015 Hiberlink project that was funded by the Andrew W. Mellon Foundation. The project was a collaboration between The University of Edinburgh (Edina and the School of Informatics) and the Los Alamos National Laboratory (Research Library Prototyping Team), with respective Principal Investigators Peter Burnhill and Herbert Van de Sompel.

The project researched the extent of reference rot (the combination of link rot and content drift, the latter a term originally coined by the project) for links to web at large resource found in scholarly papers. The research leveraged several large scholarly corpora and the major findings were published in two papers:
  • Klein, M., Van de Sompel, H., Sanderson, R., Shankar, H., Balakireva, L., Zhou K., and Tobin, R. (2014) Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. PLoS ONE, 9(12): e115253. https://doi.org/10.1371/journal.pone.0115253
  • Jones, S., Van de Sompel, H., Shankar, H., Klein, M., Tobin, R., and Grover, C. (2016) Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content. PLoS ONE, 11(12): e0167475. https://doi.org/10.1371/10.1371/journal.pone.0167475
The original version of the Robust Links specification also included the following authors: Harihar Shankar (Los Alamos National Laboratory), Richard Wincewicz (University of Edinburgh).