These archived snapshots can be found by searching web archives for the original URI http://blogs.law.harvard.edu/futureoftheinternet/2013/09/22/perma/:
More generally, every existing web archive supports searching for archived snapshots by means of the URI of the original resource of which the archive took (a) snapshot(s).
Also, all major web archives support the
Memento protocol that allows accessing archived snapshots by using the URI
of the original resource. In addition, some web archives have
bespoke APIs that provide such functionality.
Using the ongoing example, this means that, in case
perma.cc would temporarily or permanently go off-line, snapshots can still be found in other
web archives.
However, in the common approach to address reference rot,
the original URI is replaced by the URI of the archived snapshot.
As a result, this approach prohibits retrieving snapshots from other web archives, making its functioning totally dependent on the continued existence of the
web archive that assigned the URI for the archived snapshot. From the perspective of web persistence this can hardly be regarded as a satisfactory solution.
In order to maximize chances of future retrieval of snapshots of a linked resource, that resource's original URI must be maintained when linking to it.
The Case for Structured Temporal Context on Links
This temporal context information has, so far, been included in a way that is helpful for human consumption.
Despite the many variations in expressing the information that is relevant for a web citation, a user
can interpret it and connect the dots. Also, temporal context information has so far only been included in formal web citations.
However, since all links are subject to reference rot, addition of such information should not be limited to formal citations of web resources,
but should rather be applicable to all links to web resources.
There are compelling reasons to express temporal context information in a structured manner on links
to support use by applications such as browsers, crawlers, search engines:
- The many variations in expressing web citation information makes machine interpretation challenging.
- In the current representation of information, the linkedurl and the versionurl
look like two independent URIs despite the tight - temporal - relationship between them.
- The approach used for formal web citations can not be used for links in general because it would e.g. require
adding two links to the same anchor text.
- The versionurl, if provided in a structured manner, can be used by applications such as browsers,
to indicate and provide the option to retrieve the archived snapshot of the linked resource.
- The combination of the linkedurl and the versiondate, if provided in a structured manner, can be used by applications
such as browsers, to indicate and provide the option to obtain an archived snapshot of the linked resource that is temporally near to the versiondate,
even if no versionurl is provided.
The Memento protocol that specifies content negotiation
in the datetime dimension provides this functionality in an interoperable manner, but it could also be provided by leveraging bespoke APIs of web archives.
The question then arises how to best convey the temporal context information
so that applications can use it. And how to do so in a uniform manner,
i.e. a manner that is independent on the venue conveying the information.
With this regard, it is interesting to observe that in 1995,
the definition of the anchor element
included an optional URN attribute, possibly/likely provided to address concerns regarding web persistence.
The attribute was deprecated and it is probably a fair guess that this happened because no infrastructure
existed to act upon URNs. The
HTML 5 development page for the anchor element
includes a reminder that the URN attribute is obsolete.
There are several reasons to revisit the inclusion of attributes related
to web persistence in select HTML elements, most importantly the anchor element
<a>
:
- There is a growing concern regarding persistence at least in some pockets of the web:
- Wikipedia has an active Link rot thread
looking into the problem domain.
- The Hiberlink project
and CrossRef's OpCit explore the problem for scholarly communication.
A pilot study that led to Hiberlink found disconcerting percentages
of link rot and lack of archival versions for web resources
referenced in the arXiv.org preprint collection and the thesis repository of the University of North Texas.
- Reference rot has become a significant concern in legal cases
that depend on web resources. The Perma effort at Harvard University has emerged to try and ameliorate the problem.
- The Modern Language Association style for citing web resources
no longer mandates the inclusion of the cited URI because "Web addresses are not static".
- Infrastructure has emerged that can play a role in achieving an
increased degree of web persistence: