Memento Tools - Apache Rewrite Rules

Last updated: January 19, 2015

Configuring an Apache server for Memento

For Apache servers, the following configuration can be used to add the HTTP Link header to a TimeGate on an archival server.  It uses mod_rewrite and mod_header which are typically available in the standard distribution, and hence does not require compiling or installing any code.   It adds the Link header for all pages on all responses, regardless of whether the Accept-Datetime header is present or not. This means that the response can be cached.

The directives to add are:

 
RewriteEngine on

RewriteCond %{IS_SUBREQ} false
RewriteRule ^/(.*) - [E=ORIGURI:%{HTTP_HOST}/$1]

RewriteRule ^/(.*) - [E=ORIGQRY:]
RewriteCond %{QUERY_STRING} .+
RewriteRule ^/(.*) - [E=ORIGQRY:?%{QUERY_STRING}]

RewriteRule ^/(.*) - [E=ORIGPROTO:http]
RewriteCond %{HTTPS} on
RewriteRule ^/(.*) - [E=ORIGPROTO:https]

Header always set Link 
    "<http://purl.org/memento/timegate/%{ORIGPROTO}e://%{ORIGURI}e%{ORIGQRY}e>;rel=timegate"

Note that the last section should be a single line starting with "Header" and ending at "timegate".

It may be easier to cut and paste from this textarea:

If mod_rewrite or mod_header are not currently loaded, you may also require a line like:

 LoadModule rewrite_module modules/mod_rewrite.so 

Step by step explanation

  • Line 1:  This just turns the URL rewriting engine on.  If you already have URLs being rewritten you do not need this one.
  • Line 2: Only generate the URI from the top level request (eg /) rather than the internall rewritten request (eg /index.html).
  • Line 3:  This records the URI from the server's hostname through to the end of the path (just before any query).
  • Lines 4-6: This block records the query if there is one, from the question mark through to the end of the URI, or sets it to an empty string by default.
  • Lines 7-9: This block records the protocol used to access the resource. HTTP is the default, or https if the HTTPS condition is set.
  • Line 10: This line then sets the Link header to point the PURL for a TimeGate Aggregator, currently operated at Los Alamos.

If you need to point to other TimeGates, replace the base URL of the Aggregator (http://purl.org/memento/timegate/) with the base URL of the appropriate service. For example, if your TimeGate was http://example.org/timegate/, the last line would look like:

Example sites

Some web sites that use this technique:

Testing

To confirm that the system is behaving correctly, you can simply request any resource on the web server and look at the headers. The Link header should always be there. For example, using the UNIX command line tool curl:
  curl -o null.html -D headers.txt http://your.server.here/
  
And then look in headers.txt to make sure it looks similar to:
  HTTP/1.1 200 OK
  Date: Tue, 13 Oct 2009 20:07:27 GMT
  Server: Apache
  Link: <http://purl.org/memento/timegate/http://your.server.here/>;rel=timegate
  Content-Type: text/html; charset=UTF-8