Documentation
BaseHandler.py implements a class called BaseProxyHandler. This processes the HTTP requests and returns the responses generated. It requires some minor configuration to work:
- Change the host name in the constructor to where your proxy will live. Currently this is set to mementoproxy.lanl.gov.
- The constructor takes a path as an argument for where it will listen. You'll set this in your actual implementation, no need to change the file.
The class defines the following methods:
- send(data, req, status,ct):
This function sends the data back to the client. 'req' is the
mod_python request object. 'status' is the HTTP status code to use,
which defaults to 302. 'ct' is the mime type to send in the
ContentType header.
- error(data, req, status,ct):
A wrapper function for send which changes the defaults to an error
condition.
- fetch_changes(req, requri, dt):
This is the function that needs to be implemented for each proxy. It
returns a list of (time, url) tuples to choose from for the uri
requested. 'req' is the mod_python request object. 'req_uri' is the
URI in the proxied resource. 'dt' is the datetime object at which
the client wants the resource for, in the case of the handle_dt()
function.
- handle_event(req):
This function generates a Simile Timeline event stream in either xml or json, as requested in the URI.
- handle_aggr(req):
Here, we process the TimeBundle URI and redirect to an appropriate TimeMap based on the content negotiation headers.
- handle_rem(req):
And here we generate the TimeMap from the set of times and resources generated by fetch_changes().
- handle_dt(req):
This is the main Memento function, and processes the redirects based on the requested URI.
- handle(req):
A dispatcher which calls the other handle_* functions based on the URI pattern.
Individual handlers define subclasses of BaseProxyHandler, and need only to implement the fetch_changes function.
-
The Internet Archive handler makes use of the regular URL pattern which produces a page with dates and URLs for archived resources for a particular URI.
It parses the HTML using lxml's HTML parser, and generates the change list from that.
The format to return is a list of tuples, the first is an instance of a datetime and the second a string with the URI for the Memento.
-
The Wikipedia handler is slightly more complex. It uses the Wikipedia API to extract the history information for the given article. After generating the appropriate URIs for the API, it uses LXML to process the XML responses.
It sends fake user agent headers to ensure that Wikipedia does not reject the request. In fetch_changes(), it loops through requesting 500 history items per request, and constructs a full change list across the multiple calls.
Handlers in mod_python must define a 'handler' function with the request object as argument. For proxies these construct the ProxyHandler object, giving the path in the website where they will be installed.
Then they call basehandler() from the baseHandler.py script with the newly constructed object and the request. This function processes the request as described above.
|