Sunday, March 28, 2010

Facilitating access to subscription-based resources -- Athens, Shibboleth, OpenURL, Reverse Proxy, etc

Australian and North American PhilPapers users have often told me how much they like its off-campus access feature. Off-campus access on PhilPapers works like this. First, users configure their institutional reverse proxy in their account. From that point on, PhilPapers will point them to the proxy for subscription-based articles. This will be transparent to the users. They might be asked for their credentials by the proxy, but aside from that they will have direct access to the papers. At the time of writing, 513 users have configured a proxy.

This system works well for our North American and Australian users because reverse proxies are widely used in North America and Australia. Not so in Europe. The UK, in particular, has its own subscription management systems called Athens and Shibboleth. Athens is the old system but remains more standard as far as I can tell, and the two systems are identical from the end-user point of view. These systems don't use proxies. When a user wants to access an article through Athens, they have to go first to the publisher's page. Then they have to click a link from that page to obtain an Athens login page specific to the publisher. Often they will have to look up their institution on Athens' site before logging in. After having logged in, they will be taken back to the publisher's site and authorized to download papers from the publisher. As far as I can tell, users have to repeat this process for every publisher they visit, though the credentials ought to be remembered between visits to the same publisher. We don't have Athens accounts at SAS so I couldn't test this, but I can't see how it could be otherwise --- surely publisher sites do not constantly query Athens' servers to check if their visitors' IPs have already been authenticated, whether through embedded Javascript or backend connections (I can confirm there is no such Javascript on Wiley's site). So there is a lot of clicking around for UK users when they're browsing papers from various sources.

I would like to facilitate things for our UK users. In theory, we could forward our users to an Athens URL containing the final URL of the resource the user wants to access and the user's institution, i.e. something like this:

http://www.athensams.net?u=FINAL_URL&inst=University%20of%20XYZ

Athens could easily look up the relevant publisher based on the submitted URL, then a) forward the user directly to FINAL_URL if they are already authenticated or b) present the user with a login page before forwarding them. This would make Athens as easy to use as reverse proxy for PhilPapers users.

But this was not to be. The actual login URLs for Athens look something like this:

https://auth.athensams.net/?ath_returl=FINAL_URL&ath_dspid=WILEY

These are the URLs one finds on publishers' sites (Wiley in this case). As far as I can tell, there is no parameter to specify the institution, and the ath_dspid code is mandatory. When I tried changing the latter for 'OTHER' the login failed (I found someone with an Athens account who tested this for me).

So, to point our users to appropriate Athens login pages we would have to know the publisher codes. And of course they are not published by Eduserv, the company that runs Athens. Eduserv has refused to help us out in any way -- all we've ever heard from them is 'buy our product'.

Another company which hasn't been very cooperative is ExLibris, the company that makes the widely used SFX OpenURL resolver. Here too we'd need some data to make use of the service, because each institution has its own OpenURL resolver. I've repeatedly asked ExLibris for a list of institutional SFX server URLs (I'm sure they have that), and I never got any response at all.

Fortunately for us, WorldCat has come up with a solution to our problem (and everyone else trying to streamline access to subscription-based resources): they've created a big database of institutional OpenURL resolvers which everyone can query for free (so long as it's for non-profit use). Given an institution's name, WorldCat's 'Registry' will tell us what resolver(s) it uses. The resolver will then forward the user to an appropriate access point to the item, including (as far as I can tell) Athens access pages as appropriate. Thanks to WorldCat, we can give our users the benefits of all the secret data the like of ExLibris and Eduserv have at almost no expense to ourselves.

Friday, March 26, 2010

A new backup script with rsync, versioning and rotation

Here is a backup script that does what every backup script should do:
  1. Use rsync or equivalent for transfer to speed things up
  2. Keep rotating versions of backups
  3. Don't duplicate files unless needed on the backup host (using hard links)
  4. Can be configured from a text file
One minor issue with this script, however, is that it runs from the backed up machine instead of the backup host (you don't want an intruder to gain access to your backup, ideally). But I made the script to use with EVBackup's service. While I trust them not to tamper with my encrypted data, I wouldn't trust them with the keys to PhilPapers's production server.

The script should run on any *nix machine with rsync. It should work with any *nix backup host as well, but you will need shell access to the backup host. You will also need to configure the user running the script for password-less login.