Adverts that follow you

Now one of these days I will finally get my ‘moving flat’ epic published, in the mean time you can read my how to buy a property guide. It really is worth every penny.

Part of this process involves buying a sofa which doesn’t suck as much as my current sofa.

Of course, I’m doing this the proper way of procrastinating and constantly looking at websites and not deciding anything (look out for my exciting new book of the same name, the ultimate manager’s guide).

One of the sofa’s which was luckly enough to make the final rounds (i.e. got to perform in front of Simon Cowell and co), was this one, called – cutely – Oscar. Simple lines, modern look, sofa bed. From Furniture Village, which seems to be DFS’ more mature cousin, though this perhaps doesn’t say too much.

All fine and good.

Then the other day I visited Engadget.com, not a site I normally visit but a link had caught my eye on Twitter.



Engadget screenshot, with interesting ad

Notice that ad? A bit like travelling to furthest Siberia, walking in to the dodgyist bar and finding your nan there, distinctly out of place. Click on the image for a lager version.

At the top of an American website about gadgets is an advert for a UK middle-of-the-road furniture store, advertising the exact sofa I had been considering for some time.

This was no coincidence.

Nor had they been using Alien technology to read my mind. I had visited the furniture site using the same laptop (and presumably same browser), a cookie and advertising system was at play here.

In fact my mind was made up when this evening I saw this.

This was on Time Magazine’s website (again a US publication), on a Photo Gallery about Afghan Women (see this for background, wonderful world).

It’s that cheeky little sofa again, this time popping up next to a repressive regime. You Guys!

Finally, notice on the red border of the ad, bottom right there is a little bulge, clicking on it…

http://www.struq.com/consumer-opt-out/ “Totally personalised display ads”.

Does this freak me out. Probably should do, but at the moment it borders on fun, like most people my tastes and wants are diverse enough to create stupid juxtapositions (Serious News and Girls Aloud, Global Warming and fast cars). It becomes an issue when it goes beyond, ‘person x has looked at product y from company z so show advert to it’,and becomes one entity building a database of everything you view and do online.

What a scary vision. Think I’ll stick with what I know and just use Google and Facebook.

Twitter clients

From about an hour after signing up to Twitter until very recently I used Twirl on both PC and Mac as my Twitter client. I was happy with it, and still am, but had noticed people using other clients and wanted to see if I was missing anything.

I round up my findings here:

Picture 1.png
Three twitter clients

Continue reading

University league tables combined data

Last year I collected the University League tables published from various sources and combined them in to one spreadsheet.

I’ve been updating this for this year, i.e. tables published in early/mid 2009 aimed at those starting in 2010.

You can see the UK Combined University league table data Google spreadsheet here and re-order as you please. I’ve updated the three ‘UK only’ lists, and will update the international lists in the near future.

Notes:

  • This is a bit of fun, for my own interest. Don’t take it too seriously.
  • There are plenty of good guides to UK Higher Education including The Guardian, The Times and plenty more. Use them, not this site, if you are thinking of studying in the UK!
  • A quick glance will reveal that I have never studied statistics. In particular my made up scoring system is laughable.

You’ll find last years data in a seperate sheet, accessible via a tab at the bottom of the document. Are you able to produce anything interesting with this data?

PubSubHubbub instant RSS and Atom

I have just come across PubSubHubbub via Joss Winn’s Learning Lab blog at the University of Lincoln.

It’s a way for RSS/Atom feed consumers (feed readers etc) to be instantly updated and notified when an RSS is updated.

In a nutshell, the RSS publisher notifies a specific hub when it has a new item. The hub then notifies – instantly – any subscribers who have requested the hub to contact them when there is an update.

This is all automatic and requires no special setup by users. Once the feed producer has set up PubSubHubbub, and specified a particular, the RSS feed has an extra entry in the feed itself telling subscribing client systems that they can use a specific hub for this feed. Clients which do not understand this line will just ignore and carry on as normal.Those that are compatible with PubSubHubbub can then contact the hub and ask to be notified when there are updates.

It has been developed by Google, and they’ve implemented it in to various Google services such as Google Reader and Blogger. This should help give it momentum (which is also crucial for this sorts of things). In a video on Joss’ post (linked to above) the developers demonstrate posting an article and showing Google Reader instantly update the article count for that feed (in fact, before the blog software has even finished loading the page after the user has hit ‘publish’). It reminds be of the speed of Friendfeed, I will often see by friendfeed stream webpage update with my latest tweet before I see it sent from twirl.

I’ve installed a PubSubHubbub WordPress plugin for this blog. Let’s hope it takes off

UPDATE: I’ve just looked at the source of my feed ( http://www.nostuff.org/words/feed/ ) and saw the following line:

<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/>

Amazon AWS EC2 and vufind

Today I saw a tweet from juliancheal, which mentioned he was setting up his virtual server on slicehost. I hadn’t heard of this company but their offering looks interesting. This got me thinking about cloud hosting and I decided it was time to actually try out Amazon’s AWS EC2.This allows you to run a virtual server (or multiple servers) in the Amazon cloud, servers can be created and destroyed by a click of a button.

First thing is to get a server ‘image’ to run in the cloud. Thankfully many have already been created. I went for a recent ubuntu server by Eric Hammond. This is basically a ubuntu server vanilla install, but with a few tweaks to work nicely as a EC2 virtual server. Perfect!

Signing up is quick and easy, it just uses your Amazon (the shop) credentials. Once created, you are taken back to the main control panel where you can see your new instance, including details like the all important public DNS name.  Just save a private key to your computer and use it to ssh in to your new server.

e.g.: ssh -i key1.pem root@ec2-174-129-145-xx.compute.amazonaws.com

(you may need to chmod 400 the key file, but all this is documented)

Once in, well it’s a new server, what do you want to do with it?

I installed a LAMP stack (very easy in ubuntu: apt-get update and then tasksel install lamp-server). I initally couldn’t connect to apache (but could from the server itself using ‘telent localhost 80’). I presumed it was a ubuntu firewall issue, but it turned out you also control these things from the AWS control panel. The solution was to go to ‘security groups’ and modify the group I had created when setting things up and adding HTTP to ‘Allowed Connections’. This couldn’t of been easier. And then success, I could point my browser at the DNS name of the host and saw my test index page from the web server.

Amazon aws control panel, modify to allow http connections

So now what? I pondered this out loud via Twitter, and got this reply:

vufind-twitter

Excellent idea!

Good news: vufind has some good – and simple – documentation for installing on ubuntu;

http://vufind.org/wiki/installation_ubuntu

Following the instructions (and editing them as well, they specified an earlier release and lacked a couple of steps if you weren’t also reading the more general install instructions) I quickly had a vufind installation up and running. Took around 20-25 minutes in all.

Now to add some catalogue data to the installation. I grabbed a MARC file with some journal records from one of our servers at work and copied it across as a test (I copied it just by using a scp command logged in to my ec2 server). After running the import script I had the following:

vufind results.If the server is still running when you read this then you can access it here:

http://ec2-174-129-145-75.compute-1.amazonaws.com/vufind/

EC2 is charged by the hour, and while cheap, I can’t afford to leave it running for ever. :)

So, a successful evening. Mainly due to the ease of both Amazon EC2 and Vufind.

A final note that if you are interested in EC2 you may want to look at some notes made by Joss Winn as part of the jiscpress project: http://code.google.com/p/jiscpress/wiki/AmazonWebServices

Both Ec2 and vufind are worth further investigation.

JISC, Monitter and DIUS (Department of Innovation, Universities and Skills)

Earlier this week the Jisc 2009 Conference went ahead. A one day summary of where things are going in Jisc-land.

Like last year, I got a good feel of the day via twitter. I used a web app called monitter.com for real time updates from anyone on twitter who used the tag #jisc09. monitter.com allows you to track a number (3 columns by default) of words/searches, this works well as these can be usernames, tags or just a phase. I used ‘jisc09’, ‘brighton OR sussex’ and ‘library’.

The keynote talks was also streamed live on the web, the quality was excellent. Check out the main Jisc blog for the event.

Linking to all the different sites, searches and resources on the web after the event wouldn’t do it justice. The usefulness was in the way these were all being published during the day itself, using things like twitter (and bespoke sites) as a discovery mechanism for all these different things being added around the web. I didn’t know who most of the people were, but I was finding their contributions. That’s good.

An email came out the next day about the conference and announcing a guest blog post by David Lammy, the Minister for Higher Education, on the Jisc Blog.

He finished by asking for the conversation to continue, specifically on  http://www.yoosk.com/dius which is described as ‘a place to open up lines of communication between Ministers and the HE Community’. Yoosk.com is set up to allow users to ask ‘famous people questions’. Its homepage suggests that it is designed for any kind of ‘famous person’ though seems to be dominated by UK politicians. Looks interesting but can’t help wonder if there are other sites which could facilitate a ‘discussion’ just as well or better.

The dius section of the site seems quite new. In fact my (rather quickly composed) question was the second to be added to the site. I think the idea of practitioners (yuck, did I just use that word?) raising issues directly with Ministers is an interesting one, and hope it takes off, and at very least, he/they answer the questions!

DIUS do seem to be making an effort to use web2.0 tools. I recently came across this sandbox idea of collecting sites from delicious based on tags, in this example, the library2.0 tag. Interesting stuff, but not specific to HE, it will work for any tag and really just creates a nice view of the latest items bookmarked with the tag in question. The code for it is here.

In any case, it is good to see a government department trying out such tools and also releasing the code under the GPL (even 10 Downing street’s flickr stream is under crown copyright, and don’t get me started on OS maps and Royal Mail postcodes). I’m reminded of the Direct.gov team who, when they found out there was a ‘hack the government‘ day to mashup and improve government web services, decided to join in.

DIUS homepage with web2.0 tools
DIUS homepage with web2.0 tools

On the DIUS hompage, just below the fold, they have a smart looking selection of tools, nice to see this stuff here, and so prominent, though the Netvibes link to me just a holding page when I tried it.

Finally, they have set up a blog on the jiscinvolve (WordPress MU) site. At the time of writing it has a few blogs posts which are one line questions, and has a couple of (good) responses. But I can’t help feeling that these sites need something more if they are to work. At the moment they are just there floating in space. How can they integrate these more into the places that HE staff and students inhabit. Perhaps by adding personal touches to the sites would encourage people to take part, for example the blog – a set of questions – is a little dry, it needs an introduction, host, and photos.

To sum up, some good stuff going on here, but need to see if it takes off, it must be difficult for a government department to interact with HE and students, the two are very different but they are trying.  I hope it proves useful, if you’re involved in HE why not take a look and leave a comment?

UPDATE: Posted on

short urls, perl and base64

One of my many many many faults is coming up with (in my blinkered eyes – good) ideas, thinking about them non-stop for 24hours, developing every little detail and aspect. Then spending a few hours doing some of the first things required. then getting bored and moving on to something else. Repeat ad nauseum.

Today’s brilliant plan (to take over the world)

Over the weekend it was ‘tinyurl.com’ services and specifically creating my own one.

I had been using is.gd almost non-stop all week, various things at work had meant sending out URLs to other people both formally and on services like twitter. Due to laziness it was nearly always easier to just make another shortURL for the real URL in question than to find the one I made earlier. It seemed a waste. One more short code used up when it was not really needed. The more slap-dash we are in needlessly creating short URLs, the quicker they become not-so-short URLs.

Creating my own one seemed like a fairly easy thing to do. Short domain name, bit of php or perl and a mysql database, create a bookmarklet button etc.

Developing the idea

But why would anyone use mine and not someone elses?

My mind went along the route of doing more with the data collected (compared to tinyurl.com and is.gd). I noticed that when a popular news item / website / viral come out, many people will be creating the same short URL (especially on twitter).

What if the service said how many – and who – had already shortened that URL. What if it made the list of all shortened URLs public (like the twitter homepage). The stats and information that could be produced with data about the urls being shortened, number of click throughs, etc, maybe even tags. Almost by accident I’m creating a bookmarking social networking site.

This would require the user to log in (where as most do not), not so good, but this would give it a slightly different edge to others, and help fight spam, and not so much of a problem if users only have to log in once.

I like getting all wrapped up in an idea as it allows me to bump in to things i would not otherwise. Like? like…

  • This article runs through some of the current short URL services
  • The last one it mentions is snurl.com, I had come across the name on Twitter, but had no idea it offers so much more, with click-thru stats and a record of the links you have shortened. It also has the domain name sn.im (.im being the isle of man). Looks excellent (but they stole some of my ideas!)

    snurl.com
    snurl.com
  • Even though domains like is.gd clearly exist, it seems – from the domain registrars I tried – that you can not buy two digit .gd domains. though three letter ones seem to start from $25 a year.
  • the .im domain looked like it could be good. But what to call any potential service??? Hang-on… what about tr.im! what a brilliant idea. fits. genius. Someone had, again, stolen my idea. besides, when I saw it could be several hundred pounds other top level domains started to look more attractive
  • tr.im mentioned above, is a little like snurl.com. looks good, though mainly designed to work with twitter. Includes lots of stats. Both have a nice UI. Damn these people who steal my ideas and implement them far better than I ever could. :)
  • Meanwhile…. Shortly is an app you can download yourself to run your own short url service.
  • Oh and in terms of user authentication php user class seemed worth playing with.
  • Writing the code seemed fairly easy, but how would I handle creating those short codes (the random digits after the domain name). They seem to increment while keeping as small as possible.
  • Meanwhile I remember an old friend and colleague from Canterbury had written something like this years a go, and look! he had put the source code up as well.
  • This was good simple perl, but I discovered that it just used hexadecimal numbers as the short codes, which themselves are just the hex version of the DB auto-increment id. nice and simple but would mean the codes become longer more quickly than other algorithms.
  • I downloaded the script above and quickly got it working.
  • I asked on twitter and got lots of help from bencc (who wrote the script above) and lescarr.
  • Basically the path to go down was base64 (i.e. 64 dgits in a number system, instead of the usual 10), which was explained to me with the help of a awk script in a tweet. I got confused for a while as the only obvious base64 perl lib actually converts text/binary for MIME email, and created longer, not shorter, codes than the original (decimal) id numbers as created by the database.
  • I did find a cpan perl module to convert decimal numbers to base64 called Math::BaseCnv. Which I was able to get working with ease.
  • It didn’t take long to edit the script from Ben’s spod.cx site, and add the Base64 code so that it produced short codes using all lower case, upper case and numbers.
  • you can see it yourself – if I haven’t broken it again – at http://u.nostuff.org/
  • You can even add a bookmarklet button using this code
  • Finally, something I should have done years a go, and setup mod_rewrite to make the links look nice, e.g. http://u.nostuff.org/3

So I haven’t built my (ahem, brilliant) idea. Of course the very things that would have made it different (openly showing what URLs have been bookmarked, by who, and how many click throughs, and tags) were the very thing that would make it time consuming. And sites like snurl.com and tr.im had already done such a good job.

So while I’m not ruling out creating my own really simple service (and infact u.nostuff.org already exists) and I learned about mod_rewrite, base64 on cpan, and a bunch of other stuff, the world is spared yet-another short URL service for the time being.

Playing with OAI-PMH with Simple DC

Setting up ircount has got me quite interested in OAI-PMH, so I thought I would have a little play. I was particularly interested in seeing if there was a way to count the number of full text items in a repository, as ROAR does not generally provide this information.

Perl script

I decided to use the http::oai perl module by Tim Brody (who not-so-coincidentally is also responsible for ROAR, which ircount gets its data from).

A couple of hours later I have a very basic script which will roughly report on the number of records and the number of full text items within a repository, you just need to pass it a URL for the OAI-PMH interface.

To show the outcome of my efforts, here is the verbose output of the script when pointed at the University of Sussex repository (Sussex Research Online).

Here is the output for a sample record (see here for the actual oai output for this record, you may want to ‘view source’ to see the XML):

oai:eprints.sussex.ac.uk:67 2006-09-19
Retreat of chalk cliffs in the eastern English Channel during the last century
relation: http://eprints.sussex.ac.uk/67/01/Dornbusch_coast_1124460539.pdf
MATCH http://eprints.sussex.ac.uk/67/01/Dornbusch_coast_1124460539.pdf
relation: http://www.journalofmaps.com/article_depository/europe/Dornbusch_coast_1124460539.pdf
dc.identifier: http://eprints.sussex.ac.uk/67/
full text found for id oai:eprints.sussex.ac.uk:67, current total of items with fulltext 6
id oai:eprints.sussex.ac.uk:67 is the 29 record we have seen

It first lists the identifier and date, the next line shows the title, it then shows a dc.relation field which contains a full text item on the eprints server, because it looks like a full text item and on the same server the next line shows it has found a line that MATCHed the criteria which means we add this item to the count of items with full text items attached.

The next line is another dc.identifier, again pointing to a fulltext URL for this item. However this time it is on a different server (i.e. the publishers), so this line is not treated as a fulltext item, and so it does not show a MATCH (i.e. had the first identifier line not existed, this record would not be considered one with a fulltext item).

Finally another dc.identifier is shown, then a summary generated by the script concluding that this item does have fulltext, is the sixth record seen with fulltext, and is the 29th record we have seen.

The script, as we will now see, has to use various ‘hacky’ methods to try and guess the number of fulltext items within a repository, as different systems populate simple Dublin Core in different ways.

Repositories and OAI-PMH/Simple Dublin Core.

It quickly became clear on experimenting with different repositories that the different repository software populate Simple Dublin Core in a different manner. Here are some examples:

Eprints2: As you can see above in the Sussex example, fulltext items are added as a dc.relation field, but so too are any publisher/official URLs, which we don’t want to count. The only way to differentiate between the two is to check the domain name within the dc.relation url and see if it matches that of the OAI interface we are working with. This is no means solid, quite possible for a system to have more than one hostname and what the user gives as the OAI URL may not match what the system gives as the URLs for fulltext items.

Eprints3: I’ll use the Warwick repository for this, see the HTML and OAI-PMH for the record used in this example.

<dc:format>application/pdf</dc:format>
<dc:identifier>http://wrap.warwick.ac.uk/46/1/WRAP_Slade_jel_paper_may07.pdf</dc:identifier>
<dc:relation>http://dx.doi.org/10.1257/jel.45.3.629</dc:relation>
<dc:identifier>Lafontaine, Francine and Slade, Margaret (2007) Vertical integration and firm boundaries: the evidence. Journal of Economic Literature, Vol.45 (No.3). pp. 631-687. ISSN 0022-0515</dc:identifier>
<dc:relation>http://wrap.warwick.ac.uk/46/</dc:relation>

Unlike Eprints2, the fulltext item is now in a dc.identifier field, the official/publisher URL is still a dc.relation field, which makes it easier to count the former without the latter. EP3 also seems to provide a citation of the item which is also in a dc.identifier as well. (as an aside: EPrints 3.0.3-rc-1, as used by Birkbeck and Royal Holloway, seems to act differently, missing out any reference to the fulltext).

Dspace: I’ll use Leicester’s repository, see the HTML and OAI-PMH for the record used. (I was going to use Bath’s but looks like they have just moved to Eprints!)

<dc:identifier>http://hdl.handle.net/2381/12</dc:identifier>
<dc:format>350229 bytes</dc:format>
<dc:format>application/pdf</dc:format>

This is very different to Eprints. DC.identifier is used for a link to the html page for this item (like eprints2 but unlike eprints3 which uses dc.relation for this). However it does not mention either the fulltext item or the official/publisher url at all (this record has both). The only clue that this has a full text item is the dc.format (‘application/pdf’), and so my hacked up little script looks out for this as well.

I looked at a few other Dspace based repositories (Brunel HTML / OAI ; MIT HTML / OAI) and they seemed to produce the same sort of output, though not being familiar with Dspace I don’t know if this is because they were all the same version or if the OAI-PMH interface has stayed consistent between versions.

I haven’t even checked out Fedora, bepress Digital Commons or DigiTool yet (all this is actually quite time consuming).

Commentary

I’m reluctant to come up with any conclusions because I know the people who developed all this are so damn smart. When I read the articles and posts produced by those (who were) on the OAI-PMH working group, or were in some way involved, it is clear they have a vast understanding of standards, protocols, metadata, and more. Much of what I have read is clear and well written and yet I still struggle to understand it due to my own metal shortcomings!

Yet what I have found above seems to suggest we still have a way to go in getting this right.

Imagine a service which will use data from repositories: ‘Geography papers archive’, ‘UK Working papers online’, ‘Open Academic Books search’ (all fictional web sites/services which could be created which harvest data from repositories, based on a subject/type subset).

Repositories are all about open access to the full text of research, and it seems to me that harvesters need to be able to presume that the fulltext item, and other key elements, will be in a particular field. And perhaps it isn’t too wild to suggest that one field should be used for one purpose, for example, both Dspace and Eprints provide a full citation of the item in the DC metadata, which an external system may find useful in some way, however it is in the dc.identifier field, yet various other bits of information are also in the very same field, so anyone wishing to extract citations would need to run some sort of messy test to try and ascertain which identifier field, if any, contains the citation they wish to use.

To some extent things can be improved by getting repository developers, harvester developers and OAI/DC experts round a table to agree a common way of using the format. Hmm, but ring any bells? I’ve always thought that the existence of the Bath profile was probably a sign of underlying problems with Z39.50 (though am almost totally ignorant on z39.50). even this will only solve some problems, the issue of multiple ‘real world’ elements being put in to the same field (both identifier and relation are used for a multiple of purposes), as mentioned above, is still a problem.

I know nothing about metadata nor web protocols (left with me, we would all revert to tab delimited files!), so am reluctant to suggest or declare what should happen. But there must be a better fit for our needs than Simple DC. Qualified DC being a candidate (I think, again, I know nuffing). see this page highlighting some of the issues with simple dc.

I guess one problem is that it is easy to fall in to the trap of presuming repository item = article/paper. When of course if could be almost anything, the former would be easy to narrowly define, but the latter – which is the reality – is much harder to give a clear schema for. Perhaps we need ‘profiles’ for the common different item types (articles/theses/images). I think this is the point that people will point out that (a) this has been discussed a thousand times already (b) has probably already been done!. So I’ll shut up and move on (here’s one example of what has already been said).

Other notes:

  • I wish OAI-PMH had a machine readable way of telling clients if they can harvest items, reuse the data, or even access it at all (apologies if it does allow this already). The human text of an IR policy may forbid me sucking up the data and making it searchable elsewhere, but how will I know this?
  • Peter Millington of RSP/SHERPA recently floated the idea of a OAI-PMH verb/command to report the total number of items. His point is that it should be simple for OAI servers to report such a number with ease (probably a simple SQL COUNT(*)) but at the moment OAI-PMH clients – like mine – have to manually count each item, parsing thousands of lines of data, which can take minutes, creating processing requirements for both server and client, to answer a simple question of how many items are there? I echo and support Peter’s idea of creating a count verb to resolve this.
  • Would be very handy if OAI-PMH servers could give an application name and version number as part of the response to the ‘Identify’ verb. Would be very useful when trying to work around the differences between applications and software versions.

Back to the script

Finally. I’m trying to judge how good the little script is, does it report an accurate number of full text items. If you run an IR and would be happy for me to run the script against your repository (I don’t think it creates a high load on the server), then please reply to this post. Ideally with your OAI-PMH URL and how many full text items you think you have, though neither are essential. I’ll attach the results to a comment to this post.

Food for thought, I’m pondering the need to check the dc.type of an item, and only count items of certain types, e.g. should we include images? one image of a piece of research sounds fine, 10,000 images suddenly distorts the numbers. Should it include all items, or just those that are of certain types (article, thesis etc)?

Top UK Universities : Combined Rankings

There are various  league tables out there for UK Universities. I’ve collected the results from a number of them, a league table based on league tables. This should hopefully help to remove any biases or weaknesses in particular methodologies. The results are further down this post.

I collected results for just 53 Universities, not the full 120 odd that exist in the UK. This was due to laziness, and to be honest I’m more interested in the higher end of the numbers. However I’m fairly sure no university I’ve excluded would come higher that those I’ve included In fact it was originally going to be 50, but as I collected from the various sources I added a few more around the cut off point.

For each ranking, I’ve recorded the position (e.g. 5th) , and then converted it to a score. To a create a score I simply subtracted the ranking position from ‘101’, which ensures that the University ranked first will get 100 points. A good University (according to the rankings!) will have a low number ranking and a high score, e.g. a University ranked 5th will get a score of 96 (101-5=96).

Let’s just be clear at this point, I’m not a statistician, this isn’t remotely scientific, or fair, or well thought out, or thought out at all in fact. Did you get that? Perhaps read it again to be safe. These numbers are crap, and any conclusions drawn on them are without foundation! I’m also no Higher Education expert.

Sources:

Comment on Sources:
I’m not going to go in to detail about each source, you can follow the links, and if that seems like too much effort, then this Wikipedia page provides an overview for some.
I’ve provided two totals, one for UK only based rankings, and the other includes the international rankings.

The UK only rankings –  and it is my impression that the Guardian in particular – focuses on Teaching. They are, after all, aimed at prospective students. Though there is a danger in focusing two much on teaching resources, as ultimately one University may have fantastic teachers, amazing classrooms and great support, but ultimately is seen as a bad University by employers and the public at large (and to be ‘highly respected’ normally requires a good research record, not to mention being very old). You see, that could be rubbish, I don’t really know, you’re taking this with a pinch of salt right?

The ‘Performance Ranking of Scientific Papers’ from ‘Higher Education Evaluation and Accreditation Council of Taiwan’ is perhaps the most controversial. A major ingredient is citation/impact factors from SCI and SSCI, so those stronger on the Humanities will suffer due to these disciplines being excluded . Interestingly those who focus on the Social Sciences also seem to suffer as well, notably the LSE and Warwick. As I added these numbers in last, it was very notable that some Universities moved several positions due to its inclusion.

The Result:

Click on one of the two following links:

Combined UK University Rankings (excel) (recommended)

Combined UK University Rankings (via Google docs) (as a spreadsheet)

You can order the list by any field. There are two totals: the first using the three UK only rankings, and the second, one of the middle columns, is a total which takes in to account both UK and worldwide rankings.

The rest of the columns are either raw league table data – in black text, or scores – in red.

A score is: 101 minus the ranking. The scores just make it easier to add up and order the totals by highest score, though working in this was does make things a little messy.

The worldwide rankings have an extra column, they include the world ranking as well as the UK only ranking (A University may be the 4th UK university in the list but the 28th University overall). You could potentially do something with the world ranking, e.g. if one comes 10th in the world results, the next comes 11th and the third comes 98th, then clearly it suggests that the first two are broadly similar while the third is not at the same level, though my method simply treats them as first, second, third, and does not take this in to account.

Some Universities did not appear in all the world rankings. Simply giving them a zero score seemed a little harsh, so I hacked it a bit. If, say, the lowest score was 60, then any University without a score may get 40. I know just about everyone will be pulling out their hair out at such random stupidity, though it seems to avoid those not appearing on certain tables being heavily penalised. Especially as some Universities do seem to be randomly missing from certain worldwide tables.

As mentioned above, the Performance Ranking of Scientific Papers is perhaps the most controversial here, and perhaps should not be excluded (comments welcome)? They do explain on their website the pros and cons of their method: The Humanities are more or less ignored, while the Social Sciences are treated like the Sciences, however, as they note, the datasets they use include far fewer Social Science journals, which means these subjects will score relatively lower than the sciences.

This seems true, the LSE amazingly does not appear at all (it normally appears in the top 5), and Warwick appears very low in the list, even though it has a medical school, something they say helps pull Universities up the list. In fact, before this data was added, the LSE was fourth over all, now I’ve added this data they are twelve! I’ve created a column which shows totals ignoring this ranking.

Top 20

UK-only rankings

  1. Oxford
  2. Cambridge
  3. LSE
  4. Imperial
  5. St Andrews
  6. Warwick
  7. UCL
  8. York
  9. Durham
  10. Loughborough
  11. Bath
  12. Exeter
  13. Edinburgh
  14. Leicester
  15. Nottingham
  16. Kings college
  17. Lancaster
  18. Southampton
  19. Bristol
  20. SOAS
All Rankings
1= Oxford/Cambridge
3 Imperial
4 UCL
5 Edinburgh
6 Warwick
7 Kings college
8= St Andrews/Bristol
10 Nottingham
11 LSE
12 York
13 Manchester
14 Durham
15 Southampton
16= Leicester/Sheffield
18 Birmingham
19 Glasgow
20 Bath

.

My thoughts:

  • First, a week a go, I asked on this blog for people to provide their top 20 lists, you can see them here. My question was badly phased, but the replies are interesting. It includes my results in the first comment (I wrote this without looking at any of these rankings first).
  • Looking at my guesses, I clearly have an aversion to Universities starting with L. Completely missed out Leicester, Loughborough, and Lancaster. The Scots also faired badly from my off-the-top-of-my-head list: St Andrews was no where to be seen, and yet is near the top of both lists. Aberdeen and Dundee both are close to the top 20, yet I would have probably failed to include them in a ‘top 30’. Oh, and somehow forgot Durham.
  • I think I’ve always put UCL as the ‘one after oxbridge’, yet according to these results Imperial, LSE, St Andrews and Warwick are more or less on an equal pegging.
  • I’ve also thought of the groupings a bit like the football league tables: Russell Group, then the 94 group and then the rest. With people joining/leaving these groups as they progress or stagnate. These results show this to be wrong. Looking at the UK-only top 20, 9 of them are 1994 group (and so coming out better than many Russell group Universities). In fact the LSE and Warwick were both in the 94 group until recently, which would have lead to the majority of Universities in the top 20 being in the 1994 group! There are Universities in neither of these groups who are easily ahead of some of those in the Russell group.
  • As you can see from my guesses, I put Manchester, Birmingham and Southampton higher than their actual results, so why were MY expectations high for these organisations? The first two being grand old Universities and Southampton perhaps being accounted for because the one department I know something about – Electronics and Computer Science – is very highly regarded.
  • If these results really do reflect the Research (and teaching) ability of Universities, and if the Russell group is, as it is often portrayed, the leading research Universities, and the 1994 group being smaller research Universities, then there is argument that their should be some movement in group membership (I shall leave it to the reader to look at the excel file and decide who should move up and down!).
  • Having said this, the Russell Group website reports that the group accounts for 68% of all research income, so not doing that badly.
  • Oxford and Cambridge were equal in the international results, Oxford just one point ahead in the UK-only results. So no conclusions there.
  • The Times notes in its own assessment how there is almost a clear split between pre and post 1992 Universities, the list starts with the ‘old’ Universities, and then the ‘new’ universities, with only a couple of exceptions.

And Finally…

I have tried to provide some comment, but this is just my personal view based on near total ignorance. By all means laugh, but don’t get upset.

Link to the results excel file again Combined UK University Rankings.

(this post was slightly updated in November 2008 to improve readability)

Radio Pop

Radio Pop is an interesting experimental site from the fantastic BBC radio labs.

It is a sort of soical network site for radio listening. It only records your listening through the ‘radio pop’ live streams. I (like many) mainly listen to listen again and the radio iplayer, and they are working on intergrating with both. You can see my profile here.

Screenshot of radio pop
Screenshot of radio pop - click for a larger version

You can ‘pop’ what you are currently listening to (basically a ‘i like this’ button). I’ve added my ‘pop’ rss feed to my dipity timeline.