Library search/discovery apps : intro

There’s a lot of talk in the Library world about ‘next generation catalogues’, library search tools and ‘discovery’. There’s good reason for this talk, in this domain the world has been turned on its head.

History in a nutshell:

  • The card catalogue became the online catalogue, the online catalogue let users search for physical items within the Library.
  • Journals became online journals. Libraries needed to let users find the online journals they subscribed to through various large and small publishers and hosts. They built simple in-house databases (containing journal titles and links to their homepages), or added them to the catalogue, or used a third party web based tool. As the number of e-journals grew, most ended up using the last option, a third party tool (which could offer other services, such as link resolving, and do much of the heavy lifting with managing a knowledge base).
  • But users wanted one place to search. Quite understandable. If you are after a journal, why should you look in one place to see if there is a physical copy, and another place if they had access to it online. Same with books/ebooks.
  • So libraries started to try and find ways to add online content to catalogue systems in bulk (which weren’t really designed for this). Aquabrowser : Uni Sussex beta catalogue

The online catalogues (OPAC) were simple web interfaces supplied with the much larger Library management system (ILS or LMS) which ran the backend the public never saw. These were nearly always slow, ugly, unloved and not very useful.

A couple of years a go(ish), we saw the birth of the next generation catalogue, or search and discovery tools. I could list them, but the Disruptive Technology Library Jester does an excellent job here. I strongly suggest you take a look.

Personally, I think I first heard about Aquabrowser. At the time a new OPAC which was miles ahead of those supplied with Library systems and was (I think) unique as a web catalogue interface not associated with a particular system, and shock, not from an established Library Company. The second system I heard about was probably Primo from Ex Libris. At first not understanding what it was: It sounds like Metalib (another product from the same company which cross-searches various e-resource), is Primo replacing it? Or replacing the OPAC? It took a while to appreciate that this was something that sat on top of the rest. From then, VuFind, LibraryFind and more.

While some where traditional commercial products (Primo, Encore, Aquabrowser), many more were open source solutions, a number of which developed at American Libraries. Often built on common (and modern) technology stacks such as Apache solr/Lucene, Drupal, php/java, mysql/postgres etc.Primo : British Library

In the last year or so a number of major Libraries have started to use one of these ‘Discovery Systems’ for example: the BL and Oxford using Primo, National Libraries of Scotland & Wales and Harvard have purchased Aquabrowser and the LSE is trying VuFind. At Sussex (where I work) we have purchased and implemented Aquabrowser. We’ve added data enrichments such as table of contents (searchable and visible on records), book covers and the ability to tag and review items (tag/reviewing has been removed for various reasons) .

It would be a mistake to put all of these in to one basket. Some focus on being a OPAC replacement, others on being a unified search tool, searching both local and online items. Some focus on social tools, tagging & reviewing. Some work out the box others are just a set of components which a Library can sow together, and some are ‘SaaS’.

It’s an area that is fast changing. Just recently an established Library web app Company announced a forthcoming product called ‘Summon’, which takes searching a library’s online content a step further.

So what do libraries go for, it’s not just potentially backing the wrong horse, but backing the wrong horse when everyone one else had moved on to dog racing!

And within all this it is important to remember ‘what do users actually want’. From the conversations and articles I’ve read, they want a Google search box, but one which returns results from trusted sources and academic content. Whether they are looking for a specific book, specific journal, a reference/citation, or one/many keywords. And not just one which searches the metadata, but one which brings back results based on the full text of items as well. There are some that worry that too many results are confusing. As Google proves, an intelligent ranking system makes the number of results irrelevant.

Setting up (and even reviewing) most of these systems take time, and if users start to add data (tags, reviews) to one system, then changing could cause problems (so should we be using third party tag/rating/review systems?).

You may be interested in some other articles I’ve written around this:

There’s a lot talk about discovery tools, but what sort to go for, who to back? And many issues have yet to be resolved. I’m come on to those next…

“Sitting on a gold mine” – improving provision and services for learners by aggregating and using ‘learner behaviour data’

I’m at a workshop today called “Sitting on a gold mine” – improving provision and services for learners  by aggregating and using ‘learner behaviour data’ (it rolls off the tongue!), which is part of a wider JISC TILE project looking at, in a nutshell, how we can use data collected from user and user activity to provide useful services, and the issues and challenges involved (and some Library 2.0 concepts as well). As ever, these are just my notes, at some points I took more notes than others, there will be mistakes and I will badly misquote the speakers, please keep this in mind.

There’s quite a bit of ‘workshop’ discussion coming up, which I’m a little tentative about as I can rant on about many things for hours, but not sure I have a lot of views on this other than ‘this is good stuff’!

Pain Points & Vision – David Kay (TILE)

David gave an overview of the TILE project. Really interesting stuff, lots covered and good use of slides, but quite difficult to get everything down here.

TILE has three objectives

  • Capture scope/scale of Library 2.0
  • Identify significant challenges facing library system developments
  • Propose high level ‘library domain model’ positioning these challenges in the context of library ‘business processes’

You can get context from click streams, this is done by the likes of Amazon and e-music providers.

E.g. First year students searching for Napoleon also borrowed… they downloaded… they rated this resource… etc.

David referred to an idea of Lorcan Dempsey : we get too bogged down by the mechanics of journals and provision without looking at the wider business processes in the new ‘web’ environment.

Four ‘systems’ in the TILE architecture: Library systems (LMS, cross search, ERM), VLE, Repositories and associated content services, we looked at a model of how these systems interact with the user in the middle.

Mark Tool (University of Stirling)

Mark (who used to be based down the road at the University of Brighton) talking about the different systems Stirling (and the other Universities he has worked at) use and how we all don’t really know how users use them. Not just now, but historical trends, e.g. are users using e-books more now than in the past?

These questions are important to lecturers as they point students to resources and systems but what do users actually use, and how do we use them. Also a quality issue, are we pointing them to the right resources. Are we getting good value for money? e.g. licence and staff costs for a VLE.

If we were to look at how different students look at different resources, would we see that ‘high achievers’ use different resources to weaker students? Could/should we point the weaker students to the resources that the former use? Obvious privacy implications.

Also could be of use when looking at new courses and programmes and how to resource them. Nationally, might help guide us to which resources we should be negotiated for at a national level.

Danger:

  • small crowd -> small dataset  -> can be misleading (one or two people can look like a trend)
  • HEI’s very different to each other.

Thinks we should run some smallish pilots and then validate the data collected by some other means.

Joy Palmer – MIMAS

Will mainly be talking about COPAC, which has done some really interesting stuff recently in opening up their data and APIs (see the COPAC blog).

What are COPAC working on:

  • Googlisation of records (will be available on Google soon)
  • Links to Digital content
  • Service coherency with zetoc and suncat
  • Personalisation tools / APIs
    • ‘My Bibliography’
    • Tagging facilities
    • Recommend-er functions
    • ummm other stuff I didn’t have time to note
  • Generally moving from a ‘Walled garden’ to something that can be mashed up [good!]

One example of a service from COPAC is the ‘My bibliography’ (or ‘marked list’ ) which can be exported in the ATOM format (which allows it to be used anywhere that takes an ATOM feed). These lists will be private by default but could be made public.

Talked about the general direction and ethos of COPAC development with lots of good examples, and the issues involved. One of the slides was titled:  From ‘service’ to ‘gravitational hub’ which I liked. She then moved on to her (and MIMAS/COPAC’s) perspective on the issue of using user generated data.

Workshop 1.

[Random notes from the group I was in, mainly the stuff that I agreed with(!), there were three groups] Talking about should we do this? the threats (and what groups of people affected by these threats). Good discussion. We talked about how these things could be useful, why some may be adverse/cautious of it (inc, privacy, inflicting on others areas – IT/library telling academics what they are recommending to students are not being used, ie telling them they are doing it wrong, creates friction). Should we do this? Blunt tool, may see wrong trends. But need to give it a go, and see what happens. Is it ‘anti-HE’ to be offering such services (i.e. recommending books), no no no! Should we leave it it to the likes of Google/Amazon? No, this is where the web is going. But real world experience of things to be aware of e.g. a catalogue ranking an edition of a  book high due to  high usage lead to a newer edition being further down the list.[lots more discussion, I forget]

Dave Pattern – Huddersfield.

[Dave is the system librarian at Huddersfield, who has ideas better than me, then implements than better than I ever could, in a fraction of the time. He’s also a great speaker. I hate him. Check out his annoyingly fantastic blog]

Lots of data generated just doing what we and users need to do, we can dig this. Dave starts of talking about Supermarket loyalty cards. Supermarkets were doing ‘people who bought this also bought’ 10 or more years a go. We can learn from them, we could do this.

We’ve been collecting circ data for years, why haven’t we done anything (bar real basic stuff) with it.

Borrowing suggestions (people who borrowed this also borrowed), working at Hud, librarians report it working well and suggesting the same books as they would.

Personalised Suggestions, if you log in, looking at what they borrowed and then what others items those who borrowed the

Lending paths: paths which join books together. potentially to predict what people will borrow and predict when particular books will be in high demand.

Library catalogue shows some book usage stats when used from a library staff PC (brilliant idea!) this can be broken down by different criteria (i.e. the courses borrowers are on).

Other functionality: Keyword suggestions, Common zero results keywords (eg, newspapermen, asbo, disneyfication). Huddersfield have found digging useful.

He’s released XML data of anonymised  circulation data, with approval of the library, for others to play with and hopes other libraries will do the same. (This is a stupidly big announcement, it feels insulting to put it just as one sentence like this, perhaps I should enclose it in the <blink> tag!?) See his blog post.

(note to self, don’t try to download 50mb file via 3g network usb stick – bad things happen to macbook)

Mark van Harmelen

Due to bad things was slightly distracted during part of this talk. Being a man completely failed to multi-task.

This was an excellent talk (at a good level) about how the TILE project is building prototype/real system(s). Some real good models of how this will/could work.  So far have developed harvesting data from institutions (and COPAC/similar services) and adding ‘group use’ to their database, a searcher known to be ‘chemistry student’ and ‘third year’ can then get relevant recommendations based on data from the groups they belong to. [I’m not doing this justice, but some really good models and examples of this working]

David Jennings – Music Recommender systems

First off refers to the Googlezon film (never heard of this before) and the idea of big brother in the private sector, and moves on and talks about (concept of) ipods which predict the music you want to hear next based on your mood and even matchmaking based on how you react to music.

Discovery: We search, we browse, we wait for things come along, we follow others, we avoid things everyone else listens to, etc.

Talking about flickr’s (not published) popularity ranking as a way to bring things to the front based on views, comments, tags etc.

Workshop 2:

Some random comments and notes from the second discussion session (from all groups)

One University’s experience was that just ‘putting it out there’ didn’t work, no one added tags to catalogue, conclusion was the need of community.

Coldstart problem: new content not surfacing with the sort of things being discussed here.

Is a Subject Librarian’s (or researcher) recommendation of the same value as a undergrad’s?

Will Library Director’s agree for library data to be released in the same way as Huddersfield, even though it is anonymised? They may fear the risks and issues that it could result in, even if we/they are not sure what those risks are (will an academic take issue with a certain aspect of the realised data).

At a national level, if academics used these services to create reading lists, may result in homogenisation of teaching across the UK. Also risk of student’s reading focusing on a small group of items/books, we could end up with four books per subject!

Summary

This was an excellent event, and clearly some good and exciting work is taking place. What are my personal thoughts?…

This is one of those things that once you get discussing it you’re never quite sure why it already hasn’t been done before, especially with circulation data. There’s a wide scope, from local library services (book recommendation) to national systems which use data from VLEs, registry systems and library systems. A lot of potential functionality, both in terms of direct user services and informing HE (and others) to help them make decisions and tailor services for users.

Challenges include: privacy, copyright, resourcing (money) and the uncertainty of (and aversion to) change. The last one includes a multitude of issues: will making data available to others lead to a budget reduction for a particular department, will it create friction between different groups (e.g. between academics and central services such as Libraries and IT)?

Perhaps the biggest fear is not knowing what demons this will release. If you are a Library Director, and you authorise your organisation’s data to be made available – or the introduction of a service such as the ones discussed today – how will it come back to haunt you in the future? Will it lead to your institution making (negative) headlines? Will a system/service supplier sue you for giving away ‘their’ data?  Will academics turn on you in Senate for releasing data that puts them in a bad light? ‘Data’ always has more complex issues than ‘services’.

In HE (and I say this more after talking to various people at different institutions over the last few years) we are sometimes to fearful of the 20% instead of thinking about the 80% (or is that more 5/95%). We will always get complaints about new services and especially about changes. No one contacts you when you are doing well (how many people contact Tesco to tell them they have allocated the perfect amount of shelf space to bacon?!) We must not let complaints dictate how we do things or how we allocate time (though of course not ignore them, relevant points can often be found).

Large organisations – both public and private – can be well known for being inflexible. But for initiatives like this (and those in the future) to have a better chance of succeeding we need to look at how we can bring down the barriers to change. This is too big an issue to get in to here it and the reasons are both big and many, from too many stakeholders requiring approval to a ‘wait until the summer vacation’ philosophy, from long term budget planning to knock-on affects across the organisation (change in department A means training/documentation/website of Department B needs to be changed first). Hmmmm, seemed to have moved away from TILE and on to a general rant offending the entire UK HE sector!

Thinking about Dave Pattern’s announcement, what will it take for other libraries to follow? First, techy stuff, he has (I think) created his own XML schema (is that the right term?) and will be working on an API to access the data. The bad thing would be for a committee to take this and spend years to finally ‘approve’ it. The Good thing would be for a few metadata/XML type people to suggest minor changes (if any) and endorse it as quickly as possible (which is no disrespect to Dave). Example: will the use of UCAS codes be a barrier for international adoption (can’t see why, just thinking out loud). There was concern at the event that some Library Directors would be cautious in approving such things. This is perhaps understandable. However, I have to say I don’t even know who the Director of Huddersfield Information Services is, but my respect for the institution and the person in that role goes about as high as it will go when they do things like this. They have taken a risk, taken the initiative and been the first to do something (to the best of my knowledge) worldwide. I will buy them a beer should I ever meet them!

I’ll be watching any developments (and chatter) that result from this announcement, and thinking about how we can support/implement such an initiative here. In theory once (programming) scripts have been written for a library system, it should be fairly trivial to port it to other customers of the same software (work will probably include mapping departments to UCAS codes, and the way user affiliation to departments is stored may vary between Universities). Perhaps Universities could club together to working on creating the code required? I’m writing this a few hours after Dave made his announcement and already his blog article has many trackbacks and comments.

So in final, final conclusion. A good day, with good speakers and a good group of attendees from mixed backgrounds. Will watch developments with interest.

[First blog post using WordPress 2.7, other blogs covering are Phil’s CETIS blog, and Dave Pattern has another blog entry on his talk. If you have written anything on this event then please let me know!]

Mashed Libraries

Exactly a week a go I was coming home from Mashed Libraries in London (Birkbeck).

I wont bore you with details of the day (or more to the point, I’m lazy and others have already done it better than i could (of course, I should have made each one of those words a link to a different blog but I’m laz… or never-mind)).

Thanks to Owen Stephens for organising, UKOLN for sponsoring and Dave Flanders (and Birkbeck) for the room.

During the afternoon we all got to hacking with various sites and services.

I had previously played around with the Talis Platform (see long winded commentary here, got it seems weird that at the time I really didn’t have a clue what I was playing with, and it was only a year a go!).

I built a basic catalogue search based on the ukbib store. I called it Stalisfield (which is a small village in Kent).

But one area I had never got working was the Holdings. So I decided to set to work on that. Progress was slow, but then Rob Styles sat down next to me and things started to move. Rob help create Talis Cenote (which I nicked most of the code from) and generally falls in to that (somewhat large) group of ‘people much smarter than me’.

We (well I) wanted to show which Libraries had the book in question, and plot them on a Google Map. So once we had a list of libraries we needed to connect to another service to get the location for each of these libraries. The service which fitted this need was the Talis Directory (Silkworm). This raised a point with me, it was a good job there was a Talis service which used the same underlying ID codes for the libraries i.e. the holdings service and the directory both used the same ID number. It could have been a problem if we needed to get the geo/location data from something like OCLC or Librarytechnology.org, what would we have searched on? a Libraries name? hardly a reliable term to use (e.g. The University of Sussex Library is called ‘UNIV OF SUSSEX LIBR’ in OCLC!). Do Libraries need a code which can be used to cross reference them between different web services (a little like ISBNs for books)?

Using the Talis Silkworm Directory was a little more challenging than first thought, and the end result was a very long URL which used SPARQL (something which looks a steep learning curve to me!).

In the mean time, I signed up for Google Maps, and gave myself a crash course in setting it up (I’m quite slow to pick these things up). So we had the longitude and latitude co-ordinates for each library, and we had a Google Map on the page, we just needed to connect to the two.

Four people trying to debug the last little bit of code for my little project
Four people at Mashedlibrary trying to debug the last little bit of my code.

Time was running short, so I was glad to take a back seat and watch (and learn) while Rob went to in to speed-javascript mode. This last part proved to be elusive. The PHP code which was generating javascript code was just not quite working. In the end the (final) problem was related to the order I was outputting the code, but we were out of time, and this required more than five minutes.

Back home, I fixed this (though I never would have known I needed to do this without help).

You can see an example here, and here and here (click on the link at the top to go back to the bib record for the item, which, by the way, should show a Google Book cover at the bottom, though this only works for a few books).

You can click on a marker to see the name of library, and the balloon also has a link which should take you straight to item in question on the library’s catalogue.

It is a little slow, partly due to my bad code and partly due to what it is doing:

  1. Connecting to the Talis Platform to get a list of libraries which have the book in question (quick)
  2. For each library, connect to the Talis Silkworm Directory and perform a SPARQL query to get back some XML which includes the geo co-ordinates. (geo details not available for all libraries)
  3. Finally generate some javascript code to plot each library on to a Google map.
  4. As this last point needs to be done in the <head> of the page, it is only at this point that we can push the page out to the browser.

I added one last little feature.

It is all well and good to see which libraries have the item you are after, but you are probably iterested in libraries near you. So I used the Maxmind GeoLite City code-library to get the user’s rough location, and then centering the map on this (which is clearly not good for those trying to use it outside the UK!). This seems to work most of the time, but it depends on your ISP, some seem more friendly in their design towards this sort of thing. Does the map centre on your location?