MARC Tools & MARC::Record errors

I know next to nothing about MARC,though being a shambrarian I have to fight it sometimes. My knowledge is somewhat binary, absolutely nothing for most fields/subfields/tags but ‘fairly ok’ for the bits I’ve had to wrestle with.

[If you don’t know that MARC21 is an ageing bibliographic metadata standard, move on. This is not the blog post you’re looking for]

Recent encounters with MARC

  • Importing MARC files in to our Library System (Talis Capita Alto), mainly for our e-journals (so users can search our catalogue and find a link to a journal if we subscribe to it online). Many of the MARC records were of poor quality and often did not even state the item was (a) a journal (b) online. Additionally Alto will only import if there is a 001 field, even though the first thing it does is move the 001 field to the 035 field and create its own. To handle these I used a very simple script to run through the MARC file – using MARC::Record – to add an 001/006/007 where required.
  • Setting up sabre – a web catalogue which searches the records of both the University of Sussex and the University of Brighton – we need to pre-process the MARC records to add extra fields, in particular a field to tell the software (vufind) which organisation the record was from.

Record problems

One of the issues was that not all the records from the University of Brighton were present in sabre. Where were they going missing? Were they being exported from the Brighton system? copied to the sabre server ok? Being output through the perl scritp? lost during the vufind import process?
To answer these questions I needed to see what was in the MARC files, the problem is that MARC is a binary format so you can’t just fire up vi to investigate. The first tool of the trade is a quick script using MARC::Record to convert a MARC file to text file. But this wasn’t getting to the bottom of it. This lead me to a few PC tools that were of use.

PC Tools

MarcEdit : Probably the best known PC application. It allows you to convert a MARC file to text, and contains an editor as well as a host of other tools. A good swiss army knife.
MARCView : Originally from Systems Planning and now provided by OCLC, I had not come across MARCView until recently. It allows you to browse and search through a file containing MARC records. Though the browsing element does not work on larger files.
marcview
marcview

 

USEMARCON is the final utility. It comes with a GUI interface, both of which can be downloaded from The National Library of Finland. The British Library also have some information on it. Its main use is to convert MARC files from one type of MARC to another, something I haven’t looked in to, but the GUI provides a way to delve in to a set of MARC records.

Back to the problem…

So we were pre-processing MARC records from two Universities before importing them in to vufind using a Perl script which had been supplied by another University.

It turns out the script was crashing on certain records, all records after the problematic record were not being processed. It wasn’t just that script, any perl script using MARC::Record (and MARC::batch) would crash when it hit a certain point.

By writing a simple script that just printed out each record we could as least see what the record was immediately before the record causing it to crash (i.e. the last in the list of output). This is where the PC applications were useful. Once we know the record before the problematic record, we could find it using the PC viewers and then move to the next record.

The issue was certain characters (here in the 245 and 700 fields). I haven’t got to the bottom of what the exact issue is. There are two kinds of popular encodings: MARC-8 and records in UTF-8, and this can be designated in the Leader (9th character). I think Alto (via it’s marcgrabber tool) exports in MARC-8 but perhaps the characters in the record did not match the specified encoding.

The title (245) on the orignal catalogue looks like this:

One work around was to use a slightly hidden feature of MarcEdit to convert the file to UTF:

I was then able to run the records through the perl script, and import it in to vufind.

But clearly this was not a sustainable solution. Copying files to my PC and running MarcEdit was not something that would be easy to automate.

Back to MARC::Record

The error message produced looked something like this:

utf8 "xC4" does not map to Unicode at /usr/lib/perl/5.10/Encode.pm line 174

I didn’t find much help via Google, though did find a few mentions of this error related to working with MARC Records.

The issue was that the script loops through each record, the moment it tries to start a loop with a record it does not like it crashes, so there is no way to check for certain characters in the record as it will already be too late.

Unless we use something like exceptions. The closest to this perl has out-of-the-box is eval.

By putting the whole loop in to an eval, if it hits a problem the eval simply passed the flow down to the or do part of the code. But we want to continue processing the records, so this simply calls the eval again, until it reaches the end of the record. You can see a basic working example of this here.

So if you’ve having problems processing a file of MARC records using perl MARC::Record / MARC::batch try wrapping it in a eval. You’ll still loose the records it can not process but it wont stop in it’s tracks (and you can output an error log to record the record number of the records with errors).

Post-script

So, after pulling my hair out, I finally found a way to process a filewhich contains records which cause MARC::Record to crash. It had caused me much stress as I needed to get this working, and quickly, in an automated manner. As I said, the script had been passed to us by another University and it already did quite a few things so I was a little unwilling to rewrite using another language (though a good candidate would be php as the vufind script was written in that language and didn’t seem to have these problems).

But in writing this blog post, I was searching using Google to re-find the various sites and pages I had found when I encountered the problem. And in doing so I had found this: http://keeneworks.posterous.com/marcrecord-and-utf 

Yes. I had actually already resolved the issue, and blogged about it, back in early May. I had somehow – worryingly – completely forgotten any of this. Unbelievable! You can find a copy of a script based on that solution (which is a little similar to the one above) here.

So there you are, a few PC applications and a couple of solutions to perl/MARC issue.

Academic discovery and library catalogues

A slightly disjointed post. The useful Librarytecholgy.org website by Marshall Breeding announced the eXtensibleCataloge project has just released a number of webcasts in preparation for their software release later this year.

eXtensible Cataloge webcast screenshot
eXtensible Cataloge webcast screenshot

I’ve come across this project before, and put a little simply, is in the same field as the Next Generation Catalogues such as Primo, Aquabrowser and VuFind.

However where these are discreet packages, this seems like a more flexible set of tools and modules, and a framework which libraries can build on. I didn’t manage to watch all the screencasts but the 30mins or so that I watched was informative.

As an aside, while the screen consisted of a powerpoint presentation the presenter appeared in a small box at the bottom, and watching him speak oddly made listening to what was being said more easily digestible (or perhaps just gave my eyes something to focus on!).

This looks really interesting, and will be good to see how this compares to other offerings, certainly looks like they are taking a different angle, and perhaps the biggest question will be how much time will it take to configure such a flexible and powerful setup (especially with the small amount of technical staff found in most UK HE Libraries). Anyway, worth checking out, using various metadata standards and using – amongst others – SOLR and Drupal as a base.

While on the eXtensible Cataloge website I came across a link to this blog post from Alex Golub (Rex) an ‘adjunct assistant professor of anthropology at the University of Hawai’i Manoa‘. It talks about a typical day as he Discovers and evaluates research and learn about others in the same academic dicipline. Again, well worth a read.

It starts off with an email from Amazon.com recommending a particular book. He notes:

In exchange for giving Amazon.com too much of my money, I’ve trained it (or its trained me?) to tell me how to make it make me give it more money in exchange for books.

It doesn’t take a genius to see that the library catalogue could potentially offer a similar service. A Library Catalogue would be well placed to build up a history of what you have borrowed and produce a list of recommend items. But would this only suggest items your library has, and would it be limited by the relatively small user base; if there are only a few academics/researchers with a similar interest then this will be of limited use in producing books you may be interested in (i.e. serendipity).

This is where the JISC TILE project comes in (and I blogged about an event I attended about TILE a few months a go). If we could share this data at a national level (for example) we could create far more useful services, in this case it could draw on the borrowing habits of many researchers in the same field, and could – if you wish – recommend books not yet in your own Library. As well as the TILE project, Ex Libris have announced a new product called bx which sounds like it will do a similar thing with journals.

Another nugget from the blog post mentioned above is that he uses the recommendations & reviews in Amazon as a way to evaluate the book and its author:

So I click on the amazon.com link and read more reviews, from authors whose work I know and respect.

I’ve been discussing with colleagues the merits and issues with allowing user reviews in an academic library catalogue. I hadn’t considered a use such as this. Local reviews would have been of limited use as other authors in the same field that a researcher respects (as he describes in the quote) are likely to be based at other institutions (and we would be naive to expect such a flood of reviews to a local system that every book had a number of good reviews). Again, maybe a more centralised review system is needed for academic libraries, though preferably not one which requires licensing from a third party at some expense!

And briefly, while we are talking about library catalogues. I see that the British Libraries ‘beta catalogue‘ (running on Primo) has Tag functionality out the box, and I’m pleased to see they have been this quite a central feature, with a ‘tag’ link right about the main search box. This link takes you to a list of the most frequently used and most recently added tags. Creating a new way to browse and discover items. What I love about the Folksonomy approach is that so often users find ways of using tags in ways you would never expect. For example, would a cataloger think to record an item in a museum as ‘over engineered‘? (I think the answer would be no, but it occurs to me I know nothing regarding museum cataloging standards). Could finding examples of over engineered items be useful for someone? of course! (from the Brooklyn Museum online collections, found via Mike Ellis’ excellent Electronic Museum blog). The Library of Congress on flickr pilot springs to mind as well.

So I guess to conclude all this, the quest continues in how we can ensure libraries (and their online catalogues and other systems) provide researchers and users with what they want, and use technology to enable them to discover items that in the past they might have missed.