Talis

Today Talis announced they are moving their focus away from Linked Data. I think my initial reaction, which I tweeted, holds true: this is on similar lines to Microsoft announcing plans to move away from Windows.

Earlier this year I was at an event where I bumped in to two people from Talis. The one who knew me introduced me to the other, I think he described me as ‘a long time Talis watcher’, whether or not I quoted him accurately, that statement is probably true, and I therefore want to muse a little. Forgive me (especially if you are involved).

Here are my thoughts in a personal capacity. I start with some history, it goes on a bit, feel free to skip past it. At the end I move on to today’s announcement.

Some History, it goes on a bit

First, I want to go back a few years. I had started working at Sussex in late 2002 and this was my first time  using/running the Talis Library Management System. Talis’ history is fairly well-known: Started in the late sixties as Birmingham Libraries Cooperative Mechanisation Project (BLCMP) as a co-operative shared service project between Birmingham Libraries, it developed over the years to provide what became known as a Library Management Systems (or Integrated Library System) and other libraries – both public and academic – joined the co-operative (i.e. customer owned), the system being known as Talis. Around 2000 (which was around when Sussex migrated to Talis), the company changed its name to match the name of its core product – Talis – and (I think) at this point became employee owned. I should add, for those new to such things, that in the Academic Library Management System (loans, fines, buying books, cataloguing, etc) market there were probably around six main players in the UK. All but Talis were international companies with many have well over a 1,000 customers. Talis had roughly 100 customers (half public, half academic). In terms of revenue, in simple terms, this meant a tenth of what others had to spend, I was always very aware of this and impressed in the way Talis kept up with the competition.

When I started at Sussex the system here had only been live a year, the server was a bit of a mess from a System Administration point of view. There seemed to be little structure to where components lived on the system, and many components seem to have several versions installed, yet often an older version seemed to be the one running. The top-level of the system had lots of files with silly names such as “     “ or “~”. The previous administrator was new to Unix/Solaris and to be blunt, had taken to it like a duck to sulphuric acid. My frustration here was mostly I had no idea what was the result of specific requirements of Talis and what was the whims of previous Sussex staff; what was essential to the running of the system (no matter how unusual) and what needed to be cleaned up. The crontab for root had about 1,000 lines, and I wont even start describing the printing system.

One of these frustrations was the web catalogue. For a start it was running the CERN HTTPD (netcraft pretty much confirmed you could have fitted all the servers still running CERN httpd in to classic Mini). Ironically the first thing I had done at my previous job was migrate the web catalogue to Apache. It is usual with third party systems to have clear documentation as to which web server /version is supported (or even it is installed with the application itself with no choice or involvement by the customer, as was the case with Prism2). It took a while to find out if we could move to Apache (perhaps CERN was the only thing they supported), and which version we should move it, and if there was guidance configuring Apache to serve the web application. Oh, and the web application seemed to live in about five different duplicate locations on the file system (Grrr). Like a lot of the web at the time, it used frames, and images for the menu text.

In 2003 Talis released Prism their next generation web catalogue. It ran on a separate server to the main Talis system. Based on the information we gave them they recommended two (entry level) Dell servers. These were essentially shipped to Talis where they were fully configured, we just had plug them in. One slight annoyance was that it was a master/slave configuration, with the master passing sessions to the slave to load balance. If the master dies, the whole service died.

Prism was a great leap forward, it looked like a modern web application, and I was quite pleased to see it was a java-based applocation (the ugly URLs are always a give-away). We had our severs up and running by summer 2003, but did not make it our default catalogue until summer 2004. Prism was not perfect, it timed out after a period of inactivity (nothing worse than leaving a record on screen for the book you must have, only to come back to a ‘session has timed out’ message), and had no relevance ranking.  One example was for searching for books about the web, books starting with the word web would come near the end (‘w’), with lots of unrelated (but matching the search) books coming before. You can see an example here (this uses Manchester Metropolitan University, we have recently shutdown ours).

What made me happy was that Talis saw the release of Prism 1 (and 1.1) as just the very first steps in a long line of developments. I attended  the Talis User Symposium 2003 where this was discussed. As a technical aside, I think it was at this event that I asked if the Linux boxes were to be treated as ‘black boxes’ what would happen regarding security updates, OS upgrades etc. I was told that as they were such a minimal install, and so locked down (they’re public web servers!) this would not be required. I’ve just checked, the last of these boxes will soon be decommissioned here, ‘uname –a’ shows “Red Hat Linux release 8.0 (Psyche)” (not to be confused with Redhat Enterprise Linux or anything modern like that) – we are running a system released in 2002, which we were told we shouldn’t be patching.

If I’m being open about Talis, then I should be open about something else. Its customers. I had been involved with a couple of other library systems. Customers were often frustrated with the slow pace of developments and poor service, but meetings and conferences were always professional and diplomatic exchanges, with both sides understanding the realities of the other. As such I had never witnessed such negativity and moaning, especial aimed at new developments. Each new product or version was seen by many customers purely in the negative. We’ll have to update our (overly complex and probably unneeded) user guide, it probably wont work, it will probably have bugs, it wont be useful to our users, what’s wrong with the old version. At conferences, on mailing lists, in meetings, the (vocal) majority were against anything new, and unable to see that while the new may not be perfect, it was considerably better than what was before and probably even the competition. (I should add this is some years a go, and much has changed in the last 10 years, nor due to my current job have I attended any meeting about the LMS for at least seven years).

In November 2004 two people came down from Talis to talk to a few of us at Sussex. As part of this we had a conversation about Prism: where it was going in the future. This mainly involved  us saying ‘wouldn’t it be brilliant if Prism did X’ and them saying ‘That’s great, we’re so pleased you’re saying that, the next version can do that, it’s going to be available any day now’. That release never really came. A fix (Prism 1.2 I think) came out early 2005 for a bug some users were having (this started out as a ‘only upgrade if you have this bug’ but at some point became ‘why haven’t you upgraded to the latest version yet’). Prism 1.3 was released in August 2005, and in 2006 Prism 2 was released. This did have new features: it worked with MARC21 (a newer version of a common – and dire – bibliographic standard), worked with 13 digit ISBNs and worked with Unicode. While good stuff, hardly features that users would really notice.

A classic example of requested functionality was the ability of export by Endnote. A method to do this did appear on the Talis Developer Network (TDN) and a senior member of staff at my Library emailed me to check-up that I was fully on top of such things. Only… it was written by me. I had created a filter for Endnote which allows you to cut and paste the output of Prism in to a text file and then use the filter to correctly import the file in to your Endnote library (it matched the Text displayed next to the record details in Prism with the matching fields in Endnote). I had documented this on the Sussex website, and Talis, with my permission, has adapted it as a guide. I confess, I felt smug.

Around the same sort of time Talis Graphical was being developed. Talis Graphical was a Windows GUI to the Talis system, until now, all staff activity was via a Unix application accessed via a Telnet or SSH client. Like all unix applications that use function keys (hello Ingres Client) Term Types and keyboard mappings were a bit of a dark art. I liked Graphical a lot and give Talis a lot of credit for it, especially compared with some of the other LMS systems Windows applications, which were amazingly bad. I was also impressed that all keyboard functionality was consistent between the traditional unix application (now called Talis Text) and the new Windows client.

Talis Graphical became Talis Alto on its release. A few years later the phase Talis Alto started to be used in some places as a term to encapsulate the entire LMS, and gradually this became the norm. A minor thing, but I wished such changes in terminology had been announced, the message – when it was queried –  was that the client was the LMS and hence Alto had always been the new name of the LMS to distinguish it from the company (named, if you remember, after it).

The Talis Alto LMS was(is) a good system. It had its pros and cons. It used a relational database AND actually used it as a relational database (in the Library system world this double is rare) and as a company they were open and approachable. At the same time their documentation was poor and trying to find out what documentation was current, and likewise what the latest release of a product was a pain. A one point after an update our ‘scratch’ area on our system kept filling up and on inspection a good number of very large files were there. After more work than it should have been it turned out the latest release had a new indexing system, half the files were the (essential) indexes and half were meaningless log files from the indexing operations. All stored within scratch and no notice to set up jobs to plan for these large files, or rotate the log files. Elsewhere, it felt like half the system ran by a series of slightly confusing Perl scripts which it was never fully clear what did what. It felt like it was painful for the system to adapt to a MARC21 world, and where bibliographic records and imported, bulk changed, and dropped with regularity (mainly for online content).

In 2005/06 Talis started to take on a new lease of life and confidence. Talis Insight 2005 has a list of high profile speakers from the Technology and Library worlds. A number of new initiatives and projects were started (see some of them at the bottom of the forums). The Talis Developer Network was started. Linked Data was being talked of, Panlibus the blog started (so did many other blogs), Panlibus the magazine did too. And then there’s Talking with Talis and The Library 2.0 Web Gang. There was a Library mashup competition, and the first mentions of the Talis Platform. Above all, Talis had recruited many new staff and it was clear from the blogs and concepts such as the TDN that they had some very smart people who understood the web and how to do things properly. This gave me confidence in the applications I could expect to see in the future.

One of the early things I liked was Richard Wallis’ Talis Whisper Project, a ‘catalogue’ which also showed the price of the item on the right as you moved over search results, using the Amazon API, had a basic search auto-complete and could show you via a Google Map which Library had the item. In terms of look and feel, and these features, this was light years ahead of what we currently had. Soon after, we had Project Cenote – another library catalogue demo, built on top of the new Talis Platform. It was quick to build and showed the power of building applications on top of the new Platform thingy.

Talking of the Platform, this was something that was everywhere, on the library (Panlibus) blog, the TDN, on the new more technical blogs now being setup and had its own newsletter. In 2007 some clueless  wanna-be-developer tried to go through all these references to the Talis Platform and BigFoot and work out what these things were. The results were embarrassing.

Talis were certainly raising their profile: nationally and internationally, and beyond the traditional library market. Their call for Open Data, Apis, mashups and open standards struck with me and gave hope for systems working together, especially (as we were Talis customers) systems we were paying for. They were pushing for the things I wanted. And they were telling much large players in the international library market how to do things right, while talking about cutting edge stuff at the www conference.  And as we rolled in to 2008, I got on to Twitter, where Talis staff were active, open and doing interesting stuff.

A few things troubled me. First I loved the Whisper catalogue demo, and waited, and then loved the Cenote catalogue demo. And I knew they were quick demos, and I knew they didn’t have half of the needed catalogue functionality in the real world, and I knew real products need testing, and I knew that they need to be built for exceptions, and for load. But… but… as I looked at Prism, the catalogue that was going to see regular updates and lots of new features from the day it was released in 2003 – but didn’t – I couldn’t help thinking, how long would it take to get just some of this stuff in to Prism the product, Prism that we pay for? These demos did not time out, had a modern web look, nice URLs, worked on OS X(!). The ideas, projects, research areas, mashups and demos kept on coming on the blogs but when was this going to filter in to, well, you know, the products?

There seemed to be a separation between Talis, which was producing some cutting edge thinking and ideas, and Talis the Library systems vendor, who – like many LMS providers – had a product that was looking somewhat tired, and not exactly technologically cutting edge (though I imagine the small team working on it had much better things to be doing).  This separation formalised as Talis showed a clear structure between it’s Library application side and the Talis Platform side. You can see this clearly in this March 2007 Talis homepage and the May 2007 homepage. Would the ideas of web technology, standards and openness filter from one to the other?

And there was something else bothering me: the business plan. The Talis Platform had been in public for sometime, clearly a lot of development had gone in to it, and a lot of documentation, guides, web casts, conference talks and more (all costing time and money). The same too regarding modern library technology, with talks around the world, and world-wide Talking with Talis / Library 2.0 Gang podcasts and mashup competitions. What was the aim? Their Library market was the UK and Ireland, and I could see the Platform had a potentially global market, but what was the plan regarding Libraries? Were they going global there too? If not, why the push to market themselves (via talks and competitions) globally specifically in the library sector, why not focus on the markets they plan to sell in and leave the global stuff to those advocating the global Talis Platform? How did a mainly US based Library Gang help them with a fairly conservative (and often cynical of technology) UK library customer base? Meanwhile, while it was understandable in the early days of the Platform that they were trying to build momentum and mind share, when were they going to sell something? Click here to buy Platform Space…. or Contact us to do a big enterprise deal. The website seemed short on calls to action. All this work, while good for the community, and great for someone like me, cost money and resources from a small company that I wasn’t convinced could afford to essentially pay for such broad community building, in the hope they would get some of the eventual business.

Talis had grown many tentacles, and has clearly made a name in the linked data and library technology sectors, but hard revenue on the back of this seems slow. And this is from a customer (normally customers are crying out for companies to spend less time worrying about the bottom line and more on cutting edge stuff!).

In 2007 ‘Prism 3’, a new version of the catalogue which would be hosted, was much trumpeted by Talis; at account meetings and elsewhere, ready for release by the end of the year. It would be built on the Talis Platform, and all my concerns about ‘the talk’ not going in to ‘the walk’ would be gone.

This was good not just because I wanted to see some of the cool web stuff in our catalogue, but because of something more important. The National Student Survey. For those outside of UK HE, the NSS is big, and doing well in it is important. Essentially, finalists fill out a survey of questions about their experience while at University (rate your feedback from assignments from 1-5), one of which is about the Library. Like most Universities, Units such as the Library at Sussex produced a plan as to how they would increase student satisfaction (and hence the NSS score), and while certainly not a core issue, the catalogue had been noted as a weakness and we had therefore committed to improving it. It was a concern therefore when the end of 2007 came and went, and in early 2008, while we could see early demos of the new product (then ‘Project nugget’), it was now clear it would not be ready for the summer, when Universities traditionally launch new services.

Instead we went for a relatively new (but actually ready to use) product called Aquabrowser (implemented by the likes of Harvard, Cambridge, Edinburgh and Chicago) which could import MARC21 records from any system and act like a catalogue (well it could, if you ignore the need to log in and renew books and place reservations, which as it happens is quite a big need).  We hit a very common problem in the Library technology world, and one I rant about at every opportunity (including this one): while MARC21 may be a bad standard, it is, at least, a standard, and it is, at least, used by most library systems for bibliographic information. The other type of information essential for a library catalogue is Holdings and availability information, i.e. for each item, the shelfmark and if it is on loan. This is simple information, and a common standard could be developed in an afternoon in a pub on an envelope and yet one does not exist. Instead, Aquabrowser screen scraped the Prism 2 record page for the item it wanted to show details for (all geeks will dance with joy at this wonderful solution). Active development in Aquabrowser seemed to end the day we signed up for it.

A quick note about another of Talis’ products: TalisList. TalisList was a reading list system and the unloved ugly duckling. At one event, which a colleague was at, a senior person at Talis said ‘with the new Talis Platform it’s just going to take a few months of Agile Scrums to quickly build the new TalisList application, that’s why this stuff is so good’. This follows the given wisdom in IT that to accurately predict development time you should take the time given in months and convert the units to years. And then add a couple of more as well. At the end of it, we had Aspire, of which we were the second customer.

2011

In March 2011 Talis announced the sale of the Library division to Capita. This was big news, and I confess, quite shocking, I hadn’t seen it coming. In a sense it quickly made sense, Talis was more and more acting like two companies under the same roof. One provided a traditional library system and made money from support, consulting and extra services. The other had the Talis Platform and a number of exciting developments around it, plus the Education side, built on top of the platform, which consisted of Aspire.

Talis has made a big statement, they had sold the Unit which they were named after, which was their history and main source of revenue. It showed a coming of age of the Linked Data work that had been taking place, and a confidence that there was business there to be made. The sale of the Library division gave them cash to get going and allowed them to be lean and focused.

The Library Systems market had matured, most libraries had a system they were happy with, the cost of changing systems (training, integration with other existing systems) was high. While there was on-going income from service contracts and new developments this was not a good fit for where Talis wanted to be, for a company like Capita there must be additional benefits of offering such a system as part of a larger suite of products to their public sector customers.

The split was, from the outside, a clean one, the only slight grey areas were the Web Catalogue (Prism3) on the Library side being built on top of the Talis Platform now on the Talis side (but then, Talis were always keen to host other companies data), meanwhile many customers of the Library System (Alto) now owned by Captia, were also customers of Aspire, which remained with Talis. Any benefits for customers of having two products from the same supplier were gone.

Talis continued as three Units; Talis Education (Aspire), Kasabi (a data store and related services) and Talis Systems, who provided Consulting. Presumably the developers and System Administrators who managed the actual Platform fitted under the latter.

2012

And now we get to the point of this post. All that above, really just a very long comment in parenthesis to set the context of where I am coming from. I’ve been following Talis for a long time, partly because it is what I should do when they are our main system supplier, and partly because I was interested in their direction. And as my job moved more in to innovation and new developments, again, this used useful for work (as one example, by following Talis, I grew interested in Linked Data, which in turn led me to the idea for SALDA, which became a JISC project).

Today, 9th July 2012 Talis announced a move away from Linked Data and shutting down of Kasabi.

But there is a limit to how much one small organisation can achieve. In our view, the commercial realities for Linked Data technologies and skills whilst growing is still doing so at a very slow rate, too slow for us to sustain our current levels of investment.

We have therefore made the decision to cease any further activities in relation to the generic semantic web and to focus our efforts on investing in our growing Education business.

Effective immediately we are ceasing further consulting work and winding down Kasabi. We have already spoken to existing customers of our managed services and, where necessary, are working with them on transition plans.

You can see some extra thoughts here.

There are a lot of talented people at Talis, and this must be a sad day for both current former staff. This has been their passion, and endless code, documentation, talks, presentations, plans and meetings have gone in to building what they currently have. To think it could all be turned off must be an incredible blow.

While working in my cosy public sector job, I admire those who take risks. I follow the start-up scene in the US and closer to home, and in my own small way try encourage those who work in Universities to try new ideas and be less afraid of risk (one example being blogging about a key commercial supplier in a way that might damage a client/customer relationship). Those who never fall never take big steps.

And Talis did take a massive risk. Credit to them. It sold, let us be blunt, the cash cow which must have brought in the vast majority of revenue. Yes this created a large pile of one-off cash, but this would not last, and they now had a set time limit for the Linked Data work to prove itself. Like any company, they would have planned this carefully, trying to predict growth and take-up of their services and plan for profitability.

Which perhaps makes today’s announcement even more surprising. No one could predict how long the economic downturn would continue for, or how much the Tories would cut back on new public sector IT projects, or how ideas like Kasabi would go down.  But my basic calculations as to how far several million pounds from the sale, plus consulting fees, plus Aspire fees would go, did not leave me thinking that the crunch point would be now. Of course no company waits to their last pound before taking appropriate action when out goings are more than income but I’m amazed that just 16 months after confidently saying Linked Data is the future for Talis, we now hear the opposite is true.

Two out of three divisions are to close, the third looks like it will be re-engineered to move away from Linked Data. (Ironically, in this blog’s drafts folder is a recent attempt at playing with the Aspire API and being ever so slightly frustrated – while acknowledging my limited experience – about how the linked data concept can result in having to make many calls to the API for what one can get in just one API call using the laughably uncool CSV format).

There were some signs, in the last six months a number of key people have left, and Aspire Development has made mention of a new infrastructure and platform. I did find talk of this odd when so much of Aspire’s infrastructure is The Platform so removing the need to worry about it,  so I guessed one interpretation of moving to a new platform involved moving away from the Linked Data based Talis Platform.

And what about The Platform? It really is the thing that has been there from the start, for many years you could see the release notes for the latest monthly upgrade to the Platform’s software on a wiki, it was constantly being improved. The announcement does not explicitly say what will happen with the Platform. In the short term, not a lot, Aspire and Prism 3 are built on it, and I have no idea if other third parties are using it (plus, disclosure, we have some data on there).

Perhaps the Platform on it’s own could be profitable? And if companies don’t like the idea of hosting their data on someone else’s servers, can the Platform be bundled up and sold as a standalone product to be deployed internally by large enterprises? I guess probably not, especially if it is built with other third-party components.

I admire Talis’ steps to create a new company in a new area in uncharted water, but wonder if there is anything to be learned from this, could more have been done to test the market and accurately plot take-up, demand, and the best range of products/services before taking the jump?

Aspire (a University Reading list system) has good take up in the UK, and even some internationally customers. And while breaking in to new countries is hard – and every country has a different HE setup (especially the US) – Aspire is a fairly unique product which might have many potential customers in countries that run Universities similarly to the UK. There are also complimentary developments  which may interest Universities (though with Moodle being free and Open Source, moving towards the VLE/e-learning-like functionality would be difficult).

I don’t normally publicly dissect the decisions and history of a business, nor my frustrations and experience with a commercial system in such detail (though when I have here, it is mainly from at least several years a go). I hope I haven’t upset anyone – or at least not too many people. Since I joined Sussex, nearly ten years a go, Talis has reinvented itself several times, I’ve enjoyed trying to second guess its next move.

But I end with this. Never am I reminded so much that I am a creature with a small brain, of at best average intellect and shocking poor ability to grasp what should be basic concepts than when I read the blogs of those who work, or have worked, at Talis. Time and again I am blown away by smart thought, insight, comment and ideas. I wish Talis the best in its new direction (so long as that direction involves making Aspire bloody awesome… I’m still a customer).

MARC Tools & MARC::Record errors

I know next to nothing about MARC,though being a shambrarian I have to fight it sometimes. My knowledge is somewhat binary, absolutely nothing for most fields/subfields/tags but ‘fairly ok’ for the bits I’ve had to wrestle with.

[If you don’t know that MARC21 is an ageing bibliographic metadata standard, move on. This is not the blog post you’re looking for]

Recent encounters with MARC

  • Importing MARC files in to our Library System (Talis Capita Alto), mainly for our e-journals (so users can search our catalogue and find a link to a journal if we subscribe to it online). Many of the MARC records were of poor quality and often did not even state the item was (a) a journal (b) online. Additionally Alto will only import if there is a 001 field, even though the first thing it does is move the 001 field to the 035 field and create its own. To handle these I used a very simple script to run through the MARC file – using MARC::Record – to add an 001/006/007 where required.
  • Setting up sabre – a web catalogue which searches the records of both the University of Sussex and the University of Brighton – we need to pre-process the MARC records to add extra fields, in particular a field to tell the software (vufind) which organisation the record was from.

Record problems

One of the issues was that not all the records from the University of Brighton were present in sabre. Where were they going missing? Were they being exported from the Brighton system? copied to the sabre server ok? Being output through the perl scritp? lost during the vufind import process?
To answer these questions I needed to see what was in the MARC files, the problem is that MARC is a binary format so you can’t just fire up vi to investigate. The first tool of the trade is a quick script using MARC::Record to convert a MARC file to text file. But this wasn’t getting to the bottom of it. This lead me to a few PC tools that were of use.

PC Tools

MarcEdit : Probably the best known PC application. It allows you to convert a MARC file to text, and contains an editor as well as a host of other tools. A good swiss army knife.
MARCView : Originally from Systems Planning and now provided by OCLC, I had not come across MARCView until recently. It allows you to browse and search through a file containing MARC records. Though the browsing element does not work on larger files.
marcview
marcview

 

USEMARCON is the final utility. It comes with a GUI interface, both of which can be downloaded from The National Library of Finland. The British Library also have some information on it. Its main use is to convert MARC files from one type of MARC to another, something I haven’t looked in to, but the GUI provides a way to delve in to a set of MARC records.

Back to the problem…

So we were pre-processing MARC records from two Universities before importing them in to vufind using a Perl script which had been supplied by another University.

It turns out the script was crashing on certain records, all records after the problematic record were not being processed. It wasn’t just that script, any perl script using MARC::Record (and MARC::batch) would crash when it hit a certain point.

By writing a simple script that just printed out each record we could as least see what the record was immediately before the record causing it to crash (i.e. the last in the list of output). This is where the PC applications were useful. Once we know the record before the problematic record, we could find it using the PC viewers and then move to the next record.

The issue was certain characters (here in the 245 and 700 fields). I haven’t got to the bottom of what the exact issue is. There are two kinds of popular encodings: MARC-8 and records in UTF-8, and this can be designated in the Leader (9th character). I think Alto (via it’s marcgrabber tool) exports in MARC-8 but perhaps the characters in the record did not match the specified encoding.

The title (245) on the orignal catalogue looks like this:

One work around was to use a slightly hidden feature of MarcEdit to convert the file to UTF:

I was then able to run the records through the perl script, and import it in to vufind.

But clearly this was not a sustainable solution. Copying files to my PC and running MarcEdit was not something that would be easy to automate.

Back to MARC::Record

The error message produced looked something like this:

utf8 "xC4" does not map to Unicode at /usr/lib/perl/5.10/Encode.pm line 174

I didn’t find much help via Google, though did find a few mentions of this error related to working with MARC Records.

The issue was that the script loops through each record, the moment it tries to start a loop with a record it does not like it crashes, so there is no way to check for certain characters in the record as it will already be too late.

Unless we use something like exceptions. The closest to this perl has out-of-the-box is eval.

By putting the whole loop in to an eval, if it hits a problem the eval simply passed the flow down to the or do part of the code. But we want to continue processing the records, so this simply calls the eval again, until it reaches the end of the record. You can see a basic working example of this here.

So if you’ve having problems processing a file of MARC records using perl MARC::Record / MARC::batch try wrapping it in a eval. You’ll still loose the records it can not process but it wont stop in it’s tracks (and you can output an error log to record the record number of the records with errors).

Post-script

So, after pulling my hair out, I finally found a way to process a filewhich contains records which cause MARC::Record to crash. It had caused me much stress as I needed to get this working, and quickly, in an automated manner. As I said, the script had been passed to us by another University and it already did quite a few things so I was a little unwilling to rewrite using another language (though a good candidate would be php as the vufind script was written in that language and didn’t seem to have these problems).

But in writing this blog post, I was searching using Google to re-find the various sites and pages I had found when I encountered the problem. And in doing so I had found this: http://keeneworks.posterous.com/marcrecord-and-utf 

Yes. I had actually already resolved the issue, and blogged about it, back in early May. I had somehow – worryingly – completely forgotten any of this. Unbelievable! You can find a copy of a script based on that solution (which is a little similar to the one above) here.

So there you are, a few PC applications and a couple of solutions to perl/MARC issue.

VuFind in 8 minutes using Amazon EC2

I’ve created a screencast showing how easy it can be to install VuFind. Here I go from nothing (no server, no OS) to full VuFind install in under 8 minutes.

It would probably take less than two minutes (under 10mins in total) to add MARC records to the installation, but I didn’t have any to hand at the time.

This demo cheats a bit by using a script that does the heavy work, the script was a mash up I created taking existing scripts and commands that come with VuFind with a few tweaks. It probably would have been only slightly slow to run most commands manually.

The script in question is at http://www.nostuff.org/vufind-install.txt and of course anyone is free to download (and improve, please share changes). There’s lot of potential to improve it’s ability to work on different flavours of Linux.

Multi Instance

One of the key aspects of the script is that is allows you to easily install multiple instances of VuFind on to a server. By default VuFind installs in to /usr/local/vufind and has other things (databases, apache conf) names vufind. The script prompts for an ‘instance name’ and then uses that in place of ‘vufind’.

The rational for this is my feeling that VuFind is an excellent tool for creating niche catalogues that are a subset of the full Library collection (or as someone put it a ‘tranche of the catalogue’). A branch Library, particular collection, rare books, a particular School, Core reading (short loan books), specialist resources (AV / laptop items for loan) etc. The idea of a organisation’s records being in system, rather than many (of varying quality) makes sense, but it’s reasonable for those moving their records to a central system to want to be able to search their records independently of other records (and expecting users to go to an Advanced search of using a refine option of the main catalogue is not really not an option). VuFind seems like an obvious answer to this. Especially if new instances can be set up quickly.

In fact it seems to be a failing of most of the main Library Manage Systems (ILS) and their web interfaces that being able to create lots of interfaces (slicing and dicing the underlying single large pool of records). Most require a large amount of effort to create a second web interface to a catalogue. This seems like such an obvious flaw and a barrier to moving to one system to manage resources such as books and other items for an entire organisation.

Amazon EC2

Amazon AWS is a great tool to use for this. A small instance will cost around $0.10 an hour, the micro instance is even cheaper (just over $0.02). Create an instance for ten hours and you have spent around a dollar. Mess it up, just destroy it and create a new one. No risk and no hassle (for first time users the hardest thing is probably the ssh private keys).

Jisc bid writing

Today I submitted a JISC bid on behalf of a team, as part of the recent JISC call Infrastructure for Education and Research (’15/10′ to its friends). The call was actually a set of broadly related different strands, we submitted (with a whole 30 mins to spare) under a strand called Infrastructure for Resource Discovery, and there’s a nice web based version of the call on Jiscpress.

Jiscpress was created in part by Joss Winn and a post of his I saw today inspired me to knock out this this rambling.  Go read it before you read this, thanks.

I admire his openness and I should strive to do the same. Funny that I try – and to an extent automatically do – make much of what I do open, but with this sort of thing there is a tendency to keep it close to your chest. There were very few tweets in the run up to the bid. Why are we not more open? He also talks about his JISC bid writing and tips, here’s mine.

My first experience was attending a ‘town hall meeting’ in Birmingham about the JISC Capital programme, around 2006. For a starts I didn’t even know what a Town hall meeting was (I think it means a briefing day, everyone presumed you should know this). I do remember it felt daunting, there were lots of people in suits. Lots and lots of sentences I didn’t understand (We’re going to innovate to take vertical solutions and silos and break them in to a horizontal landscape to help foster a new wave of disruption to help inject in to the information environment testbed) and no one I knew. I looked at the massive documents that accompanied the programme, many of them, many times. And looked at what I needed to do to write a bid. Budgets, signatures, partners, matched funding. I didn’t submit one.

Since then the community has developed, in no small part thanks to Twitter, but also to things like mashlib and many one day events (which either never used to exist in the field that I work, or I was just more ignorant then than I am now). Beer has been a big part of forging links in the HE Library / tech community. Seriously. It really needs its own cost code.

I looked at a number of potential calls over the last few years – often they required a report or development that I had no really knowledge in, I almost came close to putting something in for the JISC rapid innovation call (and helped mark it). When the JISC LMS call came out about a year a go the time and subject were right to submit a bid. I knew the subject matter, I had a natural interest and passion, and I knew the people who would be involved in these sorts of things.

These are tips for people who are thinking of putting in a bid, especially those who are stupidly disorganised like me:

  • Time between a call being released and the submission deadline is short, normally about a month, which in HE terms is not long. Use the JISC funding timeline to get a heads up of future funding opportunities so that you can prepare for working on a bid (including blocking off time during a month, and arranging initial meetings with others) before it comes out and not taken by surprise. The JISC 15/10 call had a blog post a few weeks before the call came out giving a feel for the call and confirming the date it should be released. It helped me to start thinking about ideas and block out time to read it (even if some of that time was in the evening) on the day it was released.
  • Every organisation is different (that applies to everything I say) but for us, setting up a meeting a couple of days after the call was out was very useful. It included those who it could affect and relevant senior staff. The call had lots of areas which matched our goals (and some, not always the same, that matched my personal interests), it was good to prepare a briefing and then bounce those ideas around to see what had potential and see what other ideas came up. It helps in many ways, to quickly focus and refine potential ideas (and drop those that people show no interest in), keep everyone in the loop and see whose willing to work on it. It stops it being one person or departments little side project.
  • The briefing day was very useful, especially for talking to people, finding potential partners and getting great advice.
  • Now I have an incredible amount of bad points, but two of them are leaving everything to the last minute and working in a very linear fashion. Often things that feel like they are the last things you need to do are actually things you need to set in motion earlier on. This seems so simple typing it now but I’ll probably (be warned colleagues) do the same next time. These include budget, project name and supporting letters.
  • The budget is hard. See if your org offers support in doing this. The problem is certain magic numbers (the wonders of trac and fec) can only be calculated once you know all your other costs. However I tend to find that near the end of the bid writing process you suddenly think of some work a particular group/person/org will need to do so you need to factor in those hours and costs, or you invite someone from the other end of the country to be on a panel and need to cater for their travel and hotel costs. In best ‘do as I say don’t do as I do’ tradition I would try and bash this out well in advance so it can be sent to those who can then check it over and fill in the magic numbers.
  • Asking a friend at another Uni if they don’t mind asking their (P)VC to drop everything so that he/she can write a nice supporting letter for your project is hard. So try and avoid it by getting it done sooner. Again often easier said than done as projects tend to evolve during the bid writing process which can make letters reflect out of date ideas or stress the wrong areas.
  • Letters and other stuff need a project name. I’m guilty of really not thinking a name is that important. The acronym will be meaningless to all. On my first bid I just used a working name (all of 5 seconds thought) and right at the end asked everyone if they are happy to go with it. Mistake. Changing project name at the last minute is a pain.
  • A key point. You need a good idea. And a good idea is one that is a good fit to the call. You may have a perfect methodology but if the idea doesn’t fit with the call then you could be in trouble. I’m guessing ‘It’s not really what you’re after but it’s such a good idea you must want to fund it’ is not a good sign.
  • Speak to people, I mentioned the briefing day above, but also speak to the programme manager, they’re nice people! Talk about it on twitter.
  • You don’t need to be an expert. I was put off for years from writing a bid about things I was interested in but didn’t think I knew enough about. You can ask people to work with you! People who know how to do stuff. I’ve just submitted a bid about Linked Data. Now I’ve followed the rise of Linked Data for years and tried to learn about it, but taking an XML file and ‘converting’ (is that the right term?) in to Linked Data, I had no idea how to even start. But I spoke to some people, who recommended someone, and they do know what to do
  • Approaching others out the blue is difficult, especially if you don’t feel ‘part of it’. All I can say is ask. And if you don’t know who to approach ask people (either at the JISC or via twitter) for advice.
  • If you have a clear(ish) idea of what you are going to do, broken down in to mini packages of work, andwho is going to do each one of them, then writing the actually bid is easy. Treat it like a job application. We all know that when writing a job app use the Person spec as a structure, a paragraph for each entry of the person spec, perhaps even with headings to help those marking your application. A bid is just the same, the clearly laid out structure of a bid is worth sticking to, it’s the same thing the markers will have to use to score each section. If the JISC Call document ‘proposal outline’ refers to a section which talks about Project Management, leadership, copyright, evaluation and sustainability. Then write about those things together as clearly as possible. Long winded paragraphs which ramble on about everything and make subtle implied passing references are a bain to the marker and no help to you.
  • But, I have been involved in marking and assessing bids, what impressed me was the impartial way bids were judged, and the real desire of wanting to fund Good Ideas, even if the actually bid document needs a little clarity in some parts. Especially from first bidders. To stress, there was a real desire to see first time bidders (with a good idea) be successful.
  • So the actually bid write up can in a way be left later than other tasks mentioned above, as it mainly involves just you, (and probably a couple of people who work closely with you to check it).
  • In an ideal world this would all be done weeks before it needed submission. In real life other factors (and work) can mean it can be a last minute dash. That’s fine. But make sure a few days before it has to be submitted you put in to an email to yourself (and others involved in writing it up): The email address to send it to, the submission date time, cut and paste things such as the exact format it needs to be submitted in (how many fiels, how big), number of pages. Add a link below each of these facts to the actually source of information (jiscpress is excellent for this), so when you’re panicing and presume everything you know is wrong you can follow a link and see for yourself that it really is eight pages max for the proposal, direct from the horses mouth.

The whole process of submitting a bid, and running a project, is good experience. It often involves working with people you would not normally, and doing differently to your normal job. Now if I get a chance in the next few days I will follow Joss’ example and blog about our proposal.

Talis Aspire, checking if a course has a list

Talis Aspire is a new-ish Reading List system used at the University of Sussex Library.

On Aspire, a url for a Department looks like this:
http://liblists.sussex.ac.uk/departments/anthropology.html

A page for a course looks like this (for course l6061):
http://liblists.sussex.ac.uk/courses/l6061.html

The nice thing is that you can replace the ‘.html’ with .json or .rdf – while the html only has a link to the related list, the json and rdf expose other links and relationships, such as the department.

For us, most (but not all) courses only have one list. URLs for lists are not predictable in the same way as the courses URL. E.g.
http://liblists.sussex.ac.uk/lists/EEC1E2AA-C350-DAFC-BDE4-1E9EF5EC69E5.html

Continue reading

Library Catalogues need to cater for light-weight discovery clients

Way back I wrote a piece about the changing model for library catalogues, you can see it here. The main premise was that trying to maintain records in a Library Management System (LMS/ILS) for all the items you want your users to discover is no longer feasible. This is especially true in this here digital age, trying to maintain records for all the e-journals a University has access to is an almost impossible task, and LMS were not designed for thousands of MARC records to be dropped and then re-imported (i.e. sync’d) with another source. And what about all the free stuff, is an e-book not worth being discovered by users because it is free?

So let the LMS be a record of what your library physically holds, and your discovery service a place where users can find (and see how to access) resources that are of interest to their research and work. The former being just one element (albeit a major one) of the latter. Meanwhile your LMS physically holdings can be shared with other discovery systems (such as union catalogues) to show what your library physically contains.

Continue reading

ircount : update

One Sunday morning in January this year I got an email sent automatically from the webhosting company. It contained the output of the script that ran weekly, when all ran fine the script produced no output. When something went wrong the error messages were emailed to me. Judging by the length of the email something big had gone wrong.

The script collected data from http://roar.eprints.org/ – to be used as this weeks ‘number of records’ for each repository.

The reason became clear quickly. A major revamp to ROAR had just been launch, showing off a new interface, which used the Eprints software as a platform (essential a repository or repositories). This was a great leap forward but unfortunately removed the simple text file I used to collect the data, and what was more, the IDs for each IR had changed.

I finally got around to fixing this in May. The most fiddly bit was linking the data I collected now with the data I already had. This involved matching URLs and repository names.

Anyways. Things should be more or less as they were. A few little tweaks have been added. A few bugs still remain.

As ever you can view the code and changes here: http://trac.nostuff.org/ircount/browser/trunk

And checkout the svn here: http://svn.nostuff.org/ircount/

ircount can be found here: http://www.nostuff.org/ircount/

Summon @ Huddersfield

I attended an event at Huddersfield looking at their and Northumberland’s experience of Summon. http://library.hud.ac.uk/blogs/summon4hn/?p=22 These are my rough notes. Take all with a pinch of salt.

The day reaffirmed my view of Summon, it is ground breaking in the Library market, and with no major stumbling blocks. They are very aware that coverage is key and seem to be adding items and publishers. It searches items that a organisation has access to (though users can tick a option to search all items in the kb, not just those they can access). They have good metadata, merging records from a number of sources, and making use of subject headings (to refine or exclude from the search).

There was general consensus that it made sense to maintain only one Knowledge-base, and therefore in this case, using 360 Link if implementing Summon. There was also general dissatisfaction for federated search tools.

To me, and I must stress this is a personal view, there are two products that I have seen which are worth future consideration: Summon and Primo. Summon’s strength is in the e-resources realm and as a resource discovery service. Primo’s strength, while offering these features/services, is as a OPAC (with My account features etc) and personalisation (tags, lists). Both products are in a stage of rapid development.

In my view, one decision to implement one of these products – which ever it is –  will have a chain reaction. And I think this is an important point. Using Sussex as an example, it currently has Aquabrowser (as a Library Catalogue), Talis Prism 2 (for Borrower Account, reservations, renewals), SFX (Link Resolver) and Metalib (Federated Search).

Two example scenarios (and I stress there are other products on the market and this is just my personal immediate thoughts):

One: Let’s say Sussex first decide to replace Metalib with Summon. They would probably cancel Metalib (Summon replaces it). Probably move from SFX to 360 Linker (one Knowledge base). May then wish to review our Library Catalogue in a years time: Primo is no longer on the cards (too much cross over with Summon, which we now already have), so they either stick with Aquabrowser (but the new SaaS v3 release) or perhaps move to Prism 3 (Talis’ new-ish SaaS Catalogue). Sussex would end up with no Ex Libris products, but would potentially subscribe to several Serial Solutions products.

Two: Let’s say Sussex decide to replace Aquabrowser with Primo (which acts more like a Catalogue than AB). They cancel Aquabrowser. Primo would (in addition to being the primary OPAC) have Summon-like functionality, allowing users to search a large database of items instantly, with relevance and facets. So Summon would not be an option. Stick with SFX (Metalib would be a side feature of Primo, with a Primo-like interface). With a number of Ex Libris products they would want to keep an eye on the Ex Libris URM (next genration LMS), they would have no Serials Solutions products.

The following are some notes from the day:

Sue White from Huddersfield Library started the day, saying it is probably the best decision they have ever made.

Helle Lauridsen from Serials Solutions Europe started with a basic introduction of what it is and what their key aims were (i.e. be like google).
She emphasised that all content (different types and publishers) is treated equally.

“better metadata for better findability”. merge metadata elements. Use SerialSolutions, urichs, Medline, crossref to create the best record. ‘record becomes incredibly rich’.

She went through all the new features added in the last 12 months, including a notable size in the knowledge base. ‘dials’ to play with relevancy of different fields. Recommender service coming.

Shows a list of example publishers, included many familiar names, have just signed with Jstor. She showed increase in ‘click throughs’ for particular publishers, the biggest were for jstor and ScienceDirect. Newspapers have proved to be very popular.

There is an advanced search. There has been negative feedback ‘please bring back title/author search’.

Eileen Hiller from Huddersfield talked about product selection. She mentioned having people from across the Library and campus on the selection/implementation group, getting student feedback and talking to academics. They used good feedback in their communications (e.g. in the student newspaper and their blog). Student feedback questionnaire has been useful.

Dave Pattern talks about the history of e-resource access at Huddersfield, started with a online word document and then a onenote version. Metalib was slow, and they found more students using Google Scholar than metalib.

They started with a blank sheet of paper and as a group thrashed out their ideal product, without knowing about Summon. First class search engine, ‘one stop shop’, improved systems management, etc. Invited a number of suppliers in, showed them the vision and asked them to present their product against it, Huddersfield rated each one against The Vision. Report to Library Management Group. Summon was the clear fit.

Implementation: Starts off with a US conference call. MARC21 mapping spreadsheet, they went with defaults. they add a unique id to the 999|a field.

Be relistic with early implementation, e.g. lib cat and repository are only two local databases. Be aware of when you LMS deletes things flagged for deletion. Huddersfield had early issues with this.

Do you want your whole catalogue on Summon? ebook/ejournal records etc.

Summon originally screenscraped for holdings/availability (aquabrowser does this for Sussex) could bring the traditional catalogue to its knees.

Moving to 360 Link makes you life much easier if moving to Summon, only one Knowledgebase to maintain.

They asked Elsevier to create a custom file for their sciencedirect holdings to upload to 360.

Huddersfield found activating journals in 360 a quick process.

360 API more open than Summon API. for customers only. You can basically build your own interface. Virginia using it to produce a mobile friendly version of their catalogue. Hud used it to identify problem MARC records.

94% of Huddersfield subscribed journals are on Summon (No agreement with the following: BSP 80%, Sciencedirect, Jstor… Westlaw/LexisNexis 55%). They now have a agreement with LexisNexis and Jstor. In discussion with Elsevier. They manage to have this level of coverage for these reources by using other sources for the data (e.g. publishers for Business source premier and A&I databases for ScienceDirect).

Dummy journal records for journal titles (print and e) so that they are easily found on Summon. See this example.

Can recommend specific resources (‘you might be interested in ACM Digital Library’), can be useful for subjects like Law.

Summon at Hudderfield now has 60 million items (see left hand side for breakdown), indexed. Judging by this Summon seems to have 575 million items indexed in total.

Survey results: Users found screens easy to understand. many (43%) refined their results. Dave thinks that now Google has facets on the left may increase facet usage. 82% for results were relevant to their research topics.

They will go live in July. Currently working on training materials and staff training. Considering adding archives and Special Collections in the future.

Annette Coates, Digital Services Manager, Northumbria Uni.

She gave a history of e-resource provision, 2005 onwards: webfeat (they brand it nora, which they are keeping for Summon). ‘We have the same issues with federated search that everyone else has’. Both Northumbria and Huddersfield are keeping a seperate A-Z list for e-resources (N are using libguides, like Sussex).

User Evaluation: is it improving the user’s search experience? how can we improve it futher? NORA user survey. Timing important, Getting people involved, Incentives, Capturing the session. They will use all the user feedback in a number of ways, ‘triangulate to ensure depth’, use good quotes as a marketing tool (including to lib staff), feedback good/bad to Serials Solutions, use it to improve the way they show it to others…

Northumbria Summon implementation timeline

Q&A

Focus groups, guidance?
very little guidance in focus group, and let them play with it

What is the position regarding authentication?

N use citrix. Will be Shibolising their 360link.

H channeling as much as possible through ezproxy. don’t have shib. promote usage though usage portal, which authenticates them.

No shibboleth integration at the moment.

(discussion about how summon may mean you can stop trying to add journal records, and can raise lots of questions… should summon be the interface on your catalogue kiosks).

You can send list of ISSNs to Serial Solutions to see matches, to find out what your coverage would be.

There was a very vague indication that OPAC integration may be on the cards for Summon. This is an important thing IMO.

Number of comments about Library staff being far more critical than users.

Summon ingesting stuff (MARC) from LMS 4 times a day. Using DLF standard for getting holdings data from LMS. (this is a good thing). Huddersfield wrote the DLF protocol code.

Q: Are SerialSolutions (proquest) struggling to get metadata from their direct competitors?

A: SS: Ebsco the main one, but we go direct to publishers. And for Elsevier, able to index it from elsewhere (and in talks with them).

Q: lexis and westlaw, where only 50% coverage, how do students know to go elsewhere (i.e. direct to the resource)?

A: for law students point them to e-resource pages (wiki) as well as summon to promote direct access to them. also (and perhaps more importantly) will have recommender which can recommend lexis/westlaw for law searches.

Q: can you search the whole summon kb, not just those things we subscribe to?
yes

Q: Are there personalisation options? (saving lists, items, marking records)
May come in the future, summon are thinking about it.

ircount development

I’ve finally got around to spending a bit of time on the ircount code.

This post goes through some of the techy stuff behind it. If you’re just interested in features, I’m afraid there’s none yet, but you can now compare more than 4 repositories, but that’s as far as you’ll want to read. Continue reading