or08: eprints track, session 2

After coffee a little more talk about new features and the future as we ran out of time before. Christopher Gutteridge has now turned up, he may have had a few grown up fizzy drinks last night.

(lost concentration here: salt grain take) Eprints plugin will try and pick when people enter their names wrong (e.g. get first/lastnames mixed up). Report an eprint (or report an issue with an eprint) link on item/record pages?

3.1 beta: should be released in a day or so. Live CD available.

When will the new template (for records/items) including related papers (or ‘people who liked this also liked…’), html designer working on this. Can recreate abstract pages daily for fresh data (e.g. i think for stats/other papers).

People come in via Google for an item and the leave again. Soton ecs put links to postgrad prospectus and more on abstract pages for items, found hits to postgrad prospectus tripled.

Talking about more finely grained controls ans privileges , i.e. who can edit what, and where, and giving people additional power. Includes, for example, this person can edit wording of fields/help, but not edit workflow.

11:42: now moving on to research assessment experience.

Bill Mortimer – Open University.

How Open used eprints to support the RAE experience.

used eprints as a publication database because it was publicly available and helped increase citations. Also because of the reporting tool developed for eprints.

Open use mediated deposit but also imported records and self deposit.

Only peer reviewed items in ORO. Had up to 7 temp ‘editors’ processing the buffer.

Very slow uptake when mediated. Now have just under 7,000 items in ORO.

Simplified the workflow (which of course ep3+ have improved). Researchers responsible for depositing items for RAE submission.

Pro: increased awareness (of IR) increased deposits.

con: overlap of perceptions of ORO and RAE process (some felt RAE took over the IR). Lots of records but only 16% carry full text (% of full text varied by department).

Slide with some future ideas, good, see presentation on (though not currently there) http://pubs.or08.ecs.soton.ac.uk/

12:06am

Susan Miles – Kingston

metdata only repository at the moment but plan to add content and full text this year.

uni departmental structure and hierarchy has been the most controversial thing. Didn’t use RAE tool, wasn’t out the box.

Subject team staff created records, but focused moved to collection of physical items. (some) staff really got in to the IR, but this had the downside that many left with their new skills and experience!

misc bits

  • non existent items
  • people trying to pass off others work
  • items being removed and then re-entered constantly at the last minute for the rae
  • over sees academics caused issues.
  • proof of performances and other ‘arts’ outputs were a challenge (next time get the academics to do it).
  • a barrel moving back and forth in a room was a piece of research to be submitted for the RAE (How. evidence, metadata)

Unexpected, but lots of interest in the IR across the University. But lots of things in the buffer and no staff.

University committee has endorsed the IR as the source of publication data.

Because of using subject team staff for IR RAE, subject support now have good knowledge of the IR, which is good.

12:27

Wendy from soton

higher profile in Uni due to RAE work means people are including her – and the IR – more in discussions across campus such as looking at the REF.

question (from me): were any academics reluctant/against their rae information being put online? Answer: no

[anon comment, etheses mandate being reviewed regarding animal rights issues etc]

William Nixon: also planning to upload rae data. Does not foresee any problems, BUT recommend to not flag items as rae08 as some academics may have issues with this.

Les: HEFCE put metadata for items submitted to rae on web anyway.

q for open: you are currently only published peer reviewed items, do you plan to change this.

a: yes reviewing.

or08: Eprints 3.1

At the sucks-less-than-dspace Eprints track today. First up Eprints 3.1 and Future. Haven’t seen anything about 3.1 before this, v3 was released over a year a go so looking forward to seeing what is new.

9:10am: Les Carr is talking. reviewing v3 released last year. Talking about the large amount of work surrounding a repository (for all), which he experienced first hand running the soton ecs repository, and the work they have put in to help this. He found that when he contacted academics to point out problems he has fixed with their items/records they seemed pleased glad that someone was doing this. Last year they (eprints team) wanted to focus on ‘things on the ground’ to make things easier and not focus too much on rejiging the internals.

9:20: 3.1 more control for users. manage the repository without needing technical time (especially as University IT services often want to just set something up and leave it). showing Citation impact for authors.

Eprints 3 platform is built of two parts: ‘core’ backend, and plugins. Plugins control everything you see (I didn’t know plugins were used to this extent). A lot of the new things are just new plugins ‘slotted in’. Plugins can be updated separately which means upgrading specific parts of functionality is easy and doesn’t affect the whole system.

Lots of things moved from the command line to the web interface.

Administration: user interface for creating new fields and and configuring administrative tasks (sounds good).

Easily extend metadata, what gets stored, in a nice user interface.

9:31: live demo of adding new fields: ‘manage metadate fields’, you can edit them for each dataset e.g. document, eprints, users, imports. First get a screen showing all existing fields, a text field to enter a new field name (and something to show if you have any fields half created, to continue). Interface looks similar to creating an eprint item. select the different types of field e.g. boolean date, name, etc, lots of them, with descriptions of what they are, also one is a set where you can add a list of defined options, another is compound which can have various subfields. This is looking great.

9:38: next screen, loads of options: required? include in export? index? As name was selected on prev screen various options specific to the name field type. lots more. Has help (click on the ‘?’).

9:41: next screen set of questions about how this is displayed in the user interface, i.e. text user would see, help text. Again seems well designed. Editing XML in the past wasn’t rocket science but it was easy to forget steps or get syntax wrong, plus (certainly for v2) you had to do it with no items in the archive (not easy on a live repository!)

By default new fields appear in the MISC step (screen) of the deposit process for users. which can be changed by editing the workflow.

9:53: configuration (via web interface), fairly crude at the moment but looks to be useful (though not turned on for the demo repository), basically can edit things that are in cfg files. plan to turn this in to a full user interface in the future (not sure if for 3.1 or beyond).

9:58: running through some of the thins in the cfg files, such as how to make a field mandatory only for theses.

Quality Assurance. ideas of an ‘issue’ (something amiss) and an ‘audit’.

issue: stale, missing metadata. issues reported by item and also aggregated by depository.notification of issues can be emailed to authors. We cn define all this, i.e. what counts as an issue in the cfg files. can also check for duplicates (good as it will make my god awful script we use at Sussex obsolete).

Can have a nightly audit, and see if anyone has acted on the alerts and issues. reports can be generated for people.

10:07: batched editing. do a search and then batched change any fields for those search results. nice. running short of time so not demo’ing.

manage deposits screen (for users) icons on the right of each item of yours, to see, delete, move, etc. you change what columns you see on this screen by using icons at the bottom of the screen, can also move them around.

Impact Evidence: citation tracking, researchers can track citations counts and rank papers. volatile fields don’t change the history of a record. download counts from irstat.

Better bibliographies. can reorder, choose what to view, better control. this is very much needed as different researchers want their publication list in a different way. uses stylesheets.

Complex objects: all public objects have official URIs. expanded document-level metadata .

Versioning (based on VERSION project). ‘simple and useful’. pubished material ‘pre post or reprints’. unpublished materail, early draft, working paper. looks good.

10:19: Improve Document uploader. can upload a zip file of many files.

10:25 discussion about versions, e.g. how a user may add a draft (with limited metadata) and then go on and re-edit the item later on when they have a published version.

‘Contributers’ field. roles taken from dc relator names (225). large list of roles, may want to cut down.

A new skin, but not for 3.1 – i.e. record/abstract page will show a thumbnail of the item at the top, because the item is the important thing not the metadata (which is what is emphasised in the ui at the moment), i.e. in the same way that flickr shows the photo as the main thing on the page, and metadata at the bottom, good idea. new layout looks good.

Future: no time to talk: cloud computing, amazon eprints services perhaps (you just sign up to a IR on amazon and one is automatically created). On top of Fedora (saw folks on IRC talking about the same for Dspace the other day), or the Microsoft offering just announced. In a box (i.e. comes out the box as a pre-installed server) honeycomb.

or08: live blogging experiment

Today – as you probably have seen – I posted some badly written notes that meant nothing to no one, and interested even fewer. This was my experiment in live blogging. I’ve seen others do it quite a bit recently and always thought it worked well, so wanted to give it a try.

Some thoughts:

  • Using a different tense is a little weird. Normally we write in the past tense if reviewing an event, when blogging as it happens I found myself switching between current and past tense (the latter out of habit). This wasn’t helped with no internet access before coffee, so i was writing in to a text editor (not that you wanted to know, called Smultron) something I planned to paste in to a blog post in the future, which when posted will be talking about the past, but i wanted it to read as if it was live!
  • I looked up at one point while switching between wordpress and twitter and saw two laptop screens of people in front of me, one had twitter across the screen, the other had the wordpress composition window. Am I boring or with the in-crowd?!
  • Perhaps the biggest point was my difficulty in note taking. I wanted to write stuff that other people not there would find useful. However, my notes were largely rather basic, not meaty enough to say much, someone reading would get a general idea what the talk was about. It would give someone a feel of the outline of a talk, but not what the key points were, something which I think is a crucial difference.
  • As well as taking notes, I had various tabs open, including the excellent crowdvine conference site, twitter, bloglines, google blog search (searching to see what turned up for ‘or08’… oh look! me! god I’m so vain). At times the note taking, twittering (and learning about tags on twitter) and checking out crowdvine, I would occasionally look up and have no idea what the presenter is talking about (I’m a man, I have evolved to be an expert single tasker). Must try and ensure I’m not being distracted from the actual reason I’m there.
  • My notes were rough. Not helped by the fact that the lecture hall was very full (and it wasn’t one of those poncy MBA lecture theaters with big wide seats), so I was being careful of my elbows – which limits typing, and for me, using the shift key. Does the embarrassment of badly typed, ill thought, ungrammatical notes get trumped by their potential interest to others and timeliness?
  • Timeliness is an important point, I could have waited until the end of the day but wanted to get them out straight away.
  • After morning coffee I had the internet. I sat down with next to two people I had met before, while there were quite a few with their laptops open, they were not, and I felt a little self conscious. They were trying to listen to the talk, and here was this guy next to them mucking around on his laptop the whole time. Actually I don’t think they were bothered.
  • While talking about being self-conscious, does posting things as quickly as possible look like attention seeking and ego massaging? Never thought that about anyone else doing it so hopefully the answer is no (but then I love this sort of thing, so I wont).

So will I do it again. Yes, and I like having the web to hand while at these things. I think I need to improve my note taking, and perhaps take more time writing up points (and my thoughts) on the things of interest rather than writing lots of little snippets. I basically need to take notes anyway (whether notepad and pen, MS Word or a blog), and it does make it stick in my head better than just sitting there, so I may as well make my notes open to others. The timeliness (thats time – li – ness, not Time Lines!) is perhaps harder to argue, but I like the idea that things are hitting the web the moment they happen, so think I will continue it.

I remember last year a couple of years a go when the www 2006 conference was taking place (can’t believe it was two years a go), I was sitting at my desk watching flickr, blogs, and just about everything else being updated – a lot – in real time. The ability for me to see photos, watch videos and see notes of things that happened a couple of minutes a go amazing and really help capture the feel for the whole event.

Other bits

Battery was running low (why didn’t campus designers in the 60s think to add plug points in lecture theaters for laptops) so had to revert to pen/paper for session 3. All good talks, but the SWORD talk by Julie Allinson was excellent.

Didn’t stay for the poster session minute madness but of the few posters I did have chance to see, the one for feedforward really got my eye and just looks excellent.

Crowdvine (link to or08 on crowdvine)

This is an excellent tool, and I recommend it to anyone setting up a conference. Though I think web savvy crowds will get more out of it (e.g. integration with twitter and web feeds). It helped to put names to faces, but it also helped to get a feel for who are some of the more prominent people. For example: If Les Carr talks about Gnu Eprints, I know to listen as he manages the thing, and if Bill Hubbard talks about IRs I know to listen to what he says because he Manages Sherpa in the UK. However I couldn’t tell you the same about the Dspace or US equivalents. I still can’t tell you their names (I don’t do names) but I certainly recognised faces of those who seemed to be very active in their area. I know this sounds a little elitist or hierarchical, but it really isn’t meant to be.

Handy hint: if you want your profile page to be at the top of the conference homepage, just make superficial changes to it every few hours!

As someone mention on twitter, this, and every social networking site, needs much more than just ‘friend’. Perhaps: ‘i have seen a few emails from them on mailing lists and I may have even replied to one’, ‘I kinda stood in the same group as them during a coffee break at a conference once’ and ‘I read their blog and see them mentioned here and there so we are a little like friends’. I felt a little unsure when clicking on a few people as friend, but then they all added me back (except Christophe Gutteridge, bastard). Of course this is no different to facebook, the amount of people who have requested me as a friend who I swear I have never spoken to, even if they new a girl who lived in the corridor above me at the first year of University (that’s a real one).

(PS I used too many brackets and exclamation marks in this blog post!)

or08: session 2b: Sustainability Issues

[again unedited, unchecked, sorry for mistakes!]

Warewick Cathro
Assistant Direct General, National Library of Australia

[sorry didn’t take very good notes for Warwick’s good talk]

“towards the australian data commons” paper on the web for reference on Australian policy in this area.

various sites/projects:

arrow: aggregates IRs in Uni repositories, 90,000 records, expects to grow rapidly. not a ‘native search service’ intended to let others use the metadata.
future: evolve, support financially by ‘austrian national data service’ (like everything else in this talk). will use shibb and poss openid.

regstry services
[interesting stuff, another project, but didn’t make any notes]

pilin – identity management
handle mirror/proxy.
tools and define requirements for a national service
national persistant identifier service.

Obsolescence notification
aons project
toolkit on sourceforge
adapters for ir software
compares profile with data from external registeries, for each registry they have built an adapter

Australian METS profile
encoding of preervation metadata
exchaging data format.
three layer model, top, generic profile, middle: content models, bottom: implementation profiles

———-

Libby Bishop (Leeds/Essex)

Timescapes: looking at relationships, family life (young people, fatherood, older people).
But also buildng a data archive in the process, some objects not born digital.

400+ participants
5000+ objects
500+ gb size.

Sustainable = Shareable + desirable.

Share:
IP sorted, resource discovery, harvestability.

Desirability
what makes people want to use this, this issue is at the service
researchers are primarly audience, but also media, policy makers, students.
Longitudinal (new term to me) e.g. track people as they move through time
needs to be multimedia: voice, video, audio.
video helps to engage you people
themematic data
reuse helps make it desirable.

Distinctive features of timescapes
answer: data (primarly)
but also: multimedia, sensitive content, complex access
Longitudinal, dynamic updating.
Intergrating of research, archive and reuse.
researchers are central to the design, they interact with repository.

Timescape Repository (at leeds), Timescapes data preserved at UK Data Archive (essex):
no point recreating a preservation service at leeds. uses digitool at leeds because mandras (?) was. digitool not open. wanted to use an existing tool at leeds rather than setup a new one.

metadata:
lots of challenges, especially in what is needed.
lots of people, expertise, and different institution.
researchers tend to be the experts and know their area,
and IR people know current practice in metadata.
looking in to how to mark up audiovisual, e.g. looking at a METS wrapper.
modifying depositor interface to repository, let people add their own metadata, with some stuff still being added by the IR staff.

showing an example of the sort of data (in a MS Word file) the researchers are collecting. need a fair bit of conversations to encourage researchers to do this. (transcript guidelines/forms)

back to sustainability:
“key strategies for sustainability”
– embedding in multiple institutions (can’t predict the future).
– build trust with researchers in what you are doing (and asking them to do) is essential, esp in long term.
– reuse!
lots of people want to be part of the project: affiliates programme. those who want to work closely have to agree to contribute their own data and reuse current data.

Summary:
researchers agreed to share and reuse data: success
waiting list of affiliates

issues:
quaility of researcher dsubmitted data, some reluctant to share, digitool multimedia support limited.
Collaboration takes time, especially across institutions.

or08 session 1 (part3)

Rich Tags: cross repository browsing
ds
ECS, Southampton

categories can be unhelpful, if you don’t know it (LOC!) hard to see relate articles.

solution, unified categories across repositories. Automated.

aggregated data from oai, then got more from external sources.

eg.
instution name from whois, decade from date.

tf-idf, algorithm for categories (?)

mSpace, ecs project.

richtags.org, example of above, with records from various universities.

categories come from dmoz

or08: session1 (part 2)

again unedited/checked:

I’m tagging with or08:

http://blogsearch.google.com/blogsearch?hl=en&q=or08&ie=UTF-8&scoring=d
http://technorati.com/search/or08?authority=a4&language=en

On the margins of scholarship

Richard Davis, Uni London Computer Centre

flickr, good example of an online document repository

“flickr for eprints”
1 – the data i enter to be used by other applications
2 – rss feeds i can use elsewhere
3 – clickable keywords, leading to similar articles in other repositories.

linnean online

demo’d images in an eprints ir, allowed photos to be previewed on the same page, bookmarks, comments.

may scan in original comments.

sneep. social networking extensions for eprints.
jisc funded, tagging, bookmarking, open source, exploit eprint objects

about giving choice, let users find it, e.g. facebook app, if we make it will it will be useful for someone (in a way we can’t predict).

web2.0 is raising expectations of what websites should do.

or08: session1 part1

again, unedited or checked. draft notes:

ian mulvany -nature (who produce connotea) – speaker.
david kane – Waterford Institute of Technology

openid

semantic web, very useful to get data from, but hard to do and very few do it.

contributing: plan text easy, semantic web hard
data mining: semantic web easy, plain text hard.

talked about how nature are working on intergrating social tools with connotea.

“we want to connect repositories with connotea”.

connitea could act as an interchange with repositories.

showed how Waterford Institute of Technology catalogue uses tags etc, example of how social tools can be intergrated.

openid:
your signon can be a URI, like a URL or email address,
When you try to sign on with oenid, you are redirected to your openid provider (eg yahoo). Yours details
are not shared with the site you are trying to access, just keys.

security risk, one website (your open id provider) is the key to access to your details on many websites, i.e. phishing risk. something to be aware of.

of (?) allows one site to access things on another site (eg doffler access your photos on flickr) without you having to give the first site your login details for the second.

connotea now supports open id. think it is the future.

or08: Open Repositories 2008

I’m at Open Repositories 2008.

Got in to southampton central at 8:15am, for a 9 start, thought i had loads of time but wait for bus, journey, and wondering around campus only just got here for 9. so did just about everyone else, so not username and password allocated until coffee time :( [but now online, hence you seeing this!]

draft notes from the first session, unedited or checked.
peter murray-rust
repositories data

Believes data is the most important thing for scientists, as opposed to open access final full text.

“PDF destroys information” – pdf destroys information, Word contains data which pdf just losses, word files (and latex, xml) are useful for sciencetists as they can reuse the data, formula, metadata etc contained within, which is lost as a pdf file.

academic theses are one of the most important thing for institutions/researchers. electronic theses are going to be very powerful.

technical problems slowed the talk down.

showed pdb repository, protein data range, going since the 70s. showed rsearch he *put* in to the *repository* while working at glaxo.

message is that scientists are already putting stuff in repositories.

crystaleye, built/started by a postgrad. now has over 100,000 crystal structures. harvests from those that release their crystallography (acs, rcs, etc). links to paper via doi.

scientists will not put things in to repositories (presumes he means articles based IRs, as he has just been describing how scientists do!).

OSCAR text extraction. showed example of cutting the text of a PDF doc in to OSCAR, it produced a table of formula that were contained within the text.

Royal Society Chemistry: PROSPECT , semantic markup of papers.
SPRECTRa (cam/imperial), how can we capture data as part of the academic process

“do not try to invent electronic notebooks”, success rate approx zero. i.e. don’t try and make capturing data the integral part of their workflow.

No one knows when their paper gets published.

“get at the authoring process”, that is the key.