SQL grouping mulitple values in to one SELECT field

SQL is one of those things I always need to understand more than I do. Sometimes odd little problems can seem impossible with my limited understanding of the language.

One such hit this me this morning. We wanted a report from our Institutional Repository (a online archive of the University’s research publications). The report was to include title, journal title, type (journal article, book, etc), author(s) and departments for each item in the report.

Anyone with a basic understanding of relational databases will be able to picture the main table of publications, with another table for publication authors – one record for each author of each publication, with the publication’s id number to link them to the item in question. The same set up for the departments associated with a particular item.

With basic SQL, it would be easy to return a list which included multiple entries for items with multi authors, i.e. an item with three authors would be repeated three times in the list, each showing a different author. An item with three authors and two departments would be repeated six times to cover all combinations of authors and departments. Not very desirable. We could also provide a list which only shows each item once (as required) but only show one author and department (ie by using GROUP BY and perhaps a function to select the first author). Again would be less than ideal.

A bit of googling showed creating an agregate function in postgress (we are using mysql), and this solution which seems to use a tempory table.

The solution in the end was much more simpler, and involves using a function called group_concat.

The final SQL looked like this:

SELECT a.eprintid, a.type, a.title, a.publication,
GROUP_CONCAT(c.creators_given, " ", c.creators_family) as Authors,
GROUP_CONCAT(d.depts) as Departments
FROM `archive` a, `archive_creators` c, `archive_depts` d
WHERE a.eprintid = c.eprintid
AND a.eprintid = d.eprintid
GROUP BY c.eprintid, d.eprintid

The two group_concats will produce a list of authors (and a list of departments) both comma separated. The author’s is a little more complex as the name is within two separate fields, though this isn’t a problem, we just need to specify both with a space in between to make them readable.