Wikidata, for those of you unfamiliar with it, is the backend we are developing for Wikipedia. At its simplest, it’s a spine linking together the same concept in different languages – so we can tell that a coronation in English matches Tacqoyma in Azeri or Коронація in Ukranian, or thirty-five other languages between. This all gets bundled up into a single data entry – the enigmatically named Q209715 – which then gets other properties attached. In this case, a coronation is a kind of (or subclass of, for you semanticians) “ceremony” (Q2627975), and is linked to a few external thesauruses. The system is fully multilingual, so we can express “coronation – subclass of – ceremony” in English as easily as “kroning – undergruppe af – ceremoni” in Danish.
So far, so good.
There has been a great deal of work around Wikipedia in recent years in connecting our rich-text articles to static authority control records – confirming that our George Washington is the same as the one the Library of Congress knows about. During 2012-13, these were ingested from Wikipedia into Wikidata, and as of a year ago we had identified around 420,000 Wikidata entities with authority control identifiers. Most of these were from VIAF, but around half had an identifier from the German GND database, another half from ISNI, and a little over a third LCCN identifiers. Many had all four (and more). We now support matching to a large number of library catalogue identifiers, but – speaking as a librarian – I’m aware this isn’t very exciting to anyone who doesn’t spend much of their time cataloguing…
So, the next phase was to move beyond simply “authority” identifiers and move to ones that actually provide content. The main project that I’ve been working on (along with Charles Matthews and Magnus Manske, with the help of Jo Payne at OUP) is matching Wikidata to the Oxford Dictionary of National Biography – Wikipedia authors tend to hold the ODNB in high regard, and many of our articles already use it as a reference work. We’re currently about three-quarters of the way through, having identified around 40,000 ODNB entries who have been clearly matched to a Wikidata entity, and the rest should be finished some time in 2015. (You can see the tool here, and how to use that will be a post for another day.) After that, I’ve been working on a project to make links between Wikidata and the History of Parliament (with the assistance of Matthew Kilburn and Paul Seaward) – looking forward to being able to announce some results from this soon.
What does this mean? Well, for a first step, it means we can start making better links to a valuable resource on a more organised basis – for example, Robin Owain and I recently deployed an experimental tool on the Welsh Wikipedia that will generate ODNB links at the end of any article on a relevant subject (see, eg, Dylan Thomas). It means we can start making the Wikisource edition of the (original) Dictionary of National Biography more visible. It means we can quickly generate worklists – you want suitable articles to work on? Well, we have all these interesting and undeniably notable biographies not yet covered in English (or Welsh, or German, or…)
For the ODNB, it opens up the potential for linking to other interesting datasets (and that without having to pass through wikidata – all this can be exported). At the moment, we can identify matches to twelve thousand ISNIs, twenty thousand VIAF identifiers, and – unexpectedly – a thousand entries in IMDb. (Ten of them are entries for “characters”, which opens up a marvellous conceptual can of worms, but let’s leave that aside…).
And for third parties? Well, this is where it gets interesting. If you have ODNB links in your dataset, we can generate Wikipedia entries (probably less valuable, but in oh so many languages). We can generate images for you – Wikidata knows about openly licensed portraits for 214,000 people. Or we can crosswalk to whatever other project we support – YourPaintings links, perhaps? We can match a thousand of those. It can go backwards – we can take your existing VIAF links and give you ODNB entries. (Cataloguers, take note.)
And, best of all, we can ingest that data – and once it’s in Wikidata, the next third party to come along can make the links directly to you, and every new dataset makes the existing ones more valuable. Right now, we have a lot of authority control data, but we’re lighter on serious content links. If you have a useful online project with permanent identifiers, and you’d like to start matching those up to Wikidata, please do get in touch – this is really exciting work and we’d love to work with anyone wanting to help take it forward.
Update: Here’s part 2: on how to use the mix-and-match tool.