Projects and plans

15th January. A bit late for New Year’s resolutions, and I’m never much of a one for them anyway.

Still, it’s a good time to take stock. What am I hoping to achieve this year? I have omitted the personal aims, as they’re not of great interest to anyone who’s not me, but otherwise, hopefully without overcommitting myself…


2015 was pretty good. I planned a rather complex library move (twice, after the first time was delayed, which is a good way to learn from your mistakes without having actually had to commit them). Two weeks into the new year and ~400 metres of books shifted, it’s looking like it’s actually working, so let’s call that one a conditional success. First order of business: finish it off. And write up some notes on it so that others may learn to not do as I have done.

Secondly, get something published again. I had my first ‘proper’ academic publication in late 2015, and though it’s on a topic that approximately three people care about, I’m still glad it’s done and out there. (I have something to point at next time I’m glibly assured “oh, that approach never happens any more”. This is a recurrent theme in discussions about scholarly publishing; but I digress.) I would recommend it to any academic librarian as an exercise in understanding what your researchers suffer.

(I have a couple of projects on the boil which I’d like to write up properly, of which more anon.)

Thirdly, finish putting together the papers from the 2014 Polar Libraries Colloquy. Call this a public admittance of dragging my heels about this.

Lastly, consider Chartership. I’ve avoided this for many years, seeing it as a rather daunting pile of paperwork, but it’s probably a sensible thing to think about.


Firstly, I’d like to clear off the History of Parliament work on Wikidata. I haven’t really written this up yet (maybe that’s step 1.1) but, in short, I’m trying to get every MP in the History of Parliament database listed and crossreferenced in Wikidata. At the moment, we have around 5200 of them listed, out of a total of 22200 – so we’re getting there. (Raw data here.) Finding the next couple of thousand who’re listed, and mass-creating the others, is definitely an achievable task.

Secondly, and building on this, I did some work in the autumn of 2015 on building a framework for linking EveryPolitician and Wikidata. I need to pick this back up and work out how we can best represent politicians in general – what are the best data structures for things like constituencies, parliamentary terms, parties?

This leads into the third project, which is the general use of Wikidata as a “biographical spine”. Charles Matthews, Magnus Manske, and I have been working on this for a couple of years, and it really is beginning to bear fruit. We’re working to pull together as many large biographical databases as possible, and have them talking to one another through Wikidata, so that we can start bringing data and links from one to the users of another. This certainly won’t ever be completed in 2015 – but it would be good to write some of it up in a single report so that it’s clear what we’re doing, and hopefully start advertising it to researchers who could benefit.

Fourthly (oh, goodness), the Oxford Dictionary of National Biography. This is a project I embarked on back in 2013; the goal is to get a reliable crossreference between Wikipedia/Wikidata and the ODNB – now complete, mainly thanks to Charles Matthews – and then to fix all the vague unhelpful “see DNB” Wikipedia citations into nicely formatted, linkable ones, which readers can actually benefit from. This second part is going to take a long time, but I’ve made some rudimentary attempts at auto-predicting the required citations to be fixed by hand, and hopefully we’ll get there in time.

Moving away from Wikidata, early last year I started on what has turned into the Birthdays Project – an attempt to study the way in which people misremembered their birthdays when they’re not well-documented. This is generally known and the basic result is kind of obvious, but it has only been (very cursorily) discussed in the academic literature before, and I don’t think anyone’s properly attacked it with substantial data, multiple cultural contexts, etc. I wrote up a few notes on this in early 2015 (part 1, part 2), but since then I’ve nailed down some more data, figured out a useful way of visualising it, and so on. No idea if it’s publishable per se, but it would be good to have it written up.

That… looks like a busy year ahead.

Finally, going places and doing things. I have a couple of long-awaited holidays planned, and some people I’m looking forward to seeing on them. I will be going to the Polar Libraries Colloquy in Alaska, but I won’t be going to Wikimania in June – I’ll be elsewhere, sadly. I’m sad to miss this year, as it looks to be an excellent event.

Most popular videos on Wikipedia, 2015

One of the big outstanding questions for many years with Wikipedia was the usage data of images. We had reasonably good data for article pageviews, but not for the usage of images – we had to come up with proxies like the number of times a page containing that image was loaded. This was good enough as it went, but didn’t (for example) count the usage of any files hotlinked elsewhere.

In 2015, we finally got the media-pageviews database up and running, which means we now have a year’s worth of data to look at. In December, someone produced an aggregated dataset of the year to date, covering video & audio files.

This lists some 540,000 files, viewed an aggregated total of 2,869 million times over about 340 days – equivalent to 3,080 million over a year. This covers use on Wikipedia, on other Wikimedia projects, and hotlinked by the web at large. (Note that while we’re historically mostly concerned with Wikipedia pageviews, almost all of these videos will be hosted on Commons.) The top thirty:

14436640 President Obama on Death of Osama bin Laden.ogv
10882048 Bombers of WW1.ogg
10675610 20090124 WeeklyAddress.ogv
10214121 Tanks of WWI.ogg
9922971 Robert J Flaherty – 1922 – Nanook Of The North (Nanuk El Esquimal).ogv
9272975 President Obama Makes a Statement on Iraq – 080714.ogg
7889086 Eurofighter 9803.ogg
7445910 SFP 186 – Flug ueber Berlin.ogv
7127611 Ward Cunningham, Inventor of the Wiki.webm
6870839 A11v 1092338.ogg
6865024 Ich bin ein Berliner Speech (June 26, 1963) John Fitzgerald Kennedy trimmed.theora.ogv
6759350 Editing Hoxne Hoard at the British Museum.ogv
6248188 Dubai’s Rapid Growth.ogv
6212227 Wikipedia Edit 2014.webm
6131081 Newman Laugh-O-Gram (1921).webm
6100278 Kennedy inauguration footage.ogg
5951903 Hiroshima Aftermath 1946 USAF Film.ogg
5902851 Wikimania – the Wikimentary.webm
5692587 Salt March.ogg
5679203 CITIZENFOUR (2014) trailer.webm
5534983 Reagan Space Shuttle Challenger Speech.ogv
5446316 Medical aspect, Hiroshima, Japan, 1946-03-23, 342-USAF-11034.ogv
5434404 Physical damage, blast effect, Hiroshima, 1946-03-13 ~ 1946-04-08, 342-USAF-11071.ogv
5232118 A Day with Thomas Edison (1922).webm
5168431 1965-02-08 Showdown in Vietnam.ogv
5090636 Moon transit of sun large.ogg
4996850 President Kennedy speech on the space effort at Rice University, September 12, 1962.ogg
4983430 Burj Dubai Evolution.ogv
4981183 Message to Scientology.ogv

(Full data is here; note that it’s a 17 MB TSV file)

It’s an interesting mix – and every one of the top 30 is a video, not an audio file. I’m not sure there’s a definite theme there – though “public domain history” does well – but it’d reward further investigation…