Archive for August, 2016

History of Parliament and Wikidata – the first round complete

Sunday, August 14th, 2016

Back in January, I wrote up some things I was aiming to do this year, including:

Firstly, I’d like to clear off the History of Parliament work on Wikidata. I haven’t really written this up yet (maybe that’s step 1.1) but, in short, I’m trying to get every MP in the History of Parliament database listed and crossreferenced in Wikidata. At the moment, we have around 5200 of them listed, out of a total of 22200 – so we’re getting there. (Raw data here.) Finding the next couple of thousand who’re listed, and mass-creating the others, is definitely an achievable task.

Well, seven months later, here’s where it stands:

  • 9,372 of a total 21,400 (43.7%) of History of Parliament entries been matched to records for people in Wikidata.
  • These 9,372 entries represent 7,257 people – 80 have entries in three HoP volumes, and 1,964 in two volumes. (This suggests that, when complete, we will have about ~16,500 people for those initial 21,400 entries – so maybe we’re actually over half-way there).
  • These are crossreferenced to a lot of other identifiers. 1,937 of our 7,257 people (26.7%) are in the Oxford Dictionary of National Biography, 1,088 (15%) are in the National Portrait Gallery database, and 2,256 (31.1%) are linked to their speeches in the digital edition of Hansard. There is a report generated each night crosslinking various interesting identifiers.
  • Every MP in the 1820-32 volume (1,367 of them) is now linked and identified, and the 1790-1820 volume is now around 85% complete. (This explains the high showing for Hansard, which covers 1805 onwards)
  • The metadata for these is still limited – a lot more importing work to do – but in some cases pretty decent; 94% of the 1820-32 entries have a date of death, for example.

Of course, there’s a lot more still to do – more metadata to add, more linkages to make, and so on. It still does not have any reasonable data linking MPs to constituencies, which is a major gap (but perhaps one that can be filled semi-automatically using the HoP/Hansard links and a clever script).

But as a proof of concept, I’m very happy with it. Here’s some queries playing with the (1820-32) data:

  • There are 990 MPs with an article about them in at least one language/WM project. Strikingly, ten of these don’t have an English Wikipedia article (yet). The most heavily written-about MP is – to my surprise – David Ricardo, with articles in 67 Wikipedias. (The next three are Peel, Palmerston, and Edward Bulwer-Lytton).
  • 303 of the 1,367 MPs (22.1%) have a recorded link to at least one other person in Wikidata by a close family relationship (parent, child, spouse, sibling) – there are 803 links, to 547 unique people – 108 of whom are also in the 1820-32 MPs list, and 439 of whom are from elsewhere in Wikidata. (I expect this number to rise dramatically as more metadata goes in).
  • The longest-surviving pre-Reform MP (of the 94% indexed by deathdate, anyway) was John Savile, later Earl of Mexborough, who made it to August 1899…
  • Of the 360 with a place of education listed, the most common is Eton (104), closely followed by Christ Church, Oxford (97) – there is, of course, substantial overlap between them. It’s impressive to see just how far we’ve come. No-one would ever expect to see anything like that for Parliament today, would we.
  • Of the 1,185 who’ve had first name indexed by Wikidata so far, the most popular is John (14.4%), then William (11.5%), Charles (7.5%), George (7.4%), and Henry (7.2%):

  • A map of the (currently) 154 MPs whose place of death has been imported:

All these are of course provisional, but it makes me feel I’m definitely on the right track!


So, you may be asking, what can I do to help? Why, thankyou, that’s very kind…

  • First of all, this is the master list, updated every night, of as-yet-unmatched HoP entries. Grab one, load it up, search Wikidata for a match, and add it (property P1614). Bang, one more down, and we’re 0.01% closer to completion…
  • It’s not there? (About half to two thirds probably won’t be). You can create an item manually, or you can set it aside to create a batch of them later. I wrote a fairly basic bash script to take a spreadsheet of HoP identifiers and basic metadata and prepare it for bulk-item-creation on Wikidata.
  • Or you could help sanitise some of the metadata – here’s some interesting edge cases:
    • This list is ~680 items who probably have a death date (the HoP slug ends in a number), but who don’t currently have one in Wikidata.
    • This list is ~540 people who are titled “Honourable” – and so are almost certainly the sons of noblemen, themselves likely to be in Wikidata – but who don’t have a link to their father. This list is the same, but for “Lord”, and this list has all the apparently fatherless men who were the 2nd through 9th holders of a title…