Canadian self-reported birthday data

February 22nd, 2015 by

In the last post, we saw strong evidence for a “memorable date” bias in self-reported birthday information among British men born in the late 19th century. In short, they were disproportionately likely to think they were born on an “important day” such as Christmas.

It would be great to compare it to other sources. However, finding a suitable dataset is challenging. We need a sample covering a large number of men, over several years, and which is unlikely to be cross-checked or drawn from official documentation such as birth certificates or parish registers. It has to explicitly list full birthdates (not just month or year)

WWI enlistment datasets are quite promising in this regard – lots of men, born about the same time, turning up and stating their details without particularly much of a reason to bias individual dates. The main British records have (famously) long since burned, but the Australian and Canadian records survive. Unfortunately, the Australian index does not include dates of birth, but the Canadian index does (at least, when known). So, does it tell us anything?

The index is available as a 770mb+ XML blob (oh, dear). Running this through xmllint produces a nicely formatted file with approximately 575,000 birthdays for 622,000 entries. It’s formatted in such a way as to imply there may be multiple birthdates listed for a single individual (presumably if there’s contradictory data?), but I couldn’t spot any cases. There’s also about ten thousand who don’t have nicely formatted dd/mm/yyyy entries; let’s omit those for now. Quick and dirt but probably representative.

And so…

There’s clearly a bit more seasonality here than in the British data (up in spring, down in winter), but also the same sort of unexpected one-day spikes and troughs. As this is quite rough, I haven’t corrected for seasonality, but we still see something interesting.

The highest ten days are: 25 December (1.96), 1 January (1.77), 17 March (1.56), 24 May (1.52), 1 May (1.38), 15 August (1.38), 12 July (1.36), 15 September (1.34), 15 March (1.3).

The lowest ten days are: 30 December (0.64), 30 January (0.74), 30 October (0.74), 30 July (0.75), 30 May (0.78), 13 November (0.78), 30 August (0.79), 26 November (0.80), 30 March (0.81), 12 December (0.81).

The same strong pattern for “memorable days” that we saw with the UK is visible in the top ten – Christmas, New Year, St. Patrick’s, Victoria Day, May Day, [nothing], 12 July, [nothing], [nothing].

Two of these are distinctively “Canadian” – both 24 May (the Queen’s birthday/Victoria Day) and 12 July (the Orange Order marches) are above average in the British data, but not as dramatically as they are here. Both appear to have been relatively more prominent in late-19th/early-20th century Canada than in the UK. Canada Day/Dominion Day (1 July) is above average but does not show up as sharply, possibly because it does not appear to have been widely celebrated until after WWI.

One new pattern is the appearance of the 15th of the month in the top 10. This was suggested as likely in the US life insurance analysis and I’m interested to see it showing up here. Another oddity is leap years – in the British data, 29 February was dramatically undercounted. In the Canadian data, it’s strongly overcounted – just not quite enough to get into the top ten. 28 February (1.28), 29 February (1.27) and 1 March (1.29) are all “memorable”. I don’t have an explanation for this but it does suggest an interesting story.

Looking at the lowest days, we see the same pattern of 30/xx dates being very badly represented – seven of the ten lowest dates are 30th of the month…. and all from days where there were 31 days in the month. This is exactly the same pattern we observed in UK data, and I just don’t have any convincing reason to guess why. The other three dates all fall in low-birthrate months,

So, in conclusion:

  • Both UK and Canadian data from WWI show a strong bias for people to self-report their birthday as a “memorable day”;
  • “Memorable” days are commonly a known and fixed festival, such as Christmas;
  • Overreporting of arbitrary numbers like the 15th of the month are more common in Canada (& possibly the US?) than the UK;
  • The UK and Canadian samples seem to treat 29 February very differently – Canadians overreport, British people underreport;
  • There is a strong bias against reporting the 30th of the month particularly in months with 31 days

Thoughts (or additional data sources) welcome.

Tags: , , , ,

2 Responses to “Canadian self-reported birthday data”

  1. Nicolas Bouliane Says:

    A word of warning about the Canadian Expeditionary Force data set: The dates it contains are not reliable, as their format is completely arbitrary. Moreover, many Canadians lied to get into the army, as mentioned by Library and Archives Canada.

    I’m working on the same data set since a while, too.

  2. Good job Says:

    Good post, keep

Leave a Reply