Crime statistics

A couple of interesting blog posts on the BBC – part 1, part 2 – about a recent set of crime statistics publicised by the Conservatives.

The basic gist of the Conservative claim is that violent crime is vastly increased over the past decade; the basic problem is that the method of recording violent crime changed in the middle of the period, to a much more “permissive” approach, where police were obliged to record a complaint rather than dismissing it. Which, unsurprisingly, tends to lead to a lot more reported crime, without actually saying anything about the underlying crime rates.

I suppose in an ideal world Labour would be running a campaign of “Do you really want to be governed by people who can’t read printed warnings on graphs?”, but sadly all we’ll get is a bit of he-said-she-said over the next two weeks and a few more people will be left beliving that the country is a far scarier place now than it ever was.

Verified by Visa

Verified by Visa and MasterCard SecureCode: or, How Not to Design Authentication. [via]

This is a very interesting paper; it confirms most of the basic misgivings I’ve had about the 3D Secure model of online card approval. (Basically: it’s not that it’s inherently not very secure, although it is, it’s that it encourages people to be overly trusting of weird middleman attempts to get financial information. I mean… a frame pops up, which shows no obvious signs of whether or not it’s secure, coming from a domain which has no obvious connection to the card provider, registered in another country…)

Amazon and Macmillan

In an interesting move, Amazon (.com, anyway) recently pulled a large number of books published by Macmillan, or its imprints; this was a reaction to a dispute over how to establish the sale & distribution conditions for ebooks.

(Basically: two big players having a game of chicken, and someone is blinking a bit later than usual. It caused… some entirely justified outcry from the people caught in the middle.)

Charlie Stross has an interesting explanation about the two duelling models of the publishing supply chain here – basically, Amazon trying to grab a slice of the cake that previously went to the publishers.

Demographics in Wikipedia

There’s a lengthy internal debate going on in Wikipedia at the moment (see here, if you really want to look inside the sausage factory) about how best to deal with the perennial article of biographies of living people, of which there are about 400,000.

As an incidental detail to this, people have been examining the issue from all sorts of angles. One particularly striking graph that’s been floating around shows the number of articles marked as being born or died in any given year from the past century:


As the notes point out, we can see some interesting effects here. Firstly – and most obviously – is the “recentism”; people who are alive and active in the present era tend to be more likely to have articles written about them, so you get more very recent deaths than (say) people who died forty years ago. Likewise, you have a spike around the late 1970s / early 1980s of births of people who’re just coming to public attention – in other words, people in their early thirties or late twenties are more likely to have articles written about them.

If we look back with a longer-term perspective, we can see that the effects of what Wikipedia editors have chosen to write about diminish, and the effects of demographics become more obvious. There are, for example, suggestions of prominent blips in the deathrate during the First and Second World Wars, and what may be the post-war baby boom showing up in the late 1940s.

So, we can distinguish two effects; underlying demographics, and what people choose to write about.

(In case anyone is wondering: people younger than 25 drop off dramatically. The very youngest are less than a year old, and are invariably articles about a) heirs to a throne; b) notorious child-murder cases; c) particularly well-reported conjoined twins or other multiple births. By about the age of five you start getting a fair leavening of child actors and the odd prodigy.)

Someone then came up with this graph, which is the same dataset drawn from the French Wikipedia:


At a glance, they look quite similar, which tells us that the overall dynamic guiding article-writing is broadly the same in both cases. This doesn’t sound that drastic a change, but different language editions can vary quite dramatically in things like standards for what constitutes a reasonable topic, so it is useful to note. French has a more pronounced set of spikes in WWI, WWII, and the post-war baby boom, though, as well as a very distinctive lowering of the birthrate during WWI. These are really quite interesting, especially the latter one, because it suggests we’re seeing a different underlying dynamic. And the most likely underlying dynamic is, of course, that Francophones tend to prefer writing about Francophones, and Anglophones tend to prefer writing about Anglophones…

So, how does this compare in other languages? I took these two datasets, and then added Czech (which someone helpfully collected), German and Spanish. (The latter two mean we have four of the five biggest languages represented. I’d have liked to include Polish, but the data was not so easily accessible.) I then normalised it, so each year was a percentage of the average for that language for that century, and graphed them against each other:

What can we see from these? Overall, every project has basically the same approach to inclusion; ramping up steadily over time, a noticeable spike in people who died during WWII or in the past two decades, and a particular interest in people who are about thirty and in the public eye. There is one important exception to this last case – German, which has a flat birthrate from about 1940 onwards, and apparently no significant recentism in this regard. The same is true of Czech to a limited degree. (Anecdotally I believe the same may be true of Japanese, but I haven’t managed to gather the data yet)

The WWII death spike is remarkably prominent in German and Czech, moderately prominent in French, and apparent but less obvious in English and Spanish. This could be differential interest in military history, where biographies tend to have deaths clustered in wartime, but it also seems rational to assume this reflects something of the underlying language-biased data. More Central Europeans died in WWII than Western Europeans; proportionally fewer died in the Anglosphere because English-speaking civilian populations escaped the worst of it, and the Spanish-speaking world was mostly uninvolved. The deaths in WWI are a lot more tightly clustered, and it’s hard to determine anything for sure here.

The other obvious spike in deaths is very easy to understand from either interpretation of the reason; it’s in 1936, in Spanish, which coincides with the outbreak of the Civil War. Lots of people to write articles about, there, and people less likely to be noted outside of Spain itself.

I mentioned above that (older) birthrates are more likely to represent an underlying demographic reality than deathrates are; localised death rates could be altered by a set of editors who choose to write on specific themes. You’d only get a birthdate spike, it seems, if someone was explicitly choosing to write about people born in a specific period; it’s hard to imagine it from a historical perspective. Historically linked people are grouped by when they’re prominent and active, and that happens at a variable time in their lives, so someone specifically writing about a group of people is likely to “smear” out their birthdates in a wide distribution.

So, let’s look at the historic births graph and see if anything shows up there. German and French show very clear drops in the birth rate between 1914 and about 1920, round U-shaped falls. German appears to have a systemic advantage over the other projects in birthrate through the 1930s and 1940s, though as the data is normalised against an average this may be misleadingly inflated – it doesn’t have the post-1970 bulge most languages do. The very sharp drop in births in 1945 is definitely not an artefact, though; you can see it to a lesser degree in the other languages, except English, where it’s hardly outside normal variance.

So, there does seem to be a real effect here; both these phenomena seem predictable as real demographic events, and the difference between the languages is interpretable as different populations suffering different effects in these periods and being represented to different degrees in the selection of people by various projects.

The next step would be, I suppose, to compare those figures to known birth and death rates both globally and regionally over the period; this would let us estimate of the various degrees of “parochialism” involved in the various projects’ coverage of people, as well as the varying degrees of “recentness” which we’ve seen already. Any predictions?

On the love of books

A book of which I am greatly fond is the Philobiblon of Richard de Bury, an early fourteenth-century clergyman who was Bishop of Durham and Lord Chancellor under Edward III. For a text written by such an imposing figure, it is remarkably sweet; Philobiblon is literally “the love of books”. de Bury collected an immense amount of literature during his life – according to his biographer, “more than all the other English bishops put together”.

People, of course, found this a little odd, so he had to do something about it:

…we have resigned all thoughts of other earthly things, and have given ourselves up to a passion for acquiring books. That our intent and purpose, therefore, may be known to posterity as well as to our contemporaries, and that we may for ever stop the perverse tongues of gossipers as far as we are concerned, we have published a little treatise written in the lightest style of the moderns; for it is ridiculous to find a slight matter treated of in a pompous style. … And because it principally treats of the love of books, we have chosen, after the fashion of the ancient Romans, fondly to name it by a Greek word, Philobiblon.

The first section is devoted to explaining the importance of reading and learning, as it appeared to him.

Books delight us, when prosperity smiles upon us; they comfort us inseparably when stormy fortune frowns on us. They lend validity to human compacts, and no serious judgments are propounded without their help. Arts and sciences, all the advantages of which no mind can enumerate, consist in books. How highly must we estimate the wondrous power of books, since through them we survey the utmost bounds of the world and time, and contemplate the things that are as well as those that are not, as it were in the mirror of eternity. In books we climb mountains and scan the deepest gulfs of the abyss; in books we behold the finny tribes that may not exist outside their native waters, distinguish the properties of streams and springs and of various lands; from books we dig out gems and metals and the materials of every kind of mineral, and learn the virtues of herbs and trees and plants, and survey at will the whole progeny of Neptune, Ceres, and Pluto.

He explains the merits of reading:

…what pleasantness of teaching there is in books, how easy, how secret! How safely we lay bare the poverty of human ignorance to books without feeling any shame! They are masters who instruct us without rod or ferule, without angry words, without clothes or money. If you come to them they are not asleep; if you ask and inquire of them they do not withdraw themselves; they do not chide if you make mistakes; they do not laugh at you if you are ignorant.

and the terrible temptations of power:

…we were reported to burn with such desire for books, and especially old ones, that it was more easy for any man to gain our favour by means of books than of money. Wherefore, since supported by the goodness of the aforesaid prince of worthy memory, we were able to requite a man well or ill, to benefit or injure mightily great as well as small, there flowed in, instead of presents and guerdons, and instead of gifts and jewels, soiled tracts and battered codices, gladsome alike to our eye and heart.

After a while, it becomes apparent that what we’re reading is, in its way, a text on basic librarianship, filtered down from the fourteenth-century collector with an eye on the future.

He discusses collection management, explaining why he has collected books of poetry (because it is hard to understand the great authors if one cannot understand their allusions) and downplayed civil law; why he has preferred the classical authors but not neglected modern writings. He explains the need to provide standard reference works – Greek and Hebrew grammars, and perhaps even Arabic (for the “numerous astronomical treatises”); indeed, if they’re not available, to commission them and make them available.

He even deals with weeding and stock replacement, in a more direct way than we normally have to:

But because all the appliances of mortal men with the lapse of time suffer the decay of mortality, it is needful to replace the volumes that are worn out with age by fresh successors, that the perpetuity of which the individual is by its nature incapable may be secured to the species; and hence it is that the Preacher says: Of making many books there is no end.

But then we come to probably the best part: his lengthy rant near the end about The Scholars Of Today, and How They Are Just Really Vile.

But the race of scholars is commonly badly brought up, and unless they are bridled in by the rules of their elders they indulge in infinite puerilities. They behave with petulance, and are puffed up with presumption, judging of everything as if they were certain, though they are altogether inexperienced.

You may happen to see some headstrong youth lazily lounging over his studies, and when the winter’s frost is sharp, his nose running from the nipping cold drips down, nor does he think of wiping it with his pocket-handkerchief until he has bedewed the book before him with the ugly moisture. … But the handling of books is specially to be forbidden to those shameless youths, who as soon as they have learned to form the shapes of letters, straightway, if they have the opportunity, become unhappy commentators, and wherever they find an extra margin about the text, furnish it with monstrous alphabets, or if any other frivolity strikes their fancy, at once their pen begins to write it. There the Latinist and sophister and every unlearned writer tries the fitness of his pen, a practice that we have frequently seen injuring the usefulness and value of the most beautiful books.

Other hazards, apparently, included people filling books with pressed violets, dropping cheese into them, cutting the margins and flyleaves off to write letters on, laymen holding them upside down, children grubbying the illustrated capitals by touching them, and the vague horror of the “smutty scullion reeking from his stewpots”. (I have never had problems with people cutting the margins off, but every other one of these is familiar in some way…) Not quite the image of the silent, austere, medieval monastery we have in mind most of the time!

He then explains why he has collected these books – “to found in perpetual charity a Hall in the reverend university of Oxford, the chief nursing mother of all liberal arts, and to endow it with the necessary revenues, for the maintenance of a number of scholars” – and includes, presumably so it wouldn’t get misplaced, a copy of the charter for its library.

It’s quite an interesting set of rules, as a historical document; a committee of five was to run the library, three of whom could agree to lend out anything the library had a duplicate copy of, if they were given a pledge of equal value in response, and the borrower’s name was carefully written down. Once a year books were to be brought back so that they could be seen by the librarians, and they were not allowed to be taken outside of the city or its environs. And, every year, the librarians were to check every volume was accounted for… which, to my great amusement, is recommended to happen in the first week of July, high summer and – these days – prime stock-checking time.

Sadly, de Bury’s excessive collecting took its toll – he died in exceptional poverty, and his personal library was broken up and sold to pay his debts. The college at Oxford was formed – the plan completed by his successor – and lasted until the Reformation, when it ceased to exist; a small amount of its buildings are now absorbed into Trinity College, if you look carefully enough. His library never made it there, and its catalogue is now lost; we only know of two volumes from it, one at the Bodleian and one at the British Library.

The book survived, though; given its appeal, it’s not hard to see why. The full text was digitised by the University of Virginia Library or by Project Gutenberg, and is worth an hour’s reading; the edition I have, a lovely pocket-sized hardback from 1902, has also been digitised by the Internet Archive with page images, if you prefer that sort of thing.

Newspaper priorities

I’ve just dealt with a pile of today’s and yesterday’s newspapers.

The Guardian, the Times and the Independent, both days: large full-colour photograph of Haiti on the front page as the main headline story, four inside pages of coverage (six in today’s Independent, and a few more in the second section of today’s Guardian) plus a scattering of editorials or leader articles.

The Telegraph and the Financial Times, both days: Haiti prominent, but other headline stories as well. One or two inside pages; the Telegraph also runs a background feature on Haiti’s history.

And, then, the Daily Mail: two inside pages each day, no front-page mention. The stories that displaced it, you’ll be pleased to know, were Gary McKinnon getting judicial review, and some research on a possible Alzheimers test; the front-page photographs were of Kate McCann and Beyoncé.

I shouldn’t be surprised by this, but, really. The only element of it that doesn’t seem like self-parody is that neither story was about Labour incompetence causing a crash in house prices.

Google leaving China?

If this goes ahead it’ll be a pretty far-reaching move:

We launched in January 2006 in the belief that the benefits of increased access to information for people in China and a more open Internet outweighed our discomfort in agreeing to censor some results. …

These attacks and the surveillance they have uncovered–combined with the attempts over the past year to further limit free speech on the web–have led us to conclude that we should review the feasibility of our business operations in China. We have decided we are no longer willing to continue censoring our results on, and so over the next few weeks we will be discussing with the Chinese government the basis on which we could operate an unfiltered search engine within the law, if at all. We recognize that this may well mean having to shut down, and potentially our offices in China.

[Full post]

Google’s been pretty widely criticised for its Chinese operations in the past; this is certainly to their credit, though I suspect those who’re most opposed to them won’t see stopping as much of a penance for having done it to begin with. (It’ll also shake up the Chinese internet market a bit – Google apparently has a market share of a quarter to a third of all searches there)

What’s interesting about the post is what it doesn’t say. The attacks are “highly sophisticated and targeted”, and a primary goal was aimed at reading the mail of individual human rights activists, not something that you’d routinely aim for as part of corporate espionage. They’re pointedly not accusing the government, not in so many words; it’s just hanging there waiting to see who runs with it.

Four hundred years ago today

7th January, 1610. Galileo turned his telescope to the sky, and “I have seen Jupiter accompanied by three fixed stars…”.

Die itaque septima Ianuarii, instantis anni millesimi sexcentesimi decimi, hora sequentis noctis prima, cum cælestia sidera per Perspicillum spectarem, Iuppiter sese obviam fecit; cumque admodum excellens mihi parassem instrumentum (quod antea ob alterius organi debilitatem minime contigerat), tres illi adstare Stellulas, exiguas quidem, veruntamen clarissimas, cognovi; quæ, licet e numero inerrantium a me crederentur, nonnullam tamen intulerunt admirationem, eo quod secundum exactam lineam rectam atque Eclipticæ parallelam dispositæ videbantur, ac cæteris magnitudine paribus splendidiores.

(I am faintly pleased that I understand more than three words of that. Life’s simple pleasures.)

Here’s where it all began, in a way. Four centuries on, we can talk quickly and happily of three-hundred-mile sulphur plumes, ice-bound planetary oceans teeming with hypothetical life, moons larger than planets, or geological features a thousand miles across; we have photographed them from near and from far, mapped them to a point, studied and speculated for lifetimes. They allowed us, less than a century later, to work out a fundamental physical constant for the first time. We have even worked with them so closely that we have begun to worry about the risk of physically contaminating them, amazing as it sounds.

But they’re still four little dots in the sky, clustered around a bigger dot; people can still pick up binoculars for the first time, or a cheap child’s telescope, look up, and feel the same rush. This is the easiest way to see it; they’re the most available sign that the system works, that there are some kind of universal laws out there, and that we ourselves can stand still and watch them played out in front of our eyes, regular as clockwork. I remember years ago, clustered in a telescope dome in a night which felt almost as cold as this one, watching the three moons strung out around Jupiter, the blurry cloud bands of light and dark on its surface… and then a perfect black circle in the middle of the southern hemisphere, as Io swung between it and the sun, letting us see an eclipse from four hundred million miles away. (We were meant to be looking at Saturn, but you don’t forget that kind of opportunity. I still have the slip of paper with the picture in a box somewhere.)

There’s a good set of posts here on the discovery and its context; if you feel up to the Latin, there’s a transcription of the Sidereus Nuncius here, or a scanned copy here (the Jovian moons are from leaf 17 on). Note the diagrams.

Snow, snow, snow

It may not have escaped anyone’s notice that Oxford is under six inches or so of snow. I called into work this morning, to see what was happening, and got told “— isn’t coming in, nor is —. And neither are you.”

Well, I know better than when to argue. So, a day off to go photographing!







The real delight was Hinksey Lake, which was entirely iced (or at least slushed) over. Large amounts of it were covered in loose drifts of snow, with occasional duck-tracks; here and there were small craters where a duck had flown in and landed too heavily, or some snow had fallen in and broken through the crust.










The last couple make me think of photographs of an icy surface somewhere in the outer solar system; craters on Europa or Callisto, perhaps. (The one with a buoy, meanwhile, looks like an Antarctic research station seen from the air.)