Graphing Shakespeare

Today I came across a lovely project from JSTOR & the Folger Library – a set of Shakespeare’s plays, each line annotated by the number of times it is cited/discussed by articles within JSTOR.

“This is awesome”, I thought, “I wonder what happens if you graph it?”

So, without further ado, here’s the “JSTOR citation intensity” for three arbitrarily selected plays:

Blue is numbers of citations per line; red is no citations. In no particular order, a few things that immediately jumped out at me –

  • basically no-one seems to care about the late middle – the end of Act 2 and the start of Act 3 – of A Midsummer Night’s Dream;
  • “… a tale / told by an idiot, full of sound and fury, / signifying nothing” (Macbeth, 5.5) is apparently more popular than anything else in these three plays;
  • Othello has far fewer “very popular” lines than the other two.

Macbeth has the most popular bits, and is also the most densely cited – only 25.1% of its lines were never cited, against 30.3% in Othello and 36.9% in A Midsummer Night’s Dream.

I have no idea if these are actually interesting thoughts – my academic engagement with Shakespeare more or less reached its high-water mark sixteen years ago! – but I liked them…


How to generate these numbers? Copy-paste the page into a blank text file (text), then use the following bash command to clean it all up –

grep "FTLN " text | sed 's/^.*FTLN/FTLN/g' | cut -b 10- | sed 's/[A-Z]/ /g' | cut -f 1 -d " " | sed 's/text//g' > numberedextracts

Paste into a spreadsheet against a column numbered 1-4000 or so, and graph away…