Marking authorship in texts

December 27th, 2012 by

While writing something about Wikipedia, and talking about the idea of tracable attribution of text, I’ve been thinking of ways in which works with multiple discrete authors have displayed the different contributions of those authors.

At one extreme, there’s a fully “collaborative” work – no-one makes a distinction between the two authors, and while they’re named on the title page the writing is implicitly attributed to both. At the other extreme, we have individual chapters or articles – A writes chapter 1, B writes chapter 2, etc., and they may never have known of the other contributors.

In the middle, there’s cases where the work is broadly collaborative but with individual elements – the main text is jointly written, but particular contributors sign their own footnotes, sidebar sections, forewords, appendices, etc.

The one that interests me, though, is something I saw in I.S. Shklovsky’s Intelligent Life in the Universe when I read it as a student – I seem to have lost my copy in the intervening ten years, so this is from memory.

The book was originally published in the USSR in the early 1960s, and translated and expanded in English with the aid of Carl Sagan later in the decade. The original text was updated by Sagan, who also added several new chapters; the two then shared drafts, editing “each other’s” sections. Given the political climate, however, they were keen to avoid claiming to be in agreement on some sensitive topics, and so they experimented with explicitly marking the appearance of a single voice in the text itself.

In the end, the result ran something like:

Lorem ipsum dolor sit amet, consectetur adipisici elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. ▲Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.▼ △Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.▽

Unmarked text was jointly written; black triangles marked remarks by one author, and white triangles by another. (At at least one point, delightfully, they started arguing.)

So, the question: was this something common in the period that I’ve just never noticed elsewhere? Is there a name for it? What other novel ways of marking authorship have been used?

Letters from the West

November 11th, 2012 by

The British Library has recently released the first tranche of some material it digitised as part of the Europeana 1914-1918 program. Most of this first installment involves papers from the India Office Records, which (for various reasons) ended up being transferred from the FCO to the British Library rather than the National Archives. As the Indian Government was responsible for India’s participation in the war, they include all sorts of unexpected primary sources rather than the more usual printed histories – official reports, intelligence briefings, policy papers, etc.

But the most interesting, by far, are the “Reports of the Censor of Indian Mails in France”. The system of censorship in force at the time had two goals – the first was the most obvious, to prevent the transmission of negative rumours or sensitive information, either by obscuring the comments or by returning the letters unsent. The second was more subtle; the censors who were reading the letters used them to prepare reports on morale, the reaction of front-line soldiers to news, and the like. Inbound mail was also censored, with much the same effect.

The Indian reports contained a brief summary of themes and comments in the censored mail for the period, along with a selection of translated extracts giving the name of the sender and recipient, and a note of the language they had written in. The originals weren’t kept, and the chances are that very few survive anywhere.

Many letters deal with the fighting, with reports of the war. Others are simply slices of life, reports of the strange world far from home:

I had an opportunity of seeing London. It is an enormous city. It took about an hour and a half for the train to go through the city without stopping. The buildings are high, the streets are very clean. There are many big factories. Many kinds of gardens, fields without any walls, short cows with long hair and short horns are some of the objects that came to my notice. (“a Mahratta Brahmin”, 21/1/15)

No one has any clue as to the language of this place. Even the British soldiers do not understand it. They call milk “doolee” and water “deolo” [du lait, de l'eau]. There is much comeliness in this country. The people dress themselves like the English. Of black men they think a great deal. No one keeps “parda” as we do. The country is a very open one. The ladies shake hands freely. They are not bashful about this. They do as the English do. (“X.Y., a wounded Sikh”, 27/1/15)

We enjoyed the Saloono festival as best as it could be enjoyed by a foreigner in a distant country far from home. Here we managed to get camphor, sandal wood and other necessities of “Hawan” which was done with Vedic mantras in France, which the French people might never have expected. We also get “saimis” here, they are manufactured in Paris as well as in Italy and are sold in small packets. (Ram Seran Das, August 1915)

The French language:
Kya, tum mere sath aoge? = Walé wo wené éwac má?
Tumhara ghar kidhar hai = U é watr mézon?
Main tum ko pyar karta hun = Y ém wu boku. (Jemadar Sohbat Khan, 29/8/15)

When this letter was written we four, viz, Gul Din, Gul Shah, Rakib Shah, and I were sitting under a tree, eating apples and pears and had made a pipe out of an empty shell-case and were smoking, with the pipe standing in front of us. (Jemadar Zar Gir, 57th Rifles, 30/8/15)

In this country rain falls every day. The country is cold and abounds in fruit. (Sowar Sharif Khan, 13/9/15)

As to your request to send you a copy of the Qu’ran, I have already written and told you that I cannot get one here. What is the use of repeating it? If I could get one here, I would send it. You say the Qu’ran can be got in London, but London is 52 miles from here [Brighton] and we do not go there. (Khadim Ali Khan, 17/10/15)

These are the result of skimming two or three volumes; there’s a wealth of social history buried in these papers, and it would really reward some intensive reading.

They’re all listed through the Digitised Manuscripts interface, which is a little tricky to use; for reference, here’s a full index of the digitised papers by date covered:

The origins of “scientists”

October 16th, 2012 by

(Hello all! I haven’t touched this blog in months. I really should post an update soon…)

So, today is Ada Lovelace Day. I’m working on preparing some material for the Royal Society event we’re running on Friday (more of which anon), and looking at Orlando to find what content is in there.

To my surprise, for Mary Somerville, it notes:

March 1834: Mathematician William Whewell’s anonymous assessment of On the Connexion of the Physical Sciences by MS in the Quarterly Review took up the question of gender difference (and proposed the adoption of a new word, ‘scientist’). This word, which Whewell had coined in a talk in 1833, he now proposed in print as necessary to embrace all enquirers into different aspects of the natural world.

Well, that was an unexpected footnote. The word “scientist” first appeared in print in response to a review article by a woman writing to argue for a uniform model of the natural sciences.

scientist, n. 1. A person with expert knowledge of a science; a person using scientific methods. [citations:] 1834 Q. Rev. LI. 59.

The encyclopedia anyone can [be told to] edit

February 10th, 2012 by

A moment of amusement, from the (thankfully) long-distant past:

The Great Soviet Encyclopedia, which contains more than 100,000 entries and fills fifty-one volumes, includes some distortions so flamboyant as to be beyond belief. These are an old story. But such distortions have importance [...]

Almost everyone has heard about what happened to Beria in the Encyclopedia. After his liquidation, subscribers were notified, with full instructions, that they should snip out the article about him and insert in its place substitute articles which were duly enclosed, about the Bering Strait and an obscure eighteenth-century statesman named Berholtz. These were the best available substitutes beginning with ‘Ber’. During Stalin’s day when the party line changed on some matter so important that the Encyclopedia itself had to be changed, subscribers were obliged to turn in the volume affected to the party secretary; it was pulped and a new whole volume, cut and patched, was then sent out to the subscriber. Nowadays the reader is allowed to keep the book, and trusted to make the proper emendation himself. Progress!

Another person ‘expelled’ from the Encyclopedia was a Chinese Communist leader, Kao Kang. To replace him, a substitute page went out dealing with a city in Tibet. [...] In their haste to make the revision, the editors overlooked the fact that the same Tibetan city also appeared elsewhere in the Encyclopaedia, spelled differently.

– John Gunther, Inside Russia Today (Penguin, 1964).


February 10th, 2012 by

Some photographs from the snowfall in Cambridge earlier in the week:

Cam in snow

Towpath at night

Snow on branches

Pots in snow

His and hers … cameras?

November 24th, 2011 by

From’s camera section:

His and Hers ... cameras

…yeah. “Gifts for Her, Gifts for Him”. It is apparently now a useful commercial approach to gender cameras. (Interestingly, this is the only part of the “Electronics” section which has his-and-hers gift recommendations – I wonder why…)

Note that one camera, the Olympus XZ-1, is even on both lists. For women, it comes in white at £289.99, and for men it comes in black at £311.08. I don’t even want to know the logic behind that one.

I really should have thought of this earlier

November 23rd, 2011 by

Every now and again, I find myself with a pile of telephoto shots of something which was very hard to focus on properly, where I want to select the best few images and crop them for display. If I’ve made a hundred images, this can get very tedious – I have to manually zoom in on each one to see how sharp it is before comparing it to the next.

Tedious, repetitive, tasks. Surely, this is something a computer can do for me? Lo and behold, imagemagick saves the day…

convert -crop 1024x768+1632+1040 *.JPG -set filename:f 'crop_%t.%e' +adjoin '%[filename:f]'

..takes a series of 4288×2848 pictures, crops out the centre 1024×768, and drops this into a seperate file called crop_FILENAME. Skimming through these is far quicker…

I know, I know, trivial solutions. But it saves me a lot of time. And as a result:



…pictures of the woodpecker outside my living-room window, shot with a D90 and an old manually focused f/5.5 300m lens.

It works! I had almost two hundred frames to run through to find these (which may explain why they waited a month and a half for me to get around to it…)

Moments of peace

November 11th, 2011 by

We think of the Armistice as being a moment of flags, of applause, of music in the silent air. But, for many, it was just a quiet morning; millions of men, sitting in the dust and the frost, looking around them and wondering what to do next. An eyewitness:

November 11th.—There had been so much talk of an armistice that a Brigade message in the morning telling us of its having been signed at 8 o’clock, and that hostilities were to cease at 11, fell somewhat flat. The event was anticlimax relieved by some spasmodic cheering when the news got about, by a general atmosphere of ‘slacking off for the day’, and by the notes of a lively band in the late afternoon. The men betook themselves to their own devices. There was a voluntary Service of Thanksgiving in the cinema which the Germans had built; the spacious building was quite full. [...] ‘To me the most remarkable feature of that day and night was the uncanny silence that prevailed. No rumbling of guns, no staccato of machine-guns, nor did the roar of exploding dumps break into the night as it had so often done. The War was over.

November 12th.—Baths were a first concern.

— The War The Infantry Knew, 1914-1919, ed. Capt. J.C. Dunn.

JSTOR: where does your money go?

July 22nd, 2011 by

Writing some comments elsewhere about the recent events involving JSTOR, I commented something along the lines of – well, they’re a nonprofit organization unlike most journal publishers. Then, it occured to me, they say that but they’re remarkably reticient. What sort of nonprofit? Where does their money go? After all, the fees paid by member organizations can’t all go on servers; either there’s an endowment being built up to support the work (which would actually be a pretty smart move), or the publishers aren’t doing badly out of it.

So, let us dig a little. Who are JSTOR? How does their money flow work? Their site tells us:

JSTOR is part of ITHAKA, a not-for-profit organization helping the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways.
©2000-2011 ITHAKA. All Rights Reserved. JSTOR®, the JSTOR logo, and ITHAKA® are registered trademarks of ITHAKA.

Okay, so, we have a name. Their About pages don’t give much more information; no details on who exactly this “non-profit organization” is. No annual report, of course, god forbid. They do give a contact address, in New York – on Fifth Avenue, in fact, very fancy – and so the obvious guess is that they’re a New York corporation.

And, lo and behold, they are. “Ithaka Harbors, Inc.”. They changed their name when the two amalgamated in 2009. The older iteration of Ithaka can be found as a Delaware corporation operating in New York. Confusingly, JSTOR remained in existence, absorbed Ithaka, and changed its name.

A little more digging turns up the current Form 990 for the merged organization (and some older ones for JSTOR alone) here. It does indeed seem to have 501(c)(3) tax-exempt status, though they’re not very helpful about letting us find the paperwork.

Well, we have it now. JSTOR/Ithaka turned over sixty million dollars in 2009, and employed 211 people. The 2007 & 08 reports both give around $45m in turnover; let’s look at 2008, to strip out the effect of the amalgamation so that we’re only looking at the “JSTOR division”.

To briefly explain the charging, first, when an organisation joins JSTOR it pays an upfront capital sum (the ACF) and then an annual subscription (AAF); the general idea is that the ACF pays for the cost of building the archive and the AAF pays for the actual day-to-day service. Poking around the various fees pages suggests the ACF varies wildly by institution and by which content you’re taking, but an average of double the annual fee seems plausible.

The income breakdown, from a total of $43.5m – $8.6m in Archive Capital Fees, $30.3m in Annual Access Fees, $1.8m in Service Revenue. “Service revenue” is unclear. Buried down in section 11, meanwhile, is the intriguing “miscellaneous revenue”; $133k in publishers fees, $35k in remote session fees, $145k in pay-per-view. Other revenue was then covered by a loss of a third of a million, which is later explained as a currency loss – presumably the vagaries of foreign exchange in a volatile year.

The next section lists expenses of “FEES AND PUBLISHERS PAYMENTS”, $8,358,557, of which $8,242,126 is attributable to program costs rather than management overhead. Journal scanning amounts for about three million – though this is low, it was eleven million in 27 and five in 2009 – with another five million on administrative costs & travel, three million on IT, eleven and a half million on salaries and staff costs. A million went to “old” Ithaka in grants, a million was written off as depreciation, a million on “occupancy” (rent?), and then some small bits of change like conference costs. Overall, an eight-million dollar surplus, but the next year was a deficit; the fluctuations of scanning charges probably come into play here.

The payroll covers 113 staff, of whom 12 seem to be listed as officers, directors, etc. The senior staff average a salary of ~$155k, with the ED paid $300k, while the other staff average about $67k.

So, some interesting points.

  • The figure of $145k for individual articles is definitely interesting – only 0.35% of JSTOR’s revenue came from pay-per-view cases? This is vastly lower than I expected; quite possibly the prices are so high (and JSTOR access so common, academically) that very few people are willing to pay and unable to circumvent it via a friend. The estimate quoted is $19/article as an average – so perhaps only seven and a half thousand articles over the year?

  • Scanning averages about six million dollars a year in 2007-9. The Archive Capital Fee averages about eight and a half. There’s a bit of a mismatch here, but it could be they compare more closely over a longer timeframe, or that this is building a surplus for future work. They’re reasonably close, at least.
  • Comparing the ACF to the AAF, estimating one to be twice the other for any given institution, we can get a proxy for what proportion of income is new – it looks like ~15% in 2008/9. There’s a corresponding growth in overall income (it’s masked by a sharp drop in investment income, which is only $2.5m in 2008, a third of what it was in 2007) which would seem to bear out this figure.

So, overall…

The once-off capital fees charged by JSTOR look reasonable for the ongoing costs of actually digitising the documents. After that, about 30% of the annual fee is payments to the publishers, with the other 70% going on overhead. Of that overhead, 10% is directly running the servers, almost 40% staffing, and the remaining 20% various administrative costs; I am no expert in the field, but the salaries paid do seem quite high (and Manhattan offices aren’t cheap, either).

So if your library pays a $10,000 ongoing subscription, that’s effectively $3,000 direct to the publishers, $1,000 on servers, and $6,000 on people to feed and water those servers (or manage those people, etc.). It would be very interesting to know how those publisher payments break down – but, equally, it would be interesting to know how much of that 60% is actually essential for running the service.

Crustacean crossings

June 27th, 2011 by

Things you never quite expect to see in suburban English cities: crayfish carefully picking their way across the road.




The first photo feels like I should edit in some 5mm-tall people fleeing the monster.

Photography was suspended briefly for a car to drive over it. (Literally: the wheels passed several feet to each side) The driver couldn’t see what was in the road, but guessed we were photographing something small and fragile, and looked at us with a very guilty expression as she passed…