<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Generalising &#187; wikipedia</title>
	<atom:link href="http://www.generalist.org.uk/blog/tags/wikipedia/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.generalist.org.uk/blog</link>
	<description>because we can&#039;t think of anything wittier</description>
	<lastBuildDate>Wed, 15 May 2013 23:13:27 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Wikipedians in Residence: a recap</title>
		<link>http://www.generalist.org.uk/blog/2013/wikipedians-in-residence-a-recap/</link>
		<comments>http://www.generalist.org.uk/blog/2013/wikipedians-in-residence-a-recap/#comments</comments>
		<pubDate>Wed, 24 Apr 2013 19:47:17 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Andrew]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://www.generalist.org.uk/blog/?p=1057</guid>
		<description><![CDATA[To my great surprise, I got named in a BBC story today. The article is about the upcoming Wikipedian in Residence at the National Library of Scotland; it&#8217;s really pleasing that as my own work at the British Library is coming to an end, there&#8217;ll be someone else taking up the work at an equally [...]]]></description>
				<content:encoded><![CDATA[<p>To my great surprise, I got <a href="http://www.bbc.co.uk/news/uk-scotland-22264118">named in a BBC story</a> today. The article is about the upcoming Wikipedian in Residence at the National Library of Scotland; it&#8217;s really pleasing that as my own work at the British Library is coming to an end, there&#8217;ll be someone else taking up the work at an equally interesting organisation.</p>
<p>NLS is just the tip of the iceberg, though. <a href="http://outreach.wikimedia.org/wiki/Wikipedian_in_Residence">Here</a> is a list of all the current and past Wikimedians in Residence, and below is a list of everyone who is currently looking for a Wikipedian (or Wikimedian) in Residence that I&#8217;ve heard about &#8211; please let me know if I&#8217;ve missed any!</p>
<ul>
<li><b><a href="http://blog.wikimedia.org.uk/2013/04/1533/">The National Library of Scotland</a></b> (paid)</br><br />
Four-month residency working with the National Library of Scotland in Edinburgh to help disseminate the Library&#8217;s content to Wikipedia, and work with librarians to help encourage understanding and use of the projects.</p>
<li><b><a href="http://www.jisc.ac.uk/fundingopportunities/funding_calls/2013/04/wikimedia.aspx">JISC &#8220;Wikimedia Ambassador&#8221; residency</b></a> (paid)</br><br />
Nine-month program looking to build skills and expertise engaging with Wikimedia projects among JISC-funded research programs, and to help disseminate knowledge from that research. (In many ways, this fits very neatly with some of the work I was doing for AHRC&#8230;).</p>
<li><b><a href="http://www.wikimedia.de/wiki/WiR_2013_(m/w)">ZDF Television (Germany)</b></a> (paid)</br><br />
Short-term program (until mid-October) to liaise between the organisation and Wikipedia contributors on &#8211; I love this &#8211; a project to fact-check political claims during the months before the 2013 federal election in September. </p>
<li><b><a href="https://en.wikipedia.org/wiki/Wikipedia:GLAM/SI/WIR">Smithsonian Institution</b></a> (paid)</br><br />
Internship (with stipend), aiming to build on and sustain the existing partnership programs with the Smithsonian.</p>
<li><b><a href="http://lists.wikimedia.org/pipermail/glam/2013-April/000389.html">Swiss Federal Archives</b></a> (paid)</br><br />
Three to six month program with a particular focus on digitising WWI-related photographs.</p>
<li><b><a href="http://metro.org/articles/open-data-fellowship-accepting-applications/">METRO (New York) Open Data Fellowship</b></a> (paid)</br><br />
An interesting two-track program; an eight-week fellowship working as a Wikipedian in Residence for a consortium of cultural institutions, and also as an advisor on open data/licensing/etc. US only, students preferred.</p>
<li><b><a href="http://lists.wikimedia.org/pipermail/libraries/2013-April/000131.html">Olympia Timberland Library (US)</a></b> (volunteer)</br><br />
The library is looking for a &#8220;Wiki-Ninja&#8221; (now there&#8217;s something to put on a job description) to help build and sustain a local-history editing program among the local community.
</ul>
<p>And, of course, there&#8217;s plenty more institutions which are setting up similar volunteer programs without going through a formal recruitment process &#8211; it only tends to be needed when money gets involved. If you&#8217;re a Wikipedia volunteer thinking of what you could do with a local institution, now is as good a time as any to approach them&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.generalist.org.uk/blog/2013/wikipedians-in-residence-a-recap/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How many hours?</title>
		<link>http://www.generalist.org.uk/blog/2013/how-many-hours/</link>
		<comments>http://www.generalist.org.uk/blog/2013/how-many-hours/#comments</comments>
		<pubDate>Tue, 19 Feb 2013 10:55:07 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Andrew]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://www.generalist.org.uk/blog/?p=1048</guid>
		<description><![CDATA[A newly released (and very interesting) paper &#8211; Using Edit Sessions to Measure Participation in Wikipedia &#8211; looks at estimating the level of participation in Wikipedia using an estimate of time spent contributing, rather than previous studies based on raw edit numbers, etc. Their headline figure is an estimate that all of Wikipedia, up to [...]]]></description>
				<content:encoded><![CDATA[<p>A newly released (and very interesting) paper &#8211; <a href="http://www-users.cs.umn.edu/~halfak/publications/Using_Edit_Sessions_to_Measure_Participation_in_Wikipedia/geiger13using-preprint.pdf"><i>Using Edit Sessions to Measure Participation in Wikipedia</i></a> &#8211; looks at estimating the level of participation in Wikipedia using an estimate of <i>time</i> spent contributing, rather than previous studies based on raw edit numbers, etc.</p>
<p>Their headline figure is an estimate that all of Wikipedia, up to an unspecified date in 2012, represents &#8220;a total of 102,673,683 total labor-hours&#8221;. </p>
<p>As David White noticed, this is many lifetimes of labour:</p>
<blockquote class="twitter-tweet"><p>Circa 168 life-times RT @<a href="https://twitter.com/wikiresearch">wikiresearch</a> A total of 102,673,683 hours were spent editing Wikipedia -all languages- until 2012</p>
<p>&mdash; David White (@daveowhite) <a href="https://twitter.com/daveowhite/status/303813045044527104">February 19, 2013</a></p></blockquote>
<p><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>Some other ways to visualise these numbers:</p>
<ul>
<li>Three years work by <a href="http://en.wikipedia.org/wiki/List_of_universities_in_the_United_Kingdom_by_size_of_student_population">a mid-sized university</a> of around 15,000 people (assuming a working day of eight hours and 250 working days in the year)</p>
<li>The users of the British Library reading rooms (capacity <a href="http://www.bl.uk/about/annual/2009to2010/performance/performancestats.html">~1500</a>) working for thirty-three years.
<li>One thousand &#8220;productive lives&#8221; (days as above, over fifty years, rather than 24/7 cradle to grave)</ul>
<p>Or, in a sharp demonstration of the &#8220;<a href="http://en.wikipedia.org/wiki/Cognitive_Surplus">cognitive surplus</a>&#8221; theory:</p>
<ul>
<li>Seven minutes writing time each from the <a href="http://uk.reuters.com/article/2012/08/07/uk-oly-ratings-day-idUKBRE8760V820120807?feedType=RSS&#038;feedName=sportsNews">global audience</a> of the 2012 Olympic opening ceremony.</ul>
<p>All of Wikipedia, in all its languages, could have been written in the time it took the world to make a cup of tea during the speeches.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.generalist.org.uk/blog/2013/how-many-hours/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Wikipedia and the British Library</title>
		<link>http://www.generalist.org.uk/blog/2013/wikipedia-and-the-british-library/</link>
		<comments>http://www.generalist.org.uk/blog/2013/wikipedia-and-the-british-library/#comments</comments>
		<pubDate>Thu, 14 Feb 2013 11:54:48 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Andrew]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://www.generalist.org.uk/blog/?p=1046</guid>
		<description><![CDATA[Crossposted from the British Library Digital Scholarship blog I&#8217;ve been working as the Wikipedian in Residence at the British Library for the past nine months. This is a one-year project funded by the AHRC, which aims to study the ways in which academics and specialists can engage with Wikipedia and similar projects. It builds on [...]]]></description>
				<content:encoded><![CDATA[<p><i>Crossposted from the <a href="http://britishlibrary.typepad.co.uk/digital-scholarship/2013/02/wikipedia-and-the-british-library.html">British Library Digital Scholarship blog</a></i></p>
<hr />
<p>I&#8217;ve been working as the Wikipedian in Residence at the British Library for the past nine months. This is a one-year project funded by the AHRC, which aims to study the ways in which academics and specialists can engage with Wikipedia and similar projects.</p>
<p>It builds on the work previously done by a number of other Wikipedians in Residence at institutions around the world (<a href="https://outreach.wikimedia.org/wiki/Wikipedian_in_Residence">full list</a>); usually, they&#8217;ve worked with galleries or museums to help improve content relating to the collections of those institutions. The benefits for everyone are clear &#8211; Wikipedia improves in quality and scope; the institutions engage communities interested in their material, and reach potentially much broader audiences.</p>
<p>We&#8217;ve tried something a bit different this time around. While we&#8217;ve worked on some content projects, we&#8217;ve focused on working with researchers and librarians to help build skills and give people the confidence to engage directly with these communities. Over the past months, I&#8217;ve talked to well over three hundred people, demonstrating tools and encouraging them to think about making a first step. There are three approaches we&#8217;ve been looking at here:</p>
<ul>
<li><b>Contextualising research.</b> Part of the perennial problem of academic projects is that they are often very specialised; it can be very difficult to explain the details of the work to a layperson. Wikipedia allows researchers to help improve the &#8220;background&#8221; material needed to put their work in context, indirectly the supporting public impact of their work. Working with the International Dunhuang Project, the BL hosted a series of workshops over a week; here, curators, Wikipedia contributors, and students worked to write articles about Central Asian archaeology and exploration &#8211; see <a href="http://en.wikipedia.org/wiki/Wikipedia:GLAM/BL/IDP/Report">our report</a>.</li>
<li><b>Capturing research.</b> Wikipedia &#8211; a publicly-visible, constantly shifting draft awaiting further collaboration &#8211; is great for absorbing pieces of secondary research work that may never be formally published elsewhere. As a cataloguer, I used to spend time trying to chase down small details &#8211; who did this particular bookplate belong to? was this author the same as another under a pseudonym? what was the original title of this book, and was it first written in Russian or French? Many projects, especially those concentrating on historical networks or correspondence, produce many incidental biographies or summaries of events; Wikipedia can be a very efficient way to get this work out to a wider audience, rather than keeping it in a local silo. Next month, I&#8217;ll be working with the <a href="http://www.darwinproject.ac.uk">Darwin Correspondence Project</a> in Cambridge to look at using some of their biographical summaries as the nucleus of Wikipedia articles.</li>
<li><b>Digital content</b>. Wikimedia is one of the largest open-content communities around, and is always keen to use new high-quality material. If your project is producing data or images (or anything else) under a free license, there may well be someone wanting to use it in an interesting and transformative way &#8211; and to expose it to new audiences. At the Library, we&#8217;ve been working to get high-quality imagery from our Royal Manuscripts collection (recently digitised) to supplement related articles &#8211; such as the beautiful image illustrating the history of the fleur-de-lys in seven languages, below:</li>
<p><a title="Bedford Master [Public domain], via Wikimedia Commons" href="https://commons.wikimedia.org/wiki/File%3AClovis_recevant_la_fleur_de_lys_-_XVe_si%C3%A8cle.jpg"><img width="256" align="center" alt="Clovis recevant la fleur de lys - XVe siècle" src="//upload.wikimedia.org/wikipedia/commons/thumb/8/8c/Clovis_recevant_la_fleur_de_lys_-_XVe_si%C3%A8cle.jpg/256px-Clovis_recevant_la_fleur_de_lys_-_XVe_si%C3%A8cle.jpg"/></a></p>
</ul>
<p>If you&#8217;re interested in what else we&#8217;ve done, you can see an outline presentation I gave to AHRC <a href="http://www.slideshare.net/generalising/ahrc-wikipedian-in-residence-report">here</a>.</p>
<p> I&#8217;m at the Library until the end of April &#8211; if you think you or a group you&#8217;re working with would be interested to hear more, please get in touch!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.generalist.org.uk/blog/2013/wikipedia-and-the-british-library/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Marking authorship in texts</title>
		<link>http://www.generalist.org.uk/blog/2012/marking-authorship-in-texts/</link>
		<comments>http://www.generalist.org.uk/blog/2012/marking-authorship-in-texts/#comments</comments>
		<pubDate>Thu, 27 Dec 2012 13:52:58 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Andrew]]></category>
		<category><![CDATA[editing]]></category>
		<category><![CDATA[publishing]]></category>
		<category><![CDATA[wikipedia]]></category>
		<category><![CDATA[writing]]></category>

		<guid isPermaLink="false">http://www.generalist.org.uk/blog/?p=1030</guid>
		<description><![CDATA[While writing something about Wikipedia, and talking about the idea of tracable attribution of text, I&#8217;ve been thinking of ways in which works with multiple discrete authors have displayed the different contributions of those authors. At one extreme, there&#8217;s a fully &#8220;collaborative&#8221; work &#8211; no-one makes a distinction between the two authors, and while they&#8217;re [...]]]></description>
				<content:encoded><![CDATA[<p>While writing something about Wikipedia, and talking about the idea of tracable attribution of text, I&#8217;ve been thinking of ways in which works with multiple discrete authors have displayed the different contributions of those authors.</p>
<p>At one extreme, there&#8217;s a fully &#8220;collaborative&#8221; work &#8211; no-one makes a distinction between the two authors, and while they&#8217;re named on the title page the writing is implicitly attributed to both. At the other extreme, we have individual chapters or articles &#8211; A writes chapter 1, B writes chapter 2, etc., and they may never have known of the other contributors.</p>
<p>In the middle, there&#8217;s cases where the work is broadly collaborative but with individual elements &#8211; the main text is jointly written, but particular contributors sign their own footnotes, sidebar sections, forewords, appendices, etc.</p>
<p>The one that interests me, though, is something I saw in I.S. Shklovsky&#8217;s <i>Intelligent Life in the Universe</i> when I read it as a student &#8211; I seem to have lost my copy in the intervening ten years, so this is from memory.</p>
<p>The book was originally published in the USSR in the early 1960s, and translated and expanded in English with the aid of Carl Sagan later in the decade. The original text was updated by Sagan, who also added several new chapters; the two then shared drafts, editing &#8220;each other&#8217;s&#8221; sections. Given the political climate, however, they were keen to avoid claiming to be in agreement on some sensitive topics, and so they experimented with explicitly marking the appearance of a single voice in the text itself. </p>
<p>In the end, the result ran something like:</p>
<blockquote><p>Lorem ipsum dolor sit amet, consectetur adipisici elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. ▲Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.▼ △Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.▽</p></blockquote>
<p>Unmarked text was jointly written; black triangles marked remarks by one author, and white triangles by another. (At at least one point, delightfully, they started arguing.)</p>
<p>So, the question: was this something common in the period that I&#8217;ve just never noticed elsewhere? Is there a name for it? What other novel ways of marking authorship have been used?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.generalist.org.uk/blog/2012/marking-authorship-in-texts/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The encyclopedia anyone can [be told to] edit</title>
		<link>http://www.generalist.org.uk/blog/2012/the-encyclopedia-anyone-can-be-told-to-edit/</link>
		<comments>http://www.generalist.org.uk/blog/2012/the-encyclopedia-anyone-can-be-told-to-edit/#comments</comments>
		<pubDate>Fri, 10 Feb 2012 20:15:50 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Andrew]]></category>
		<category><![CDATA[commonplace]]></category>
		<category><![CDATA[history]]></category>
		<category><![CDATA[publishing]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://www.generalist.org.uk/blog/?p=941</guid>
		<description><![CDATA[A moment of amusement, from the (thankfully) long-distant past: The Great Soviet Encyclopedia, which contains more than 100,000 entries and fills fifty-one volumes, includes some distortions so flamboyant as to be beyond belief. These are an old story. But such distortions have importance [...] Almost everyone has heard about what happened to Beria in the [...]]]></description>
				<content:encoded><![CDATA[<p>A moment of amusement, from the (thankfully) long-distant past:</p>
<blockquote><p>The Great Soviet Encyclopedia, which contains more than 100,000 entries and fills fifty-one volumes, includes some distortions so flamboyant as to be beyond belief. These are an old story. But such distortions have importance [...]</p>
<p>Almost everyone has heard about what happened to Beria in the Encyclopedia. After his liquidation, subscribers were notified, with full instructions, that they should snip out the article about him and insert in its place substitute articles which were duly enclosed, about the Bering Strait and an obscure eighteenth-century statesman named Berholtz. These were the best available substitutes beginning with &#8216;Ber&#8217;. During Stalin&#8217;s day when the party line changed on some matter so important that the Encyclopedia itself had to be changed, subscribers were obliged to turn in the volume affected to the party secretary; it was pulped and a new whole volume, cut and patched, was then sent out to the subscriber. Nowadays the reader is allowed to keep the book, and trusted to make the proper emendation himself. Progress!</p>
<p>Another person &#8216;expelled&#8217; from the Encyclopedia was a Chinese Communist leader, Kao Kang. To replace him, a substitute page went out dealing with a city in Tibet. [...] In their haste to make the revision, the editors overlooked the fact that the same Tibetan city also appeared elsewhere in the Encyclopaedia, spelled differently.</p></blockquote>
<p>&#8211; John Gunther, <i>Inside Russia Today</i> (Penguin, 1964).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.generalist.org.uk/blog/2012/the-encyclopedia-anyone-can-be-told-to-edit/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Article ratings and expectations</title>
		<link>http://www.generalist.org.uk/blog/2010/article-ratings/</link>
		<comments>http://www.generalist.org.uk/blog/2010/article-ratings/#comments</comments>
		<pubDate>Fri, 01 Oct 2010 00:06:41 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Andrew]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://www.generalist.org.uk/blog/?p=769</guid>
		<description><![CDATA[I am working late and procrastinating, so a quick note on the recent Wikipedia article feedback pilot: It appears as though registered users are “tougher” in their grading of the articles than are anon users. This is especially notable in the area of “well sourced” (3.7 mean for anon vs. 2.8 mean for registered) and [...]]]></description>
				<content:encoded><![CDATA[<p>I am working late and procrastinating, so a quick note on the recent Wikipedia <a href="http://www.mediawiki.org/wiki/Article_feedback/Public_Policy_Pilot/Early_Data">article feedback pilot</a>:</p>
<blockquote><p>It appears as though registered users are “tougher” in their grading of the articles than are anon users. This is especially notable in the area of “well sourced” (3.7 mean for anon vs. 2.8 mean for registered) and “complete” (3.5 vs. 2.7). It’s interesting to note that the means for “neutral” are almost identical.</p></blockquote>
<p>Anecdotally, this fits well with a lot of what I&#8217;ve noticed with external feedback in the past; when someone writes in, it&#8217;s usually with a report of &#8220;X is wrong&#8221; rather than &#8220;the article on Y is atrocious&#8221;. When X is fixed, even when the article itself still seems to be a mess, people seem quite happy with it, even if it contains cleanup tags or ugly layout or the like. </p>
<p>Presumably, this suggests casual readers have low expectations of Wikipedia&#8217;s average quality; they accept bad (or terse) articles as par for the course but are pleasantly surprised by decent ones. Editors, meanwhile, are more closely familiar with the better ones, and apply somewhat more aspirational standards &#8211; a &#8220;tolerable&#8221; article is a deficient one.</p>
<p>On the matter of sourcing, I&#8217;d take a wild guess that if we went down to the article-specific level, we&#8217;d see a lot of this driven by the difference in articles with or without footnotes. Readers wanting a general overview may well be happy with general references or further-reading type external links; editors are more focused on the text, and more likely to prioritise specific footnoting of individual points.</p>
<p>The discrepancy in perceptions of completeness may come into play here, too &#8211; if you expect a terse cruddy article, then 5k of competently-written text <i>seems</i> relatively comprehensive. If you expect a detailed article with layout and images, then the 5k of text seems a bit of a damp squib.</p>
<p>A difference in expectations is probably partly driven by involvement &#8211; if you&#8217;re an editor, you&#8217;re more likely to expect good things and see room for improvement everywhere &#8211; but also partly by experience and estimation of quality. Which prompts the thought: do readers and editors read &#8220;different Wikipedias&#8221;? Do involved editors spend more time, on average, looking at or working with higher-quality text than casual readers do? An interesting question, but I&#8217;m not immediately sure how to quantify it. Ratio between raw pageviews and edits to an article, or pageviews versus talk pageviews?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.generalist.org.uk/blog/2010/article-ratings/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Notes on pending changes</title>
		<link>http://www.generalist.org.uk/blog/2010/notes-on-pending-changes/</link>
		<comments>http://www.generalist.org.uk/blog/2010/notes-on-pending-changes/#comments</comments>
		<pubDate>Fri, 13 Aug 2010 22:51:07 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Andrew]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://www.generalist.org.uk/blog/?p=717</guid>
		<description><![CDATA[Back in June, I wrote about the then-almost-implemented pending changes system on Wikipedia. What&#8217;s it like two months on? On the whole, I&#8217;m more than happy with its effects, and the feared imminent catastrophes haven&#8217;t materialised yet. Lag time to approve edits is pretty low; I haven&#8217;t dug up the turnaround times, but the page [...]]]></description>
				<content:encoded><![CDATA[<p>Back in June, I <a href="http://www.generalist.org.uk/blog/2010/pending-changes/">wrote about</a> the then-almost-implemented <a href="http://en.wikipedia.org/wiki/Wikipedia:Pending_changes">pending changes</a> system on Wikipedia. What&#8217;s it like two months on?</p>
<p>On the whole, I&#8217;m more than happy with its effects, and the feared imminent catastrophes haven&#8217;t materialised yet. Lag time to approve edits is pretty low; I haven&#8217;t dug up the turnaround times, but the page listing unchecked edits regularly changes completely in the few minutes between my first loading it and my remembering the tab is there and refreshing it. Indeed, it&#8217;s not uncommon to see the page empty entirely, and I&#8217;ve only rarely seen it listing more than half-a-dozen pages (out of a pool of ~2000). The lack of &#8220;pending pending changes&#8221; at any given moment also meant that spotting them via the watchlist, or casual browsing, was unlikely; to be aware of them, you usually needed to go to the central page. &#8220;Review conflicts&#8221; are quite common &#8211; perhaps a result of the noticeable slowness of the system on larger pages &#8211; but, then, so are rollback conflicts. This could definitely improve from speeding the page loading times up, I suspect; less time with the page pending is less time to have someone else come in. </p>
<p>The biggest problem I&#8217;ve found so far is, if anything, one of overenthusiasm. Whereas before we&#8217;d have a degree of &#8220;masterly inactivity&#8221; practiced on a lot of edits &#8211; someone would look at it, decide they don&#8217;t know enough to determine if it&#8217;s good or bad, and leave it be &#8211; the new system seems to have the effect of making people feel they ought to say one way or the other. Net result: more suboptimal approvals or rejections (ie, reverts), by people unfamiliar with what they&#8217;re dealing with, than we had before.</p>
<p>Why? Well, we have the central page, blinking at us, telling us there were four pages needing checked &#8211; four, just four! &#8211; and that there was a timer somewhere to note how long they took, and so on and so forth. There&#8217;s an impulse there, even if an unconscious one, to just do something so as to drive down the backlog. </p>
<p>Interestingly, this may be a problem that disappears as the system settles down, and becomes familiar and less excitingly novel. While there&#8217;s a small backlog &#8211; especially for a flagship new system &#8211; people will always feel the urge to just wipe the board clean, to keep it resolved, to have the satisfaction of having sorted it out. Once that backlog grows to a constant buffer of maybe twenty or fifty edits, the impulse to knock them all off while you make a cup of tea is sharply reduced, and so the likelihood of them being done for the sake of it is lowered; it becomes more likely that the edits will be picked up by someone who is intentionally watching the page, which is a good first approximation to &#8220;someone who knows what&#8217;s good&#8221;.</p>
<p>Assuming we have a fixed number of articles &#8211; protecting pages for the sake of protecting them is a bit odd &#8211; then the number of edits coming in will be constant; growing the buffer implies growing turnaround times, which is not the best thing. On the other hand, it&#8217;s probably inevitable &#8211; as the novelty wears off, and we stop thinking of it as an Important New Thing That Must Be Perfect, people are going to patrol the central page a bit less. It could well be that this inevitable decrease in responsiveness will actually have the unexpected benefit of improving the <i>quality</i> of reviewing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.generalist.org.uk/blog/2010/notes-on-pending-changes/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Pending changes</title>
		<link>http://www.generalist.org.uk/blog/2010/pending-changes/</link>
		<comments>http://www.generalist.org.uk/blog/2010/pending-changes/#comments</comments>
		<pubDate>Tue, 15 Jun 2010 22:48:25 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Andrew]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://www.generalist.org.uk/blog/?p=661</guid>
		<description><![CDATA[So, in under an hour, flagged revisions will go live on the English Wikipedia. Wait &#8211; flagged protection. No, that&#8217;s it, pending changes. It seems to change its name once a week at the moment &#8211; my small victory was getting rid of the word &#8220;revisions&#8221; in its current form. (We take our lasting moments [...]]]></description>
				<content:encoded><![CDATA[<p>So, in under an hour, flagged revisions will go live on the English Wikipedia. Wait &#8211; flagged protection. No, that&#8217;s it, <a href="http://en.wikipedia.org/wiki/Wikipedia:Pending_changes"><i>pending changes</i></a>. It seems to change its name once a week at the moment &#8211; my small victory was getting rid of the word &#8220;revisions&#8221; in its current form. (We take our lasting moments when we can)</p>
<p>What is it? Surprisingly little, all told, for all the ink that has been spilled. I feel the need to write something simply because of all the misinformation I&#8217;ve seen floating around over the last week or so&#8230;</p>
<p>It&#8217;s a tool which will see a small number of pages &#8211; at the moment, hard-limited to a cap of 2,000, and in practice not more than a few dozen for the first few days &#8211; placed under a new form of editing protection. They&#8217;ll be either pages which were already protected or already liable for protection under the general rules for that &#8211; high levels of vandalism, repeated fights over content, or just ludicrously tempting targets.</p>
<p>(A quick recap &#8211; pages subject to protection are either &#8220;full protected&#8221; &#8211; only users with administrator privileges, about 1,500 people, can edit them &#8211; or &#8220;semi protected&#8221; &#8211; most logged-in users can edit them, but new users or passing contributors can&#8217;t.)</p>
<p>The new system works by allowing anyone to edit, but adding a simple form of pre-screening &#8211; at any given moment, the version of the article displayed to readers will not always be the same as the most <i>recent</i> version of the article. Any qualified user will be able to look at the edits and flag the most recent as &#8220;acceptable&#8221; &#8211; &#8220;not terrible&#8221; might be a more pragmatic standard, I suppose &#8211; making it the version displayed by default, until a few more edits down the line a new one is approved, etc. The aim is that there will be a few thousand of such qualified &#8220;reviewers&#8221;, certainly enough to scale to the likely task.</p>
<p>It&#8217;s important to remember that all edits are sequential and not parallel; it&#8217;s not a matter of allowing several versions of an article to develop and then picking one, but rather an edit not approved will still be incorporated into subsequent edits, unless it&#8217;s independently edited back out.</p>
<p>The net result will primarily be to </p>
<ul>
<li>a) make these pages <i>more</i> open to editing, not less; whilst
<li>b) reducing the amount of vandalism and malicious content visible to readers</ul>
<p>a) comes from allowing anyone to edit, rather than turning them away by locking the page; b) comes from adding the post-edit sanity-check screening.</p>
<p>The counterarguments are that it will:</p>
<ul>
<li>a) act as a form of censorship;
<li>b) increase the workload for &#8220;reviewing&#8221; editors;
<li>c) reduce the involvement of casual users</ul>
<p>I honestly don&#8217;t think any of these are likely to be the case unless the implementation is fouled up. Let&#8217;s look at the simplest one first: c). The system will allow people to contribute &#8211; in a limited way &#8211; where previously they could not contribute at all. It&#8217;s possible that the existence of limited or conditional contributions will prove to be something of a deterrent over &#8220;normal&#8221; contributions, but will it really be a deterrent over a complete pre-emptive rejection? We have some evidence from the rollout of a much broader version of this on the German wikipedia that implementing it did decrease the proportion of edits by IPs &#8211; people other than logged-in users &#8211; but it&#8217;s clearly part of an overall long-term trend:</p>
<p><a href="http://commons.wikimedia.org/wiki/File:Percentage_of_IP_edits_on_de.wp_by_date.svg"><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/f/f1/Percentage_of_IP_edits_on_de.wp_by_date.svg/500px-Percentage_of_IP_edits_on_de.wp_by_date.svg.png"></a></p>
<p>b) implies that by having these edits, people will have to spend time looking at them and deciding whether to validate or reject. But we have that already &#8211; every edit that is made, in theory, gets glanced over by someone who decides whether or not to remove it. The problem is that whilst removal is obvious, there is no way to say &#8220;I have looked at this edit and choose to validate it&#8221;; every valid-but-potentially-dubious edit will thus be looked at by a number of people who &#8211; effectively &#8211; have no way of signalling to each other that the work&#8217;s been done. Allowing it to be marked as &#8220;acceptable&#8221; thus should tend to reduce the overall effort &#8211; the first person needs to make slightly more effort than they would otherwise have done, but ten others are saved assessing it.</p>
<p>a) is the most complicated. To a degree, this is a pretty visceral thing; I&#8217;ve debated this two or three times over the past few days and never seen anyone alter their position (either way) on it. But fundamentally, it&#8217;s a variant on c) &#8211; more people get to edit, there is more chance of more voices being heard. Yes, people can be &#8220;screened&#8221; by not having their edits prominently shown to passing readers &#8211; but if their edits were viewed as undesirable for whatever reason, under the old system they would have been reverted pretty quickly anyway. A page that is put under protection should not be there in order to ensure that one perspective is presented and another legitimate perspective is locked out; if that is the case, the fundamental problem lies with the decision to protect, not the mechanism used to protect. I&#8217;m firmly of the opinion that &#8220;conventional&#8221; protection is a mechanism with fundamentally more potential for censorship and suppression than this approach.</p>
<p>On the whole, I&#8217;m pretty positive about it. It&#8217;s not a panacea; it won&#8217;t solve everything, and it probably won&#8217;t have an overwhelmingly drastic effect on the areas it&#8217;s dealing with. But it will make some things better, it probably won&#8217;t have noticeable knock-on effects, and&#8230; well, we never pretended the old way was perfect. Why be afraid to experiment?</p>
<p>(For those of you wanting to read more: the <a href="http://en.wikipedia.org/wiki/Help:Pending_changes">help page</a>; the <a href="http://blog.wikimedia.org/2010/pending-changes-for-wikipedia/">official announcement</a>; and <a href="http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2009-08-31/Flagged_protection_background">a 2009 article on the long history of the proposal</a>.)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.generalist.org.uk/blog/2010/pending-changes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Quality versus age of Wikipedia&#8217;s Featured Articles</title>
		<link>http://www.generalist.org.uk/blog/2010/quality-versus-age-of-wikipedias-featured-articles/</link>
		<comments>http://www.generalist.org.uk/blog/2010/quality-versus-age-of-wikipedias-featured-articles/#comments</comments>
		<pubDate>Fri, 16 Apr 2010 13:25:33 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Andrew]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://www.generalist.org.uk/blog/?p=442</guid>
		<description><![CDATA[There&#8217;s been a brief flurry of interest on Wikipedia in this article, published last week: Evaluating quality control of Wikipedia&#8217;s feature articles &#8211; David Lindsey. &#8230;Out of the Wikipedia articles assessed, only 12 of 22 were found to pass Wikipedia’s own featured article criteria, indicating that Wikipedia’s process is ineffective. This finding suggests both that [...]]]></description>
				<content:encoded><![CDATA[<p>There&#8217;s been a brief flurry of interest on Wikipedia in this article, published last week:</p>
<p><a href="http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2721/2482"><i>Evaluating quality control of Wikipedia&#8217;s feature articles</i></a> &#8211; David Lindsey.</p>
<blockquote><p>&#8230;Out of the Wikipedia articles assessed, only 12 of 22 were found to pass Wikipedia’s own featured article criteria, indicating that Wikipedia’s process is ineffective. This finding suggests both that Wikipedia must take steps to improve its featured article process and that scholars interested in studying Wikipedia should be careful not to naively believe its assertions of quality.</p></blockquote>
<p>A <a href="http://en.wikipedia.org/wiki/Wikipedia_talk:Featured_article_candidates#Journal_article_labels_FAC_a_failure">recurrent objection</a> to this has been that Lindsey didn&#8217;t take account of the age of articles &#8211; partly because article quality can degrade over time, since the average contribution is likely to be below the quality of the remainder of the article if it began at a high level, and partly because the relative stringency of what constitutes &#8220;featured&#8221; has changed over time.</p>
<p>The interesting thing is, this partly holds and partly doesn&#8217;t. The article helpfully &#8220;scored&#8221; the 22 articles reviewed on a reasonably arbitrary ten-point scale; the average was seven, which I&#8217;ve taken as the cut-off point for acceptability. If we graph quality against time &#8211; time being defined as the last time an article passed through the &#8220;featuring&#8221; process, either for the first time or as a review &#8211; then we get an interesting graph:</p>
<p><a href="http://commons.wikimedia.org/wiki/File:Age_of_FAs_by_quality_in_Lindsey.svg"><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/8/80/Age_of_FAs_by_quality_in_Lindsey.svg/500px-Age_of_FAs_by_quality_in_Lindsey.svg.png"></a></p>
<p>Here, I&#8217;ve divided them into two groups; blue dots are those with a rating greater than 7, and thus acceptable; red dots are those with a rating lower than 7, and so insufficient. It&#8217;s very apparent that these two cluster separately; if an article is good enough, then <i>there is no relation</i> between the current status and the time since it was featured. If, however, it is not good enough, then there is a very clear linear relationship between quality and time. The trendlines aren&#8217;t really needed to point this out, but I&#8217;ve included them anyway; note that they share a fairly similar origin point.</p>
<p>Two hypotheses could explain this. Firstly, the quality when first featured varies sharply over time, but most older articles have been brought up to &#8220;modern standards&#8221;. Secondly, the quality when first featured is broadly consistent over time, and most articles remain that level, but some decay, and that decay is time-linked.</p>
<p>I am inclined towards the second. If it was the first, we would expect to see some older articles which were &#8220;partially saved&#8221; &#8211; say, one passed when the average scoring was three, and then &#8220;caught up&#8221; when the average scoring was five. This would skew the linearity of the red group, and make it more erratic &#8211; but, no, no sign of that. We also see that the low-quality group has no members older than about three years (1100 days); this is consistent with a sweeper review process which steadily goes through old articles looking for bad ones, and weeding out or improving the worst.</p>
<p><small>(The moral of the story? Always graph things. It is <i>amazing</i> what you spot by putting things on a graph.)</small></p>
<p>So what would this hypothesis tell us? Assuming our 22 are a reasonable sample &#8211; which can be disputed, but let&#8217;s grant it &#8211; the data is entirely consistent with all of them being of approximately the same quality when they first become featured; so we can forget about it being a flaw in the review process, it&#8217;s likely to be a flaw in the maintenance process.</p>
<p>Taking our dataset, the population of featured articles falls into two classes.</p>
<ul>
<li><b>Type A</b> &#8211; quality is consistent over time, even up to four years (!), and they comply with the standards we aim for when they&#8217;re first passed.</p>
<li><b>Type B</b> &#8211; quality decays steadily with time, leaving the article well below FA status before even a year has passed.</ul>
<p>For some reason, we are doing a bad job of maintaining the quality of about a third of our featured articles; why, and what distinguishes Type B from Type A? My first guess was user activity, but no &#8211; of those seven, in only one case has the user who nominated it effectively retired from the project.</p>
<p>Could it be contentiousness? Perhaps. I can see why <a href="http://en.wikipedia.org/wiki/Belarus">Belarus</a> and <a href="http://en.wikipedia.org/wiki/Alzheimer's_Disease">Alzheimer&#8217;s Disease</a> may be contentious and fought-over articles &#8211; but why <a href="http://en.wikipedia.org/wiki/Toru_Takemitsu">Tōru Takemitsu</a>, a well-regarded Japanese composer? We have a decent-quality article on <a href="http://en.wikipedia.org/wiki/Global_Warming">global warming</a>, and you don&#8217;t get more contentious than that.</p>
<p>It could be timeliness &#8211; an article on a changing topic can be up-to-date in 2006 and horribly dated in 2009 &#8211; which would explain the problem with Alzheimer&#8217;s, but it doesn&#8217;t explain why some low-quality articles are on relatively timeless topics &#8211; Takemitsu or the <a href="http://en.wikipedia.org/wiki/California_Gold_Rush">California Gold Rush</a> &#8211; and some high-quality ones are on up-to-date material such as climate change or <a href="http://en.wikipedia.org/wiki/Economy_of_India">the Indian economy</a>.</p>
<p>There must be something linking this set, but I have to admit I don&#8217;t know what it is.</p>
<p>We would be well-served, I think, to take this article as having pointed up a serious problem of decay, and start looking at how we can address that, and how we can help maintain the quality of <i>all</i> these articles. Whilst the process for actually identifying a featured article at a specific point in time seems vindicated &#8211; I am actually surprised we&#8217;re not seeing more evidence of lower standards in the past &#8211; we&#8217;re definitely doing our readers a disservice if the articles rapidly drop below the standards we advertise them as holding.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.generalist.org.uk/blog/2010/quality-versus-age-of-wikipedias-featured-articles/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Demographics in Wikipedia</title>
		<link>http://www.generalist.org.uk/blog/2010/demographics-in-wikipedia/</link>
		<comments>http://www.generalist.org.uk/blog/2010/demographics-in-wikipedia/#comments</comments>
		<pubDate>Fri, 29 Jan 2010 17:30:05 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Andrew]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://www.generalist.org.uk/blog/?p=191</guid>
		<description><![CDATA[There&#8217;s a lengthy internal debate going on in Wikipedia at the moment (see here, if you really want to look inside the sausage factory) about how best to deal with the perennial article of biographies of living people, of which there are about 400,000. As an incidental detail to this, people have been examining the [...]]]></description>
				<content:encoded><![CDATA[<p>There&#8217;s a lengthy internal debate going on in Wikipedia at the moment (see <a href="http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2010-01-25/BLP_madness">here</a>, if you really want to look inside the sausage factory) about how best to deal with the perennial article of biographies of living people, of which there are about 400,000.</p>
<p>As an incidental detail to this, people have been examining the issue from all sorts of angles. One particularly striking graph that&#8217;s been floating around shows the number of articles marked as being born or died in any given year from the past century:</p>
<p><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/2/23/Births_and_deaths_in_Wikipedia_biographies%2C_1899-2010.png/500px-Births_and_deaths_in_Wikipedia_biographies%2C_1899-2010.png"><br /><small><a href="http://commons.wikimedia.org/wiki/File:Births_and_deaths_in_Wikipedia_biographies,_1899-2010.png">User:Carcharoth</a></small></p>
<p>As <a href="http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2010-01-25/Births_and_deaths">the notes</a> point out, we can see some interesting effects here. Firstly &#8211; and most obviously &#8211; is the &#8220;recentism&#8221;; people who are alive and active in the present era tend to be more likely to have articles written about them, so you get more very recent deaths than (say) people who died forty years ago. Likewise, you have a spike around the late 1970s / early 1980s of births of people who&#8217;re just coming to public attention &#8211; in other words, people in their early thirties or late twenties are more likely to have articles written about them. </p>
<p>If we look back with a longer-term perspective, we can see that the effects of what Wikipedia editors have chosen to write about diminish, and the effects of demographics become more obvious. There are, for example, suggestions of prominent blips in the deathrate during the First and Second World Wars, and what may be the post-war baby boom showing up in the late 1940s.</p>
<p>So, we can distinguish two effects; underlying demographics, and what people choose to write about.</p>
<p>(In case anyone is wondering: people younger than 25 drop off dramatically. The very youngest are less than a year old, and are invariably articles about a) heirs to a throne; b) notorious child-murder cases; c) particularly well-reported conjoined twins or other multiple births. By about the age of five you start getting a fair leavening of child actors and the odd prodigy.)</p>
<p>Someone then came up with this graph, which is the same dataset drawn from the French Wikipedia:</p>
<p><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/f/fb/Naissances_et_d%C3%A9c%C3%A8s_dans_les_biographies_de_frWP%2C_1899-2009.png/500px-Naissances_et_d%C3%A9c%C3%A8s_dans_les_biographies_de_frWP%2C_1899-2009.png"><br /><small><a href="http://commons.wikimedia.org/wiki/File:Naissances_et_d%C3%A9c%C3%A8s_dans_les_biographies_de_frWP,_1899-2009.png">User:Pymouss</a></small></p>
<p>At a glance, they look quite similar, which tells us that the overall dynamic guiding article-writing is broadly the same in both cases. This doesn&#8217;t sound that drastic a change, but different language editions can vary quite dramatically in things like standards for what constitutes a reasonable topic, so it is useful to note. French has a more pronounced set of spikes in WWI, WWII, and the post-war baby boom, though, as well as a very distinctive <i>lowering</i> of the birthrate during WWI. These are really quite interesting, especially the latter one, because it suggests we&#8217;re seeing a different underlying dynamic. And the most likely underlying dynamic is, of course, that Francophones tend to prefer writing about Francophones, and Anglophones tend to prefer writing about Anglophones&#8230;</p>
<p>So, how does this compare in other languages? I took these two datasets, and then added Czech (which someone <a href="http://commons.wikimedia.org/wiki/File:Graf_biografi%C3%AD_podle_let_narozen%C3%AD_a_%C3%BAmrt%C3%AD_cswp.png">helpfully collected</a>), German and Spanish. (The latter two mean we have four of the five biggest languages represented. I&#8217;d have liked to include Polish, but the data was not so easily accessible.) I then normalised it, so each year was a percentage of the average for that language for that century, and graphed them against each other:</p>
<p><a href="http://commons.wikimedia.org/wiki/File:Wikipedia_biographies_by_birth_year_1899-2009.svg"><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/7/7c/Wikipedia_biographies_by_birth_year_1899-2009.svg/500px-Wikipedia_biographies_by_birth_year_1899-2009.svg.png"></a></p>
<p><a href="http://commons.wikimedia.org/wiki/File:Wikipedia_biographies_by_death_year_1899-2009.svg"><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/3/3f/Wikipedia_biographies_by_death_year_1899-2009.svg/500px-Wikipedia_biographies_by_death_year_1899-2009.svg.png"></a></p>
<p>What can we see from these? Overall, every project has basically the same approach to inclusion; ramping up steadily over time, a noticeable spike in people who died during WWII or in the past two decades, and a particular interest in people who are about thirty and in the public eye. There is one important exception to this last case &#8211; German, which has a flat birthrate from about 1940 onwards, and apparently no significant recentism in this regard. The same is true of Czech to a limited degree. (Anecdotally I believe the same may be true of Japanese, but I haven&#8217;t managed to gather the data yet)</p>
<p>The WWII death spike is <i>remarkably</i> prominent in German and Czech, moderately prominent in French, and apparent but less obvious in English and Spanish. This could be differential interest in military history, where biographies tend to have deaths clustered in wartime, but it also seems rational to assume this reflects something of the underlying language-biased data. More Central Europeans died in WWII than Western Europeans; proportionally fewer died in the Anglosphere because English-speaking civilian populations escaped the worst of it, and the Spanish-speaking world was mostly uninvolved. The deaths in WWI are a lot more tightly clustered, and it&#8217;s hard to determine anything for sure here.</p>
<p>The other obvious spike in deaths is very easy to understand from either interpretation of the reason; it&#8217;s in 1936, in Spanish, which coincides with the outbreak of the Civil War. Lots of people to write articles about, there, and people less likely to be noted outside of Spain itself.</p>
<p>I mentioned above that (older) birthrates are more likely to represent an underlying demographic reality than deathrates are; localised death rates could be altered by a set of editors who choose to write on specific themes. You&#8217;d only get a birthdate spike, it seems, if someone was explicitly choosing to write about people born in a specific period; it&#8217;s hard to imagine it from a historical perspective. Historically linked people are grouped by when they&#8217;re prominent and active, and that happens at a variable time in their lives, so someone specifically writing about a group of people is likely to &#8220;smear&#8221; out their birthdates in a wide distribution.</p>
<p>So, let&#8217;s look at the historic births graph and see if anything shows up there. German and French show <i>very</i> clear drops in the birth rate between 1914 and about 1920, round U-shaped falls. German appears to have a systemic advantage over the other projects in birthrate through the 1930s and 1940s, though as the data is normalised against an average this may be misleadingly inflated &#8211; it doesn&#8217;t have the post-1970 bulge most languages do. The very sharp drop in births in 1945 is definitely not an artefact, though; you can see it to a lesser degree in the other languages, except English, where it&#8217;s hardly outside normal variance.</p>
<p>So, there does seem to be a real effect here; both these phenomena seem predictable as real demographic events, and the difference between the languages is interpretable as different populations suffering different effects in these periods and being represented to different degrees in the selection of people by various projects.</p>
<p>The next step would be, I suppose, to compare those figures to known birth and death rates both globally and regionally over the period; this would let us estimate of the various degrees of &#8220;parochialism&#8221; involved in the various projects&#8217; coverage of people, as well as the varying degrees of &#8220;recentness&#8221; which we&#8217;ve seen already. Any predictions?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.generalist.org.uk/blog/2010/demographics-in-wikipedia/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	<img style='margin:0;padding:0;border:0;' width='1px' height='1px' src="http://www.generalist.org.uk/blog/wp-content/plugins/mystat/mystat.php?act=time_load&id=582845&rnd=590014456" /></channel>
</rss>
