Posts Tagged ‘wikimedia’

Taking pictures with flying government lasers

Friday, October 2nd, 2015

Well, sort of.

A few weeks ago, the Environment Agency released the first tranche of their LIDAR survey data. This covers (most of) England, at varying resolution from 2m to 25cm, made via LIDAR airborne survey.

It’s great fun. After a bit of back-and-forth (and hastily figuring out how to use QGIS), here’s two rendered images I made of Durham, one with buildings and one without, now on Commons:

The first is shown with buildings, the second without. Both are at 1m resolution, the best currently available for the area. Note in particular the very striking embankment and cutting for the railway viaduct (top left). These look like they could be very useful things to produce for Commons, especially since it’s – effectively – very recent, openly licensed, aerial imagery…

1. Selecting a suitable area

Generating these was, on the whole, fairly easy. First, install QGIS (simplicity itself on a linux machine, probably not too much hassle elsewhere). Then, go to the main data page and find the area you’re interested in. It’s arranged on an Ordnance Survey grid – click anywhere on the map to select a grid square. Major grid squares (Durham is NZ24) are 10km by 10km, and all data will be downloaded in a zip file containing tiles for that particular region.

Let’s say we want to try Cambridge. The TL45 square neatly cuts off North Cambridge but most of the city is there. If we look at the bottom part of the screen, it offers “Digital Terrain Model” at 2m and 1m resolution, and “Digital Surface Model” likewise. The DTM is the version just showing the terrain (no buildings, trees, etc) while the DSM has all the surface features included. Let’s try the DSM, as Cambridge is not exactly mountainous. The “on/off” slider will show exactly what the DSM covers in this area, though in Cambridge it’s more or less “everything”.

While this is downloading, let’s pick our target area. Zooming in a little further will show thinner blue lines and occasional superimposed blue digits; these define the smaller squares, 1 km by 1 km. For those who don’t remember learning to read OS maps, the number on the left and the number on the bottom, taken together, define the square. So the sector containing all the colleges along the river (a dense clump of black-outlined buildings) is TL4458.

2. Rendering a single tile

Now your zip file has downloaded, drop all the files into a directory somewhere. Note that they’re all named something like tl4356_DSM_1m.asc. Unsurprisingly, this means the 1m DSM data for square TL4356.

Fire up QGIS, go to Layer > Add raster layer, and select your tile – in this case, TL4458. You’ll get a crude-looking monochrome image, immediately recognisable by a broken white line running down the middle. This is the Cam. If you’re seeing this, great, everything’s working so far. (This step is very helpful to check you are looking at the right area)

Now, let’s make the image. Project > New to blank everything (no need to save). Then Raster > Analysis > DEM (terrain models). In the first box, select your chosen input file. In the next box, the output filename – with a .tif suffix. (Caution, linux users: make sure to enter or select a path here, otherwise it seems to default to home). Leave everything else as default – all unticked and mode: hillshade. Click OK, and a few seconds later it’ll give a completed message; cancel out of the dialogue box at this point. It’ll be displaying something like this:

Congratulations! Your first LIDAR rendering. You can quit out of QGIS (you can close without saving, your converted file is saved already) and open this up as a normal TIFF file now; it’ll be about 1MB and cover an area 1km by 1km. If you look closely, you can see some surprisingly subtle details despite the low resolution – the low walls outside Kings College, for example, or cars on the Queen’s Road – Madingley Road roundabout by the top left.

3. Rendering several tiles

Rendering multiple squares is a little trickier. Let’s try doing Barton, which conveniently fits into two squares – TL4055 and TL4155. Open QGIS up, and render TL4055 as above, through Raster > Analysis > DEM (terrain models). Then, with the dialogue window still open, select TL4155 (and a new output filename) and run it again. Do this for as many files as you need.

After all the tiles are prepared, clear the screen by starting a new project (again, no need to save) and go to Raster > Miscellaneous > Merge. In “Input files”, select the two exports you’ve just done. In “Output file”, pick a suitable filename (again ending in .tif). Hit OK, let it process, then close the dialog. You can again close QGIS without saving, as the export’s complete.

The rendering system embeds coordinates in the files, which means that when they’re assembled and merged they’ll automatically slot together in the correct position and orientation – no need to manually tile them. The result should look like this:

The odd black bit in the top right is the edge of the flight track – there’s not quite comprehensive coverage. This is a mainly agricultural area, and you can see field markings – some quite detailed, and a few bits on the bottom of the right-hand tile that might be traces of old buildings.

So… go forth! Make LIDAR images! See what you can spot…

4. Command-line rendering in bulk

Richard Symonds (who started me down this rabbit-hole) points out this very useful post, which explains how to do the rendering and merging via the command line. Let’s try the entire Durham area; 88 files in NZ24, all dumped into a single directory –

for i in `ls *.asc` ; do gdaldem hillshade -compute_edges $i $i.tif ; done

gdal_merge.py -o NZ24-area.tif *.tif

rm *.asc.tif

In order, that a) runs the hillshade program on each individual source file ; b) assembles them into a single giant image file; c) removes the intermediate images (optional, but may as well tidy up). The -compute_edges flag helpfully removes the thin black lines between sectors – I should have turned it on in the earlier sections!

Laws on Wikidata

Tuesday, September 9th, 2014

So, I had the day off, and decided to fiddle a little with Wikidata. After some experimenting, it now knows about:

  • 1516 Acts of the Parliament of the United Kingdom (1801-present)
  • 194 Acts of the Parliament of Great Britain (1707-1800)
  • 329 Acts of the Parliament of England (to 1707)
  • 20 Acts of the Parliament of Scotland (to 1707)
  • 19 Acts of the Parliament of Ireland (to 1800)

(Acts of the modern devolved parliaments for NI, Scotland, and Wales will follow.)

Each has a specific “instance of” property – Q18009569, for example, is “act of the Parliament of Scotland” – and is set up as a subclass of the general “act of parliament”. At the moment, there’s detailed subclasses for the UK and Canada (which has a seperate class for each province’s legislation) but nowhere else. Yet…

These numbers are slightly fuzzy – it’s mainly based on Wikipedia articles and so there are a small handful of cases where the entry represents a particular clause (eg Q7444697, s.4 and s.10 of the Human Rights Act 1998), or cases hwere multiple statutes are treated in the same article (eg Q1133144, the Corn Laws), but these are relatively rare and, mostly, it’s a good direct correspondence. (I’ve been fairly careful to keep out oddities, but of course, some will creep in…)

So where next? At the moment, these almost all reflect Wikipedia articles. Only 34 have a link to (English) Wikisource, though I’d guess there’s about 200-250 statutes currently on there. Matching those up will definitely be valuable; for legislation currently in force and on the Statute Law Database, it would be good to be able to crosslink to there as well.

Mechanical Curator on Commons

Sunday, December 15th, 2013

The internet has been very enthralled by the British Library’s recent release of the Mechanical Curator collection: a million public-domain images extracted from digitised books, put online for people to identify and discover. The real delight is that we don’t know what’s in there – the images have been extracted and sorted by a computer, and human eyes may never have looked at them since they were scanned.

Image taken from page 171 of '[Seonee, or, camp life on the Satpura Range ... Illustrated by the author, etc.]'

I wasn’t directly involved with this – it was released after I left – but it was organised by former colleagues of mine, and I’ve worked on some other projects with the underlying Microsoft Books collection. It’s a great project, and all the more so for being a relatively incidental one. I’m really, really delighted to see it out there, and to see the outpouring of interest and support for it.

One of the questions that’s been asked is: why put them on Flickr and not Commons? The BL has done quite a bit of work with Wikimedia, and has used it as the primary way of distributing material in the past – see the Picturing Canada project – and so it might seem a natural home for a large release of public domain material.

The immediate answer is that Commons is a repository for, essentially, discoverable images. It’s structured with a discovery mechanism built around knowing that you need a picture of X, and finding it by search or by category browsing, which makes metadata essential. It’s not designed for serendipitous browsing, and not able to cope easily with large amounts of unsorted and unidentified material. (I think I can imagine the response were the community to discover 5% of the content of Commons was made up of undiscoverable, unlabelled content…) We have started looking at bringing it across, but on a small scale.

Putting a dump on archive.org has much the same problem – a lack of functional discoverability. There’s no way to casually browse material here, and it relies very much on metadata to make it accessible. If the metadata doesn’t exist, it’s useless.

And so: flickr. Flickr, unlike the repositories, is designed for casual discoverability, for browsing screenfuls of images, and for users to easily tag and annotate them – things that the others don’t easily offer. It’s by far the best environment of the three for engagement and discoverability, even if probably less useful for long-term storage.

This brings the question: should Commons be able to handle this use case? There’s a lot of work being done just now on the future of multimedia: will Commons in 2018 be able to handle the sort of large-scale donation that it would choke on in 2013? Should we be working to support discovery and description of unknown material, or should we be focusing on content which already has good metadata?