Historic maps present a key source of information of the evolution and change of both the natural and built environment. Many types of environmental processes and land use changes happen over long periods and therefore having historical data of these in progress is essential to help understand them. The National Library of Scotland have a unique collection of digitised historic maps covering the UK, these represent a huge opportunity to mine for relevant data on a whole host of environmental, land use and built environment changes over the course of their period (accurate OS sequences cover the period since 1854, although not always completely over the whole UK over the entire sequence). Cataloguing and understanding the data content of these maps by hand is time-consuming and can be difficult in itself simply because of the volume of data represented.
Modern deep learning tools are now sufficiently highly developed for image segmentation that they represent efficient methodologies to begin to build a useful data processing chain to work with digitised and geo-referenced map images. The aim would be to be able to segment fully the map, removing and geo-referencing text, using character recognition to make a searchable text layer which is geo-located. Then to perform similar analysis for other features once confounding text overlays are removed. In particular both natural environment markers such as woodland, coastline, as well as built environment (e.g. road network, car parks) could all be usefully extracted as vectorised layers on which transport activities could be identified and trip origin-destination maps could be further derived. This work could then enable changes in extent and nature of these to be examined across a sequence of maps (e.g. traffic pollution map).
However, various issues remain for which modern satellite-based Earth observation techniques can help to provide a useful input. Firstly, accurately georeferencing features on maps can be difficult. Current approaches to this rely on coarse georeferencing at the corners of a map sheet, a more interesting and advanced approach would be to allow more detailed transformations across a map sheet to do this more accurately. This would allow for a more accurate and detailed comparison of changes over time. This leads on to the second way that satellite data could be used, namely to extend any time series of changes through to the present. It is expected that similar techniques could be used for satellite images as for the maps themselves, allowing any work developed to be of more extensive usefulness.
A recent project to work through some of the ideas for this processing chain has been successful at using template matching to begin to segment simpler symbols across a map, and a more complex technique using a pre-trained UNet convolutional neural network to perform text segmentation. It is expected that a longer PhD project can build on these successes and develop further techniques to increase the amount of data that can be obtained automatically from the available map sequences. Furthermore, we will focus on using open multispectral imaging from Sentinel-2 to permit map georeferencing and to experiment with extending deep learning techniques to segment such images.
This project is joint with National Library of Scotland (NLS) who are able to supply a wealth of historic map scans and Registers of Scotland (ROS) who have a significant interest in land use change and built environment specifically.
Images are examples of successful text segmentation to enable text removal, and pattern matching for detection of markers of forest type in a historic OS map (with thanks to NLS).