Auf die Größe kommt es an

Zum Deutschen Geographentag 2013 in Passau brachte ich ein Poster, das den Rahmen und erste Ergebnisse meines Dissertationsprojekts zeigt.


Grundsätzlich behandle ich dabei die Anfälligkeit von (sozialen) Simulationsmodellen auf Skalierung in sprichwörtlich allen erdenklichen Modelldimensionen (u.a. Zeit, Raum, soziale Bezugseinheit). Grundlage ist ein agentenbasiertes Modell von Umzugsentscheidungen. Am Poster fokussiere ich dabei auf die beiden technologisch-methodischen work packages. Das ist zum Einen ein Algorithmus zur Dissagregation und Almagamation verschiedener Bevölkerungsdaten in eine Population im Maßstab 1:1. Dies dient dazu, später die Eingabedaten gemeinsam mit der Analyseebene schrittweise von dieser “größten sinnvollen Auflösung” zu re-aggregieren, und so das Modell in einer Vielzahl von Kombinationen unterschiedlicher Skalierung auf verschiedenen Dimensionen durchlaufen zu lassen. Die technische Implementierung dieser Experimentierumgebung ist der zweite am Poster diskutierte Teil.

Der Disaggregierungsalgorithmus ist in Python ausgeführt, die Daten sind in einem SpatialLite-Backend gespeichert (wobei durch die Anbindung via GDAL/OGR verschiedene andere Varianten erlaubt, beispielsweise könnten die Daten auch in einer leistungsfähigeren Datenbank wie Postgresql/PostGIS liegen).

Der Quelltext ist in einem git-Repository verfügbar, unter Als Abhängigkeiten sind Python 2.x, SpatiaLite, Shapely, und GDAL/OGR notwendig. Die Bibliotheken sind alla auch via easy_install/pip/etc verfügbar. Für zugehörige Beispieldaten und/oder nähere Erklärung des zugegebenermaßen streckenweise etwas konfusen (aber kommentierten) Codes stehe ich natürlich gerne zur Verfügung.

Das Poster ist hier als PDF herunterzuladen.

Frankenstein, PhD: pseudo-individual populations

The doctorate college has its annual symposium going on this weekend (20-21 Sept 2013).

I will, among other things, contribute a poster about the recently completed first work package of my thesis. It covers disaggregating census data (a looooot of columns with a loooot of data) onto individual building polygons (no population data at all :/) with the help of a fine mesh population grid (only population count, but comparably high resolution). Final product is a set of interrelational tables of individuals, households, and buildings.

What I did was …

  • … apply a local/regional filter to the buildings, omitting everything with an area larger or smaller than the median of its building block plus/minus three standard deviations. The aim here is to discard malls, supermarkets, news stands, and the like; and work on mostly residential buildings.
  • Then, I distribute the grid cells’ population count over the buildings, by area. Buildings overlapping with more than one grid cell would receive aliquot population counts from all respective grid cells. In the same way than later on with the census data columns, I first assign decimal values, and then round them, iterating from highest to smallest value (as long as the total of the grid cell is not reached). This “initial population” serves only as a seed value for later on – fortunately, because the sample data I used (the données carroyées from INSEE) was found to contain errors.
  • Next, I calculate something like an IDW (inverse distance weighting) for each value for each building, taking into concern every census tract polygon. Distance between centroids, obviously.
  • In the same processing step, I normalise the calculated IDW values by the value of the respective local census tract. This leaves me with “gradients” towards the neighbouring census tract polygons.
  • Then, I calculate a share of each building on the census tract polygons by the “initial seed population”, and – together with the modified IDW value – use it as multipliers on the census tracts values.
  • We’re nearly there: I just repeat the whole-number thingy again (see second step), and I have an integer population in buildings.
  • Finally then, I use an ugly fitting algorithm (developed by trial-and-error) to distribute the individuals over households, which fit into the buildings population counts. Household sizes and counts are in the census data – that gives a rough estimate in the distributed values.

Find the poster as a PDF here

The source code is stored in a git-repository at bitbucket:, you will need Python 2.x, SpatiaLite, Shapely, and GDAL/OGR. All of them are also available via easy_install/pip/etc. Data and/or explanation of the – I’m being honest with you – at some lengths poorly commented source code upon request 🙂


ECTQG13: one by one


At the European Colloquium for Theoretical and Quantitative Geography (ECTQG) 2013 I presented the methodology of the first work package of my dissertation. The exact procedure is explained in a bit more detail in another post on a poster explaining the same work package.

This is the abstract I sent nearly half a year ago:

CREATING PSEUDO-INDIVIDUAL POPULATIONS as an input for empirically examining scaling issues in agent-based models

Temporal, spatial, and social scaling has always been a controversial topic for scholars working with social simulation models. Going along the lines of Squazzoni (2012:xiii), the level of analysis is determining much more than only a model’s level of detail. Bottom-up approaches try to inductively gather knowledge about macro-level patterns from micro-level actions. Top-down approaches, on the other hand, attempt to deductively explain how macro-patterns impact individual interactions. Still, we do not know very much about the influences of scale and scaling on social models (cf. e.g. Easterling & Polsky 2004, Swyngedouw 2004, Carson & Koch 2013).

The current research is the first step in an ongoing project which aims to closer examine said influences of temporal, spatial and social scaling in an urban residential model. A prerequisite to run the closely supervised and rigorously documented series of simulations, which also will take a closer look on the interaction of independent scaling of different model dimensions, is a data basis with the highest resolution which is meaningful for the respective research questions (subsequently data is step-wisely aggregated). In the case of a residential model, individuals/households and dwelling units are the adequate reference units.

The core of the current research is a suite of custom Python scripts which make use of the GDAL/ORG library and the bindings for R. As input data, building polygons, a small-meshed population count grid, and various socio-economic variables on the basis of census tracts are used; currently data from France and Austria are processed. First, the population is distributed among buildings; a simple inverse distance weighting (IDW) algorithm smoothes sharp boundaries, a local dynamic threshold excludes non-residential buildings. Next, so household sizes are available, inhabitants are grouped to households. Finally, socio-economic variables are assigned to individuals, their distribution is weighted according to the neighbouring census tracts.

As a result, a vector dataset with individual inhabitants grouped into households and assigned to geo- referenced buildings is created. To verify the results, a re-aggregation is carried out; privacy and licensing issues are discussed.

KEYWORDS: Agent-based modelling, social simulation, data preparation, artificial population, geo-statistics

Carson, D. B. & Koch, A., 2013 : Divining the local: Mobility, scale and fragmented development. Local Economy (forthcoming issue).
Easterling, W. E. & Polsky, C., 2004 : Crossing the Divide: Linking Global and Local Scales in Human– Environment Systems, in: Sheppard, E. & McMaster, R. B. (Eds.), Scale and Geographic Inquiry: Nature, Society, and Method, Malden, Oxford, Carlton: Blackwell, pp. 67–85.
Squazzoni, F. ,2012 : Agent-Based Computational Sociology, Chichester: Wiley.
Swyngedouw, E., 2004 : Scaled Geographies: Nature, Place, and the Politics of Scale, in: Sheppard, E. & McMaster, R. B. (Eds.), Scale and Geographic Inquiry: Nature, Society, and Method, Malden, Oxford, Carlton: Blackwell, pp. 129–153.


As it happens, it was accepted as is. Even more surprisingly, I managed to finish in time what I had promised half a year before.
You can find my slides here (as html) and here (as pdf).


CSSS2013 – transport model

Time for an update 🙂

From 9th to 18th July I visited the Complex Systems Summer School in Le Havre in Northern France. The overall theme was “Collective Behaviour and Mobility in Complex Systems” – which I think fits the research for my thesis quite well.

Apart from having a lot of fun there with fellow (PhD) students from all over France and a few also from other countries, I also learned a lot. It’s fascinating how French geography still clings to quantitative methods, while German-speaking human geography scholars usually fear everything slightly adept to determinism as the devil fears holy water. And, finally, what this post is all about: We were to conduct a group project, yielding some kind of complex model.

I was in a group with marvellous people: Claire Lagesse was our conceptional mastermind, Juste Raimbault our young and over-ambitious modelling geek, whom Florent Querini – our “senior” researcher – and I had to slow down a bit from time to time. Nicolas Dugue double-featured as our only professional software engineer and our only professional rock musician, and Bernadette Quinn (whom I did not find a profile online – congrats, you managed to stay out 🙂 ) kept (a) the project’s overall aims and (b) our English language in sight.

After having discussed a lot of technicalities, we figured we would have to have a research question, and not only solutions (in the form of algorithms). We decided upon modelling a transport model of a medium sized city. To have <em>some</em> kind of real-word binding, we chose to simplify the road network of an existing city – Toulouse.

Toulouse has a really nice though a bit too “hip” corporate identity (see their website) – so while our project language had switched to French once more during a heated discussion period, I quickly made a caricature of their logo for our presentation layout. Unfortunately, someone insisted on using Latex later – but was not fluent enough in it to do so nicely … Nevertheless, voíla, here they my slide backgrounds are:

rect3062 rect3062__

If you want to use them somewhere – feel free to do so 😉

Well, back to our model. We created a network of streets, with population and commodities distributed over the city. Each agent got a set of daily routine journeys assigned randomly, and could choose from different modes of transport:


Funnily enough (no, not really funny, just as expected), our completely thought-up model did not produce expectable results: adding an additional tram line, e.g., would result in even worse congestion on the streets … makes you wonder 😉

Bottom line: It was fun, but did not really make sense.

You can find the model’s source code at:, the slides of our final presentation here, and our “consulting agency’s”  logo here:


Really nice to do something “just for the sake of it” 🙂 Looking forward to the next time 🙂


guerilla remote sensing @EGEA Euromed 13

The workshop Lisa and me taught at the EGEA Euromed Regional Congress 2013 dealt with citizens’ cartography, and how civil rights groups would use (online) maps to boost their messages. As a practical (“early morning”) part of the workshop, we followed Jeffrey Warrens guide to grassroot mapping, and built a low cost kite from garbage bags. Although there was literally no wind at all, we tried flying it – and it was great fun. The resulting images, well – see for yourselves 😉

well, a lot of feet, short time low in the air, but a spect… – wait for it – …tacular crash in the end 😛