KALL10

During the Interactions in Urban Space Seminar by EGEA Helsinki I attended a workshop on the Geography of the Night, taught by Dmitrii Komendenko from the University of Saint Petersburg.

As part of the workshop we carried out a small field survey, carrying together the opening hours of all shops, bars, restaurant, and other amenities around the Vaasankatu in Kallio, the “SoHo of Helsinki”. Kallio is a neighbourhood of just more than one square kilometre, but is densely populated with 27 000 people living there. It is both a traditional working class district and a neighbourhood of the Bohème-Bourgeois, of creatives and students. Some scholars consider it the prime example of Gentrification in Finnland. All the same, Kallio is definitely in a very early stage of Gentrification, and might – as I heard somewhere but cannot reproduce my source – never become truely gentrified due to the small sizes of its tenements.

To come back to the field survey: wonderful Eliška of EGEA Brno and I were to record the opening hours of shops in a part of Kallio. Classical field work is done with your feet. We walked from house to house and noted the opening hours, names, adresses, and types of businesses into our field diary.
Ultimately, our efforts resulted in a 150 row table and a beautiful, hand-drawn map for the seminar’s final round-up. See for yourselves!

P1000358_1024

P1000362_1024

P1000361_1024

getEPSG script

As probably a lot of people, I recently converted to “thinner clients”. Most of my geoprocessing is now done on a server or on my office workstation which I access via ssh. Naturally, after having been in the geo-whatever business for ten years, I have of lot data laying around (some which I probably am not allowed to re-use any more but I still keep them around anyway for no reason in particular), and this data is laying around in numerous locations: first, there’s literally a dozen directories of unsorted, slightly sorted in one or another kind of ordering system or prepared to put into an archive and hand over to friends or colleagues, on my laptop’s disk. Then, there’s a number of project directories residing on the same disk, each containing more or less geodata. The same for my various servers and the two computers I have access to at work. And of course, there’s then – apart from the real backup disks – a bunch of external drives which might contain project directories or small sub-collections of geodata for one topic or another.

Enough said about the starting point of a long and slowly going effort I am currently going through: I’m consolidating my geodata into on place. I hope to eliminate a lot of duplicates, and in the end of the day be able to access them from my laptop, my tablet, or any of my colleagues’ or friends’ place easily and quickly. For the vector data, which I started with because I feel the raster data is an entirely different, much more difficult topic with more and different constraints to be met, I decided for a PostGIS database. It allows me to access the data r/w via an SSL encrypted connection, QGIS has more than decent support for it, and I can configure accounts to access only parts of the data for clients or friends or colleagues which I can directly embed into the QGIS project files without them having to configure anything.

PostGIS comes with a handy shp2pgsql script to import data into the database, which unfortunately lacks one thing: it cannot determine the imported shapefile’s spatial reference. You have to supply it on the command line unless you want it to be set to 0.

Easy cheesy to accomplish that with a little Python script employing GDAL/OGR 🙂

The following script accepts one parameter which should be a filename to a vector dataset (such as a shapefile). It returns an EPSG code (or nothing in case it cannot resolve the embedded SRS to EPSG). That’s perfectly simple, and at the same time incredibly convienient if you have a bash script which sorts or imports or renames or [insert your use case here] a bunch of vector geo data files.

You find the source code – as always under your favourite open source license (I would prefer if you used GPLv2 or MIT) – in my bitbucket at https://bitbucket.org/christophfink/getepsg/. Have fun, use it, and leave me a comment if you do 🙂

The Spatiality of #BangkokShutdown

I have been following media reports on the ongoing protests in Thailand, and especially in its capital city Bangkok quite closely. Bangkok is a city I loved the few times I was there – and by that I do not mean the tourist sites or the cliché touristic “services”. Some friends of mine live there, and others have South East Asia as the focus areas of their research. It is safe to say I have my bonds to the “City of Angels”.

So, I followed the news, and also a lot of blogposts, and at some point stumbled over Richard Barrow’s Thailand page which prominently featured (and features) a map of the protests. You know that I am incredibly fond of maps, and have worked on maps and urban protests before, so I was fascinated immediately.

Then, somehow out of a sudden, it became obvious to me how the places of the protests had moved, and at the same time the space(s) of the protests and of the protesters had changed their shapes: first, in late November and early December, the protesters had chosen places which are symbols of the Thai state and government for their demonstrations. Now, with their January 13 “Bangkok Shutdown” campaign, they moved to a relatively small number of large, central crossroads. All of these crossroads are not only important for the city’s motorists (BKK is without exaggeration one of the if not _the_ most congested cities I have visited so far), but also all are important hotspots for tourists visiting the Thai capital. Victory Monument is where backpackers would get off the bus to reach Khao San Road. The crossroad in between National Stadium, MBK and Siam Malls is the number one shopping area for tourists of all kinds, closely followed by the market inside the Chatutak Park near the Phatumwan Intersection. The whole length of Sukhumvit Road anyway is a giant tourist walking mile.

These observations are only blunt speculation, and based on nothing else but on compiling in my head what I read somewhere, somewhen with what I believe to know about places in Bangkok.

Well, then, why not gather more data? Fine: I cannot just leave Salzburg for two weeks and conduct interviews with people whom I probably do not even share common languages with. Fortunately, people use Social Media a lot, and in my working group there is some experience with gathering and analysing short messages people send via Twitter. Fortunately, Twitter is very popular with political protesters, and Hash-Tags for campaigns or movements emerge quickly and remain comparably stable. Fortunately (no. 3), Twitter messages can be georeferenced by their authors, and very often are so by default if sent from mobile devices.

This blog post will not cover any of the political issues, and will not try to understand or explain the arguments of either side of this conflict. There are people out there who really know what they are talking about, and I do not feel like I should put myself into this endless row of not-so-sophisticated commenters. If you read the reports in the various blog posts carefully, I think you can build something of an own opinion on the topic. I try to stick to data here, and analyse it with some respectful distance, without digging into political reasoning.

I quickly wrote a script (OK, quickly is relative, it still took me the better part of today) to gather tweets into a PostgreSQL/PostGIS database according to a filter. The script collects all tweets from the past which are still available (6-9 days back according to Twitter help pages), and proceeds to gather further messages as they are posted. You can find the source code – as always under a GPLv2 license – in my bitbucket at https://bitbucket.org/christophfink/tweet-monitor.

I think I was fast enough to catch all of the current #BangkokShutdown messages (and some ten-ish more keywords). My script makes sure I do not miss any of the future tweets.

While I want to wait for more complete data to come in before doing any in-depth analyses, I can already wet your appetite for more:
bangkokshutdown_georeferencedtweets
bangkokshutdown_georeferencedtweets_2
bangkokshutdown_georeferencedtweets_3
Stay tuned for more!

Massive Reverse Geocoding

So my colleagues Izabela and Bartosz collected some 540 million georeferenced tweets for one of their research projects. For the research question it did not suffice to have individual pairs of coordinates. Rather, a placename was necessary – at least a city name, better the name of a neighbourhood or even street.

Bartosz quickly discovered Nominatim, a (reverse) geo-coding engine built upon OpenStreetMap data. He promptly ran into its usage limitations.

Fortunately, Nominatim is open source software, and the installation procedure is well-documented. Fortunately, we have a high-end workstation sitting in the corner of our office. Fortunately, they had me setting the whole thing up.

First, following the stringent instructions from the Nominatim wiki page, I read in the 30GiB OSM Planet file. This took some 5 days despite the database running on a SuperTalent PCIe SSD (which promises an astonishing 1000MiB/s throughput, both reading and writing). Having two Xeons and 128 GiB of RAM definitely didn’t hurt either.

Initially, the idea was to discard the HTTP API from the Nominatim source, and directly call its core. As it turned out, Nominatim is doing substantial calculation in the PHP frontend code. Time was too short to port this to a locally callable version, so I also set up a webserver, and hooked it up with the API code.

In the meantime, Bartosz had prepared a huge CSV file with three columns, representing the latitude, longitude, and an ID for each of the tweets in question. It was (and is) 542400310 rows long, which accounts for an 18GiB file size.

I devised a Python script employing the multiprocessing module. One sub-process reads in the CSV file line by line, and puts them into a multiprocessing.Queue which in turn serves input to cpu_count + 1 worker processes. These explode the individual lines into columns, query the local (though configurable) Nominatim API interface, and reformat the results into a write-ready CSV output line. Via another Queue these lines are fed into one another sub-process which ultimately writes them to a new file. Finally, I added progress reporting, and a stop-resume routine, as initial estimates tell us the processing will take yet another fortnight.

You can find the source code (which I release in a GPLv2 license) in my bitbucket at https://bitbucket.org/christophfink/nominatim_reversely_geocode_tweets.

Bildschirmfoto 2013-12-06 um 11.51.45

Okkupy Google Maps

In late April/early May 2013, Lisa, Stefan, and I taught a workshop at the EGEA Euromed Regional Congress 2013 in El Bosque near Sevilla. We had proposed to work on how social movements employ maps, with three examples: first, the Barrio Maravillas/triBall in Madrid; second, the Gezı Park in Istanbul (that was hindsight, its situation escalated way after our workshop application!); and the planned high-voltage line through nature reserves in the Austrian region of Salzburg.

Our idea was to start off with a mapping exercise to stimulate the consciousness of how subjectively and discoursively maps are produced; then have a bit of introduction to qualitative research methods, with a focus on discourse analysis and urban theories (like e.g. Henri Lefebvre’s ideas); and finally dive into analysing the actual discourses of our examples, taken merely from blogposts and media articles. We planned to conduct some interviews beforehand in Madrid (which didn’t really succeed, because I was too shy of my Spanish proficiencies and didn’t want to approach possible “experts” in the various “centros sociales” across town until the very last day. I didn’t meet anybody who was involved in the 2008 protests.

“Well, not exactly optimal, but hey: we have plenty of online ressources.”, I thought. Hm … we didn’t have internet access in the workshop rooms :/

We completed the mapping exercise, gave the theoretical introductions, and had great joy with the fun part of the workshop: building a DIY surveying kite (and attempting to fly it).

When it came to really examining texts, we were extremely lucky: Someone who tied in with the congress somehow (don’t want to give their name away straight ahead) was involved with the protests againt the Torre Cajasol in Sevilla, and agreed to be interviewed by us. Great fun, and great chance for insights.

Last week, the workshop report was due, and we managed to write what I think is a decent piece of scientific literature. It will be published soon in the congress’ proceedings – but I guess it’s ok, if I give you a sneak preview of the manuscript we sent here. There’s still a review and language editing process going on, so this is definitely not the final version – so keep tuned for any updates 🙂

2011-06-01_0404129