- Wed 26 February 2020
- data
In this text, I attempt to analyse the parking situation in Marburg - the city I am currently living in. Moving through cities these days, traffic is a big aspect as one is surrounded by cars at all times. When traffic is not flowing, this traffic typically parks. Hence, my motivation to study the parking situation in Marburg: what are the cars in Marburg doing when not driving?
This article is mostly about quantitative analyses based on (sort of) publicly available data. The city of Marburg maintains a system that summarises the currently free parking spots on a website. This system can be accessed under the following website: https://pls.marburg.de/.
If you are curious about the analyses and corresponding figures, then please read on: first I give an overview about the data acquisition process. This is for you to understand what data I am actually using and how I got it. Second, I continue with the actual analyses that covers analytics as well as predictions.
Data acquisition
I acquired the parking data in Marburg with a set of programming tools. The official “Parkleitsystem Marburg”, roughly translated as “parking direction system Marburg”, is used as data source. It is available online at https://pls.marburg.de/ and shown below in Fig. 1 for reference.
The city website is updated every five minutes and lists parking decks spread across Marburg. These parking decks comprise popular shopping locations and places, such as:
- Ahrens
- City parking deck
- Erlenring-Center
- Furthstraße
- Furthstraße - Parkdeck
- Hauptbahnhof
- Lahncenter
- Marktdreieck
- Marktdreieck - Parkdeck
- Oberstadt
For each of the parking decks, a number of data is given:
- Column “PARKHAUS”: The name of the parking deck.
- Column “FREI”: The number of free parking spots.
- Column “ROUTE”: A Google Maps link to the parking deck location.
- Column “max. Einfahrtshöhe”: The permitted vehicle height.
This website and its content is used as data source for this study. The software used for data acquisition is written in Python and uses Scrapy to download the data from the website. Scrapy also stores the data in a machine readable format to facilitate the post processing and analysis steps. The whole software stack runs as a Docker container on a small Linux server. The data is saved every three minutes in order to subsample the website update interval of five minutes. Note that this time scale is chosen as to have sufficiently dense data points but not to overload the website servers - in particular the latter aspect should be given high priority when working on projects as the one in this study. Since only the table given on the city website is processed and stored, the storage requirements for the Linux server are minimal. Figure 2 summarises the data acquisition pipeline.
The analysis pipeline is also written in Python. It uses the Scientific computing Python stack. In particular, the Pandas Python package has been used extensively to arrange and evaluate the data. The “data for analysis” shown in Fig. 2 is exactly one such Pandas DataFrame. The data structure for the analysis is a table with timestamps along the rows and the parking decks along the columns, as shown in Fig. 3. Over the course of the data acquisition period, I accumulated around 100,000 snapshots of the website.
Analysis of parking situation in Marburg
The data analysis is divided into four parts. We start with an introduction to verify that our data is reasonable. The second and third parts analyse the parking demand as summed over all parking decks in Marburg and for each parking deck separately, respectively. While the second part only takes temporal information into account, the third part also takes spatio-temporal information into account. Finally, the last part uses a machine learning prediction method for spatial interpolation that comes with the benefit of returning uncertainty measures about its predictions.
Introduction
This introduction serves the purpose of getting a very first hold of the acquired data. Here, we will look into the parking deck locations and how the metric that quantifies “parking demand” is computed. It turns out that the data preprocessing steps that I present in this section are necessary in order to correct corrupted data from the website.
Each of the snapshots consists of a timestamp, the number of free spots, the corresponding colour, the maximal vehicle height and a Google Maps link. The latter two are confirmed to remain unchanged throughout the whole dataset for each parking deck and the link is used to derive the parking deck locations. The locations that are linked in the table are parsed from the correponding URL and shown in Fig. 4 by the red markers. It turns out that the coordinates obtained from the links on the website are wrong. It seems that Google Maps corrects for that automatically and redirects to the correct coordinates. I searched for the correct coordinates and indicated them by green markers in Fig. 4.
These correct coordinates are used for a quick experiment to warm up a little bit more. Using the OpenStreetMap, I compile a list of all restaurants, pubs, Biergarten and Kneipen in Marburg. The list comprises of classical pubs in Marburg like the “Sudhaus” and “Quod” but also incoporates the “Mensa”, taking a whopping 153 items into account. For each of the parking decks, the distance to all of the restaurants/pubs is computed and summed up. The accumulated distances are shown in Fig. 5 and guide you to the parking deck that you should choose whenever you are hungry: pick “oberstadt”, “lahncenter”, “city” or “ahrens” if you are particularly hungry as they have the smallest accumulated distance to restaurants and pubs.
With the parking deck positions in place, let’s focus on the metric. The measure of “parking demand” is quantified here as the number of used parking spots. However, the website only states the number of free parking spots. The capacity of each parking deck is calculated as maximum of the free parking spots. To obtain the number of used parking spots, the number of free parking spots are subtracted from the capacity. The capacities determined by this procedure are shown in the corresponding column of the following table:
Parking deck | Determined capacity | Documented capacity |
---|---|---|
ahrens | 213 | 225 |
city | 195 | 160 |
erlenring-center | 402 | 409 |
furthstraße | 118 | 204 |
furthstraße - parkdeck | 97 | n.a. |
hauptbahnhof | 264 | 288 |
lahncenter | 170 | 168 |
marktdreieck | 190 | 280 |
marktdreieck - parkdeck | 97 | n.a. |
oberstadt | 209 | 235 |
These empirically determined values can be compared to the ones documented officially, as shown in the table column “documented capacity”. Apparently, the two sets of numbers are close to each other. Since the empirically computed parking capacities are more granular (e.g. they distinguish between “Marktdreieck” and “Marktdreieck - Parkdeck”) and seem to be more up-to-date (empirically, the “City” parking deck was found to have had 195 free parking spots at maximum despite only having 160 parking spots as documented officially), I will be using the empirically determined capacities for the analyses that I show in the following.
With the coordinates and used parking spots determined, we turn towards the main analysis. This main analysis begins with the accumulative signal.
Accumulative parking analysis
The accumulative data is obtained as shown by the red arrow in Fig. 3: the columns for all parking decks are summed up to yield the accumulated number of used parking spots.
Given the timestamps of the data, the number of used parking spots can be plotted against time. Since the data acqusition time of 3min is quite fast on the scale of the data collection process - namely month -, the data is resampled to hours, days and weeks, as shown in Fig. 6.
The green line shows the weekly average. It drops during the time between Christmas and New Year’s Eve. Furthermore, the regularity of the data becomes obvious. There seem to be two types of periodicity: first, the periodicity per day (as hardly visible by the data resampled to one hour) and second, periodicity per week (as clearly visible by the data resampled to one day). Despite these obvious facts, the figure might be nice to look at but doesn’t quantify the periodicities further.
The autocorrelation of Fig. 6 is computed to quantify the periodicity. The autocorrelation function measures how similar a signal is to itself by shifting it by some time delay that make up the x-axis of autocorrelation plots. For time delays at which the autocorrelation function has maxima, the signal is similar to itself. Hence, periodicity can be quantified by exactly these maxima of the autocorrelation function. Figure 7 shows the autocorrelation function of the parking demand in Marburg.
As suggested by the accumulative signal in Fig. 6, the autocorrelation shown in Fig. 7 peaks at seven days. This confirms that the signal has a period of seven days. The daily periodicity is not visible in this plot as it is based on the daily resample of the data; the initial small peak at the shift value of one in Fig. 7 is due to the fact that this value corresponds exactly to the sampling frequency of the resampled signal and, by that, does not carry any relevant information regarding the periodicity in the signal. All other peaks correspond to multiples of the identified 7-day periodicity.
Finally, to understand the full scope of the periodicity of the signal, I evaluated the day-hour-histogram. This type of histogram is used to visualise the number of used parking spots in Marburg against time and day of the week. This makes up a two dimensional matrix, as shown in Fig. 8.
It becomes obvious that the peak hours of parking demand during the working week are between 11am and 4pm. Also, notably, the weekends are less busy as people are probably not going into the city by car. On Saturdays, the peak hours in parking demand are shorter, only from around 12pm to 3pm. On Sundays, there is significantly less parking demand in Marburg as compared to all other six days. Interestingly, the relaxed Sunday mood smears into Monday mornings, as these are much less busy than regular mornings during the working week. Finally, it can be observed that even at late nights, like 10pm to 4am, there are still cars using the parking decks.
Now that we know the periodicity of the parking demand in Marburg, let’s focus on all of the parking decks separately.
Separate parking deck analysis
Now we turn to the more detailed analyses that take the different parking decks into account. First, Fig. 9 shows the parking demand against time for each parking deck separately.
Despite being overloaded with information, there are a few aspects that this figure points towards. First, the signals seem to be periodic as well and second, the parking decks seem to vary significantly in how many parking spots are used. The integral parking demand quantifies exactly that: it measures how many parking spots have been provided by a parking deck throughout the whole measurement period. Figure 10 shows the results thereof.
The integral parking demand shows that the “Erlenring” parking deck is by far the most used parking deck. The second most important parking deck, “Lahncenter”, provides around half of the parking spots that “Erlenring” provides. The proportion of parking spots provided on weekends compared to the overall number of provided parking decks is roughly equal for almost all parking decks. Here, only “Marktdreieck - Parkdeck” differs as this parking deck provides significantly fewer parking spots on weekends.
The integral parking demand can be given an additional temporal information by rendering a video that takes time into account. The following video shows the parking deck usage against passing hours. The video shows the ratio of parking spots that are used with the hour and date running in the title of the figure. As before, the number of used parking decks is normalised to the maximal usage across all parking decks and, additionally, all times.
Clearly, this video visualises the periodicity in the signal. The parking deck bars disappear when there is no data - certain parking decks close down during the night and some days of the weekend. Most of the parking decks reach 100% occupancy and then decrease to smaller values below 5% during the night. The notable exceptions to this observation are the parking decks “Lahncenter” and “Erlenring” which only decrease to around 40% to 50% occupancy at night. That might explain the previous observation in Fig. 8 of cars that remain parked during night. In the beginning of the video there is some data missing as my data acquisition pipeline broke down during that time.
The dependence on time is studied further. Figure 11 shows the histograms of used parking spots for each parking decks separately. Not only that, it also shows histograms thereof that depend on the time of the day.
The temporal histograms shown in Fig. 11 visualises a number of results. For some of the parking decks it is clearly visible how the peak of parking demand shifts with time: parking decks like “ahrens”, “oberstadt”, “marktdreieck”, “marktdreieck - parkdeck”, “furthstraße” or “hauptbahnhof” all show how there is little parking demand in the mornings, a maximal parking demand around noon and a decaying parking demand in the evening. Contrary, parking decks like “city”, “ehrlenring-center” or “lahncenter” do not seem to depend on time as much.
After these analyses about the acquired data, we finally turn towards predictions.
Predictive analysis
As last part of this article, I attempt a predictive analysis of the parking data in Marburg. The parking decks indicate parking demand at their locations. With the following spatial prediction, I would like to forecast parking demand where there are no parking decks available.
Since I am also interested in the uncertainties in the predictions, I use Gaussian Process Regression (GPR) for the spatial predictions. GPR uses Gaussian Processes (GPs), which themselves generalise multivariate Gaussian distributions to function space. Being a Bayesian method, they return not only a prediction but also the uncertainty of the prediction.
Despite that GPR is a non-parametric statistical method, it needs hyperparameters. While I do not want to give an introduction to GPR in this article, the analysis details are noted here for the sake of a transparent analysis. Please consult the links provided before and make sure to focus on the amazing book by Rasmussen et al to understand the following technial details:
- Contrary to the typically chosen zero-mean GP prior, I use the mean of the data as GP prior mean.
- A radial basis function kernel is used as correlation function of the GP prior.
- The kernel hyperparameters, magnitude and length scale, are set heuristically based on visual fit quality and the characteristic length scale of Marburg.
- Very nearby parking decks (“marktdreieck” and “marktdreieck - parkdeck” as well as “furthstraße” and “furthstraße - parkdeck”) are merged in order to avoid discontinuities.
- I use a noise-free GP prior as the integer measurements from the website do not come with noise.
With these technical details in place, it is time to look at the GPR results. The following video shows the parking deck usage averaged over the whole measurement period. It is a 3D plot with the plane as geographical coordinates of Marburg, the height of the upper subplot as predicted parking deck usage and the height of the lower subplot as prediction uncertainty of the parking deck usage.
The usage values of the parking decks are indicated with black dots. The maximum of the prediction, indicated as height as well as in yellow colour, is located close to the erlenring parking deck. As expected on account of the GPR algorithm, the prediction uncertainties decrease close to the measurement positions. In order to fully see the spatial predictions, the 3D plots are rotated. At the end of the video, the viewport is rotated to the top view.
The second video uses exactly that top down view to visualise the spatial predictions. However, it also introduces a temporal component by showing the spatial predictions as days pass. The data shown is the daily average. On the left of the following video, the colour encoded top-down view as introduced in the previous video is shown. The black dots correspond to the locations of the parking decks that served as training data. The red dots, however, correspond to points-of-interest (POIs) between the parking decks. Hence, the GPR is used to compute the predicted parking demand at these POIs. The right subplot shows the corresponding values of these POIs together with their names: I evaluated the parking demand as fitted by the GPR at the “mensa”, “physik”, “UB” (university library), “bahnhof” and “kino” in Marburg. These predictions do come with uncertainies, but they are very small. If you look very closely, you might spot the error bars in right subplot.
This video shows that the location “mensa” is the place with the highest parking demand. “Physik”, “UB” and “bahnhof” have roughly the same parking demand and “kino” fluctuates the most across the days that are shown.
Conclusion
In this article I showed how publicly available parking data in Marburg is saved, analysed and finally used for spatial predictions. The statistical analysis comprises a few aspects that are interesting for both, people looking for free parking spots as well as the city of Marburg. This analysis is interesting for the city of Marburg because the usage statistics of the parking decks could be utilised to ultimately improve the parking situation in Marburg, e.g. by incorporating load balancing or by re-thinking the necessity of more or less parking decks. The spatial prediction yields statements about where particularly many or particularly few parking spots in Marburg are required.
You might ask why I compiled this study. First, I like to do such things. Second, I believe that open data can contribute to a better quality of the lives of all of us as existing systems can be analysed and optimised. This is daily business in the corporate sector but has not reached the public sector, yet. This article is to foster the spirit of open data in Marburg in the hope that open data improves the quality of lives in Marburg in the future.
The more open data there is available, the more open data fans such as me can help to compile similar analyses for the city of Marburg. Dear Marburg, start to provide us with exciting open data! :-)
If you have questions about this article or would like to know more about the analyses, please feel free to contact me via email.