Why so many people? Explaining non-habitual transport overcrowding with internet data


Francisco Câmara Pereira
Filipe Rodrigues (fmpr [at] dei.uc.pt)
Evgheni Polisciuc
Moshe Ben-Akiva


Public transport smartcard data can be used for detection of large crowds. By comparing with statistics on habitual behavior (e.g. average by time of day), one can specifically identify non-habitual crowds, which are often very problematic for the transport system. While habitual overcrowding (e.g. peak hour) is well understood both by traffic managers and travelers, non-habitual \emph{overcrowding hotspots} can become even more disruptive and unpleasant because they are generally unexpected. By quickly understanding such cases, a transport manager can react and mitigate transport system disruptions.
We propose a probabilistic data analysis model that breaks each non-habitual overcrowding hotspot into a set of explanatory components. These are automatically retrieved from social networks and special events websites and processed through text-analysis techniques. For each such component, the model estimates a specific share in the total overcrowding counts.
We first validate with synthetic data and then test our model with real data from the public transport system (EZLink) of Singapore, focused on 3 case study areas. We demonstrate that it is able to generate explanations that are intuitively plausible and consistent both locally (correlation coefficient, CC, from 85% to 99% for the 3 areas) and globally (CC from 41.2% to 83.9%).
This model is directly applicable to any other domain sensitive to crowd formation due to large social events (e.g. communications, water, energy, waste).


IEEE Transactions on Intelligent Transportation Systems, 2015