Vancouver City Talks!

General description: A web application which computes/visualizes the quality of life in your neighborhood.

Implementation period: Feburary 2013

Technologies: MongoDB, PhP, JS, & HTML

Event: International Open Data Hackathon.

Contributors: Kazem Jahanbakhsh, Priyanka Gupta, Ajay Sridharan, Amir Moghaddam, & AmirHossein Hajizadeh

Technical details:

This application has been developed for Vancouver Open Data Hackathon. We have used different types of data from Vancouver city (e.g. crime rate, business licences, property taxes, parks, libraries, and schools) to compute an overall score for different parts of Vancouver city. The core of the application is our algorithm for predicting neighborhood quality by integrating different types of data.

Model Layer:

For our model layer we took the following steps:

1- Cleaning/Normalizing/Parsing Data: parsing these data and normalize them in very uniform format which only contains important information along with lat/lon

a- Crime Data:
Value: tells you how safe your area is. It has the time dimension which should be taken into account.
Challenge: crime location comes in truncated street format like "18XX SPYGLASS PL", so we need to use google or yahoo api to get the lat/lon. time granularity is in month/year.

b- Property Tax Report:
Value: richness of different regions in city
Challenge: translation from address to lat/lon

c- Business Licences:
Value: we can use this data in order to find nearby reataurants. this shows how much you can enjoy in your neighborhood and its a notion of neighborhood richness.

d- Parks:
value: how green is your neighborhood.

e- Libraries:
Value: another factor for the richness of different regions.

f- Schools:
Value: the density of schools around you for families.

2- Importing Data: We imported the cleaned version of each dataset into MongoDB. All of our queries are geospatical queries.

3- Computing Score: We segementized Vancouver city into regions (circles) parametrized with their centres and radius. We computed a score for different regions which aggregates all datasets and come up with a single score reflecting the quality of life in different neighbourhood. We used a linear function to map the six features to the output dependent variable as shown below:

score = c1*crime + c2*business + c3*tax + c4*libraries + c5*schools + c6*parks

The main challenge is computing scores as well as making sure that the assumption about the linear relationship is valid.

Visualization Layer:

For visualization we used Google Heatmap Layer.

The following image also shows our protyping/planning steps for building the final app:

neighborhood prototype

Statistical Graphics:
Another interesting approach would be to use ideas from statistical graphics in order to visualize data s.t. user can easily find out the quality of different neighborhoods by looking at our data map. Edward Tufte in "The Visual Display of Quantitative Information" book have listed a set of properties that a statistical graphics should satisfy:

Considering above requirements for data visualization, now the main question is how we can design a visualization layer which consumes different kinds of data and present them naturally s.t. a viewer can find out the quality of life in different neighborhoods.

You can check the app from here: VanCityTalks App!

You can download the source code for this project from github: VanCityTalks Github Repo.

Me and the team in Open Data Hack!

Firefox Hack Day!

Tweet

You should follow Follow @kjahanbakhsh me on Twitter.