datamining the city
The project examines the opening of a major landmark through the lenses of both off icial and unofficial data. It asks an abstract question about the extent to which off icial and unoff icial sources
agree, and a substantive question about what impact the opening of the Barclay’s Center had on the surrounding area of Brooklyn. We hypothesize that the social media activity will mirror the off icial
data sources and that we will be able to measure an impact of the economic development.
The use of social media data could provide a more personal and localized perspective into urban developements, and provide an additional lens through which to examine their eff ects on the local economies of the neighbourhoods in which they are constructed.
The yelp data set was the most involved in terms of data collection. The data set contains all of the venues listed in the bars category within the one mile radius, as well as the date and star rating for every review for each establishment. While we would have liked to include other venue types, the collection of this data was limited because of very tight restrictions on scraping data from Yelp. Furthermore, the data set contains several hundred reviews per month on around one hundred businesses from 20II on (Figure I2). First the Yelp API was called via Python to gather the names and addresses of all of the establishments listed under the Bars category in our one- mile radius from the Barclay's center. Next, the business names were encoded into the url for the Yelp page for the business. Using Python and BeautifulSoup the reviews were scraped from both the landing page and all the subsequent pages of reviews for the business collecting the date of the review and the number of stars in the review. For some businesses this included several hundred reviews and dated back to 2005.
Yelp Reviews, Septmeber 2011
Yelp Reviews, September 2013
In order to make the necessary apples-to-apples comparison between September 20II and September 20I3 we repeated the data collection procedure for the same I-mile radius circle in Williamsburg that was used for the scaling of Foursquare data. What we found was that the number of reviews written per month increased 3I.I% in Williamsburg between September 20II and September 20I3. Using this control area as a proxy for overall increased adoption we divide the number of reviews in September 20II by the number of reviews in September 20I3 (I5I I I98 = 0.763) and find that we should scale each review in the Barclay's area by 0.763 for an apples-to-apples comparison (Figure I3). Finally, in order to map the results the venues were geocoded by their addresses, again using a LION file in ArcGIS.
In order to combine the social media activity into the same map we created a combined social media metric. Because we did not have a strong reason to weight one data set more heavily than the other we made a simple weighting of 16 times the number of Yelp reviews because there were roughly 16 times more foursquare check-ins than Yelp reviews in our study area in September 2011.
Social Media Activity, 2011
Social Media Activity, 2013
When looking at the real estate data from September 20II and scaled data for September 20I3 we see that there were a few areas in 20II with very high-dollar value sales, whereas in 20I3 there appear to be more low-dollar value sales. This could be explained by investors buying up larger developments in advance of Barclay's opening, whereas individuals buying condos preferred to buy after the area had already become more attractive due to the opening of the arena. However, there is not a noticeable jump in sales in the Barclay's area when compared with other areas. The Yelp reviews show an increase in the number of reviews in the area when compared with our control area. However, there is no comparable increase in foursquare check-ins between 20II and 20I3 when we remove check-ins at the Barclay's Center itself. In fact, when looking at a plot of all foursquare check-ins over time, if we remove the check-ins that are at the Barclay's arena itself, Foursquare activity in the area appears to be steady or even declining. Therefore, the outcome is ambiguous, we are not able to definitively show that economic activity in the area has increased in either the social media data, or real estate data.
Real Estate Sales, 2011-2013
Subway Station Use, 2011-2013
Real Estate Sales (All Brooklyn), 2011-2013