I recently added around 1,000 new photos to the map on OldNYC. Read on to find out how!
At its core, OldNYC is based on geocoding: the process of going from textual addresses like “9th Street and Avenue A” to numeric latitudes and longitudes. There’s a bit of a mismatch here. The NYPL photos have 1930s addresses and cross-streets, but geocoders are built to work with contemporary addresses. OldNYC makes an assumption that contemporary geocoders will produce accurate results for these old addresses. For NYC, this is usually a good assumption! The street grid hasn’t changed too much in the past 150 years. But it is an assumption, and it doesn’t always pan out.
Two of the most noticeable problem spots are Stuytown and Park Avenue South:
The lettered Avenues (A, B, C, D) used to continue above 14th street. This was the Gas House district. But in the 1940s, this area was destroyed to make way for the super-blocks of Stuyvesant Town. Intersections like “15th and A” do no exist in the contemporary Manhattan grid and geocoders can’t make sense of them. But there are photos there!
The problem for Park Avenue South is different. Until 1959, it was known as 4th Avenue. So photographs from the 1930s are recorded as being at, for example, “4th Avenue and 17th street”, an interesection which no longer exists. Again, contemporary geocoders can’t make sense of this.
The frustrating thing here is that it’s perfectly obvious where all of these interesctions should be. Manhattan has a regular street grid, after all. So I set out to build my own Manhattan street grid geocoder.
To begin with, I gathered lat/lons for every intersection that I could. With some simple logic, this handled the Avenue renaming issue.
My initial idea to geocode unknown interesections was to interpolate on the avenues. For example, to find where the intersection of 18th Street and Avenue A should be, you can assume that the intersections of numbered streets and Avenue A are evenly spaced and then find where the 18th street intersection would fall:
Mathematically, you fit linear regressions from cross-street to latitude and longitude. This feels like it should work but, because the streets aren’t all perfectly spaced, it winds up producing results that don’t quite look right.
While I was playing around with this approach, I realized that I was checking the results using a different technique: continuing the straight lines of the streets until they intersected:
Mathematically, this means that you fit a linear regression to the latitude→longitude mapping for each Street and Avenue. To find an intersection, you find the point where these lines intersect. This works so long as the Streets and Avenues are straight. Fortunately, with a few exceptions like Avenue C and the West Village, they are (r2>0.99).
This approach produced very good results. The oddities which remained were as likely to be problems with the data as with the geocoder (one image was non-sensically labeled as “25th & D”, which extrapolates to somewhere in the East River).
While Stuytown and Park Avenue South were clear winners, new photos appeared all over the map:
It even helped uptown:
All told, there are about 1,000 new images on the map. Go check them out and! And please help transcribe the text on the back of them. My OCR system didn’t run on the new images, so they’re sorely lacking descriptions.
Here are a few favorites:
Please leave comments! It's what makes writing worthwhile.
comments powered by Disqus