The reason that Google cannot geocode “236-238 25TH STREET” is because the address is not specific enough – there are a lot of places in the world that match “25TH STREET.”
from geopy import GoogleV3
api_key = 'AIzaSyDNqc0tWzXHx_wIp1w75-XTcCk4BSphB5w'
geocode = GoogleV3(api_key).geocode
assert geocode(address) is None
location1 = geocode('236-238 25TH STREET, BROOKLYN, NY')
location2 = geocode('236-238 25TH STREET, BRONX, NY')
assert location1 != location2
You can use the usaddress
package to detect if an address is incomplete and attempt to assume default values for missing information.
address = '236-238 25TH STREET'
import subprocess
assert subprocess.call('pip install usaddress'.split()) == 0
import usaddress
parts = usaddress.parse(address)
value_by_type = {v: k for k, v in parts}
missing_place = 'PlaceName' not in value_by_type
missing_state = 'StateName' not in value_by_type
missing_zip = 'ZipCode' not in value_by_type
if missing_place and missing_state and missing_zip:
address += ', New York, NY'
address
Here is a complete example for converting a table of addresses:
import geopy
g = geopy.GoogleV3('AIzaSyDNqc0tWzXHx_wIp1w75-XTcCk4BSphB5w').geocode
import subprocess
assert subprocess.call('pip install usaddress'.split()) == 0
import numpy as np
from usaddress import parse as parse_address
def fix_address(address, default_region):
address_parts = parse_address(address)
value_by_type = {v: k for k, v in address_parts}
missing_place = 'PlaceName' not in value_by_type
missing_state = 'StateName' not in value_by_type
missing_zip = 'ZipCode' not in value_by_type
if missing_place and missing_state and missing_zip:
address += ', ' + default_region
return address
def get_location(row):
address = row['address']
address = fix_address(address, default_region='New York, NY')
location = g(address)
if location is None:
return np.nan
row['longitude'] = location.longitude
row['latitude'] = location.latitude
return row
import pandas as pd
address_table = pd.DataFrame([
['118 West 22nd Street'],
['415 E 71st St, New York, NY'],
['abcdefg'],
['65-60 Kissena Blvd, Flushing, NY'],
], columns=['address'])
geolocated_table = address_table.apply(get_location, axis=1)
clean_table = geolocated_table.dropna(subset=['longitude', 'latitude'])
Click here to run the complete example.