New column with coordinates using geopy Nominatim


#1

Hi mentors and peer,

From below table as input, we are trying to add a column from Address using geopy Nominatim, in order to calculate distance with user input coordinate to sample only dataset within certain radius, say 5 miles.

From below code => we get all coordinates of 487 rows from original 501 rows. We deleted 14 addresses that can’t be converted, but had trouble appending this data (with different length than 501 rows) to the table.

So we did research and found something useful on: https://stackoverflow.com/questions/31414481/new-column-with-coordinates-using-geopy-pandas
See below codes:

https://stackoverflow.com/questions/49640197/error-429-too-many-requests-while-geocoding-with-geopy-in-python

We tried to run these 2 codes separately but it kept running and seemed won’t stop and give a column.

As we know 14 addresses can’t be converted, how shall we modify our code to include BaseException? Any way we can make this work?

We feel we are so close, any hint would be greatly appreciated!


#2

One option is to define a function that returns np.nan for rows that are incomplete and use DataFrame.apply.

import numpy as np
import pandas as pd
from geopy import GoogleV3

api_key = 'AIzaSyDNqc0tWzXHx_wIp1w75-XTcCk4BSphB5w'
geocode = GoogleV3(api_key).geocode
   
def get_location(row):
    address = row['address']
    location = geocode(address)
    if location is None:
        return np.nan
    row['longitude'] = location.longitude
    row['latitude'] = location.latitude
    return row

address_table = pd.DataFrame([
    ['415 E 71st St, New York, NY'],
    ['abcdefg'],
    ['65-60 Kissena Blvd, Flushing, NY'],
], columns=['address'])
geolocated_table = address_table.apply(get_location, axis=1)
clean_table = geolocated_table.dropna(subset=['longitude', 'latitude'])

Another option is to define a pandas.Series where you specify only the row indices that have valid values.

import pandas as pd
from shapely.geometry import Point

d = {}
for index, row in address_table.iterrows():
    address = row['address']
    location = geocode(address)
    if not location:
        continue
    geometry = Point(location.longitude, location.latitude)
    d[index] = geometry.wkt

address_table['wkt'] = pd.Series(d)

Here is a complete example.