How do I get the nearest locations?

Given a table of target locations, how do we identify the target locations nearest to a source location?

import pandas as pd

source_longitude = -73.9884995
source_latitude = 40.7703931

target_table = pd.DataFrame([
    ('Y', -73.8979288830126, 40.8610176767735),
    ('BS', -73.9891391691363, 40.7332101565336),
    ('LCL', -73.9863395026542, 40.7491760758241),
    ('JA', -73.9899024559513, 40.7441782553497),
    ('PC', -73.9929040802707, 40.7529424605577),
], columns=['Name', 'Longitude', 'Latitude'])

Sort by Distance

Since our coordinates are in longitude and latitude, we use a geodesic distance metric such as Vincenty Distance.

from geopy.distance import vincenty as get_geodesic_distance
# from scipy.spatial.distance import euclidean as get_euclidean_distance

source_lonlat = source_longitude, source_latitude

def get_distance(row):
    target_lonlat = row['Longitude'], row['Latitude']
    return get_geodesic_distance(target_lonlat, source_lonlat).meters

target_table['Distance'] = target_table.apply(get_distance, axis=1)

# Get the nearest 2 locations
nearest_target_table = target_table.sort_values(['Distance'])[:2]

# Get locations within 1000 meters
filtered_target_table = target_table[target_table['Distance'] < 1000]

If your coordinates are in X and Y, use Euclidean distance instead.

Use a K-D Tree

When repeatedly querying the same set of locations, a K-D tree is more efficient.

Since our coordinates are in longitude and latitude, we use a K-D tree implementation that supports a geodesic distance metric. Note that pysal.lib.cg.KDTree expects (latitude, longitude) coordinate order. The distance calculations will be completely wrong if you try to use the (longitude, latitude) coordinate order.

from pysal.lib.cg import KDTree as GeodesicKDTree, RADIUS_EARTH_KM
# from scipy.spatial.kdtree import KDTree as EuclideanKDTree

# Drop rows that are missing coordinates
target_table.dropna(subset=['Latitude', 'Longitude'], inplace=True)

# Initialize k-d tree
target_latlons = target_table[['Latitude', 'Longitude']].values
target_tree = GeodesicKDTree(
    target_latlons, distance_metric='Arc', radius=RADIUS_EARTH_KM * 1000)

source_latlon = source_latitude, source_longitude

# Get the nearest 2 locations
distances, indices = target_tree.query(source_latlon, k=2)
nearest_target_table = target_table.iloc[indices].copy()
nearest_target_table['Distance'] = distances

# Get locations within 1000 meters
indices = target_tree.query_ball_point(source_latlon, 1000)
filtered_target_table = target_table.iloc[indices].copy()

If your coordinates are in X and Y, you can use a K-D tree implementation that uses the Euclidean distance metric, such as scipy.spatial.kdtree.KDTree or sklearn.neighbors.KDTree.

If you are getting the following error, it is possible that there are null coordinates in your dataset.

Troublesome data array

Be sure to remove null coordinates from your dataset before initializing your k-d tree.

# Drop rows that are missing coordinates
target_table.dropna(subset=['Latitude', 'Longitude'], inplace=True)

# Initialize k-d tree
target_latlons = target_table[['Latitude', 'Longitude']].values
target_tree = KDTree(
    target_latlons, distance_metric='Arc', radius=RADIUS_EARTH_KM * 1000)

Links

lat and lon are reversed, which results in inacccurate measurement of distance

pysal.lib.cg.KDTree expects latitude, longitude coordinate order.

Assumes data in latitude and longitude

https://pysal.readthedocs.io/en/latest/generated/pysal.lib.cg.KDTree.html