Given a table of target locations, how do we identify the target locations nearest to a source location?
import pandas as pd
source_longitude = -73.9884995
source_latitude = 40.7703931
target_table = pd.DataFrame([
('Y', -73.8979288830126, 40.8610176767735),
('BS', -73.9891391691363, 40.7332101565336),
('LCL', -73.9863395026542, 40.7491760758241),
('JA', -73.9899024559513, 40.7441782553497),
('PC', -73.9929040802707, 40.7529424605577),
], columns=['Name', 'Longitude', 'Latitude'])
Sort by Distance
Since our coordinates are in longitude and latitude, we use a geodesic distance metric such as Vincenty Distance.
from geopy.distance import vincenty as get_geodesic_distance
# from scipy.spatial.distance import euclidean as get_euclidean_distance
source_lonlat = source_longitude, source_latitude
def get_distance(row):
target_lonlat = row['Longitude'], row['Latitude']
return get_geodesic_distance(target_lonlat, source_lonlat).meters
target_table['Distance'] = target_table.apply(get_distance, axis=1)
# Get the nearest 2 locations
nearest_target_table = target_table.sort_values(['Distance'])[:2]
# Get locations within 1000 meters
filtered_target_table = target_table[target_table['Distance'] < 1000]
If your coordinates are in X and Y, use Euclidean distance instead.
Use a K-D Tree
When repeatedly querying the same set of locations, a K-D tree is more efficient.
Since our coordinates are in longitude and latitude, we use a K-D tree implementation that supports a geodesic distance metric. Note that pysal.lib.cg.KDTree
expects (latitude, longitude) coordinate order. The distance calculations will be completely wrong if you try to use the (longitude, latitude) coordinate order.
from pysal.lib.cg import KDTree as GeodesicKDTree, RADIUS_EARTH_KM
# from scipy.spatial.kdtree import KDTree as EuclideanKDTree
# Drop rows that are missing coordinates
target_table.dropna(subset=['Latitude', 'Longitude'], inplace=True)
# Initialize k-d tree
target_latlons = target_table[['Latitude', 'Longitude']].values
target_tree = GeodesicKDTree(
target_latlons, distance_metric='Arc', radius=RADIUS_EARTH_KM * 1000)
source_latlon = source_latitude, source_longitude
# Get the nearest 2 locations
distances, indices = target_tree.query(source_latlon, k=2)
nearest_target_table = target_table.iloc[indices].copy()
nearest_target_table['Distance'] = distances
# Get locations within 1000 meters
indices = target_tree.query_ball_point(source_latlon, 1000)
filtered_target_table = target_table.iloc[indices].copy()
If your coordinates are in X and Y, you can use a K-D tree implementation that uses the Euclidean distance metric, such as scipy.spatial.kdtree.KDTree
or sklearn.neighbors.KDTree
.