 # How do I get the nearest locations?

#1

Given a table of target locations, how do we identify the target locations nearest to a source location?

``````import pandas as pd

source_longitude = -73.9884995
source_latitude = 40.7703931

target_table = pd.DataFrame([
('Y', -73.8979288830126, 40.8610176767735),
('BS', -73.9891391691363, 40.7332101565336),
('LCL', -73.9863395026542, 40.7491760758241),
('JA', -73.9899024559513, 40.7441782553497),
('PC', -73.9929040802707, 40.7529424605577),
], columns=['Name', 'Longitude', 'Latitude'])
``````

# Sort by Distance

Since our coordinates are in longitude and latitude, we use a geodesic distance metric such as Vincenty Distance.

``````from geopy.distance import vincenty as get_geodesic_distance
# from scipy.spatial.distance import euclidean as get_euclidean_distance

source_lonlat = source_longitude, source_latitude

def get_distance(row):
target_lonlat = row['Longitude'], row['Latitude']
return get_geodesic_distance(target_lonlat, source_lonlat).meters

target_table['Distance'] = target_table.apply(get_distance, axis=1)

# Get the nearest 2 locations
nearest_target_table = target_table.sort_values(['Distance'])[:2]

# Get locations within 1000 meters
filtered_target_table = target_table[target_table['Distance'] < 1000]
``````

If your coordinates are in X and Y, use Euclidean distance instead.

# Use a K-D Tree

When repeatedly querying the same set of locations, a K-D tree is more efficient.

Since our coordinates are in longitude and latitude, we use a K-D tree implementation that supports a geodesic distance metric. Note that `pysal.lib.cg.KDTree` expects (latitude, longitude) coordinate order. The distance calculations will be completely wrong if you try to use the (longitude, latitude) coordinate order.

``````from pysal.lib.cg import KDTree as GeodesicKDTree, RADIUS_EARTH_KM
# from scipy.spatial.kdtree import KDTree as EuclideanKDTree

# Drop rows that are missing coordinates
target_table.dropna(subset=['Latitude', 'Longitude'], inplace=True)

# Initialize k-d tree
target_latlons = target_table[['Latitude', 'Longitude']].values
target_tree = GeodesicKDTree(

source_latlon = source_latitude, source_longitude

# Get the nearest 2 locations
distances, indices = target_tree.query(source_latlon, k=2)
nearest_target_table = target_table.iloc[indices].copy()
nearest_target_table['Distance'] = distances

# Get locations within 1000 meters
indices = target_tree.query_ball_point(source_latlon, 1000)
filtered_target_table = target_table.iloc[indices].copy()
``````

If your coordinates are in X and Y, you can use a K-D tree implementation that uses the Euclidean distance metric, such as `scipy.spatial.kdtree.KDTree` or `sklearn.neighbors.KDTree`.

#2

If you are getting the following error, it is possible that there are null coordinates in your dataset.

`Troublesome data array`

Be sure to remove null coordinates from your dataset before initializing your k-d tree.

``````# Drop rows that are missing coordinates
target_table.dropna(subset=['Latitude', 'Longitude'], inplace=True)

# Initialize k-d tree
target_latlons = target_table[['Latitude', 'Longitude']].values
target_tree = KDTree(