Teams that are presenting at our second annual NYC Open Data Student Showcase on Sunday, March 3 should check here for last minute technical updates.
GeoTable Maximum Display Count Increased to 1024
We updated crosscompute-geotable to increase the maximum display count for geotables rendered as maps.
PySAL>=2 Breaks Previous Code that Uses KDTree and KNN
While updating the platform, we inadvertently updated PySAL to 2.0.0 which introduces changes that break existing code.
# OLD
import numpy as np
from pysal.cg import RADIUS_EARTH_KM
from pysal.cg.kdtree import KDTree
from pysal import knnW_from_array
kd_tree = KDTree(xys, distance_metric='Arc', radius=RADIUS_EARTH_KM)
distances, indices = kd_tree.query(xy, k=count, distance_upper_bound=1)
relative_indices = indices[~np.isnan(indices)]
relative_distances = distances[~np.isnan(indices)]
w = knnW_from_array(xys, k=2)
w.transform = 'R'
# NEW
from pysal.lib.cg import KDTree, RADIUS_EARTH_KM
from pysal.lib.weights import KNN
kd_tree = KDTree(xys, distance_metric='Arc', radius=RADIUS_EARTH_KM)
distances, indices = kd_tree.query(xy, k=count, distance_upper_bound=1)
relative_indices = indices[indices < count]
relative_distances = distances[indices < count]
w = KNN(kd_tree, k=2)
w.set_transform('R')
Click here to see a complete example.
Spatial Regression Tutorials Fixed and Updated
We fixed and updated all code in our Spatial Regression Tutorial Notebooks and Example Tools.
- NYC Department of Health and Mental Hygiene temporarily restricted access to their Communicable Disease Surveillance Data, which broke our walkthrough. We replaced the dataset with HIV/AIDS Diagnoses by Neighborhood, Sex, and Race/Ethnicity and updated the Prepare Dependent Variable By Aggregating Over Column Walkthrough.
- We updated the Prepare Dependent Variable By Aggregating Over Image Walkthrough to extract the spatial reference in proj4 format using
pollution_raster.crs.to_proj4()
.
Almost everything that you need to prepare your predictive tool using NYC Open Data is written in our Spatial Regression Tutorials:
- How to Create Your First Fun Predictive Tool
- How to Load Open Data
- How to Geocode Addresses
- How to Prepare Your Training Dataset with Spatial Statistics
- How to Train and Select Your Predictive Model
- How to Create a Predictive Tool
- How to Prepare and Fit a Spatial Regression Model Using PySAL
- How to Create an Animated Heatmap from a Raster
Convenience Method for Loading Datasets Fixed
We fixed our suggested method for loading open data from Socrata-based open data portals like NYC Open Data. Specifically, there was an issue where index labels were duplicated because we forgot to specify ignore_index=False
when concatenating the buffered tables.
import pandas as pd
def load(
endpoint_url,
selected_columns=None,
buffer_size=1000,
search_term_by_column=None,
**kw,
):
buffer_url = (f'{endpoint_url}?$limit={buffer_size}')
if selected_columns:
select_string = ','.join(selected_columns)
buffer_url += f'&$select={select_string}'
for column, search_term in (search_term_by_column or {}).items():
buffer_url += f'&$where={column}+like+"%25{search_term}%25"'
print(buffer_url)
tables = []
if endpoint_url.endswith('.json'):
f = pd.read_json
else:
f = pd.read_csv
t = f(buffer_url, **kw)
while len(t):
print(len(tables) * buffer_size + len(t))
tables.append(t)
offset = buffer_size * len(tables)
t = f(buffer_url + f'&$offset={offset}', **kw)
return pd.concat(tables, ignore_index=True, sort=False)
Please use the latest method suggested in our Load NYC Open Data Walkthrough.
Tool to Help Prepare Your Training Dataset Using Tree Statistics Now Available
If your team has not yet created a training dataset, you can now use our shortcut tool! We have created a tool that will augment your dataset with basic tree statistics from the 2015 Street Tree Census.
What You Can Do For the Next Few Weeks
Your team is only nine days away from presenting at our second annual NYC Open Data Student Showcase!
- Friday, March 1, 2019 4pm - Dress Rehearsal to Test Your Slides and Results
- Sunday, March 3, 2019 2pm - NYC Open Data Student Showcase 2019
- Rehearse your presentation at least three times. Make sure everyone on your team gets a chance to speak!
- Draft slides on slides.com only AFTER you have rehearsed your presentation. If you are using Google Slides, make sure to make your slides available to the public.
- Link your sides next to your team name on this spreadsheet.
- Add visualizations using matplotlib or seaborn. If you are using seaborn, remember to install the seaborn package using the following commands described in this post.
import subprocess
assert subprocess.call('pip install seaborn'.split()) == 0
- Link to your pre-generated results in this spreadsheet. Don’t forget to take a screenshot or record a screencast of your result and include it in your slides. You do not want to have technical difficulties nor do you not want to wait anxiously for your demo to run in front of a live audience.
You are welcome to ask last-minute questions on this forum. Good luck!