NYC Open Data Student Showcase 20190303 Updates


#1

Teams that are presenting at our second annual NYC Open Data Student Showcase on Sunday, March 3 should check here for last minute technical updates.

GeoTable Maximum Display Count Increased to 1024

We updated crosscompute-geotable to increase the maximum display count for geotables rendered as maps.

PySAL>=2 Breaks Previous Code that Uses KDTree and KNN

While updating the platform, we inadvertently updated PySAL to 2.0.0 which introduces changes that break existing code.

# OLD
import numpy as np
from pysal.cg import RADIUS_EARTH_KM
from pysal.cg.kdtree import KDTree
from pysal import knnW_from_array
kd_tree = KDTree(xys, distance_metric='Arc', radius=RADIUS_EARTH_KM)
distances, indices = kd_tree.query(xy, k=count, distance_upper_bound=1)
relative_indices = indices[~np.isnan(indices)]
relative_distances = distances[~np.isnan(indices)]
w = knnW_from_array(xys, k=2)
w.transform = 'R'

# NEW
from pysal.lib.cg import KDTree, RADIUS_EARTH_KM
from pysal.lib.weights import KNN
kd_tree = KDTree(xys, distance_metric='Arc', radius=RADIUS_EARTH_KM)
distances, indices = kd_tree.query(xy, k=count, distance_upper_bound=1)
relative_indices = indices[indices < count]
relative_distances = distances[indices < count]
w = KNN(kd_tree, k=2)
w.set_transform('R')

Click here to see a complete example.

Spatial Regression Tutorials Fixed and Updated

We fixed and updated all code in our Spatial Regression Tutorial Notebooks and Example Tools.

Almost everything that you need to prepare your predictive tool using NYC Open Data is written in our Spatial Regression Tutorials:

Convenience Method for Loading Datasets Fixed

We fixed our suggested method for loading open data from Socrata-based open data portals like NYC Open Data. Specifically, there was an issue where index labels were duplicated because we forgot to specify ignore_index=False when concatenating the buffered tables.

import pandas as pd

def load(
    endpoint_url,
    selected_columns=None,
    buffer_size=1000,
    search_term_by_column=None,
    **kw,
):
    buffer_url = (f'{endpoint_url}?$limit={buffer_size}')
    if selected_columns:
        select_string = ','.join(selected_columns)
        buffer_url += f'&$select={select_string}'
    for column, search_term in (search_term_by_column or {}).items():
        buffer_url += f'&$where={column}+like+"%25{search_term}%25"'
    print(buffer_url)
    tables = []
    
    if endpoint_url.endswith('.json'):
        f = pd.read_json
    else:
        f = pd.read_csv

    t = f(buffer_url, **kw)
    while len(t):
        print(len(tables) * buffer_size + len(t))
        tables.append(t)
        offset = buffer_size * len(tables)
        t = f(buffer_url + f'&$offset={offset}', **kw)
    return pd.concat(tables, ignore_index=True, sort=False)

Please use the latest method suggested in our Load NYC Open Data Walkthrough.

Tool to Help Prepare Your Training Dataset Using Tree Statistics Now Available

If your team has not yet created a training dataset, you can now use our shortcut tool! We have created a tool that will augment your dataset with basic tree statistics from the 2015 Street Tree Census.

What You Can Do For the Next Few Weeks

Your team is only nine days away from presenting at our second annual NYC Open Data Student Showcase!

  • Friday, March 1, 2019 4pm - Dress Rehearsal to Test Your Slides and Results
  • Sunday, March 3, 2019 2pm - NYC Open Data Student Showcase 2019
  1. Rehearse your presentation at least three times. Make sure everyone on your team gets a chance to speak!
  2. Draft slides on slides.com only AFTER you have rehearsed your presentation. If you are using Google Slides, make sure to make your slides available to the public.
  3. Link your sides next to your team name on this spreadsheet.
  4. Add visualizations using matplotlib or seaborn. If you are using seaborn, remember to install the seaborn package using the following commands described in this post.
import subprocess
assert subprocess.call('pip install seaborn'.split()) == 0
  1. Link to your pre-generated results in this spreadsheet. Don’t forget to take a screenshot or record a screencast of your result and include it in your slides. You do not want to have technical difficulties nor do you not want to wait anxiously for your demo to run in front of a live audience.

You are welcome to ask last-minute questions on this forum. Good luck!


#2

PySAL>=2 Changes Return Values for KDTree

OLD
relative_indices = indices[~np.isnan(indices)]

NEW
relative_indices = indices[indices < count]

Here is a complete example:

xys = [
    (0, 0),
    (0, 1),
    (1, 0),
    (1, 1),
]
xy = 0.1, 0.1
count = len(xys)

from pysal.lib.cg import KDTree, RADIUS_EARTH_KM
kd_tree = KDTree(xys, distance_metric='Arc', radius=RADIUS_EARTH_KM)
distances, indices = kd_tree.query(
    xy, k=count, distance_upper_bound=100)
relative_indices = indices[indices < count]
relative_distances = distances[indices < count]

Thanks to Jendri and Paloma for pinpointing these issues.


#3

Please avoid using single or double quotes in name of your notebook for now. There is an issue where having a single or double quote in the name will prevent your tool from running properly.

BAD
Find NYC's Restrooms

GOOD
Find Restrooms in NYC