Batch Configuration Design

invisibleroads · December 3, 2021, 3:00pm

To simplify the process of defining batches, we are considering the following batch configuration design:

batches:
  - name: {document_text}
    folder: batches/{document_text | normalize_key}
    variables:
      - id: document_text
        path: datasets/proverbs.txt

The author can specify batch variables to load from specific paths such as a text file or spreadsheet in order to loop through different values of an input variable.

If loading your batch variable from a text file, we will skip lines starting with the hash # character so that you easily comment out specific values.

Defining a batch variable was already supported in crosscompute 0.8 but we are moving the batch variable configuration to a separate section called batches and expanding functionality to support filters such as normalize_key and spreadsheet paths for batch variable combinations (see example below).

batches:
  - name: {author_name} - {document_text}
    folder: batches/{author_name}/{document_text}
    variables:
      - id: author_name
        path: datasets/quotes.csv
      - id: document_text
        path: datasets/quotes.csv

Here is the full configuration for the paint-letters widget example from crosscompute-examples:

---
# version of crosscompute
crosscompute: 0.9.0

# name of your automation
name: Paint Letters

# version of your automation
version: 0.0.1

# input configuration
input:

  # input variables
  # - id to use when referencing your variable in the template
  # - view to use when rendering your variable on the display
  # - path where your script loads the variable,
  #   relative to the input folder
  variables:
    - id: document_text
      view: text
      path: document.txt

# output configuration
output:

  # output variables
  # - id to use when referencing your variable in the template
  # - view to use when rendering your variable on the display
  # - path where your script saves the variable,
  #   relative to the output folder
  variables:
    - id: choropleth
      view: image
      path: choropleth.svg

# test configuration
# - folder that contains an input subfolder with paths for
#   input variables that define a specific test
tests:
  - folder: tests/standard

# batch configuration
# - name of the batch; can include variable ids for batch template
# - folder that contains an input subfolder with paths for
#   input variables that define a specific batch
# - variable configuration for batch template
      - id: { id of the input variable that you want to batch }
        path: { path containing different values for this variable }
batches:
  - name: {document_text}
    folder: batches/{document_text | normalize_key}
    variables:
      - id: document_text
        path: datasets/proverbs.txt

# script configuration
script:

  # folder where your script should run
  folder: .

  # command to use to run your script
  command: python -c "$(jupyter nbconvert run.ipynb --to script --stdout)"