Configuration File Design

Here is the proposed design for CrossCompute Analytics SDK protocol 1.0.0:

  • The following sections are new: network, pricing, tools, results, views, tests, presets.
  • The imports section is now named tools.
  • The batches section is now named presets.
  • The environment section now has processor and memory.
  • The environment batch parameter is now named concurrency.

UPDATED 20241001

  • Added dataset read and write modes to let the author specify how to synchronize datasets with the server.
  • Removed dataset script configuration; instead, use tool chains to run a script that generates a dataset to be used by later scripts.
  • Made dataset paths relative instead of absolute.

UPDATED 20241018

  • Removed dataset slugs

UPDATED 20241209

  • Simplified page options

UPDATE 20250403

  • Make copyright attribution more flexible

UPDATED 20250821

  • Add support for tool / worker compatibility
---
# Examples: https://github.com/crosscompute/crosscompute-examples
# Forum: https://forum.crosscompute.com/t/configuration-file-design/235
# Gallery: https://crosscompute.net
# Documentation: https://docs.crosscompute.com

# protocol version determines how this file is interpreted (required)
crosscompute: 1.0.0

# name summarizes what your tool does
name: Tool X
# slug customizes the tool uri
slug: tool-x
# title sets the page title
title: Tool X - Improve health, safety, quality of life
# description explains why your tool is useful
description: Improve health, safety, quality of life in our communities
# version should increment after you make changes to your tool
version: 0.0.0

# copyrights declare who owns this tool or kit
copyrights:
  - name: CrossCompute
    slug: crosscompute
    years:
      - 2024

# network declares how your server coordinates with other servers
network:
  # markets are storefronts for tools
  markets:
    - uri: https://crosscompute.net
    - path: market-uris.txt
  # peers are other servers that can send, receive or run tools
  peers:
    - uri: https://example.com/peer-uris.txt
    - path: peer-uris.txt
  # keys are public keys used to authenticate markets and peers
  keys:
    - uri: https://example.com/network-keys.txt
    - path: network-keys.txt

# pricing specifies how you want to price your tool or kit
pricing:
  # account is where the customer sends your subscription payment
  account: nano_abc
  # period is how often the customer needs to send payment
  period: month
  # amount is how much the customer needs to send
  amount: 100
  # currency is the ISO 4217 code or cryptocurrency abbreviation
  currency: xno

# tools let you include other tools and define a kit
tools:
  # path specifies the location of the tool to include;
  # aka references this tool in your templates;
  - path: abc/automate.yaml
    aka: abc
  # uri specifies the uri of the tool to include;
  # visibility index means the tool appears in index and search but not home
  - uri: https://github.com/crosscompute/crosscompute-examples
    aka: examples
    visibility: index
  # hashes provide a way to verify the downloaded code;
  # visibility search means the tool appears in search but not home and index
  - uri: /t/add-numbers
    aka: add_numbers
    visibility: search
    hashes:
      - sha256: abc

# results let you include other results in your reports
results:
  # path specifies the location of the result to include;
  # aka references this result in your templates
  - path: abc/results/aaa
    aka: aaa
  # uri specifies the uri of the result to include
  - uri: /t/add-numbers/r/1-2
    aka: one_two

# input is how your scripts get data from the user
input:
  # input variables are provided to your scripts from the user or from presets
  variables:
    # id references this variable in your templates (required);
    # view specifies how to render your variable (required);
    # path specifies the file where your scripts load this variable (required);
    #   note that path is relative to the input folder;
    #   specify ENVIRONMENT to prevent saving the variable for subsequent runs;
    # label sets the label text above the variable view;
    # configuration customizes the look and feel of the view
    - id: town_name
      view: string
      path: variables.dictionary
      label: What is the name of your town?
      configuration:
        suggestions:
          - value: Springfield
          - value: Branson
          - value: Nixa
          - value: Mansfield
          - value: Independence
    - id: age
      view: number
      path: variables.dictionary
      label: What is your age?
    - id: secret_code
      view: password
      path: ENVIRONMENT
      label: What is your secret code?
    - id: support_email
      view: email
      path: ENVIRONMENT
    - id: problem_description
      view: text
      path: problem.txt
    - id: blurb
      view: markdown
      path: blurb.md
    - id: flavor
      view: radio
      path: variables.dictionary
      configuration:
        options:
          - name: Vanilla
            value: vanilla
          - name: Chocolate
            value: chocolate
          - name: Strawberry
            value: strawberry
    - id: topics
      view: checkbox
      path: variables.dictionary
      configuration:
        options:
          - value: cooking
          - value: reading
          - value: writing
          - value: mathematics
          - value: swimming
    - id: profile_photo
      view: file
      path: photo{index}{suffix}
      configuration:
        mime-types:
          - image/jpeg
          - image/png
          - image/svg+xml
    - id: region
      view: map-mapbox-location
      path: variables.dictionary
  # input templates guide the user on how to specify the input variables
  templates:
    # path specifies the markdown file for your template (required)
    - path: input.md
    # expression determines whether this template shows next
    - path: input2.md
      expression: age >= 18

# output is how your scripts set data for the user
output:
  # output variables are provided by your scripts
  variables:
    # id references this variable in your templates (required);
    # view specifies how to render your variable (required);
    # path specifies the file where your scripts save this variable (required);
    #   note that path is relative to the output folder;
    # configuration customizes the view and is view-specific;
    # mode overrides rendering mode
    - id: document
      view: link
      path: document.pdf
      configuration:
        link-text: YOUR-LINK-TEXT
        file-name: YOUR-FILE-NAME
    - id: message
      view: string
      path: variables.dictionary
    - id: count
      view: number
      path: variables.dictionary
    - id: lyrics
      view: text
      path: lyrics.txt
    - id: monologue
      view: markdown
      path: monologue.md
    - id: logo
      view: image
      path: logo.svg
    - id: counts
      view: table
      path: counts.json
    - id: demo
      view: frame
      path: variables.dictionary
    - id: cards
      view: json
      path: cards.json
    # Here the checkbox is configured dynamically from buildings.json
    # and will render as input because of the mode override
    - id: buildings
      view: checkbox
      path: buildings.txt
      configuration:
        path: buildings.json
      mode: input
    - id: report
      view: pdf
      path: example.pdf
    - id: region
      view: map-mapbox
      path: region.geojson
      configuration:
        style: mapbox://styles/mapbox/dark-v11
        layers:
          - type: fill
            paint:
              fill-color: blue
          - type: circle
            paint:
              circle-color: red
    - id: incidents
      view: map-deck-screengrid
      path: incidents.json
      configuration:
        style: mapbox://styles/mapbox/dark-v11
    - id: identity
      view: barcode
      path: variables.dictionary
  # output templates guide the user on how to interpret the output variables
  templates:
    # path specifies the markdown file for your template (required)
    - path: output.md

# log is how your scripts communicate with the user while running
log:
  variables:
  templates:

# debug is how your scripts record information for troubleshooting bugs
debug:
  variables:
  templates:

# print defines how your output will render as a document or presentation
print:
  # print variables are generated by crosscompute printers
  variables:
    # id references this file in your templates (required);
    # view specifies how to render your file (required);
    # path specifies where the printer should save the file (required);
    # label sets the label text above the variable view;
    # configuration customizes how the printer generates the file
    - id: report
      view: pdf
      path: report.pdf
      configuration:
        header-footer:
          font-family: sans-serif
          font-size: 8pt
          color: '#808080'
          padding: 0.1in 0.25in
          skip-first: true
        page-number:
          location: footer
          alignment: right
        name: '{y2 | slug}-{y3}.pdf'
    - id: report-uri
      view: link
      path: report.pdf
      label: Report URI
      configuration:
        path: report-uri.json

# views are custom view classes for processing and rendering variables
views:
  # name is the view name;
  # configuration is applied to all variables with this view;
  # package specifies the path or uri of the python package defining the view
  - name: map-mapbox
    configuration:
      style: mapbox://styles/mapbox/dark-v11
  - name: a-view
    package:
      path: views/a-view
  - name: b-view
    package:
      uri: https://pypi.org/project/b-view

# tests verify that your tool works properly
tests:
  # folder sets input variables and checks selected output variables
  - folder: standard

# presets are pre-defined sets of input variables
presets:
  # folder sets values for input variables;
  # folder contains an input subfolder;
  # input subfolder contains files for the input variables
  - folder: presets/standard
  # configuration sets variable values, where each row is a separate preset
  # folder is the name of the folder saved to disk;
  # name is the preset name displayed online;
  # slug is the preset uri;
  # the above string templates can include variable ids and filters
  - folder: presets/{x1 | slug}-{x2}
    name: '{x1 | title} {x2}'
    slug: '{x1 | slug}-{x2}'
    configuration:
      path: presets.csv
  # reference folder sets variable values missing in the configuration path;
  # configuration sets variable values, where each row is a separate preset
  - folder: presets/{x1 | slug}-{x2}
    name: '{x1 | title} {x2}'
    slug: '{x1 | slug}-{x2}'
    reference:
      folder: presets/standard
    configuration:
      path: presets.csv

# datasets are files that are expected by your scripts
datasets:
  # path specifies the file where your scripts load this dataset (required);
  #   note that path is relative to the dataset folder;
  # reference path or uri specifies the location of your original file;
  - path: abc.csv
    reference:
      path: datasets/abc-2024.csv
  # use references to avoid changing paths in your scripts with new datasets;
  # for example, suppose you have a report that relies on a yearly dataset;
  # use path to fix a location where your scripts can expect to find the file;
  # use reference path to point to the latest version of your yearly dataset
  - path: def.csv
    reference:
      uri: https://example.net/def-2024.csv
  # input replace means the dataset will be downloaded to the input folder
  # output replace means the path in the output folder will replace the dataset
  - path: jkl.csv
    input: replace
    output: replace
  # input is none by default; the dataset will not be downloaded
  # output append means the path in the output folder will be appended remotely
  - path: mno.csv
    output: append
  # input is none by default; the dataset will not be downloaded
  # output is none by default; the dataset will not be uploaded
  - path: pqr.csv

# scripts contain code that turn input variables into output variables
scripts:
  # command runs in the tool folder;
  # folder paths are passed as arguments
  - command: >-
      python run.py
      {input_folder} {output_folder} {log_folder} {debug_folder}
  # path is a python script that runs in the tool folder;
  # folder paths are passed as environment variables
  - path: run.py
  # path is a python script that runs in the specified folder;
  # folder paths are passed as environment variables
  - path: run.py
    folder: scripts
  # path is a jupyter notebook that runs in the tool folder;
  # folder paths are passed as environment variables
  - path: run.ipynb
  # function is a python function that runs in the tool folder;
  # folder paths are passed as function arguments
  - function: run.plot_all

# environment configures how your scripts run
environment:
  # processor is what processor type you want to use to run your scripts
  processor: cpu
  # memory is how much memory you want to reserve for your scripts
  memory: 1gb
  # engine runs your scripts and can be either podman or unsafe;
  # podman is a container engine (see https://podman.io);
  # unsafe means that the scripts will run directly on your machine
  engine: podman
  # image is the container used to run your scripts when using podman engine
  image: python
  # packages are dependencies required by your scripts;
  # engine=unsafe will install the packages directly on your machine;
  # engine=podman will install the packages in the container image
  packages:
    # id is the name of the package as defined in the package manager;
    # manager is the name of a package manager such as pip, npm, dnf, apt
    - id: matplotlib
      manager: pip
    - id: turf
      manager: npm
    # uri is the uri of the package;
    - uri: "https://mirrors.rpmfusion.org/free/fedora/\
        rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm"
      manager: dnf
    - id: ffmpeg
      manager: dnf
    - id: libgeos-dev
      manager: apt
  # ports expose server processes running in your scripts
  ports:
    # id should correspond to a log or debug variable id that uses frame view;
    # number is the port on which your script server process is listening
    - id: demo
      number: 8888
  # variables are environment variables needed by your scripts
  variables:
    # id is the environment variable to make available to your script
    - id: GOOGLE_KEY
  # concurrency defines how your presets run;
  # thread runs each preset as a separate thread;
  # process runs each preset as a separate process;
  # none runs each preset one at a time
  concurrency: process
  # interval specifies how long to wait before running your scripts again;
  # use this setting to have dashboards update themselves
  interval: 30 minutes
  # add an exclamation point to ensure the scripts run even if nobody watches
  # interval: 30 minutes!

# display configures the overall look and feel of your tool
display:
  # styles customize how your templates look
  styles:
    # path specifies the location of a css file
    - path: style.css
    # uri specifies the uri of a css file
    - uri: https://cdn.jsdelivr.net/npm/pygments-css@1.0.0/default.css
  # templates override the core templates used to render the site
  templates:
    # id specifies the name of the template to override;
    # path specifies the location of a jinja template
    - id: base
      path: base.html
    - id: live
      path: live.html
    - id: root
      path: root.html
  # pages override the design of specific pages
  pages:
    # id specifies the name of the page to override;
    # configuration sets the design of the page
    - id: tool
      # design input puts the input template on the tool page
      design: input
    - id: tool
      # design output puts the output template on the tool page
      design: output
    - id: tool
      # design none only renders the list of presets on the tool page
      design: none
    - id: input
      # design none removes all variable labels and css
      design: none
      # buttons override the design of specific buttons
      buttons:
        - id: continue
          text: Continuar
        - id: back
          text: Volver
    - id: output
      # design flat prevents section folding
      design: flat

# authorization restricts access to your tool
authorization:
  # tokens are api keys for your tool;
  # each token defines an identity
  tokens:
    # path specifies the location of a yaml file;
    # the file should have tokens as keys and configuration settings as values;
    # abcd:
    #   role_name: admin
    #   town_name: Springfield
    # bcde:
    #   role_name: leader
    #   town_name: Branson
    # cdef:
    #   role_name: member
    #   town_name: Mansfield
    - path: tokens.yaml
  # groups define permissions
  groups:
    # configuration settings define how a group matches an identity;
    # permission ids specify which privileges the identity can access
    - configuration:
        role_name: owner
      permissions:
        - id: add_token
        - id: see_root
        - id: see_tool
        - id: see_preset
        - id: run_tool
    # note that the value for each configuration setting can also be a list;
    # then an identity will match this group if its setting matches a value
    - configuration:
        role_name: member
      permissions:
        - id: see_root
        - id: see_tool
        - id: see_preset
          # action match is specific to the permission see_preset;
          # in the example above, suppose that a user uses the token cdef;
          # then the user will have town_name Mansfield and
          # can only see presets whose input variable town_name is Mansfield
          action: match

# compatibility restricts worker access to your tool                                                                                              
compatibility:                                                                                                                                    
  # workers restrict access to specific workers;                                                                                                  
  # omit to remove this restriction                                                                                                               
  workers:                                                                                                                                        
    - slug: nKUOoxPSER8AxuBF5nFgpWhWycza9DMm                                                                                                      
  # labels restrict access to workers with specific labels;                                                                                       
  # omit to remove this restriction                                                                                                               
  labels:                                                                                                                                         
    - slug: processor/cpu  

By specifying a slug, you can indicate that the dataset is on a CrossCompute server. Set lock to false if your script will change the dataset and you want to upload the updated dataset after your script finishes running.

datasets:
  # slug identifies a dataset on the server
  # lock is true by default; set false if your script will change the dataset
  - path: jkl.csv
    slug: jkl
    lock: false
  # slug can also be omitted in which case the slug will be the file name
  - path: mno.csv

Instead of specifying lock, it might be more expressive to specify write mode for datasets, similar to how programming languages set write mode for files.

datasets:
  # path specifies the file where your scripts load this dataset (required);
  #   note that path is relative to the dataset folder;                  
  # slug identifies a dataset on the server
  # read replace means the dataset will be downloaded and overwritten locally
  # write replace means the dataset will be uploaded and overwritten remotely
  - path: jkl.csv
    slug: jkl
    read: replace
    write: replace
  # read is none by default; the dataset will not be downloaded
  # write append means the dataset will be appended remotely
  - path: mno.csv
    slug: mno
    write: append
  # slug can be omitted in which case the slug will be the file name
  # read is none by default; the dataset will not be downloaded
  # write is none by default; the dataset will not be uploaded
  - path: pqr.csv

Instead of having a separate datasets folder, I am debating whether to have the dataset in question be written to the input folder and read from the output folder.

datasets:
  # input replace means the dataset will be downloaded to the input folder
  # output replace means the path in the output folder will replace the dataset
  - path: jkl.csv
    input: replace
    output: replace
  # read is none by default; the dataset will not be downloaded
  # output append means the path in the output folder will be appended remotely
  - path: mno.csv
    output: append
  # input is none by default; the dataset will not be downloaded
  # output is none by default; the dataset will not be uploaded
  - path: pqr.csv

We successfully tested dataset locks, making it possible for future tools to add distributed persistence via an append or replace mechanism.

Earlier tools handled persistence using the local file system.

Future tools can now persist datasets even if workers run on multiple machines.

Here is an example of a tool that uses the append mechanism to append data to a remote dataset. Note that the append mechanism can only be used with line-based text datasets such as CSVs. Binary datasets should only use the replace mechanism.

automate.yaml

---
crosscompute: 0.9.5
name: Check Dataset Locks (Output Append)
version: 0.0.1
copyright:
  name: CrossCompute Inc.
  year: 2024
output:
  variables:
    - id: c
      view: number
      path: variables.dictionary
presets:
  - folder: presets/standard
datasets:
  - path: abc.txt
    output: append
scripts:
  - command: python run.py 3

run.py

import json
from os import getenv
from pathlib import Path
from sys import argv
from time import sleep
  
  
INPUT_FOLDER = Path(getenv('INPUT_FOLDER', 'presets/standard/input'))
OUTPUT_FOLDER = Path(getenv('OUTPUT_FOLDER', 'results/standard/output'))
OUTPUT_FOLDER.mkdir(parents=True, exist_ok=True)


lines = []
try:
    with open(INPUT_FOLDER / 'abc.txt', 'rt') as f:
        for line in f.read():
            line = line.strip()
            if not line:
                continue
            lines.append(line)
except OSError:
    pass
if len(argv) > 1:
    sleep(int(argv[1]))
lines.append(str(len(lines)))
with open(OUTPUT_FOLDER / 'abc.txt', 'wt') as f:
    f.write('\n'.join(lines))
with open(OUTPUT_FOLDER / 'variables.dictionary', 'wt') as f:
    json.dump({'c': len(lines)}, f)

Note that datasets are downloaded to the input folder and uploaded from the output folder.

This functionality is scheduled to be available in CrossCompute 0.9.5.

We made copyright attribution more flexible by simplifying the syntax.

# copyrights declare who owns this tool or kit
copyrights:
  - name: CrossCompute
    slug: crosscompute
    years:
      - 2024
      - 2025
  - name: DevLabs México
    slug: devlabs-mexico
    years:
      - 2025
  • If a tool has multiple copyright owners, list each copyright owner separately.
  • If a tool is updated, append another year to the years field.
  • Make sure to preserve prior years when updating the years field so that you can claim copyright properly.
  • Note that the name of a copyright owner will not change if the copyright owner already exists in the system. You will need to update the copyright owner name directly in the system.

We are proposing to repurpose the debug step to be a way to display errors, warnings as well as logs for debugging.

We plan to add support for variable.level, that will be recognized across all steps, with a definition analogous to Python’s logging level and HTTP error codes.

Whether a variable is rendered depends on whether the user is a site reader, tool manager or tool user / result viewer.

# debug is how your scripts record errors, warnings, logs                
debug:
  variables:                                                             
    # level filters variables based on user roles and settings;
    # level == DEBUG is only visible to site readers and tool managers;
    # level == INFO is visible to all result viewers;
    # level == WARNING is visible to all result viewers;
    # level == ERROR is visible to all result viewers;
    # level == CRITICAL is visible to all result viewers and is intended to
    #   indicate script errors (analogous to HTTP 50x)
    - id: town_name_debug                                                
      view: string                                                       
      path: variables.dictionary                                         
      level: DEBUG
    - id: town_name_info                                                 
      view: string                                                       
      path: variables.dictionary                                         
      level: INFO                                                        
    - id: town_name_warning                                              
      view: string                                                       
      path: variables.dictionary                                         
      level: WARNING                                                     
    - id: town_name_error                                                
      view: string                                                       
      path: variables.dictionary                                         
      level: ERROR
    - id: town_name_critical                                             
      view: string
      path: variables.dictionary                                         
      level: CRITICAL

We are proposing the following enhancements:

  1. Let a tool creator to control which workers can run results for their tool
  2. Let a worker creator to control which tools can be run by their worker.

Tool Configuration

The following section should be specified in the tool configuration:

# compatibility restricts worker access to your tool
compatibility:
  # workers restrict access to specific workers;
  # omit to remove this restriction
  workers:
    - slug: nKUOoxPSER8AxuBF5nFgpWhWycza9DMm
  # labels restrict access to workers with specific labels;
  # omit to remove this restriction
  labels:
    - slug: processor/cpu

If the tool configuration is missing a compatibility section, then the tool will not be runnable by the platform.

Worker Configuration

As a counter point, the following section should be specified in a worker configuration:

# compatibility restricts tool access to your worker
compatibility:
  # tools restrict access to specific tools;
  # omit to remove this restriction
  tools:
    - slug: add-numbers
  # labels restrict access to tools with specific labels;
  # omit to remove this restriction
  labels:
    - slug: processor/cpu