In this tutorial, you will create a simple automation from scratch that counts words in a document. See the complete code.
Step 0: Launch jupyter lab
Visit Learn Examples in JupyterLab and start a 30 minute session. You can also install the CrossCompute Analytics Software Development Kit on your machine:
pip install crosscompute-analytics jupyterlab~=3
Make a new folder called count-words
. Enter the count-words
folder. You can either use the JupyterLab interface or run the commands below in a terminal.
mkdir count-words
cd count-words
Step 1: Define input and output variables
Make a configuration file and save it in automate.yml
. You can either use the JupyterLab interface to make automate.yml
from scratch or run the commands below in a terminal.
# Make a configuration file
crosscompute --configure
Update input, output and scripts in automate.yml
. Remove lines you don’t need.
---
crosscompute: 0.9.4
name: Count Words
version: 0.1.0
input:
variables:
- id: document
view: text
path: document.txt
output:
variables:
- id: word_count
view: number
path: variables.dictionary
- id: word_counts
view: table
path: word_counts.json
batches:
- folder: batches/standard
scripts:
- path: run.ipynb
environment:
packages:
- id: pandas
manager: pip
Put dummy data for your input variables in batches/standard
, which will be useful for prototyping your scripts. For this example, we only have one input variable, located in document.txt
.
mkdir batches/standard/input -p
cd batches/standard/input
echo one two three > document.txt
Step 2: Write scripts
Make a new notebook, rename it to run.ipynb
and add the following snippet to the top of your notebook or script.
from os import getenv
from pathlib import Path
input_folder = Path(getenv(
'CROSSCOMPUTE_INPUT_FOLDER', 'batches/standard/input'))
output_folder = Path(getenv(
'CROSSCOMPUTE_OUTPUT_FOLDER', 'batches/standard/output'))
output_folder.mkdir(parents=True, exist_ok=True)
Your script needs to start with the input variables and end with the output variables. Prototype your script cell by cell in JupyterLab.
document_text = (input_folder / 'document.txt').read_text().strip()
document_text
words = document_text.split()
words
from collections import Counter
count_by_word = Counter(words)
count_by_word
word_count = len(words)
word_count
import json
with (output_folder / 'variables.dictionary').open('wt') as f:
json.dump({'word_count': word_count}, f)
from pandas import DataFrame
word_count_table = DataFrame(count_by_word.items(), columns=['word', 'count'])
word_count_table.to_json(output_folder / 'word_counts.json', orient='split', index=False)
Then test your automation:
- Click the CrossCompute icon in the right toolbar to open the CrossCompute panel
- Click Launch and wait for the log to indicate that the server is serving
- Click Development Server
- Click Count Words
- Click Continue
For reference, below are some common commands for loading input variables and saving output variables.
Load dictionary
import json
with (input_folder / 'variables.dictionary').open('rt') as f:
variables = json.load(f)
Load text
document_text = (input_folder / 'document.txt').read_text().strip()
Save dictionary
import json
with (output_folder / 'variables.dictionary').open('wt') as f:
json.dump({'word_count': word_count}, f)
Save text
(output_folder / 'x.md').write_text('**whee**')
Save table
from pandas import DataFrame
word_count_table = DataFrame(count_by_word.items(), columns=['word', 'count'])
word_count_table.to_json(output_folder / 'word_counts.json', orient='split', index=False)
Step 3: Add templates and styles
Update automate.yml
to add input and output templates and styles.
---
crosscompute: 0.9.4
name: Count Words
version: 0.1.0
input:
variables:
- id: document
view: text
path: document.txt
templates:
- path: input.md
output:
variables:
- id: word_count
view: number
path: variables.dictionary
- id: word_counts
view: table
path: word_counts.json
templates:
- path: output.md
batches:
- folder: batches/standard
scripts:
- path: run.ipynb
display:
styles:
- path: style.css
pages:
- id: output
configuration:
design: none
environment:
packages:
- id: pandas
manager: pip
Make input.md
.
# Count Words
Count occurrences of each word.
{ document }
Make output.md
.
There are { word_count } words in the document.
{ word_counts }
Make style.css
.
body {
font-family: sans-serif;
}
Then test your automation. In most cases, your automation should update automatically and you do not need to restart the development server. In other cases, you might need to restart the development server using the steps below:
- Click the CrossCompute icon in the right toolbar to open the CrossCompute panel
- Click Stop to stop the Development Server
- Click Launch and wait for the log to indicate that the server is serving
- Click Development Server
- Click Count Words
- Click Continue