Create Automation by Example: Count Words

In this tutorial, you will create a simple automation from scratch that counts words in a document. See the complete code.

Screenshot from 2023-08-29 20-16-28

Screenshot from 2023-08-29 20-16-09

Step 0: Launch JupyterLab

You can work through this tutorial straight from your browser without installing anything via Option 1. Or you can setup our open source packages on your machine via Option 2.

  • Option 1: Visit Learn Examples in JupyterLab and start a 30 minute session straight from your browser. You do not need to install anything.
  • Option 2: Install the CrossCompute Analytics Software Development Kit on your machine and start JupyterLab: python3.11 -m venv ~/.virtualenvs/crosscompute && pip install crosscompute-analytics jupyterlab~=3 && jupyter lab

Make a new folder called count-words. Enter the count-words folder. You can either use the JupyterLab interface or run the commands below in a terminal.

mkdir count-words
cd count-words

Step 1: Define input and output variables

Make a configuration file and save it in automate.yml. You can either use the JupyterLab interface to make automate.yml from scratch or run the commands below in a terminal.

# Make a configuration file
crosscompute --configure

Update input, output and scripts in automate.yml. Remove lines you don’t need.

---
crosscompute: 0.9.4
name: Count Words
version: 0.1.0
input:
  variables:
    - id: document
      view: text
      path: document.txt
output:
  variables:
    - id: word_count
      view: number
      path: variables.dictionary
    - id: word_counts
      view: table
      path: word_counts.json
batches:
  - folder: batches/standard
scripts:
  - path: run.ipynb
environment:
  packages:
    - id: pandas
      manager: pip

Put dummy data for your input variables in batches/standard, which will be useful for prototyping your scripts. For this example, we only have one input variable, located in document.txt.

mkdir batches/standard/input -p
cd batches/standard/input
echo one two three > document.txt

Step 2: Write scripts

Make a new notebook, rename it to run.ipynb and add the following snippet to the top of your notebook or script.

from os import getenv
from pathlib import Path
   
input_folder = Path(getenv(
    'CROSSCOMPUTE_INPUT_FOLDER', 'batches/standard/input'))
output_folder = Path(getenv(
    'CROSSCOMPUTE_OUTPUT_FOLDER', 'batches/standard/output'))
output_folder.mkdir(parents=True, exist_ok=True)

Your script needs to start with the input variables and end with the output variables. Prototype your script cell by cell in JupyterLab.

document_text = (input_folder / 'document.txt').read_text().strip()
document_text

words = document_text.split()
words

from collections import Counter

count_by_word = Counter(words)
count_by_word

word_count = len(words)
word_count

import json

with (output_folder / 'variables.dictionary').open('wt') as f:
    json.dump({'word_count': word_count}, f)

from pandas import DataFrame

word_count_table = DataFrame(count_by_word.items(), columns=['word', 'count'])
word_count_table.to_json(output_folder / 'word_counts.json', orient='split', index=False)

Then test your automation:

  • Click the CrossCompute icon in the right toolbar to open the CrossCompute panel
  • Click Launch and wait for the log to indicate that the server is serving
  • Click Development Server
  • Click Count Words
  • Click Continue

For reference, below are some common commands for loading input variables and saving output variables.

Load dictionary

import json

with (input_folder / 'variables.dictionary').open('rt') as f:
    variables = json.load(f)

Load text

document_text = (input_folder / 'document.txt').read_text().strip()

Save dictionary

import json

with (output_folder / 'variables.dictionary').open('wt') as f:
    json.dump({'word_count': word_count}, f)

Save text

(output_folder / 'x.md').write_text('**whee**')

Save table

from pandas import DataFrame

word_count_table = DataFrame(count_by_word.items(), columns=['word', 'count'])
word_count_table.to_json(output_folder / 'word_counts.json', orient='split', index=False)

Step 3: Add templates and styles

Update automate.yml to add input and output templates and styles.

---
crosscompute: 0.9.4
name: Count Words
version: 0.1.0
input:
  variables:
    - id: document
      view: text
      path: document.txt
  templates:
    - path: input.md
output:
  variables:
    - id: word_count
      view: number
      path: variables.dictionary
    - id: word_counts
      view: table
      path: word_counts.json
  templates:
    - path: output.md
batches:
  - folder: batches/standard
scripts:
  - path: run.ipynb
display:
  styles:
    - path: style.css
  pages:
    - id: output
      configuration:
        design: none
environment:
  packages:
    - id: pandas
      manager: pip

Make input.md.

# Count Words

Count occurrences of each word.

{ document }

Make output.md.

There are { word_count } words in the document.

{ word_counts }

Make style.css.

body {
    font-family: sans-serif;
}

Then test your automation. In most cases, your automation should update automatically and you do not need to restart the development server. In other cases, you might need to restart the development server using the steps below:

  • Click the CrossCompute icon in the right toolbar to open the CrossCompute panel
  • Click Stop to stop the Development Server
  • Click Launch and wait for the log to indicate that the server is serving
  • Click Development Server
  • Click Count Words
  • Click Continue
1 Like