WordCount

Get started with Grainite by writing WordCount from scratch

This guide will show you how to develop the WordCount application using Grainite on the Grainite environment provided to you.

A client will parse lines from documents (text files) and append them into a topic in Grainite. From there, action handlers will be responsible for parsing words from the lines, tracking the number of sentences and words per document, as well as word counts for each individual word.

Application Architecture

The architecture for the application is fairly easy to understand with Grainite. A client feeds text documents, line by line, into a topic in Grainite. From there, a "line" table subscribed to the topic, will receive all the events and will be responsible for running some business logic on these events.

Unlike other database systems, in Grainite, tables can have compute associated with them. This allows for developers to write code and apply business logic to the data.

Grains in the Line table will be keyed by a hash and will be responsible for parsing lines and extracting words from them. The extracted words will be sent to the Word Stats table which will be keyed by the first letter of those words. Each grain in the Word Stats table will store the count of each word in a sorted map.

In addition to storing the count of each word across all documents, WordCount will also store the number of words and sentences in a document, in the Doc Stats table.

Example

Let's assume that the client sends the following line to the Line topic from test.txt:

Hello Grainite.

The Line topic will relay all events it receives to the Line table as a result of the subscription. The Line table will then:

  1. Parse all the words from the sentence and send them to the Word Stats table.

    1. "Hello" will be sent to the "h" grain in the Word Stats table.

    2. "Grainite" will be sent to the "g" grain in the Word Stats table.

  2. Send the number of sentences and words to the Doc Stats table.

    1. 1 sentence and 2 words.

A grain in Grainite can store data in two ways:

  1. In the value of the grain.

  2. In one of the sorted maps of the grain.

The value usually contains a summary of the data associated with the grain, while the sorted maps usually contain detailed information for the grain.

Grains in the Word Stats table will track and store counts of each word in a sorted map. In this example, the "h" grain will increment and store the count for "Hello". Similarly, the "g" grain will increment and store the count for "Grainite".

Grains in the Doc Stats table will track and store the number of words and sentences in each document. In this example, the "text.txt" grain will increment and store the word and sentence counts for the document, in its value.

Last updated