Google Cloud Extension (Early Access)

The GCP extension provides Action Handlers to write to Google Cloud Storage and BigQuery Tables.

Setup

In order to use Action Handlers provided by this extension, add the following dependency in your pom.xml:

<dependency>
  <groupId>ext.grainite</groupId>
  <artifactId>grainite-gcp</artifactId>
  <version>{GRAINITE-VERSION}</version>
</dependency>

Replace {GRAINITE-VERSION} with the version of Grainite you are also using for libgrainite (the Grainite Client library for Java).

Contents

The GCP extension includes:

  • StorageWriterHandler: Handler that uploads the provided JSON list into a Google Cloud Storage bucket.

  • BigQueryWriterHandler: Handler that uploads a JSON blob from Google Cloud Storage to a BigQuery table.

Common Required Configuration Options

Both the handlers require the following configuration options:

PropertyValueDescription

credentials

Example: $secret:sa_cred

Credentials for the service account that will be used by the Action Handlers.

project.id

Example: my-gcp-project

Creating Service Account credentials and storing them in Grainite

To create the service account credentials, follow these instructions: link. Alternatively, you can use the gcloud CLI to generate the JSON credentials file:

gcloud iam service-accounts keys create /path/to/key-file.json --iam-account <service-account-email> --key-file-type=json

Where:

  • /path/to/key-file.json is where you would like to store the JSON credentials file.

  • <service-account-email> is the service account for which you wish to generate credentials.

StorageWriterHandler

This handler takes a list of JSON blobs and uploads them to the configured bucket in Google Cloud Storage. The blob name format is as follows: <optional_prefix>:<time_millis>. The <optional_prefix> (blob.name.prefix) is not mandatory, and <time_millis> represents the time, in milliseconds, when the handler uploaded the blob.

The handler uploads blobs in batches for efficiency. The handler accumulates events until one of the following happens:

  • The max batch size (upload.batch_size) is reached.

  • The max upload size (upload.payload_size) is reached.

  • There are remaining events in the Grain state, when the timer fires (upload.timer).

The handler can be optionally configured to send notifications for successful blob uploads by appending events (where the payload is the blob URI) to a Grainite Topic.

Usage

To include this handler in your application, you must specify the following in your application's configuration YAML file:

...
tables:
  - table_name: gcp_storage_uploader
    key_type: string
    action_handlers:
      - class_name: ext.grainite.handlers.debezium.DebeziumCDCProcessor
        type: java
        config:
          credentials: $secret:sa_cred
          project.id: my-gcp-project
          bucket.name: my-bucket
...

Below are the configuration options that can be passed in under config:

PropertyRequired?ValueDescription

credentials

REQUIRED

-

project.id

REQUIRED

-

bucket.name

REQUIRED

Example: my-bucket

Bucket name where the blobs will be uploaded.

debug

Optional

true or false Default: false

Prints some additional debugging information when true.

blob.name.prefix

Optional

Example: src-grainite

A prefix to add to each blob name.

notification.topic

Optional

Example: storage_uploads_topic

The topic to which an event should be appended upon successful blob uploads.

upload.batch_size

Optional

Example: 5 Default: 1

The maximum number of events to accumulate, before uploading them to the bucket.

upload.payload_size

Optional

Example: 500000 Default: 3500000

The maximum size (in bytes) to accumulate, before uploading the events to the bucket.

upload.timer

Optional

Example: 60 Default: 30

Time in seconds in which to check for pending uploads.

BigQueryWriterHandler

This handler takes a gcs blob URI for a JSON, and uploads it's contents into a BigQuery dataset table.

Note that the limitations from the following URL apply to this handler - https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-json.

Usage

To include this handler in your application, you must specify the following in your application's configuration YAML file:

...
tables:
  - table_name: bigquery_table_uploader
    key_type: string
    action_handlers:
      - class_name: ext.grainite.gcp.bigquery.handlers.BigQueryWriterHandler
        type: java
        config:
          credentials: $secret:sa_cred
          project.id: my-gcp-project
          dataset.name: my-dataset
          dataset.table: my-gcs-data-table
...

Below are the configuration options that can be passed in under config:

PropertyRequired?ValueDescription

credentials

REQUIRED

-

project.id

REQUIRED

-

dataset.name

REQUIRED

Example: my-dataset

Dataset name of the table.

dataset.table

REQUIRED

Example: my-table

Name of the table to which the rows will be added.

debug

Optional

true or false Default: false

Prints some additional debugging information when true.

Google Cloud Storage and BigQuery are trademarks of Google LLC.

Last updated