Storage

In the Global Mailbox system, storage (file system) implementation is based on the concept of storage buckets. The buckets are containers (logical groups) in the file system, which are configured according to business requirements based on security and retention.

The storage system includes buckets, which store blobs. Buckets include variants, which are versions of the buckets. Blobs are stored in the variants. You must configure at least one bucket and variant for storage to operate.

A variant can have a different configuration (for example, encryption settings). Each variant within a bucket is identified by a unique variant identifier (0 - 63). Variants can be marked as retired after which a variant becomes read only. The data in blobs is distributed among the active variants of the bucket. During a READ or GET operation, the blob is retrieved from the variant it exists in.

File transfer process flow

Following is a high-level list of the file transfer process flow:
  1. A Global Mailbox enabled protocol server adapter calls the storage client to read or write a file.
  2. The storage client looks up for the specified file or variant.
  3. The storage client transfers the file to or from the file system.

Storage configuration

You can configure storage by using the Storage Configurator, which is a command line tool. The data input is validated and an XML file is generated based on the input. Storage configuration is stored in Cassandra. You can configure the following information for the storage component:
  • Bucket variants
  • File system base path
  • Security (hash value, encryption)
  • Maximum lifespan of blobs
  • Buffer size for storage
  • Input and output threads
  • Storage of blob metadata

When you install the initial Global Mailbox node, storage buckets (1st_provisioned and global_mbx) and the first variant (0) are created. By default, Global Mailbox uses the global_mbx bucket to store message payload. When installing the initial Global Mailbox node, you must also specify the shared storage path for all other data centers. The configuration information (1st_provisioned and global_mbx buckets, global.properties and installinfo.properties files), is copied to the shared storage path specified for other data centers after installing the initial Global Mailbox node.

When you configure your storage system, you must decide how you want to use buckets and variants to store the different kinds of information that flow through the Global Mailbox.

Pause and resume

If resumption is configured for a file that is uploaded, the file is broken into segments.

If a file is broken into four segments, a total of five files, one for each segment, and a stub file, are created and stored on the disk. The stub file contains metadata with information to reassemble the segments.

One of the following statuses can be seen when writing a file to storage:
New
A new blob is created
Writing
Writing the blob
Paused
Writing of the blob is paused
Complete
Writing the blob is completed

If the upload is interrupted (either by pause or loss of network connection) in the middle of uploading a segment, the whole segment is removed, and upload is resumed when the error is corrected.