cdr define shardCollection

The cdr define shardCollection command creates a sharding definition for distributing a table or collection across multiple shard servers.

Syntax

Read syntax diagramSkip visual syntax diagram
>>-cdr define shardCollection----------------------------------->

>--definition_name--database--:--user--.--+-collection-+-------->
                                          '-table------'   

                                           .-delete--------.   
>--+------------------------+-- --type--=--+-keep----------+---->
   |                    (1) |              '-informational-'   
   '-| Connect Option |-----'                                  

>-- --key--=--+-column-----------+------------------------------>
              '-"--expression--"-'   

>--+------------------------------+----------------------------->
   '- --versionCol--=--+-field--+-'   
                       '-column-'     

                                 .----------------------------------.                          
                                 V                                  |                          
>-- --strategy--=--+-expression----ER_group--"--WHERE_expression--"-+--ER_group--REMAINDER-+-><
                   |         .----------.                                                  |   
                   |         V          |                                                  |   
                   +- chash----ER_group-+--+------------------------------+----------------+   
                   |                       '- --partitions--=--partitions-'                |   
                   |        .----------.                                                   |   
                   |        V          |                                                   |   
                   '- hash----ER_group-+---------------------------------------------------'   

Notes:
  1. See Connect Option.
Element Description Restrictions
collection The name of the collection that is distributed across database servers. Must be the name of an existing collection.
column The name of a table column. Must be the name of an existing column.
database The name of the database that contains the table or collection that is distributed across database servers. Must be the name of an existing database.
definition_name The name of the sharding definition that is created.  
ER_group The ER-group name of a database server that receives sharded data. Must be the ER-group name of an existing database server.
expression The WHERE-clause expression that is used to select rows or documents by shard key value.  
field The name of a collection field. Must be the name of an existing field.
REMAINDER Specifies the database server that receives rows or documents with shard key value that is not selected by the other expressions.  
partitions The number of hashing partitions to create on each shard server. Must be a positive integer.
table The name of the table that is distributed across database servers. Must be the name of an existing table.
user The owner of the table or collection that is distributed across database servers. Must be the name of an existing user.

The following table describes the cdr define shardCollection parameters.

Long Form Short Form Description
--key= -k Defines the shard key on all database servers.
Possible values are:
  • A column name
  • An expression

All database servers in a shard cluster must use the same column or expression as the shard key.

--partitions= -p Specifies the number of hashing partitions to create on each shard server when the sharding strategy is consistent hashing. Default is 3. The more hashing partitions, the more evenly the data is distributed among shard servers. However, if you specify more than 10 hashing partitions, the resulting SQL statement to create the sharded table might fail because it exceeds the maximum character limit for an SQL statement.
--strategy= -s Specifies the method for determining which database server an inserted row or document is distributed to.
Possible values are:
  • expression: The expression that is defined in the server statement is used.
  • chash: A consistent hash algorithm is used. When you add or remove a shard server, the consistent hashing algorithm redistributes a fraction of the data.
  • hash: A hash algorithm is used. When you add or remove a shard server, the hashing algorithm redistributes all the data.
--type= -t Specifies action on the shard server where a row or document was inserted:
  • delete (default): The row or document is deleted from the source shard server after it is replicated to the target shard server. If you do not set --versionCol=column, changes made to rows and documents can be lost during the replication process.
  • keep: The row or document is not deleted on the source shard server after the row or document is replicated to the source shard serverso that two copies of the data exist in the shard cluster.
  • informational: Data is not replicated. You can run sharded queries but the data is not sharded during loading. You must load the data on the appropriate shard server according to the sharding definition.
--versionCol= -v When --type=delete is specified in the sharding definition, Enterprise Replication must verify that a source row or document was not updated before it can delete the row or document on the shard server.
Possible values are:
  • A table column name
  • A field name

If --type=delete is set in the sharding definition, but --versionCol=column is not, changes made to rows and documents can be lost during the replication process.

This parameter is required if any rows have out-of-row data, such as data stored in smart large object, or if collections have BSON documents that have sizes larger than 4 KB.

Usage

Use the cdr define shardCollection command to create a sharding definition for distributing a table or document across multiple shard servers. The replicates that are created as part of the cdr define shard command are mastered and use always apply and row scope. You cannot specify that triggers fire.

Multiple sharding definitions are not allowed on the same table or collection.

You cannot manually define an Enterprise Replication replicate for a table that is sharded.

Return codes

A return code of 0 indicates that the command was successful.

If the command is not successful, one of the following error codes is returned: 3, 18, 39, 52, 83, 99, 125, 196, 215, 229.

For information about these error codes, see Return Codes for the cdr Utility.

Example: Creating a sharding definition that uses a consistent hash algorithm

The following example creates a sharding definition that is named collection_1. Rows that are inserted on any of the shard servers are distributed, based on a consistent hash algorithm, to the appropriate shard server. Enterprise Replication must verify that a replicated row or document was not updated before the row or document can be deleted on the source server. The b column in the customers table that is owned by user john is the shard key. Each shard server has three hashing partitions.

cdr define shardCollection collection_1 db_1:john.customers   
   --type=delete --key=b --strategy=chash --partitions=3 --versionCol=column_3      
   g_shard_server_1 g_shard_server_2 g_shard_server_3 

The partition range for each shard server is calculated based on the server group name. The data is distributed according to the following sharding definition:

g_shard_server_1     (mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) between 4019 and 5469)
                      or (mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) between 5719 and 6123)  
                      or (mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) between 2113 and 2652)
g_shard_server_2     (mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) between 6124 and 7415)
                      or (mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) between 5470 and 5718)  
                      or (mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) between 7416 and 7873)
g_shard_server_3     (mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) between 2653 and 3950)
                      or mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) >= 7874 
                      or mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) < 2113   
                      or (mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) between 3951 and 40

Example: Creating a sharding definition that uses a hash algorithm

The following example creates a sharding definition that is named collection_1. Rows that are inserted on any of the shard servers are distributed, based on a hash algorithm, to the appropriate shard server. Enterprise Replication must verify that a replicated row or document was not updated before the row or document can be deleted on the source server. The state column in the customers table that is owned by user john is the shard key.

cdr define shardCollection collection_1 db_1:john.customers
   --type=delete --key=state --strategy=hash --versionCol=version
   g_shard_server_A g_shard_server_B g_shard_server_C g_shard_server_D

Example: Creating a sharding definition that uses an IN expression

The following example creates a sharding definition that is named collection_2. The state column in the clients table that is owned by user joe is the shard key. Rows that are inserted on any of the shard servers are distributed, based on the defined expression, to the appropriate shard server. Replication acknowledgment must verify that a replicated row or document was not updated before the row or document can be deleted on the source shard server.

cdr define shardCollection collection_2 db_2:joe.clients
--type=delete --key=state --strategy=expression –-versionCol=version
   g_shard_server_A "IN ('TX','OK')"
   g_shard_server_B "IN ('NY','NJ')"
   g_shard_server_C "IN ('AL','GA')"
   g_shard_server_D  REMAINDER
In the previous example:
  • Inserted rows that have a value of AL in the state column are sent to g_shard_server_C.
  • Inserted rows that have a value of NJ in the state column are sent to g_shard_server_B.
  • Inserted rows that have a value of CA in the state column are sent to g_shard_server_D.

Example: Creating a sharding definition that uses a BETWEEN expression

The following example creates a definition that is named collection_3. The age column in the users table that is owned by user charles is the shard key. Rows that are inserted on any of the shard servers are distributed, based on the defined expression, to the appropriate shard server. Replication acknowledgment must verify that a replicated row or document was not updated before the row or document can be deleted on the source shard server.

cdr define shardCollection collection_3 db_3:charles.users
--type=delete --key=age --strategy=expression –-versionCol=version
   g_shard_server_A "BETWEEN 0 and 20"
   g_shard_server_B "BETWEEN 21 and 62"
   g_shard_server_C "BETWEEN 63 and 100"
   g_shard_server_D  REMAINDER
In the previous example:
  • Inserted rows that have a value of 35 in the age column are sent to g_shard_server_B.
  • Inserted rows that have a value of 102 in the age column are sent to g_shard_server_D.
  • Inserted rows that have a value of 15 in the age column are sent to g_shard_server_A.

Example: Creating a sharding definition that defines a shard key by function

The following example creates a sharding definition that is named collection_4. The COLOR shard key in the cars collection that was owned by user mike is the shard key. Documents that are inserted on any of the shard servers are distributed, based on the defined expression, to the appropriate shard server.
cdr define shardCollection collection_4 db_4:mike.cars
   –t delete -k "bson_value_lvarchar(data,'COLOR')" -s expression –v version
   g_shard_server_E "IN ('blue','green')"
   g_shard_server_F "IN ('black','white')"
   g_shard_server_G "IN ('brown','gray')"
   g_shard_server_H "IN ('red','yellow')"
   g_shard_server_I  REMAINDER
In the previous example:
  • Inserted documents that have a value of yellow in the COLOR key are sent to g_shard_server_H.
  • Inserted documents that have a value of blue in the COLOR key are sent to g_shard_server_E.
  • Inserted documents that have a value of pink in the COLOR key are sent to g_shard_server_I.