Tiering data by using ScaleILM application

Use the IBM Spectrum® Discover ScaleILM application to move data to different tiers (pools) that are configured on the IBM Spectrum Scale connection.

Before you begin

  • Configure IBM Spectrum Scale with one or more internal pools. For more information, see: Internal storage poolsFor more information, see Internal Storage pools in the IBM Spectrum Scale: Administration Guide.
  • If you want to, you can configure IBM Spectrum Archive and its pools.
  • Create the IBM Spectrum Scale connection in IBM Spectrum Discover. If the external pool that is managed by IBM Spectrum Archive is used as the destination tier, then the "host" setting of IBM Spectrum Scale connection must specify one of the IBM Spectrum Archive nodes.
  • IBM Spectrum Discover ScaleILM application uses the same user, that is configured in IBM Spectrum Discover to scan the IBM Spectrum Scale and run the Data Movement policies. For this data movement, the IBM Spectrum Scale connection is used as the 'source_connection'. For more information, see Creating or identifying a user ID and password for scanning. For more information, see the topic Creating or identifying a user ID and password for scanning in the Data Cataloging: Concepts, Planning, and Deployment Guide.

About this task

The IBM Spectrum Discover ScaleILM application provides the advanced data tiering function through the IBM Spectrum Scale connection. It uses the system metadata and custom metadata of documents that are collected by IBM Spectrum Discover for this advanced data tiering function. This capability eliminates the need to scan file systems during IBM® License Manager (ILM) execution. It also provides cognitive tiering capability by using the results of IBM Spectrum Discover's AUTOTAG, CONTENT SEARCH, and DEEP-INSPECT policies.

IBM Spectrum Scale provides the built-in Information Lifecycle Management (ILM) capability that optimizes the cost-effectiveness of data by moving the physical location of data between the storage pools with different cost or performance characteristics. IBM Spectrum Scale's ILM supports the data movement between its internal storage pools and between both the internal pool and the external storage repository. The movement from external storage repository is managed by external applications, such as IBM Spectrum Archive Enterprise Edition (EE) and IBM Spectrum Protect for Space Management.

Make sure that you know the IBM Spectrum Archive Enterprise Edition (EE) software versions that are supported. For more information, see IBM Spectrum Storage software requirements. For more information, see the topic IBM Spectrum Storage software requirements in the Data Cataloging: Concepts, Planning, and Deployment Guide.

While tiering data in external pools, the current location of the data is indicated in its file state. The data locations and the corresponding file states that indicate these locations, are listed in the following table:
Table 1. Data locations and File States
Data locations File states
The data location is only in the internal pool. The file state is resident.
The data location is both internal and external pool. The file state is premigrated.
The data location is only in the external pool. The file state is migrated.
Note: The functions of the ScaleILM application depend on the capability of underlying storage hardware and its management software. For more information, see the following topics:
  • Information Lifecycle Management for IBM Spectrum Scale in the IBM Spectrum Scale: Administration Guide
  • IBM Spectrum Archive Enterprise Edition (EE) IBM Documentation

Procedure

  1. Log in to IBM Spectrum Discover GUI.
  2. Go to Admin > Management Policies page.
  3. Click Add Policy to create a policy.
  4. Click the slider control and set the status to one of the following values:
    Active
    An active policy is run whenever its scheduling event is reached.
    Inactive
    An inactive policy is not run even when its scheduling event is reached, including the Now event.
  5. Enter a policy name.
  6. Enter a Policy filter. The policy filter includes the criteria for selecting the files for tiering, such as filetype="pdf".
    Note: If the destination tier is the external pool that is managed by IBM Spectrum Archive, define the policy filter criteria as state in 'resdnt'. For example, filetype ='txt' and state in 'resdnt'.

    If the filter criteria do not include state in 'resdnt' while tiering data to an external pool, the ScaleILM application skips the files that are not in resdnt state.

  7. Click Next Step to select the type of policy.
  8. Select TIER as the Policy type.
  9. Select ScaleILM as the Agent.
  10. Select the source_connection from the drop-down list. The source connection is the source from where the data is being moved, such as the name of the defined IBM Spectrum Scale connection.
  11. Enter the destination_tier where you want to move your data, such as:
    • Internal Pools

      Gold, silver, bronze, flash system

    • External Pools
      • archive:pool1@library1
      • archive:pool2@library2
      • archive:pool1@library1, pool2@library3
    Note: When you specify an IBM Spectrum Scale internal pool as the "destination_tier", you need to ensure that you specify a valid internal pool name. That internal pool name must be configured in the IBM Spectrum Scale source connection for the corresponding data source (file system). These internal pools can be listed by using the following IBM Spectrum Scale command:
    mmlspool <device> all

    When you specify an external pool that is managed by IBM Spectrum Archive as the "destination_tier", you must specify a valid archive pool. This archive pool must be defined in the IBM Spectrum Archive that is configured on the IBM Spectrum Scale cluster node.

    Additionally, you must specify the name of the pools that are defined in the IBM Spectrum Archive configuration, with the prefix archive:. The pool names must be specified in the same format as defined in the -p option of IBM Spectrum Archive Enterprise Edition (EE) CLI (eeadm). The syntax must be as follows:
    archive:<poolName>@<libraryName> 
    or
    archive:<poolName1>@<libraryName1>,<poolName2>@<libraryName2>, ...

    The policy execution fails when you do not follow the instructions.

  12. Select Next Step to enter a schedule. The schedule indicates when you want to start the tiering.
  13. Select Next Step to review the policy.
  14. Select Submit to create the policy.
  15. When the policy is created, view the files on the IBM Spectrum Discover Search catalog page to ensure that they are moved to the new tier. The following metadata are updated:
    Tier
    Displays the name of the internal pool in IBM Spectrum Scale where the data is stored.
    Note: Even if the file is in the migrated state (migrtd), the tier field shows the name of the original internal pool.
    State
    Displays the current state as one of the following values:
    • resdnt
    • premig
    • migrtd
    Migloc
    Displays the location information of the external pool when the file is in premigrated (premig) or migrated (migrtd) state. If the file is in a resident (resdnt) state, this field shows NA.
    SizeConsumed
    Displays the actual size of the file in bytes, that is associated with the IBM Spectrum Scale file system field SizeConsumed Bytes.