Data Cataloging

Data Cataloging service is a modern metadata management software that provides data insight for exabyte-scale heterogeneous file, object, backup, and archive storage on premises and in the cloud. It can help you manage your unstructured data by reducing the data storage costs, uncovering hidden data value, and reducing the risk of massive data stores.

Before you begin

  • Meet the system requirements to install the Data Cataloging service.
  • The following details are a base line for finding the resources that are needed for IBM Storage Fusion Data Cataloging service deployment. Based on the following tables, the resources can be estimated based on the number of approximate files that are required. The following are the resource values that are calculated per compute node: You must have at least two worker nodes, each with the same amount of resources available.
  • IBM Storage Fusion Data Cataloging service must have dedicated compute resources. Make sure that you have enough to cover the resources limits to perform as expected:
    Table 1. Profile requirements
    CPU RAM Disk space Network Storage Workload
    77 162 GB 120 GB 10 GB 500 GB 500 M
  • The standard deployment for Data Cataloging service project requests and limits:
    Table 2. OpenShift Container Platform requests and limits
    Custom resources Limits
    CPU requests 13400 m
    CPU limits 76500 m
    Memory requests 27278 Mi
    Memory limits 153628 Mi
  • Important: For the Data Cataloging service to run successfully on all platforms, ensure that the storage classes have the following attributes:
    • ReadWriteMany (RWX) permissions
    • volumeBindingMode set to Immediate
    • AllowVolumeExpansion set to true
  • If you have not configured the IBM operator catalog, then configure it. For the procedure to add IBM operator catalog, see Adding the IBM operator catalog.
  • Go through troubleshooting information related to the installation of Data Cataloging. See Data Cataloging service issues.

About this task

Important: If you have OpenShift® Container Platform version 4.15, then you cannot install the Data Cataloging service.

Procedure

  1. Go to Services page in IBM Storage Fusion user interface.
  2. In the Available section, click Data cataloging tile.
  3. In the Data cataloging window, go through the details of the service and click Install.
  4. In the Install service message box, select a Storage class.
    Important: If you want to use Global Data Platform as the storage provider, then it is recommended to select the default storage class ibm-spectrum-fusion. Otherwise, if you want to use Fusion Data Foundation, then select the ocs-storagecluster-cephfs storage class. You can also use a custom storage class that matches the requirements.
  5. Click Install. In case of failures, go through the downloaded logs to understand the cause of the failure and fix the issue. For more information about service issues in IBM Storage Fusion, see Troubleshooting installation and upgrade issues in IBM Storage Fusion services.
  6. Validate the installation.
    • IBM Storage Fusion user interface:

      After you enable the Data Cataloging service, you can view the service version and health status. From the ellipsis menu, you can download logs and view documentation. After you successfully collect the logs, a success notification gets displayed. The notification disappears automatically after some time.

      Table 3. Health states Data Cataloging service
      State Description
      Installing Service installation is in progress
      Upgrading Service upgrade is in progress
      Healthy Service is healthy
      Degraded Service is not healthy