Multiple connection managers

Multiple connection managers are a new capability that is designed to enhance scanning performance and enable parallel ingestion. It proves especially valuable in scenarios where data sources are geographically dispersed and need to be scanned as remote sources.

With this new capability, users can take advantage of two primary deployment scenarios:
  • The first scenario involves deploying multiple connection managers within the same cluster, allowing for efficient coordination and distribution of scanning tasks. It enables optimized resource utilization and faster processing of data.
  • The second scenario involves adding external nodes to the system and deploying one or more connection managers on these nodes. This distributed setup further enhances scanning performance by using extra computing resources and enabling parallel execution of scanning operations across multiple nodes.
    Note:
    • Multiple connection managers improve scanning performance, but it is necessary to understand that the increase in scanning speed does not increase in the performance of indexing records into the database.

      The indexing process might have its own limitations and dependencies that might impact overall performance.

    Example deployment scenario:
    • Main cluster located in Mexico with 2 connection managers deployments.
    • One remote worker located in France with 3 connection manager deployed to scan France data sources.
    • One remote worker located in Canada with 2 connection manager deployed to scan Canada data sources.
      Example:
      kind: SpectrumDiscover
      apiVersion: spectrum-discover.ibm.com/v1alpha1
      metadata:
        name: spectrumdiscover-sample
        namespace: discover
      spec:
        license:
          accept: true
        doInstall: true
        rwx_storage_class: ibmc-file-gold-gid
        connmgr:
          site: mexico
          replicas: 2
          extraLocations:
            - site: france
              locationType: remote
              replicas: 2
            - site: canada
              locationType: remote
              replicas: 3
              affinity:
        tolerations:
        - effect: PreferNoSchedule
          key: isd
          operator: Exists
      In case it's required to modify main location needs to be specified it onsite property as follows:
      connmgr:
          site: france
    Note: In case site does not exist in a statefulset as example.com or empty, for instance, internal scheduler assigns it to any type local connection manager available.