Configuration file

The configuration file is used by the Notifier and Replay.

The configuration file includes:

  • Information regarding the net
  • Runtime parameters for the Notifier and Replay
  • A list of vaults to scan

The configuration file is named scanner-settings.json and must sit in the /opt/ibm/metaocean/data/connections/cos/replay directory.

The rules for IBM Cloud® Object Storage Replay settings are:

  • All access logs are scanned.
  • All objects that are created or updated since Coordinated Universal Time (UTC) 00:00:01 from 11 April 2018 to Coordinated Universal Time (UTC) 10:01:53 on 21 September 2018 are scanned in batches of 1000.
  • Custom metadata is retrieved for each object or version.
  • Ten vaults are processed in parallel.
  • Each vault has a single process LIST that issues requests and 15 processes that issue HEAD requests.

The following example shows every setting. Most settings have default values and can be omitted, but these screens show a typical example by using default values.

Example of the Cloud Object Storage Replay settings
{
     "system": {
     "name": "Test dsnet",
     "uuid": "00000000-0000-0000-0000-000000000000",
     "manager_ip": "172.1.1.1",
     "accesser_ip": "172.1.1.2",
     "accesser_supports_https": false,
     "manager_username": "admin",
     "manager_password": "password",
     "is_ibm_cos": true
},
   "timestamps": {
     "min_utc": "2018-01-01T00:00:00Z",
     "max_utc": "2018-09-21T10:01:53Z"
},
   “policy_engine” : {
     “spectrum_discover_host”: ”modevvm32.tuc.stglabs.ibm.com”
     “user”: “sdadmin”,
     “password”: “password”
},
   "scanner": {
     "max_requests_per_second": 5000,
     "max_parallel_list": 10,
     "parallel_head_per_list": 5,
     "list_objects_size": 100
},   
"notifier":{
     "kafka_format": 1,
     "kafka_endpoint": "192.168.1.1:9092",
     "kafka_topic": "cos-le-connector-topic",
     "kafka_username": "cos",
     "kafka_password": "password",
     "kafka_pem": "-----BEGIN CERTIFICATE-----...\n-----END CERTIFICATE-----\n"
},   
   "logging": {
     "debug_log_max_bytes": 10000000,
     "debug_log_backup_count": 10000,
     "notification_log_max_bytes": 10000000,
     "notification_log_backup_count": 10000,
     "notification_log_all": true
},   
   "include_all_vaults": false,
   "has_custom_metadata": true,
   "override_warnings": true,
   "exclude-vaults": ["Manager"],
   "vaults": [
     {   
       "vault_name": "Vault-1"
     },
     {   
       "vault_name": "Vault-2",
       "has_custom_metadata": false
     },     
     {
       "vault_name": "Vault-3",
       "has_custom_metadata": false,
       "prefix": "customers/live"
     }
  ]
}
Typical Cloud Object Storage configuration settings

{
     "dsnet": {
     "name": "Test dsnet",
     "uuid": "00000000-0000-0000-0000-000000000000",
     "manager_ip": "172.1.1.1",
     "accesser_ip": "172.1.1.2",
     "accesser_supports_https": false,
     "manager_username": "admin",
     "manager_password": "password",
     "is_ibm_cos": true
},
   "timestamps": {
     "min_utc": "2018-01-01T00:00:00Z",
     "max_utc": "2018-09-21T10:01:53Z"
},
   “policy_engine” : {
     “spectrum_discover_host”: ”modevvm32.tuc.stglabs.ibm.com”
     “user”: “sdadmin”,
     “password”: “password”
},
   "scanner": {
     "max_requests_per_second": 5000,
     "max_parallel_list": 10,
     "parallel_head_per_list": 5,
     "list_objects_size": 100
},   
   "notifier":{
     "kafka_format": 1,
     "kafka_endpoint": "192.168.1.1:9092",
     "kafka_topic": "cos-le-connector-topic",
     "kafka_username": "cos",
     "kafka_password": "password",
     "kafka_pem": "-----BEGIN CERTIFICATE-----...\n-----END CERTIFICATE-----\n"
},   
   "logging": {
     "debug_log_max_bytes": 10000000,
     "debug_log_backup_count": 10000,
     "notification_log_max_bytes": 10000000,
     "notification_log_backup_count": 10000,
     "notification_log_all": true
},   
   "include_all_vaults": false,
   "has_custom_metadata": true,
   "override_warnings": true,
   "exclude-vaults": ["Manager"],
   "vaults": [
     {   
       "vault_name": "Vault-1"
     },
     {   
       "vault_name": "Vault-2",
       "has_custom_metadata": false
     },     
     {
       "vault_name": "Vault-3",
       "has_custom_metadata": false,
       "prefix": "customers/live"
     }
  ]
}

{
     "dsnet": { 
     "manager_ip": "192.168.2.106",
     "accesser_ip": "192.168.2.111"
},
     "timestamps": {
     "min_utc": "2018-04-11T00:00:01.000Z",
     "max_utc": "2018-09-21T10:01:53Z"
},   
     "scanner":{
        "max_requests_per_second": 5000
},
     "include_all_vaults": true
}
{
     "system": { 
     "manager_ip": "192.168.2.106",
     "accesser_ip": "192.168.2.111"
},
    “policy_engine” : {
     “spectrum_discover_host”: ”modevvm32.tuc.stglabs.ibm.com”
},
     "timestamps": {
     "min_utc": "2018-04-11T00:00:01.000Z",
     "max_utc": "2018-09-21T10:01:53Z"
},   
     "scanner":{
        "max_requests_per_second": 5000
},
     "include_all_vaults": true
}
IBM Cloud Object Storage Scanner is highly configurable. Each element in the file is described in Table 1.
Remember: IBM Spectrum® Discover does not support file or file path names that use characters that are not part of the UTF-8 character set.
Table 1. Explanation of the configuration file
Element Description Optional Default value Restart scanner if changed Restart notifier if changed
System section          
name Free-text name of the dsNet. Appears in the 'system_name' element in all Kafka messages. Retrieved from Manager API if configured. If not, the name does not appear in Kafka messages.
uuid UUID of the dsNet. Appears in the 'system_uuid' element in all Kafka messages. Retrieved from Manager API.
manager_ip Single IP address or host name of the manager device. Not applicable
accesser_ip Single IP address or host name of an accesser device or load balancer that routes to the accessers. Not applicable
accesser_supports_https Boolean value that indicates whether http or https can be used when you send requests to the accesser or load balancer. true
manager_username Username for accessing the manager API.

For testing only. Not to be used in production.

Supplied by user at prompt
manager_password Password for accessing the Manager API.

For testing only. Not to be used in production.

Supplied by user at prompt
is_ibm_cos Boolean value that indicates whether the system is an IBM Cloud Object Storage or another s3 compliant system. If true, the IBM® Get Bucket Extension is used to retrieve object keys from the vaults.
Note: Setting the value to false is not currently supported by the Scanner and Notifier.
True
accesser_access_key Access key ID for S3 calls to the accesses or load balancer.

For testing only. Not to be used in production.

Supplied by user at prompt if you cannot retrieve it from Manager API for the user account that is specified in dsNet/manager_ username.
accesser_secret_key Secret key for S3 calls to the accesser or load balancer.

For testing only. Not to be used in production.

Supplied by user at prompt if you cannot retrieve from Manager API.
Time stamps section          
min_utc Only objects or version in the vaults that have a LastModified datetime on or after this timestamp is submitted to IBM Spectrum Discover.

Needs to be less than the max_utc value.

Note: Changing min_utc and restarting scanner applies only to objects not yet scanned. Objects scanned before restart might have a LastModifiedDate value that is earlier than the min_utc value.
 

See note.

max_utc Only objects or version in the vaults that have a LastModified datetime on or before this time stamp is submitted to IBM Spectrum Discover. Needs to be more than min_utc and less than current time.
Note: Changing max_utc to a more recent time and restarting does not mean that new objects written since the old max_utc is scanned. The scanner continues from the last object’s key that is scanned in lexicographic order. This means that new objects with names smaller than the last object scanned are not scanned.
 

See note.

Policy engine section  

(Only required for IBM Spectrum Discover 2.0.0.3 and later)

     
spectrum_discover_host Host name or IP address of the policy engine service from which the Kafka certificate is retrieved. none
user Username for authorization on policy engine. none
password Password for authorization on policy engine. none
Replay section          
access_log_directory The access_log_directory is where the dsNet access log files are stored after download. Access logs must be in the root input folder. Files in subdirectories are not processed. [IBM Cloud Object Storage Replay]/ access_logs Restart Replay if changed Restart Replay if changed
download If download is set to false, access logs are not downloaded and are assumed to already be present in access_log_directory. true Restart Replay if changed Restart Replay if changed
Notifier section        
kafka_format Format of the Kafka message. 1
kafka_endpoint IP address and port of the Kafka endpoint. Retrieved from Manager API
kafka_topic Name of the Kafka topic. Retrieved from Manager API
kafka_username The username for authentication with Kafka.
Note: For testing only. Not to be used in production.
Supplied by user at prompt if you cannot retrieve from Manager API.
kafka_password The password for authentication with Kafka.
Note: For testing only. Not to be used in production.
Supplied by user at prompt if cannot be retrieved from Manager API.
kafka_pem The certificate PEM for authentication with Kafka. Must include '\n' characters to ensure correct formatting.
Note: For testing only. Not to be used in production.
Supplied by user at prompt if it cannot be retrieved from the system
Logging section        
debug_log_max_bytes The scanner.debug and notifier.debug roll over when this size is reached. 1,000,000
debug_log_backup_count The number of scanner.debug and notifier.debug files to retain. 10
notification_log_max_b The notification.log rolls over when this size is reached. 1,000,000
notification_log_backup_count The number of notification.log files to retain. 10
notification_log_all Boolean value that controls the level of Notifier logging.

When true: an entry is written to notification.log for message you send to the Kafka cluster.

When false: only failed sends are written to notification.log.

False
Root-level items          
include_all_vaults Boolean value that determines whether all vaults in the dsNet are scanned. If false, the details of the vaults to be scanned must be specified in the 'vaults' element.

Boolean value that determines whether custom metadata and content type are retrieved for each object by using individual HEAD requests.

False
has_custom_metadata This value is only relevant when a versioned vault is scanned. For IBM Cloud Object Storage systems, non-versioned vaults always require a HEAD request for every object. Can be overridden for each vault in the 'vaults' element. True
override_warnings Boolean value that allows the scanner to run and ignore any warnings that are generated on start-up. For example, a warning is raised on start-up if versioning is suspended on a vault. False
exclude_vaults Comma-separated list of vault names to be excluded from scanning, such as:
"exclude-vaults": ["COSVault", "COSVault-V"]
[]

Empty list

vaults List of vaults to be scanned. If include_all_vaults is true, the vaults list can be left empty.

This list can be used to define more detailed scanning parameters for individual vaults. Any settings that are defined here take precedence over the settings that are described.

Each element in the list contains:

The vault_name is the name of the vault.

The has_custom_metadata is an optional Boolean that overrides the has_custom_metadata that is described.

The prefix is an optional string that is used to filter the objects or versions that are retrieved from the vault.

Dependent on settings include_all_vaults and exclude_vaults