Configuration file
The configuration file is used by the Notifier and Replay.
The configuration file includes:
- Information regarding the net
- Runtime parameters for the Notifier and Replay
- A list of vaults to scan
The configuration file is named scanner-settings.json and must sit in the /opt/ibm/metaocean/data/connections/cos/replay directory.
The rules for IBM Cloud® Object Storage Replay settings are:
- All access logs are scanned.
- All objects that are created or updated since Coordinated Universal Time (UTC) 00:00:01 from 11 April 2018 to Coordinated Universal Time (UTC) 10:01:53 on 21 September 2018 are scanned in batches of 1000.
- Custom metadata is retrieved for each object or version.
- Ten vaults are processed in parallel.
- Each vault has a single process LIST that issues requests and 15 processes that issue HEAD requests.
The following example shows every setting. Most settings have default values and can be omitted, but these screens show a typical example by using default values.
{
"system": {
"name": "Test dsnet",
"uuid": "00000000-0000-0000-0000-000000000000",
"manager_ip": "172.1.1.1",
"accesser_ip": "172.1.1.2",
"accesser_supports_https": false,
"manager_username": "admin",
"manager_password": "password",
"is_ibm_cos": true
},
"timestamps": {
"min_utc": "2018-01-01T00:00:00Z",
"max_utc": "2018-09-21T10:01:53Z"
},
“policy_engine” : {
“spectrum_discover_host”: ”modevvm32.tuc.stglabs.ibm.com”
“user”: “sdadmin”,
“password”: “password”
},
"scanner": {
"max_requests_per_second": 5000,
"max_parallel_list": 10,
"parallel_head_per_list": 5,
"list_objects_size": 100
},
"notifier":{
"kafka_format": 1,
"kafka_endpoint": "192.168.1.1:9092",
"kafka_topic": "cos-le-connector-topic",
"kafka_username": "cos",
"kafka_password": "password",
"kafka_pem": "-----BEGIN CERTIFICATE-----...\n-----END CERTIFICATE-----\n"
},
"logging": {
"debug_log_max_bytes": 10000000,
"debug_log_backup_count": 10000,
"notification_log_max_bytes": 10000000,
"notification_log_backup_count": 10000,
"notification_log_all": true
},
"include_all_vaults": false,
"has_custom_metadata": true,
"override_warnings": true,
"exclude-vaults": ["Manager"],
"vaults": [
{
"vault_name": "Vault-1"
},
{
"vault_name": "Vault-2",
"has_custom_metadata": false
},
{
"vault_name": "Vault-3",
"has_custom_metadata": false,
"prefix": "customers/live"
}
]
}
{
"dsnet": {
"name": "Test dsnet",
"uuid": "00000000-0000-0000-0000-000000000000",
"manager_ip": "172.1.1.1",
"accesser_ip": "172.1.1.2",
"accesser_supports_https": false,
"manager_username": "admin",
"manager_password": "password",
"is_ibm_cos": true
},
"timestamps": {
"min_utc": "2018-01-01T00:00:00Z",
"max_utc": "2018-09-21T10:01:53Z"
},
“policy_engine” : {
“spectrum_discover_host”: ”modevvm32.tuc.stglabs.ibm.com”
“user”: “sdadmin”,
“password”: “password”
},
"scanner": {
"max_requests_per_second": 5000,
"max_parallel_list": 10,
"parallel_head_per_list": 5,
"list_objects_size": 100
},
"notifier":{
"kafka_format": 1,
"kafka_endpoint": "192.168.1.1:9092",
"kafka_topic": "cos-le-connector-topic",
"kafka_username": "cos",
"kafka_password": "password",
"kafka_pem": "-----BEGIN CERTIFICATE-----...\n-----END CERTIFICATE-----\n"
},
"logging": {
"debug_log_max_bytes": 10000000,
"debug_log_backup_count": 10000,
"notification_log_max_bytes": 10000000,
"notification_log_backup_count": 10000,
"notification_log_all": true
},
"include_all_vaults": false,
"has_custom_metadata": true,
"override_warnings": true,
"exclude-vaults": ["Manager"],
"vaults": [
{
"vault_name": "Vault-1"
},
{
"vault_name": "Vault-2",
"has_custom_metadata": false
},
{
"vault_name": "Vault-3",
"has_custom_metadata": false,
"prefix": "customers/live"
}
]
}
{
"dsnet": {
"manager_ip": "192.168.2.106",
"accesser_ip": "192.168.2.111"
},
"timestamps": {
"min_utc": "2018-04-11T00:00:01.000Z",
"max_utc": "2018-09-21T10:01:53Z"
},
"scanner":{
"max_requests_per_second": 5000
},
"include_all_vaults": true
}
{
"system": {
"manager_ip": "192.168.2.106",
"accesser_ip": "192.168.2.111"
},
“policy_engine” : {
“spectrum_discover_host”: ”modevvm32.tuc.stglabs.ibm.com”
},
"timestamps": {
"min_utc": "2018-04-11T00:00:01.000Z",
"max_utc": "2018-09-21T10:01:53Z"
},
"scanner":{
"max_requests_per_second": 5000
},
"include_all_vaults": true
}
Element | Description | Optional | Default value | Restart scanner if changed | Restart notifier if changed |
---|---|---|---|---|---|
System section | |||||
name |
Free-text name of the dsNet. Appears in the 'system_name' element in all Kafka messages. | ✓ | Retrieved from Manager API if configured. If not, the name does not appear in Kafka messages. | ✓ | ✗ |
uuid |
UUID of the dsNet. Appears in the 'system_uuid' element in all Kafka messages. | ✓ | Retrieved from Manager API. | ✓ | ✗ |
manager_ip
|
Single IP address or host name of the manager device. | ✗ | Not applicable | ✓ | ✗ |
accesser_ip |
Single IP address or host name of an accesser device or load balancer that routes to the accessers. | ✗ | Not applicable | ✓ | ✗ |
accesser_supports_https |
Boolean value that indicates whether http or https can be used when you send requests to the accesser or load balancer. | ✓ | true | ✓ | ✗ |
manager_username |
Username for accessing the manager API. For testing only. Not to be used in production. |
✓ | Supplied by user at prompt | ✓ | ✗ |
manager_password |
Password for accessing the Manager API. For testing only. Not to be used in production. |
✓ | Supplied by user at prompt | ✓ | ✗ |
is_ibm_cos |
Boolean value that indicates whether the system is an IBM Cloud Object
Storage or another s3 compliant system. If true,
the IBM® Get Bucket Extension is used to retrieve object keys
from the vaults. Note: Setting the value to false is not currently supported by the Scanner and Notifier.
|
✓ | True | ✓ | ✗ |
accesser_access_key |
Access key ID for S3 calls to the accesses or load balancer. For testing only. Not to be used in production. |
✓ | Supplied by user at prompt if you cannot retrieve it from Manager API for the user account that is specified in dsNet/manager_ username. | ✓ | ✗ |
accesser_secret_key |
Secret key for S3 calls to the accesser or load balancer. For testing only. Not to be used in production. |
✓ | Supplied by user at prompt if you cannot retrieve from Manager API. | ✓ | ✗ |
Time stamps section | |||||
min_utc
|
Only objects or version in the vaults that have a LastModified datetime on
or after this timestamp is submitted to IBM Spectrum
Discover. Needs to be less than the Note: Changing
min_utc and restarting scanner applies only to objects not yet
scanned. Objects scanned before restart might have a LastModifiedDate value that is
earlier than the min_utc value. |
✗ | ✓ See note. |
✗ | |
max_utc
|
Only objects or version in the vaults that have a LastModified datetime on
or before this time stamp is submitted to IBM Spectrum
Discover. Needs to be more than min_utc and less than current time.Note: Changing
max_utc to a more recent time and restarting does not mean that new
objects written since the old max_utc is scanned. The scanner continues from the
last object’s key that is scanned in lexicographic order. This means that new objects with names
smaller than the last object scanned are not scanned. |
✓ | ✓ See note. |
✗ | |
Policy engine section |
(Only required for IBM Spectrum Discover 2.0.0.3 and later) |
||||
spectrum_discover_host |
Host name or IP address of the policy engine service from which the Kafka certificate is retrieved. | ✗ | none | ✓ | ✓ |
user |
Username for authorization on policy engine. | ✗ | none | ✓ | ✓ |
password |
Password for authorization on policy engine. | ✗ | none | ✓ | ✓ |
Replay section | |||||
access_log_directory
|
The access_log_directory is where the dsNet access log files are stored after download. Access logs must be in the root input folder. Files in subdirectories are not processed. | ✓ | [IBM Cloud Object Storage Replay]/ access_logs | Restart Replay if changed | Restart Replay if changed |
download |
If download is set to false, access logs are not downloaded and are assumed to already be present in access_log_directory. | ✓ | true | Restart Replay if changed | Restart Replay if changed |
Notifier section | ✓ | ||||
kafka_format |
Format of the Kafka message. | ✓ | 1 | ✗ | ✓ |
kafka_endpoint |
IP address and port of the Kafka endpoint. | ✓ | Retrieved from Manager API | ✗ | ✓ |
kafka_topic
|
Name of the Kafka topic. | ✓ | Retrieved from Manager API | ✗ | ✓ |
kafka_username |
The username for authentication with Kafka. Note: For testing only. Not to be used in
production.
|
✓ | Supplied by user at prompt if you cannot retrieve from Manager API. | ✗ | ✓ |
kafka_password
|
The password for authentication with Kafka. Note: For testing only. Not to be used in
production.
|
✓ | Supplied by user at prompt if cannot be retrieved from Manager API. | ✗ | ✓ |
kafka_pem |
The certificate PEM for authentication with Kafka. Must include '\n' characters to ensure
correct formatting. Note: For testing only. Not to be used in production.
|
✓ | Supplied by user at prompt if it cannot be retrieved from the system | ✗ | ✓ |
Logging section | |||||
debug_log_max_bytes |
The scanner.debug and notifier.debug roll over when this size is reached. | ✓ | 1,000,000 | ✓ | ✓ |
debug_log_backup_count |
The number of scanner.debug and notifier.debug files to retain. | ✓ | 10 | ✓ | ✓ |
notification_log_max_b
|
The notification.log rolls over when this size is reached. | ✓ | 1,000,000 | ✓ | ✓ |
notification_log_backup_count
|
The number of notification.log files to retain. | ✓ | 10 | ✓ | ✓ |
notification_log_all
|
Boolean value that controls the level of Notifier logging. When true: an entry is written to notification.log for message you send to the Kafka cluster. When false: only failed sends are written to notification.log. |
✓ | False | ✗ | ✓ |
Root-level items | |||||
include_all_vaults
|
Boolean value that determines whether all vaults in the dsNet are scanned. If false, the
details of the vaults to be scanned must be specified in the 'vaults' element. Boolean value that determines whether custom metadata and content type are retrieved for each object by using individual HEAD requests. |
✓ | False | ✓ | ✗ |
has_custom_metadata |
This value is only relevant when a versioned vault is scanned. For IBM Cloud Object Storage systems, non-versioned vaults always require a HEAD request for every object. Can be overridden for each vault in the 'vaults' element. | ✓ | True | ✓ | ✗ |
override_warnings
|
Boolean value that allows the scanner to run and ignore any warnings that are generated on start-up. For example, a warning is raised on start-up if versioning is suspended on a vault. | ✓ | False | ✓ | ✗ |
exclude_vaults |
Comma-separated list of vault names to be excluded from scanning, such
as:
|
✓ | [] Empty list |
✓ | ✗ |
vaults |
List of vaults to be scanned. If include_all_vaults is true, the
vaults list can be left empty. This list can be used to define more detailed scanning parameters for individual vaults. Any settings that are defined here take precedence over the settings that are described. Each element in the list contains:The vault_name is the name of the vault. The has_custom_metadata is an optional Boolean that overrides the has_custom_metadata that is described. The prefix is an optional string that is used to filter the objects or versions that are retrieved from the vault. |
✓ | Dependent on settings include_all_vaults and exclude_vaults | ✓ | ✗ |