Known issues

The known issues in IBM Spectrum Scale DAS 5.1.7 release and possible workarounds are as follows:

S3 service creation fails with the error "Something went wrong while processing the request."

Once the IBM Spectrum Scale DAS is deployed, when you use the mmdas command to create the S3 service, the command might fail.

For example,
mmdas service create s3 --acceptLicense --ipRange 192.0.2.13-192.0.2.15
Something went wrong while processing the request.
Check 'ibm-spectrum-scale-das-endpoint' pod logs in 'ibm-spectrum-scale-das' namespace for more details
Try using the IBM Spectrum Scale DAS REST API to check if there is an issue with the REST API interface as well:
curl -k -u s3-admin -X GET -H "accept: application/json" https://<ibm-spectrumscale_host>/scalemgmt/v2/das/services
Enter host password for user 's3-admin':

Error 401: SRVE0295E: Error reported: 401
If there is an error when you use the IBM Spectrum Scale DAS REST API as well, check if the IBM Spectrum Scale GUI REST API is working fine:
curl -kv -u 's3-admin' https://<ibm-spectrumscale_host>/scalemgmt/v2/filesystems
Trying x.x.x.x
TCP_NODELAY set
Connected to <ibm-spectrumscale_host> port 443 (#0)
..

Error 401: SRVE0295E: Error reported: 401

If using the IBM Spectrum Scale REST API also results in an error, it indicates that there might be an issue with the user authentication. The user 's3-admin' created for IBM Spectrum Scale DAS might be deleted or its password might have expired. If that is the case, resolve the issue and then retry.

Otherwise, there might be an issue with the IBM Spectrum Scale GUI pod.
Workaround
  1. Restart the GUI pods in the IBM Spectrum Scale namespace by issuing the following command:
    oc delete pod <gui-0> <gui-1>
  2. After the new GUI pods are up and running, check if the REST API interface to access IBM Spectrum Scale filesystems or das/services is working fine.

If the REST API is working, the mmdas command should also work as expected.

Note: This issue can also occur while running the mmdas service list command. If you see the error message, apply the same workaround.

I/O gets interrupted if the node running the noobaa-core and noobaa-db pods goes down

If the noobaa-core and noobaa-db pods are running on the same node and that node goes down, I/O might get interrupted.

Note: Endpoint refers to NooBaa endpoints.

This issue occurs because it takes approximately 6 minutes for the noobaa-db pod to come online. During this time, the noobaa-core pod cannot communicate with the noobaa-db pod, which cause the I/O interruption.

Workaround
Use the oc get pods command on the openshift-storage namespace to check the state of the noobaa-db pod. Once the state of the noobaa-db pod changes to Running, I/O resumes.

I/O gets interrupted due to IBM Spectrum Scale container native update

The IBM Spectrum Scale container native update reboots each node. Due to the duration of each reboot, this concurrent update can take around 20 to 45 minutes. Administrators should plan for intermittent I/O outage for this duration.

Workaround

This is currently a limitation in IBM Spectrum Scale DAS.

Unable to create new accounts or exports during noobaa-db pod migration

If the node on which the noobaa-db pod is running is shutdown, new accounts or exports cannot be created for some time.

This issue occurs because it takes approximately 6 minutes for the noobaa-db pod to be migrated to another node. During this time, you cannot create new accounts or exports.

Workaround
Use the oc get pods command on the openshift-storage namespace to check the state of the noobaa-db pod. Once the state of the noobaa-db pod changes to Running, you can create new accounts or exports.

mmdas commands might fail with could not open file "global/pg_filenode.map"

Commands such as mmdas account list and mmdas export list might fail with the following error message:
could not open file "global/pg_filenode.map": Permission denied

This error occurs when one of the node's interfaces goes down and the NooBaa database pods were running on that node.

Workaround
Start the interface by applying the network policy with the nmstate command. For more information, see Updating node network configuration in Red Hat OpenShift Container Platform documentation.
Tip: You can use oc get nncp or oc get nnce to verify if the network policy is configured.

Changing scaleFactor might result in I/O failure

If you change the scale factor of the S3 service during active I/O, I/O failures might occur.

For example, consider a scenario in which the S3 service was initially created with a scaleFactor of 2. If you reduce the scaleFactor to 1 during active I/O, you might encounter I/O failures.
  • These failures occur because when you change the scaleFactor to 1, Kubernetes initiates a cleanup as the number of endpoints need to be reduced.
  • This cleanup results in skewed distribution of endpoints between the nodes such that on some nodes the number of endpoints might be high while on other nodes the number of endpoints might reduce to 0. This unbalanced configuration might lead to I/O failures.
Workaround

To avoid this unbalanced configuration, plan and configure the scaleFactor at the time of S3 service creation according to your requirements to ensure that the distribution of endpoints does not become skewed.

If you must change the scaleFactor, plan it during a maintenance window when there is no active I/O.

Account creation fails with the EOF message

Account creation by using the mmdas account create command might fail with the EOF message.
mmdas account create s3user1@example.com --gid 9999 --uid 8003 --newBucketsPath /mnt/fs_s3user1/exmp1

EOF
Workaround
Retry creating the account by using the mmdas account create command:
mmdas account list

No Accounts Available

mmdas account create s3user1@example.com --gid 9999 --uid 8003 --newBucketsPath /mnt/fs_s3user1/exmp1


Account created successfully, below are the secret and access keys
 Secret Key                                     Access Key
 ----------                                     -----------
 09PSsA/4zxV92X/Da30D7seOzaW4AXn7dps40Azh       w2g9l8NthQDWTIxAIG28

mmdas account list

 Name                   UID     GID     New buckets path
 ----                   ---     ---     ----------------
 s3user1@example.com    8003    9999    /mnt/fs_s3user1/exmp1

Export creation fails with the INVALID_READ_RESOURCES error

S3 export creation might fail with the following error message:
"message": "INVALID_READ_RESOURCES"

This error is triggered if the NooBaa namespace store is in the Rejected phase. This namespace store is created for the IBM Spectrum Scale data backend and it is configured with the S3 service.

Workaround
Before you create exports, use the following command to ensure that the NooBaa namespace store is not in the Rejected phase.
oc get namespacestore -n openshift-storage
If the namespace is in the Rejected state, the customer should do some checks, such as:
  • Basic file system mount check
  • Ensure that CNSA and CSI pods are working
  • Ensure PVC is bound
  • Check the IBM Spectrum Scale DAS operator logs and make sure that service creation is logged

S3 service instance is in the FAILED state upon its creation

The S3 service instance might be in the FAILED state after its creation.

Workaround

If the S3 service instance is in the FAILED state, refer to the IBM Spectrum Scale DAS operator logs to determine the cause and then take appropriate action to resolve the issue.

Account names that contain special characters trigger error

You cannot use special characters in account names. For example,
user@12#

Account names that contain special characters are not supported.

Workaround

Do not use special characters in account name.

Slow reader applications might lose S3 access to data

Applications that request IBM Spectrum Scale DAS to deliver data through read access and consume the delivered data very slowly, might lose S3 access to data. For such workloads, when a slow reader disconnects without draining the requested data first, the endpoint might fail to clean up its internal state. This accumulates and eventually causes all applications to lose S3 access to data. The only known workload which causes this issue is to run COSBench with the hashCheck=true option.
Workaround
  • To resolve this issue, restart the NooBaa endpoint pods.
  • There is no data loss or data corruption.

IBM Spectrum Scale DAS does not verify MD5 checksums, in case MD5 based Etags are disabled

IBM Spectrum Scale DAS does not verify MD5 checksums sent by clients using the optional Content-MD5 header of HTTP requests, in case MD5 based Etags are disabled.
Workaround
Customers who desire that Content-MD5 headers get validated, must enable the generation of MD5 based Etags by enabling via the S3 service.

IBM Spectrum Scale DAS does not properly fail-over the IP address

When a Data Access Node loses the high-speed network, then IBM Spectrum Scale DAS does not properly fail-over the IP address to one of the two other Data Access Nodes.
Workaround
To resolve this issue, shutdown the Red Hat OpenShift node to get all IP addresses moved to the other nodes. Then resolve the network issue and restart the Red Hat OpenShift node.

The IBM Spectrum Scale file system must have sufficient space while writing S3 objects

When writing S3 objects, ensure that the IBM Spectrum Scale file system has sufficient space because IBM Spectrum Scale DAS creates temporary files to process incoming data. For instance, writing a 30 GB object requires up to additional 30 GB temporary space in the file system, until the upload request is completed.

Workaround

This is a prerequisite of IBM Spectrum Scale DAS for writing S3 objects.

Performance degrade of S3 applications while connecting to more than one data access node

The performance of S3 applications may degrade in case that they connect to more than one IBM Spectrum Scale data access node and write objects that are stored in the same directory as of the underlying IBM Spectrum Scale file system.

Workaround
Ensure that such workloads use the same IP address for S3 access, so that this workload is handled from a single data access node.

Uneven distribution of NooBaa endpoint pods

The scaling factor determines the number of NooBaa endpoint pods which run on each data access node. The NooBaa endpoint pods shall be evenly distributed. For instance, with a scaling factor of four, each data access node should run four NooBaa endpoint pods. The decrease of the scaling factor like, reducing the scaling factor from four to three and certain infrastructure issues can lead to an uneven distribution of NooBaa endpoint pods. IBM Spectrum Scale DAS tries to correct this by terminating imbalanced NooBaa endpoint pods and directing the Kubernetes scheduler where to start new NooBaa endpoint pods. However, this correction is not always successful, at least one noobaa-endpoint runs on each DAN node either by scaling up or down.

Workaround

This is currently a limitation in IBM Spectrum Scale DAS.

When noobaa-core and noobaa-db pod running node is made down

As per the current design, noobaa-db pod would take few minutes (around 6+ minutes) to get into the Running state as it is moved to other node. In the interim, there is a possibility of I/O loss, which is expected as the Object Interface is not in healthy state. Once noobaa-db get into the Running state and the connection establishes between the two (that is, noobaa-core and noobaa-db) the I/O will be able to continue and new I/O requests will be serviced.

Workaround

This is currently a limitation in IBM Spectrum Scale DAS.

Warp workload fails occasionally with “The specified key does not exist” error

Warp I/O workload run into an error occasionally with the "The specified key does not exist" message.

Warp version:
warp --version
warp version 0.5.5 - 1baadbc

Monitor NooBaa endpoint logs to check whether the highlighted error is displayed.

When warp starts failing, the following error is observed in the NooBaa endpoint logs:
Sep-26 6:32:07.896 [Endpoint/14] [ERROR] CONSOLE:: RPC._on_request: ERROR srv object_api.update_endpoint_stats reqid 19524@fcall://fcall(7om8vqvf) connid fcall://fcall(7om8vqvf) AssertionError [ERR_ASSERTION]: _id must be unique. found 2 rows with _id=undefined in table bucketstats
Sep-26 6:32:07.897 [Endpoint/14] [ERROR] core.rpc.rpc:: RPC._request: response ERROR srv object_api.update_endpoint_stats reqid 19524@fcall://fcall(7om8vqvf) connid fcall://fcall(7om8vqvf) params { namespace_stats: [ { io_stats: { read_count: 2199279, write_count: 929200, read_bytes: 55346668240896, write_bytes: 13374358598656, error_write_bytes: 0, error_write_count: 0, error_read_bytes: 0, error_read_count: 0 }, namespace_resource_id: '632d5b3674e74100298682d4' }, [length]: 1 ], bucket_counters: [ { bucket_name: SENSITIVE-d11ed9bf0f42c55a, content_type: 'application/octet-stream', read_count: 1055154, write_count: 358804 }, { bucket_name: SENSITIVE-40584c364915f5f3, content_type: 'application/octet-stream', read_count: 1144123, write_count: 374277 }, [length]: 2 ] } took [8.8+0.4=9.2] [RpcError: _id must be unique. found 2 rows with _id=undefined in table bucketstats] { rpc_code: 'INTERNAL', rpc_data: { retryable: true } }
Sep-26 6:32:07.897 [Endpoint/14] [ERROR] core.sdk.endpoint_stats_collector:: failed on update_endpoint_stats. trigger_send_stats again [RpcError: _id must be unique. found 2 rows with _id=undefined in table bucketstats] { rpc_code: 'INTERNAL', rpc_data: { retryable: true } }
Sep-26 6:32:37.907 [Endpoint/14] [ERROR] core.util.postgres_client:: updateOneWithClient failed { system: 632d5af574e74100298682c0, bucket: 632f441da43595b2582184de, content_type: 'application/octet-stream' } { '$set': { last_write: 1664173957897, last_read: 1664173957897, system: 632d5af574e74100298682c0, bucket: 632f441da43595b2582184de, content_type: 'application/octet-stream' }, '$inc': { writes: 358804, reads: 1055154 } } UPDATE bucketstats SET data = jsonb_set(jsonb_set(jsonb_set(jsonb_set(jsonb_set(jsonb_set(jsonb_set(data,'{content_type}','"application/octet-stream"'),'{bucket}','"632f441da43595b2582184de"'),'{system}','"632d5af574e74100298682c0"'),'{last_read}','1664173957897'::jsonb),'{last_write}','1664173957897'::jsonb),'{reads}',to_jsonb(COALESCE(Cast(data->>'reads' as numeric),0)+1055154)),'{writes}',to_jsonb(COALESCE(Cast(data->>'writes' as numeric),0)+358804)) WHERE (data->>'system'='632d5af574e74100298682c0' and data->>'bucket'='632f441da43595b2582184de' and data->>'content_type'='application/octet-stream') RETURNING _id, data AssertionError [ERR_ASSERTION]: _id must be unique. found 2 rows with _id=undefined in table bucketstats
Workaround
  1. Check noobaa-db pod in openshift-storage namespace by using the following commands:
    oc rsh noobaa-db-pg-0
    psql -U postgres
    \c nbcore
  2. Identify the duplicate record by using the following query:
    SELECT data->>'bucket' as bucket,
                data->>'system' as system,
                jsonb_agg(jsonb_build_object('_id', _id)) as ids
                FROM bucketstats
                GROUP BY 1,2
                HAVING count(*) > 1;
    Check the record for which duplicate entries exist shown in the following example:
    nbcore=# select * from bucketstats where (data->>'system'='632431b4cab31d0029558440' and data->>'bucket'='63243a12cab31d0029558478' and data->>'content_type'='application/octet-stream');
               _id            |                                               data
    --------------------------+----------------------------------------------------------------------------------------------------------------------------------
    ----------------------------------------------------------------------------------------------------------------------------------
     63243c108d5458000e5c5ea7 | {"_id": "63243c108d5458000e5c5ea7", "reads": 129634826905, "bucket": "63243a12cab31d0029558478", "system": "632431b4cab31d0029558
    440", "writes": 43169720959, "last_read": 1663676369913, "last_write": 1663676369913, "content_type": "application/octet-stream"}
     63243c10c781ba000e15953d | {"_id": "63243c10c781ba000e15953d", "reads": 129634807954, "bucket": "63243a12cab31d0029558478", "system": "632431b4cab31d0029558
    440", "writes": 43169713464, "last_read": 1663676369913, "last_write": 1663676369913, "content_type": "application/octet-stream"}
    (2 rows)
    The example shows two entries for a record, delete one of them as shown in the next step.
  3. Delete the duplicate entry by using the following command:
    nbcore=# delete from bucketstats where (data->>'system'='632431b4cab31d0029558440' and data->>'bucket'='63243a12cab31d0029558478' and data->>'content_type'='application/octet-stream' and data->>'_id'='63243c108d5458000e5c5ea7');
    DELETE 1
    nbcore=#
    
  4. Exit the noobaa-db pod shell.

S3 service update with some combinational flags is not honored

When S3 service is updated with the combination of flags enableMD5/disableMD5 and scaleFactor, then the scaleFactor flag is only honored. The enableMD5 flag value remains unchanged.

For example,
mmdas service update s3 --enableMD5 --scaleFactor 2
Workaround
Update the S3 service with scaleFactor and enableMD5/disableMD5 flags individually one after another.
For example,
mmdas service update s3 --enableMD5
mmdas service update s3 --scaleFactor 2

mmdas command fails with the error "Something went wrong while processing the request"

After the IBM Spectrum Scale DAS deployment, when you run any mmdas command, the command might fail.

For example:
mmdas service list 
Something went wrong while processing the request.
Check 'ibm-spectrum-scale-das-endpoint' pod logs in 'ibm-spectrum-scale-das' namespace for more details
Try using the IBM Spectrum Scale DAS REST API to check if there is an issue with the REST API interface as well:
curl -k -u s3-admin -X GET -H "accept: application/json" https://<ibm-spectrumscale_host>/scalemgmt/v2/das/services
Enter host password for user 's3-admin':
Sample output:

Error 403: SRVE0295E: Error reported: 403

403 is forbidden http return code which refers to the multiple attempts with invalid password and user is locked.

Workaround
  1. Remove s3 admin user from GUI pods in the IBM Spectrum Scale namespace and create new user, as shown in the following example:
    oc exec -c liberty ibm-spectrum-scale-gui-0 -n ibm-spectrum-scale -- /usr/lpp/mmfs/gui/cli/rmuser s3-admin
    EFSSG0021I The user s3-admin has been successfully removed.
    EFSSG1000I The command completed successfully.
    oc exec -c liberty ibm-spectrum-scale-gui-0 -n ibm-spectrum-scale -- /usr/lpp/mmfs/gui/cli/lsuser       
    EFSSG0100I There are no values to return.
    oc exec -c liberty ibm-spectrum-scale-gui-0 -n ibm-spectrum-scale -- /usr/lpp/mmfs/gui/cli/mkuser s3-admin -p Passw0rd -g 'ProtocolAdmin'
    EFSSG0019I The user s3-admin has been successfully created.
    EFSSG1000I The command completed successfully.
    oc exec -c liberty ibm-spectrum-scale-gui-0 -n ibm-spectrum-scale -- /usr/lpp/mmfs/gui/cli/lsuser       
    Name     Long name Password status Group names   Failed login attempts Disable Password Expiry Target Feedback Date
    s3-admin           active          ProtocolAdmin 0                     FALSE
    EFSSG1000I The command completed successfully.
  2. Delete das-gui-user secret from IBM Spectrum Scale DAS namespace, then create new secret, as shown in the following example:
    oc delete secret das-gui-user
    oc -n ibm-spectrum-scale-das create secret generic das-gui-user --from-literal=username='s3-admin' --from-literal=password='Passw0rd'
    

Performance degradation for read of small objects

When using Red Hat OpenShift Data Foundation (ODF) 4.12 with IBM Spectrum Scale DAS 5.1.7, performance degradation may be observed when doing read of small objects (size ~4k). This issue is observed because of some changes made for NooBaa in Red Hat OpenShift Data Foundation (ODF) 4.12. A fix for this issue may be provided with newer versions of Red Hat OpenShift Data Foundation (ODF).

Workaround

This is currently a limitation in Red Hat OpenShift Data Foundation (ODF) 4.12.

IBM Spectrum Scale DAS 5.1.7 pods run into CrashLoopBackOff error or mmdas command fails on fresh install/upgrade of IBM Spectrum Scale DAS

After fresh installation of IBM Spectrum Scale DAS 5.1.7, user may notice that the pods in ibm-spectrum-scale-das namespace are in CrashLoopBackOff error.

In case of upgrade to IBM Spectrum Scale DAS 5.1.7, user may notice one or both of the below issues:
  • One or more pods in the ibm-spectrum-scale-das namespace are in the CrashLoopBackOff error.
  • The mmdas command may hung or returns an error message shown as follows:
    # mmdas service list
    Something went wrong while processing the request.
    Check 'ibm-spectrum-scale-das-endpoint' pod logs in 'ibm-spectrum-scale-das' namespace for more details
Workaround
This issue might have been caused by network policy introduced in the IBM Spectrum Scale DAS 5.1.7 release. To workaround this issue, perform the following steps:
  1. Apply the latest IBM Spectrum Scale DAS manifest file from the IBM GitHub repository:
    # oc apply -f https://raw.githubusercontent.com/IBM/ibm-spectrum-scale-container-native/v5.1.7.0/generated/das/install.yaml
    
  2. Check if there are network policies in the ibm-spectrum-scale-das namespace:
    # oc get networkpolicy -n ibm-spectrum-scale-das
    NAME                                      POD-SELECTOR   AGE
    ibm-spectrum-scale-das-nwpolicy-egress    <none>         16s
    ibm-spectrum-scale-das-nwpolicy-ingress   <none>         16s  
    
    Delete network policies if they are present:
    #oc delete networkpolicy -n ibm-spectrum-scale-das ibm-spectrum-scale-das-nwpolicy-egress ibm-spectrum-scale-das-nwpolicy-ingress
    networkpolicy.networking.k8s.io "ibm-spectrum-scale-das-nwpolicy-egress" deleted
    networkpolicy.networking.k8s.io "ibm-spectrum-scale-das-nwpolicy-ingress" deleted
    
  3. Restart all the pods in the ibm-spectrum-scale-das namespace:
    # oc delete pods -–all -n ibm-spectrum-scale-das