Known issues
The known issues in IBM Spectrum Scale DAS 5.1.7 release and possible workarounds are as follows:
- S3 service creation fails with the error "Something went wrong while processing the request."
- I/O gets interrupted if the node running the noobaa-core and noobaa-db pods goes down
- I/O gets interrupted due to IBM Spectrum Scale container native update
- Unable to create new accounts or exports during noobaa-db pod migration
- mmdas commands might fail with could not open file "global/pg_filenode.map"
- Changing scaleFactor might result in I/O failure
- Account creation fails with the EOF message
- Export creation fails with the INVALID_READ_RESOURCES error
- S3 service instance is in the FAILED state upon its creation
- Account names that contain special characters trigger error
- Slow reader applications might lose S3 access to data
- IBM Spectrum Scale DAS does not verify MD5 checksums, in case MD5 based Etags are disabled
- IBM Spectrum Scale DAS does not properly fail-over the IP address
- Performance degrade of S3 applications while connecting to more than one data access node
- Uneven distribution of NooBaa endpoint pods
- When noobaa-core and noobaa-db pod running node is made down
- Warp workload fails occasionally with “The specified key does not exist” error
- S3 service update with some combinational flags is not honored
- mmdas command fails with the error "Something went wrong while processing the request"
- Performance degradation for read of small objects
- IBM Spectrum Scale DAS 5.1.7 pods run into CrashLoopBackOff error or mmdas command fails on fresh install/upgrade of IBM Spectrum Scale DAS
S3 service creation fails with the error "Something went wrong while processing the request."
Once the IBM Spectrum Scale DAS is deployed, when you use the mmdas command to create the S3 service, the command might fail.
mmdas service create s3 --acceptLicense --ipRange 192.0.2.13-192.0.2.15Something went wrong while processing the request.
Check 'ibm-spectrum-scale-das-endpoint' pod logs in 'ibm-spectrum-scale-das' namespace for more detailscurl -k -u s3-admin -X GET -H "accept: application/json" https://<ibm-spectrumscale_host>/scalemgmt/v2/das/services
Enter host password for user 's3-admin':
Error 401: SRVE0295E: Error reported: 401curl -kv -u 's3-admin' https://<ibm-spectrumscale_host>/scalemgmt/v2/filesystems
Trying x.x.x.x
TCP_NODELAY set
Connected to <ibm-spectrumscale_host> port 443 (#0)
..
Error 401: SRVE0295E: Error reported: 401If using the IBM Spectrum Scale REST API also results in an error, it indicates that there might be an issue with the user authentication. The user 's3-admin' created for IBM Spectrum Scale DAS might be deleted or its password might have expired. If that is the case, resolve the issue and then retry.
- Workaround
-
- Restart the GUI pods in the IBM Spectrum
Scale namespace by
issuing the following command:
oc delete pod <gui-0> <gui-1> - After the new GUI pods are up and running, check if the REST API interface to access IBM Spectrum
Scale
filesystemsordas/servicesis working fine.
- Restart the GUI pods in the IBM Spectrum
Scale namespace by
issuing the following command:
If the REST API is working, the mmdas command should also work as expected.
I/O gets interrupted if the node running the noobaa-core and noobaa-db pods goes down
If the noobaa-core and noobaa-db pods are running on the same
node and that node goes down, I/O might get interrupted.
This issue occurs because it takes approximately 6 minutes for the noobaa-db pod
to come online. During this time, the noobaa-core pod cannot communicate with the
noobaa-db pod, which cause the I/O interruption.
- Workaround
- Use the oc get pods command on the
openshift-storagenamespace to check the state of thenoobaa-dbpod. Once the state of thenoobaa-dbpod changes toRunning, I/O resumes.
I/O gets interrupted due to IBM Spectrum Scale container native update
The IBM Spectrum Scale container native update reboots each node. Due to the duration of each reboot, this concurrent update can take around 20 to 45 minutes. Administrators should plan for intermittent I/O outage for this duration.
- Workaround
-
This is currently a limitation in IBM Spectrum Scale DAS.
Unable to create new accounts or exports during noobaa-db pod migration
If the node on which the noobaa-db pod is running is shutdown, new accounts or
exports cannot be created for some time.
This issue occurs because it takes approximately 6 minutes for the noobaa-db pod
to be migrated to another node. During this time, you cannot create new accounts or exports.
- Workaround
- Use the oc get pods command on the
openshift-storagenamespace to check the state of thenoobaa-dbpod. Once the state of thenoobaa-dbpod changes toRunning, you can create new accounts or exports.
mmdas commands might fail with could not open file
"global/pg_filenode.map"
could not open file "global/pg_filenode.map": Permission deniedThis error occurs when one of the node's interfaces goes down and the NooBaa database pods were running on that node.
- Workaround
- Start the interface by applying the network policy with the nmstate command.
For more information, see Updating node network configuration in Red Hat OpenShift
Container Platform documentation.Tip: You can use oc get nncp or oc get nnce to verify if the network policy is configured.
Changing scaleFactor might result in I/O failure
If you change the scale factor of the S3 service during active I/O, I/O failures might occur.
scaleFactor of 2. If you reduce the scaleFactor to 1 during active
I/O, you might encounter I/O failures.- These failures occur because when you change the
scaleFactorto 1, Kubernetes initiates a cleanup as the number of endpoints need to be reduced. - This cleanup results in skewed distribution of endpoints between the nodes such that on some nodes the number of endpoints might be high while on other nodes the number of endpoints might reduce to 0. This unbalanced configuration might lead to I/O failures.
- Workaround
-
To avoid this unbalanced configuration, plan and configure the
scaleFactorat the time of S3 service creation according to your requirements to ensure that the distribution of endpoints does not become skewed.If you must change the
scaleFactor, plan it during a maintenance window when there is no active I/O.
Account creation fails with the EOF message
mmdas account create s3user1@example.com --gid 9999 --uid 8003 --newBucketsPath /mnt/fs_s3user1/exmp1
EOF- Workaround
- Retry creating the account by using the mmdas account create
command:
mmdas account list No Accounts Available mmdas account create s3user1@example.com --gid 9999 --uid 8003 --newBucketsPath /mnt/fs_s3user1/exmp1 Account created successfully, below are the secret and access keys Secret Key Access Key ---------- ----------- 09PSsA/4zxV92X/Da30D7seOzaW4AXn7dps40Azh w2g9l8NthQDWTIxAIG28 mmdas account list Name UID GID New buckets path ---- --- --- ---------------- s3user1@example.com 8003 9999 /mnt/fs_s3user1/exmp1
Export creation fails with the INVALID_READ_RESOURCES error
"message": "INVALID_READ_RESOURCES"This error is triggered if the NooBaa namespace store is in the Rejected phase.
This namespace store is created for the IBM Spectrum
Scale data
backend and it is configured with the S3 service.
- Workaround
-
Before you create exports, use the following command to ensure that the NooBaa namespace store is not in the
Rejectedphase.oc get namespacestore -n openshift-storageIf the namespace is in theRejectedstate, the customer should do some checks, such as:- Basic file system mount check
- Ensure that CNSA and CSI pods are working
- Ensure PVC is bound
- Check the IBM Spectrum Scale DAS operator logs and make sure that service creation is logged
S3 service instance is in the FAILED state upon its creation
The S3 service instance might be in the FAILED state after its creation.
- Workaround
-
If the S3 service instance is in the
FAILEDstate, refer to the IBM Spectrum Scale DAS operator logs to determine the cause and then take appropriate action to resolve the issue.
Account names that contain special characters trigger error
user@12#
Account names that contain special characters are not supported.
- Workaround
-
Do not use special characters in account name.
Slow reader applications might lose S3 access to data
hashCheck=true option.- Workaround
-
- To resolve this issue, restart the NooBaa endpoint pods.
- There is no data loss or data corruption.
IBM Spectrum Scale DAS does not verify MD5 checksums, in case MD5 based Etags are disabled
Content-MD5 header of HTTP requests, in case MD5
based Etags are disabled.- Workaround
- Customers who desire that
Content-MD5headers get validated, must enable the generation of MD5 based Etags by enabling via the S3 service.
IBM Spectrum Scale DAS does not properly fail-over the IP address
- Workaround
- To resolve this issue, shutdown the Red Hat OpenShift node to get all IP addresses moved to the other nodes. Then resolve the network issue and restart the Red Hat OpenShift node.
The IBM Spectrum Scale file system must have sufficient space while writing S3 objects
When writing S3 objects, ensure that the IBM Spectrum Scale file system has sufficient space because IBM Spectrum Scale DAS creates temporary files to process incoming data. For instance, writing a 30 GB object requires up to additional 30 GB temporary space in the file system, until the upload request is completed.
- Workaround
-
This is a prerequisite of IBM Spectrum Scale DAS for writing S3 objects.
Performance degrade of S3 applications while connecting to more than one data access node
The performance of S3 applications may degrade in case that they connect to more than one IBM Spectrum Scale data access node and write objects that are stored in the same directory as of the underlying IBM Spectrum Scale file system.
- Workaround
- Ensure that such workloads use the same IP address for S3 access, so that this workload is handled from a single data access node.
Uneven distribution of NooBaa endpoint pods
The scaling factor determines the number of NooBaa endpoint pods which run on each data access
node. The NooBaa endpoint pods shall be evenly distributed. For instance, with a scaling factor of
four, each data access node should run four NooBaa endpoint pods. The decrease of the scaling factor
like, reducing the scaling factor from four to three and certain infrastructure issues can lead to
an uneven distribution of NooBaa endpoint pods. IBM Spectrum Scale DAS tries to correct this by terminating
imbalanced NooBaa endpoint pods and directing the Kubernetes scheduler where to start new NooBaa
endpoint pods. However, this correction is not always successful, at least one
noobaa-endpoint runs on each DAN node either by scaling up or down.
- Workaround
-
This is currently a limitation in IBM Spectrum Scale DAS.
When noobaa-core and noobaa-db pod running node is made down
As per the current design, noobaa-db pod would take few minutes (around 6+ minutes) to get into
the Running state as it is moved to other node. In the interim, there is a
possibility of I/O loss, which is expected as the Object Interface is not in healthy state. Once
noobaa-db get into the Running state and the connection establishes between the two
(that is, noobaa-core and noobaa-db) the I/O will be able to continue and new I/O requests will be
serviced.
- Workaround
-
This is currently a limitation in IBM Spectrum Scale DAS.
Warp workload fails occasionally with “The specified key does not exist” error
Warp I/O workload run into an error occasionally with the "The specified key does not exist" message.
warp --version
warp version 0.5.5 - 1baadbc
Monitor NooBaa endpoint logs to check whether the highlighted error is displayed.
Sep-26 6:32:07.896 [Endpoint/14] [ERROR] CONSOLE:: RPC._on_request: ERROR srv object_api.update_endpoint_stats reqid 19524@fcall://fcall(7om8vqvf) connid fcall://fcall(7om8vqvf) AssertionError [ERR_ASSERTION]: _id must be unique. found 2 rows with _id=undefined in table bucketstats
Sep-26 6:32:07.897 [Endpoint/14] [ERROR] core.rpc.rpc:: RPC._request: response ERROR srv object_api.update_endpoint_stats reqid 19524@fcall://fcall(7om8vqvf) connid fcall://fcall(7om8vqvf) params { namespace_stats: [ { io_stats: { read_count: 2199279, write_count: 929200, read_bytes: 55346668240896, write_bytes: 13374358598656, error_write_bytes: 0, error_write_count: 0, error_read_bytes: 0, error_read_count: 0 }, namespace_resource_id: '632d5b3674e74100298682d4' }, [length]: 1 ], bucket_counters: [ { bucket_name: SENSITIVE-d11ed9bf0f42c55a, content_type: 'application/octet-stream', read_count: 1055154, write_count: 358804 }, { bucket_name: SENSITIVE-40584c364915f5f3, content_type: 'application/octet-stream', read_count: 1144123, write_count: 374277 }, [length]: 2 ] } took [8.8+0.4=9.2] [RpcError: _id must be unique. found 2 rows with _id=undefined in table bucketstats] { rpc_code: 'INTERNAL', rpc_data: { retryable: true } }
Sep-26 6:32:07.897 [Endpoint/14] [ERROR] core.sdk.endpoint_stats_collector:: failed on update_endpoint_stats. trigger_send_stats again [RpcError: _id must be unique. found 2 rows with _id=undefined in table bucketstats] { rpc_code: 'INTERNAL', rpc_data: { retryable: true } }
Sep-26 6:32:37.907 [Endpoint/14] [ERROR] core.util.postgres_client:: updateOneWithClient failed { system: 632d5af574e74100298682c0, bucket: 632f441da43595b2582184de, content_type: 'application/octet-stream' } { '$set': { last_write: 1664173957897, last_read: 1664173957897, system: 632d5af574e74100298682c0, bucket: 632f441da43595b2582184de, content_type: 'application/octet-stream' }, '$inc': { writes: 358804, reads: 1055154 } } UPDATE bucketstats SET data = jsonb_set(jsonb_set(jsonb_set(jsonb_set(jsonb_set(jsonb_set(jsonb_set(data,'{content_type}','"application/octet-stream"'),'{bucket}','"632f441da43595b2582184de"'),'{system}','"632d5af574e74100298682c0"'),'{last_read}','1664173957897'::jsonb),'{last_write}','1664173957897'::jsonb),'{reads}',to_jsonb(COALESCE(Cast(data->>'reads' as numeric),0)+1055154)),'{writes}',to_jsonb(COALESCE(Cast(data->>'writes' as numeric),0)+358804)) WHERE (data->>'system'='632d5af574e74100298682c0' and data->>'bucket'='632f441da43595b2582184de' and data->>'content_type'='application/octet-stream') RETURNING _id, data AssertionError [ERR_ASSERTION]: _id must be unique. found 2 rows with _id=undefined in table bucketstats- Workaround
-
- Check
noobaa-dbpod inopenshift-storagenamespace by using the following commands:oc rsh noobaa-db-pg-0 psql -U postgres \c nbcore - Identify the duplicate record by using the following
query:
SELECT data->>'bucket' as bucket, data->>'system' as system, jsonb_agg(jsonb_build_object('_id', _id)) as ids FROM bucketstats GROUP BY 1,2 HAVING count(*) > 1;Check the record for which duplicate entries exist shown in the following example:
The example shows two entries for a record, delete one of them as shown in the next step.nbcore=# select * from bucketstats where (data->>'system'='632431b4cab31d0029558440' and data->>'bucket'='63243a12cab31d0029558478' and data->>'content_type'='application/octet-stream'); _id | data --------------------------+---------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------- 63243c108d5458000e5c5ea7 | {"_id": "63243c108d5458000e5c5ea7", "reads": 129634826905, "bucket": "63243a12cab31d0029558478", "system": "632431b4cab31d0029558 440", "writes": 43169720959, "last_read": 1663676369913, "last_write": 1663676369913, "content_type": "application/octet-stream"} 63243c10c781ba000e15953d | {"_id": "63243c10c781ba000e15953d", "reads": 129634807954, "bucket": "63243a12cab31d0029558478", "system": "632431b4cab31d0029558 440", "writes": 43169713464, "last_read": 1663676369913, "last_write": 1663676369913, "content_type": "application/octet-stream"} (2 rows) - Delete the duplicate entry by using the following
command:
nbcore=# delete from bucketstats where (data->>'system'='632431b4cab31d0029558440' and data->>'bucket'='63243a12cab31d0029558478' and data->>'content_type'='application/octet-stream' and data->>'_id'='63243c108d5458000e5c5ea7'); DELETE 1 nbcore=# - Exit the
noobaa-dbpod shell.
- Check
S3 service update with some combinational flags is not honored
When S3 service is updated with the combination of flags enableMD5/disableMD5
and scaleFactor, then the scaleFactor flag is only honored. The
enableMD5 flag value remains unchanged.
mmdas service update s3 --enableMD5 --scaleFactor 2- Workaround
- Update the S3 service with
scaleFactorandenableMD5/disableMD5flags individually one after another.For example,mmdas service update s3 --enableMD5 mmdas service update s3 --scaleFactor 2
mmdas command fails with the error "Something went wrong while processing the request"
After the IBM Spectrum Scale DAS deployment, when you run any mmdas command, the command might fail.
mmdas service list
Something went wrong while processing the request.
Check 'ibm-spectrum-scale-das-endpoint' pod logs in 'ibm-spectrum-scale-das' namespace for more details
curl -k -u s3-admin -X GET -H "accept: application/json" https://<ibm-spectrumscale_host>/scalemgmt/v2/das/services
Enter host password for user 's3-admin':
Error 403: SRVE0295E: Error reported: 403
403 is forbidden http return code which refers to the multiple
attempts with invalid password and user is locked.
- Workaround
-
- Remove s3 admin user from GUI pods in the IBM Spectrum
Scale
namespace and create new user, as shown in the following
example:
oc exec -c liberty ibm-spectrum-scale-gui-0 -n ibm-spectrum-scale -- /usr/lpp/mmfs/gui/cli/rmuser s3-admin EFSSG0021I The user s3-admin has been successfully removed. EFSSG1000I The command completed successfully. oc exec -c liberty ibm-spectrum-scale-gui-0 -n ibm-spectrum-scale -- /usr/lpp/mmfs/gui/cli/lsuser EFSSG0100I There are no values to return. oc exec -c liberty ibm-spectrum-scale-gui-0 -n ibm-spectrum-scale -- /usr/lpp/mmfs/gui/cli/mkuser s3-admin -p Passw0rd -g 'ProtocolAdmin' EFSSG0019I The user s3-admin has been successfully created. EFSSG1000I The command completed successfully. oc exec -c liberty ibm-spectrum-scale-gui-0 -n ibm-spectrum-scale -- /usr/lpp/mmfs/gui/cli/lsuser Name Long name Password status Group names Failed login attempts Disable Password Expiry Target Feedback Date s3-admin active ProtocolAdmin 0 FALSE EFSSG1000I The command completed successfully. - Delete
das-gui-usersecret from IBM Spectrum Scale DAS namespace, then create new secret, as shown in the following example:oc delete secret das-gui-user oc -n ibm-spectrum-scale-das create secret generic das-gui-user --from-literal=username='s3-admin' --from-literal=password='Passw0rd'
- Remove s3 admin user from GUI pods in the IBM Spectrum
Scale
namespace and create new user, as shown in the following
example:
Performance degradation for read of small objects
When using Red Hat OpenShift Data Foundation (ODF) 4.12 with IBM Spectrum Scale DAS 5.1.7, performance degradation may be observed
when doing read of small objects (size ~4k). This issue is observed because of some
changes made for NooBaa in Red Hat OpenShift Data Foundation (ODF) 4.12. A fix for this issue may be
provided with newer versions of Red Hat OpenShift Data Foundation (ODF).
- Workaround
-
This is currently a limitation in Red Hat OpenShift Data Foundation (ODF) 4.12.
IBM Spectrum Scale DAS 5.1.7 pods run into
CrashLoopBackOff error or mmdas command fails on fresh
install/upgrade of IBM Spectrum Scale DAS
After fresh installation of IBM Spectrum Scale DAS 5.1.7,
user may notice that the pods in ibm-spectrum-scale-das namespace are in
CrashLoopBackOff error.
- One or more pods in the
ibm-spectrum-scale-dasnamespace are in theCrashLoopBackOfferror. - The mmdas command may hung or returns an error message shown as
follows:
# mmdas service list Something went wrong while processing the request. Check 'ibm-spectrum-scale-das-endpoint' pod logs in 'ibm-spectrum-scale-das' namespace for more details
- Workaround
-
This issue might have been caused by network policy introduced in the IBM Spectrum Scale DAS 5.1.7 release. To workaround this issue, perform the following steps:
- Apply the latest IBM Spectrum Scale DAS manifest file
from the IBM GitHub
repository:
# oc apply -f https://raw.githubusercontent.com/IBM/ibm-spectrum-scale-container-native/v5.1.7.0/generated/das/install.yaml - Check if there are network policies in the
ibm-spectrum-scale-dasnamespace:
Delete network policies if they are present:# oc get networkpolicy -n ibm-spectrum-scale-das NAME POD-SELECTOR AGE ibm-spectrum-scale-das-nwpolicy-egress <none> 16s ibm-spectrum-scale-das-nwpolicy-ingress <none> 16s#oc delete networkpolicy -n ibm-spectrum-scale-das ibm-spectrum-scale-das-nwpolicy-egress ibm-spectrum-scale-das-nwpolicy-ingress networkpolicy.networking.k8s.io "ibm-spectrum-scale-das-nwpolicy-egress" deleted networkpolicy.networking.k8s.io "ibm-spectrum-scale-das-nwpolicy-ingress" deleted - Restart all the pods in the
ibm-spectrum-scale-dasnamespace:# oc delete pods -–all -n ibm-spectrum-scale-das
- Apply the latest IBM Spectrum Scale DAS manifest file
from the IBM GitHub
repository: