IC SunsetThe developerWorks Connections Platform is now in read-only mode and content is only available for viewing. No new wiki pages, posts, or messages may be added. Please see our FAQ for more information. The developerWorks Connections platform will officially shut down on March 31, 2020 and content will no longer be available. More details available on our FAQ. (Read in Japanese.)
Topic
  • 54 replies
  • Latest Post - ‏2019-12-09T18:19:03Z by gpfs@us.ibm.com
gpfs@us.ibm.com
gpfs@us.ibm.com
662 Posts

Pinned topic IBM Spectrum Scale V4.2.3 announcements

‏2017-04-07T22:34:25Z |
Updated on 2017-04-07T22:44:40Z at 2017-04-07T22:44:40Z by gpfs@us.ibm.com
  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-04-07T22:40:21Z  

    Flash (Alert) IBM Spectrum Scale (GPFS) V4.2.3  AFM with DMAPI-enabled file system causes mmfsd daemon failure on gateway node

    Abstract

    IBM has identified an issue with AFM on a DMAPI-enabled filesystem in IBM Spectrum Scale V4.2.3, where health monitoring is enabled by default, causes an mmfsd daemon assert on the gateway node. This issue may cause the mmfsd daemon on the gateway nodes to fail and restart repeatedly.

     

    See the complete Flash at:  http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010103

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-05-30T21:59:44Z  

    Flash (Alert): IBM Spectrum Scale (GPFS): Asynchronous I/O write: file size change may not be updated

    Abstract

    IBM has identified an issue with IBM Spectrum Scale V4.1.0.4 through V4.1.1.14 and V4.2.0.0 through V4.2.3.0 levels when asynchronous Direct I/O is used to write to a file on LINUX, using the io_submit interface.

    Problem Summary:

    As a result of an asynchronous Direct I/O write using the io_ submit interface, file size change may not be updated correctly, leading to possible undetected loss of data when a node fails before file size change is committed to disk.

     

    See the complete Flash at: http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010223

    Updated on 2017-05-30T22:00:14Z at 2017-05-30T22:00:14Z by gpfs@us.ibm.com
  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-06-30T15:53:15Z  

    Flash (Alert): IBM Spectrum Scale (GPFS): On I/O with file size change on metanode takeover, the file size change may not be committed to disk

    Abstract

    IBM has identified an issue with IBM GPFS and IBM Spectrum Scale versions when a file size change that happens during a small timing window at the non-metanode may not be committed to the disk during metanode takeover.

    Problem Summary:

    A very small timing window exists in which a file size change, resulting from an append that happens on a non-metanode, may not be committed to the disk when that node takes over as metanode, leading to possible undetected data loss.  If the data involved uses GPFS encryption (IBM GPFS 4.1.x or later), this may also result in undetected data corruption. 

    See the complete Flash at:  http://www.ibm.com/support/docview.wss?uid=ssg1S1010293

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-06-30T15:56:32Z  

    Flash (Alert): IBM Spectrum Scale (GPFS): RDMA-enabled network adapter failure on the NSD server may result in file IO error

    Abstract

    IBM has identified an issue with all IBM GPFS and IBM Spectrum Scale versions where the NSD server is enabled to use RDMA for file IO and the storage used in your GPFS cluster accessed via NSD servers (not fully SAN accessible) includes anything other than IBM Elastic Storage Server (ESS) or GPFS Storage Server (GSS); under these conditions, when the RDMA-enabled network adapter fails, the issue may result in undetected data corruption for file write or read operations.

    Problem Summary:

    As a result of a logic error when the NSD server is processing a file write or read request after the RDMA-enabled network adapter fails, the data intended to be written to the file will not be written or the data intended to be read from the file will not be read, and in both cases, the application file operation will return success as if the file IO was completed.

    See the complete Flash at:  http://www.ibm.com/support/docview.wss?uid=ssg1S1010233

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-07-26T16:25:19Z  

    Flash (Alert): IBM Spectrum Scale and IBM Elastic Storage Server:  Performance monitoring component potentially running fileset quota sensor on more than one node causing additional load 

    Abstract

    IBM has identified a configuration issue with the performance monitoring component in IBM Spectrum Scale V4.2.2 and V4.2.3, and IBM Elastic Storage Server (ESS) 5.0 and 5.1.  The default configuration is monitoring the fileset quota usage hourly from all nodes in the cluster instead of just from one node.  This issue may cause more load on the system than needed and can contribute to an overload situation.

    See the complete Flash at: http://www.ibm.com/support/docview.wss?uid=ssg1S1010423

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-08-02T19:14:10Z  

    Flash (Alert):  IBM Spectrum Scale: AFM incorrectly replicates rename operations

    Abstract: 

    IBM has identified an issue with AFM in IBM Spectrum Scale V4.1.1.12 through V4.1.1.15 and V4.2.2.0 through V4.2.3.2 levels where AFM might incorrectly replicate the rename operations. This issue may cause undetected data loss due to some files missing at the target site.

    Problem Summary:

    If a file is renamed twice under the same directory, AFM might incorrectly rename the file at the target site. For example, if the rename sequence is file1 -> file2 and file3 -> file1, then file1 might get missed at the target site. This happens only if multiple threads are executing these rename operations in parallel and the second rename goes before the first one due to a missing dependency check.

    See the complete Flash at:  http://www.ibm.com/support/docview.wss?uid=ssg1S1010426

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-08-14T13:25:58Z  

    Security Bulletin: A vulnerability in Samba affects IBM Spectrum Scale SMB protocol access method (CVE-2017-9461)

    Summary

    A Samba vulnerability affects IBM Spectrum Scale SMB protocol access method which could allow denial of service, caused by
    improper handling of dangling symlinks in smbd. A remote attacker could exploit this vulnerability to cause a fd_open_atomic infinite
    loop with high CPU usage and memory consumption on the system.


    See the complete bulletin at http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010376

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-08-21T21:18:45Z  

    Flash (Alert):  IBM Spectrum Scale (GPFS) V4.2.3: failures in scanning file system metadata may result in file system data or metadata corruption

    Abstract:

    IBM has identified a problem with the GPFS file system metadata scanning function in IBM Spectrum Scale 4.2.3.0 - 4.2.3.3 which may result in silent file system data corruption or metadata corruption on certain failures.

    Problem Summary

    The first step in rebalancing or restoring the replication factor of all files in a file system (for example, using the mmrestripefscommand) is to scan and repair the file system metadata. If an unrecoverable error occurs during this scan, some file system metadata blocks may be incorrectly de-allocated. If these blocks are reused by other files, corruption may occur in random locations. The consequences may range from corruption of file system metadata, to silent file data corruption, or potential loss of the entire file system being processed. An unrecoverable error, when it occurs, can be observed in the command output and mmfs.log. "Not enough memory to allocate internal data structure" is one of the unrecoverable errors, which indicates that the GPFS pagepool is not large enough for the current workload.

     

    See the complete Flash at:  http://www.ibm.com/support/docview.wss?uid=ssg1S1010487

    Updated on 2017-08-21T21:19:31Z at 2017-08-21T21:19:31Z by gpfs@us.ibm.com
  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-08-24T20:54:49Z  

    Flash (Alert):  IBM Spectrum Scale v.4.2.3.1 and v.4.2.3.2 install toolkit on ppc64 platform(/usr/lpp/mmfs/4.2.3.x/installer/spectrumscale) might fail when attempting to run ./spectrumscale install, deploy or upgrade

    Abstract:
    IBM has identified an issue with v4.2.3.1 and v.4.2.3.2 of Spectrum Scale install toolkit on ppc64 in which a failure of the install toolkit may occur if the toolkit is run multiple times. This can occur during install, deploy or upgrade and will give a FATAL message to the user and prevent the desired task from completing.

    Problem Summary:

    The Spectrum Scale install toolkit could potentially fail to perform an install or deploy task due to a failure in the toolkit internal license logic, resulting in the following error message: "[ FATAL ] license cannot be installed on given GPFS cluster as it's already running different version of license".


    See the complete Flash at:  http://www.ibm.com/support/docview.wss?uid=ssg1S1010505

    Updated on 2017-08-24T20:55:25Z at 2017-08-24T20:55:25Z by gpfs@us.ibm.com
  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-08-26T00:24:57Z  

    GPFS 4.2.3.4 is now available from IBM Fix Central:

    http://www-933.ibm.com/support/fixcentral

    Problems fixed in GPFS 4.2.3.4

    August 25, 2017

    * Avoid erroneous FSSTRUCT error in rare cases after a SG panic.
    * Fix a problem in which ESS server node deadlocks with many threads showing 'wait for log buffers'.
    * Fix a memory corruption issue that can occur during/after a reconnect.
    * Fix a logAssert "!IsMemoryMappingFree" which is caused by a race between mmshutdown and 'tsctl nqStatus'.
    * Fix a a possible GPFS daemon assert that can occur while running the mmdelsnapshot command. The assert can happen when prefetch is reading from a snapshot that is being deleted.
    * Fix an issue in AFM ADR environment where secondary mount failure causes kernel crash.
    * Ensure a copy of the keystone and auth config is created when the Object protocol is disabled and uninstalled from the protocol environment.
    * Fix an issue in AFM environment where unresponsive remote mount causes synchronous operations like Reads to fail intermittently.
    * Address a problem in resync/failover/changeSecondary where while recreating a deleted file at home/secondary it might cause an invalid memory access and cause the daemon to crash.
    * Fix a condition where cNFS on SLES12 or later fail to restart statd.
    * Fix deadlock that can occur during inode cleanup in Linux kernel 3.13 and later.
    * Address a problem where applyupdates should not be started until failback --start is completed successfully.
    * Fix an incorrect assertion which can go off when the file system manager is brought down while running one of the following commands mmrestripefs/mmdeldisk/mmrpldisk.
    * Fix an issue in the AFM environment where a fileset unlink or a unresponsive remote mount can cause a deadlock.
    * Improve snapshot command error reporting when batching is used.
    * Fix a problem with the GPFS file system metadata scanning function in IBM Spectrum Scale 4.2.3.0 - 4.2.3.3 which may result in file system data or metadata corruption on certain failures, like run out of GPFS pagepool memory while running mmrestripefs, mmdeldisk, mmrpldisk, or mmadddisk -r.
    * Return ESTALE when file referenced by FileHandle is deleted instead of ENOENT.
    * Fix a kernel assert caused by missing buffer lock checking.
    * Address a problem where applyUpdates generates operations on files/dirs that were removed from primary - but never played to secondary and later applyUpdates fail to pull such files/dirs back.
    * Fix a problem in which GPFS was returning EBADF when Ganesha provided an fd which is not a GPFS fd.
    * Fix a mmfsd daemon crashes after upgrading GNR code level and issuing mmchconfig release=LATEST.

    * Fix an issue in the AFM environment where mmgetstate or mmdiag commands causes daemon crash if handlers are being disabled.
    * Fix mmsetquota to work with non-standard username.
    * Fix a deadlock issue in AFM environment when new gateway node joins the cluster and it takes over the fileset from existing gateway node where the workload is running.
    * Change in lockd behavior on SLES12 SP2 may cause it to reboot while recovering another cNFS node.
    * Fix mmlsquota -u: to work with a non-standard username.
    * Fix for missing arping command path declaration for CentOS.
    * Fix mmlscluster: to correctly output the "Remote shell command" line.
    * This fix reduces the snap processing time for clusters which have the Object protocol deployed with the Unified File and Object feature enabled.
    * Fix a problem in which mmsmb exportacl list shows SID instead of user name for all users but the first.
    * Fix a memory leak in GNR that may cause mmfsd heap memory usage to increase over time, particularly when the workload does many small writes. The problem occurs in an ESS or GSS environment.
    * Fix a daemon crash in the AFM environment where a replication error or a fileset unlink causes a memory handler to be accessed incorrectly.
    * Fix a bug where failure to execute the mmdevdiscover script resulted in all pdisks to temporarily lose their I/O paths. This caused the workload to pause while paths were recovered. Sometimes it caused the recovery group to fail over to the backup node. In a few instances, it resulted in the unmount of the file system, requiring manual intervention to restore service.
    * This update addresses the following APARs: IV98545 IV98609 IV98640 IV98641 IV98643 IV98683 IV98684 IV98685 IV98686 IV98687 IV98701 IV99044 IV99059 IV99060 IV99062 IV99063.

     

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-09-22T20:05:56Z  

    Flash (Alert):  IBM Spectrum Scale (GPFS) AFM incorrectly replicates data when write and truncate operations are interleaved

    Abstract:

    IBM has identified an issue with AFM in IBM GPFS (V3.5.0.0 through V3.5.0.34, or V4.1.0.0 through V4.1.0.8), or IBM Spectrum Scale (V4.1.1.0 through V4.1.1.16, or V4.2.0.0 through V4.2.3.4) levels where AFM might not transfer write operations completely when a file is truncated. This may cause a data mismatch between cache (or primary) and home (or secondary). This issue may result in undetected data corruption at home (or secondary).

     

    See the complete Flash at:  http://www.ibm.com/support/docview.wss?uid=ssg1S1010629

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-09-25T18:11:08Z  

    GPFS 4.2.3.3 is now available from IBM Fix Central:

    http://www-933.ibm.com/support/fixcentral

    Problems fixed in GPFS 4.2.3.3

    July 24, 2017

    * Fix an issue in AFM environment where lookup and metatdata operations on same file from different nodes causes the daemon assert.
    * Allow a user to specify the afmrpo interval in weeks[W],days[D] or hours[H].
    * Fix a fcntl revoke handler exception that can occur after an EIO error.
    * Fix a pdisk firmware version attribute issue. After the pdisk firmware level is changed, mmlspdisk still shows the old version which may mislead the administrator.
    * Change EIO to ESTALET for open operation of a file that was deleted.
    * Fix a race condition that causes command to fail with "invalid version on put".
    * Don't allow kernel modules to cleanup when removing gpfs.gpl if gpfs.gplbin is currently installed.
    * Fix an Assert: exp(dm != inv) L-813 in ../fs/fsop.C which can occur when trying to resend a read event.
    * Fix a mmfsd crashes, due to a Signal 6 (abort). This can happen removing socket connections in a CCR environment.
    * Fix a very rare fault that can occur during heavy directory update workloads.
    * Fix a problem in which mmchfirmware --type storage-enclosure fails if running in adminMode=central where only one node has ssh privileges. This fix applies to GSS/ESS customers.
    * The mmlsmount command has been changed on all platforms. The change only affects the output format of the -Y argument only when IPv6 address is used.
    * This change does not allow mmchcluster -p LATEST on a CCR enabled cluster.
    * This fix a problem in which unnecessary allocation manager cursors are being consumed.
    * Fix an assert "ioStatsP == __null" that can occur when creating a file system with "-v yes" option, after the "fastest" readReplicaPolicy is enabled.
    * Fix a err 112  rename failure that can occur during recovery for IW fileset.
    * Fix an AIX kernel crash due to assert "freeing vnode not on gnode list".
    * Fix an issue in the AFM environment where an unresponsive target causes a queue to be dropped during the attribute setting.
    * Fix a problem where a verbsRdmaSend enable node sent excessive nsdMsgRdmaPrepare to an AIX node.
    * Fix a replicas mismatch problem caused by mmrestripefs -b wrongly resetting the missupdate flag.
    * Fix CCR and/or mmsdrcli-RPC request errors that can occur during authentication of the incoming socket connections.
    * Address a problem where renames across directories do not reset the dirty bit which in future leads to a big list of dirty directories and hence recovery on AFM filesets might take longer to scan.
    * Fix an issue in the AFM environment where UIDs in ACLS are not remapped during replication over NSD protocol when UID remapping is enabled.
    * When an AFM fileset is to be converted to a regular independent fileset, first check for incomplete dirs, uncached files and orphans. If found inform the user to run prefetch, prior to the conversion.
    * Fix a condition where mmautoload may hung.

    * Fix a mmlogsort failure that can occur when mmlogsort attempts to query the time zone information on a node that is down.
    * Fix a gpfs.snap command failure in a sudo wrapper environment if the legacy log timestamp format is used.
    * Fix a divide by zero problem, when running mmrestripefs, which is specific to a file system using directly attached disks only with no NSD servers defined.
    * Fix a dynassert 'mmapFlushSXLock.isLockedShared' which may fail as a secondary failure while daemon is shutting down.
    * Fix a problem in which a CES ip could not be removed from a node. This can occur when problems are occurring during a CES ip move or failover (e.g. network issues, CCR issues, quorum loss). Subsequent runs of mmcesnetworkmonitor did not fix this and the ip remain active on a node where it should not.
    * This fix adds more group memberships (up to 2048) on AIX.
    * Fix a "freeSpace != __null" assert. This issue could only happen when doing file system rebalance after suspending some disks.
    * Fix a problem in which you get a Unable to create file in fileset error even if the inode limit is not reached which is most likely to occur if the user fills up the fileset from a single node.
    * Fix mmcommon test scpwrap.
    * This update addresses the following APARs: IV97601 IV97676 IV97677 IV97678 IV97680 IV97681 IV97682 IV97683 IV97685 IV97693 IV97808 IV97836 IV98052 IV98053 IV98054 IV98058.

     

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-09-25T18:14:29Z  

    GPFS 4.2.3.2 is now available from IBM Fix Central:

    http://www-933.ibm.com/support/fixcentral

    Problems fixed in GPFS 4.2.3.2

    June 14, 2017

    * Address a problem where AFM recovery stalls on a read of an IW fileset when it waits to fetch the file from home after the recovery completes.
    * Improve a conditional ccr update for CES IPs list file.
    * Fix a problem that causes RenameHandler long waiter. This can occur if PIT is in progress.
    * Fix a Ganesha crash that can occur when the user enters a string which contains a colon in any mmnfs command that requires a client option or client list string.
    * Fix an assert that can occur when changing gateway nodes to non gateway nodes while operations are being performed on an iw fileset and then the non gateway nodes are turned back into gateway nodes.
    * Fix a rare deadlock that can occur between a thread handling mmap and a thread handling a memory map pagefault.
    * Fix an E_STALE failure that can occur when during a DMAPI dm_read_invis.
    * Fix long waiters that can occur on a very busy system doing background snapshot deletion.
    * Fix a case where GPFS skipped shrinking lastdata block which causes excess space to be consumed.
    * Improve the mmsetquota error message that occurs when a block limit is specified in 'T' unit and larger than 909T is specified.
    * Fix an assert that can occur when mmcheckquota and mmrepquota are passed fileset ids from deleted filesets.
    * Fix an issue in the AFM environment where afmHardMemThreshold configuration value is not honored and more memory is used than specified.
    * Increase the wait time for commands to execute, before failing.
    * Correct formatting of large call counts reported by "mmfsadm vfsstats".
    * Fix long waiters that can occur after a file system panic and a very busy system.
    * Fix a problem in which inodes become Busy after unmount with NFS and immutable files.
    * mmkeyserv: Make it possible to set certain attributes to the default with the use of "delete" or "default" keyword.
    * Fix a problem in which mkdir, creates, and resync can fail during revalidation from cache/primary to home/secondary in newer kernels 3.18 or later.
    * Fix a problem in which GPFS can not handle errors that occur when a DM application was unable to retrieve data due to offline tape.
    * Suppress repeated message "Expanded ... inode space N from X to Y inodes" in mmfs.log.
    * Fix a rare quota management deadlock caused by error conditions such as out of disk space.
    * Fix an issue in AFM+HSM environment where resync/failover/changeSecondary commands fails to replicate migrated files.
    * Fix an issue in the AFM environment where a fileset force unlink could cause the daemon to crash.
    * Address a problem where a gateway node can assert/crash when having more than 1024 active fileset operations occurring across different filesets on a single gateway node.
    * Fix a problem in which gpfs.snap may not gather mmfs logs on AIX nodes correctly.

    * Fix a clone parent file deletion performance issue.
    * Fix a problem in which fsetxattr failed with ENOENT using a fd of an unlinked file.
    * Fix the Assert exp(fileLockHeld != LkObj::nl) in fetchBufferM() that can occur when compression is being used.
    * Fix a problem where DMAPI invis read/write fails with an err 22 when calling from non session node.
    * Fix a problem in which the mmnetverify command does not correctly verify remote addresses when running many tests in parallel.
    * Fix a deadlock that is very rare and can occur after running snapshot commands.
    * Fix a policy problem which causes the LOWDISKSPACE callback to not trigger after a fs manager takeover when the old fs manager fails because of an abort or a lost connection.
    * This update addresses the following APARs: IV96355 IV96416 IV96417 IV96418 IV96419 IV96420 IV96425 IV96426 IV96429 IV96472 IV96473 IV96474 IV96476 IV96482 IV96483 IV96487 IV96488 IV96585 IV96761 IV96762 IV96763 IV96764 IV96783 IV96786 IV96791.

     

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-09-25T18:17:28Z  

    GPFS 4.2.3.1 is now available from IBM Fix Central:

    http://www-933.ibm.com/support/fixcentral

    Problems fixed in GPFS 4.2.3.1

    May 16, 2017

    * Fix a Ganesha crash caused by an applyUpdate.
    * Fix a ccrio initialization failure (err 811) when changing the daemon-interface.
    * Fix a rare segmentation fault in the mmgetstatus command.
    * Fix a SIGBUS error that can occur during a mmap read on a snapshot file.
    * Fix a problem in which we see a flood of "failed to scrub vdisk" log message when GNR node experiences quorum loss. This is for ESS/GSS.
    * Fix a rare race between unlink, lookup and token revoke which causes kernel crash in d_revalidate.
    * This fix will make sure Ganesha request reference a valid GPFS filesystem.
    * Fix a system hang that can occur when a file system is suspended while doing a mmap.
    * This fix rejects unreasonable large requests to preallocate inodes immediately with ENOSPC.
    * Fix a directory rename issue with IW filesets that can occur if the rename target is an existing directory.
    * Fix a fault that can occur when restripe runs while the SG is not mounted on all NSD nodes.
    * This fix restricts the afmMaxParallelRecoveries config value from 0 to 128.
    * This fix removes the unnecessary error message "cannot open /proc/net/tcp6" when shutting down GPFS.
    * Fix a problem with not properly handling quotas in an AFM environment. This can occur when you have very large hard and soft limit values.
    * Fix a "exp(!sgP->isSGMgr())" assert that can occur when you delete a file system and then create a new file system with the same name at the same time.
    * Fix an err 112 that can show up in the mmfs logs when mmchnode --gateway is executed.
    * Fix a kernel crash that can occur while attempting to mount a loop device to a correspond file in a GPFS file system or while using a GPFS file system file as a LIO backend.
    * Address a problem where applyUpdates continues to run even if the fileset at the old primary is unlinked or the mmfs daemon has been shutdown.
    * Fix an outband resync failure that can occur if a recovery is triggered by deleting some files in a directory and the directory itself. This is an AFM/DR environment.
    * Fix rename conflicts that can occur in SW/DR filesets.
    * Update log code to prevent log recovery error when log file became illReplicated. This could happen on file system with -K set to NO and there is not enough disk space for full replication.
    * This fix will use new interface that will reduce multiple retries every time a lock is freed and there are multiple waiters for the lock.
    * Fix an assert that can occur with a DR fileset and the file system is suspended.
    * Fix bug that requires a large free space in /var/mmfs to run change commands.
    * Fix recovery failure err 17 when psnap0 deletion fails.
    * Fix a daemon assert that can occur in an AFM environment where the mmfsd daemon fails to start repeatedly with a DMAPI enabled filesystem at a gateway node.

    * Address a problem where trying to queue a writeSplit message to the helper gateway's queue can fail with an error 28 (E_NOSPAC).
    * Fix an issue which returns EACCESS(errno = 13) while running mmapplypolicy when there is a mounted NFS file system which has the same name with a GPFS file system.
    * A fastpath optimization defect can result in an internal error to be returned to the user when it is safe to continue without entering the fast path.
    * Install if you suffer from mmapplypolicy/tspolicy hanging after otherwise finishing all work.
    * cNFS: fix a problem with /usr/sbin/rpcinfo not found in SLES12 or later.
    * Fix a failure in Object Authentication configuration with Active Directory or LDAP. This fix is only required if Object is being configured with Active Directory or LDAP and DN of Swift service user(specified in --ks-swift-user) is more than 79 characters.
    * Fix a problem with ESS disk replacement in which the mmchcarrier command may wipe out the pdisk location code. The problem will prevent the subsequent mmchcarrier command to proceed without a valid location code.
    * Fix a problem in which a GPFS command may wrongly terminate another process.
    * Fix a rare deadlock problem caused by stream write(enableRepWriteStream=yes).
    * Update log recovery code to avoid GPFS daemon assert after detecting invalid directory block during log recovery. Code has been changed to log a FSSTRUCT error and fail the log recovery so offline mmfsck can be run on the file system.
    * Fix a mmfsd crashes (incompleteOk assertion), when the number of files in the committed directory doesn't match the number of files in CCR's file list in case of a new CCR file update request.
    * This update addresses the following APARs: IV94991 IV94992 IV94994 IV94995 IV94996 IV94997 IV94998 IV95015 IV95021 IV95230 IV95557 IV95643 IV95925 IV96037 IV96163.

     

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-09-25T18:26:58Z  

    IBM Spectrum Scale 4.2.3.0 is now available from IBM Fix Central:

    http://www-933.ibm.com/support/fixcentral

    This topic summarizes changes to the IBM Spectrum Scale licensed
    program and the IBM Spectrum Scale library.

    Summary of changes
    for IBM Spectrum Scale version 4 release 2.3
    as updated, April 2017

    Changes to this release of the IBM Spectrum Scale licensed
    program and the IBM Spectrum Scale library include the following:

    Addition of a new topic for AFM DR regarding failback of
    multiple filesets
              A new topic was added that
              details a method for minimizing
              application downtime during the failback procedure.

    Authentication considerations changes
              The following changes are done:
                * Authentication support matrix
                  has been divided to separate
                  out the File and object protocols and accordingly,
                  the corresponding explanation is modified.
                * The matrix is further divided
                  based on the authentication
                  service that is used.
                * A diagram is added to explain the high level flow of
                  authentication for File protocols.
                * "Authentication for file access" topic is renamed to
                  "Authentication and ID mapping for file access".

    Directory preallocation
              In environments in which many files are
              added to and removed from
              a directory in a short time, you can
              improve performance by setting
              the minimum compaction size of the
              directory. The minimum compaction
              size is the number of directory slots,
              including both full and empty
              slots, that a directory is allowed to retain
              when it is compacted.


    Express Edition no longer available
              IBM Spectrum Scale Express
              Edition is no longer available.

    FPO enhancements
              * Uses the mmrestripefile command to check whether
                the replicas of data blocks are matched for one file
              * Provides QoS support for autorecovery
              * Supports locality-aware data copy

    Installation toolkit support for gpfs.adv and gpfs.crypto
    packages
              The installation toolkit now
              supports installation, deployment,
              and upgrade of gpfs.adv and gpfs.crypto packages.

    Installation toolkit support for populating
    cluster definition file
              The installation toolkit now
              supports populating the cluster
              definition file with the current cluster state.

    IBM Spectrum Scale GUI changes
              The following main changes are made in the
              IBM Spectrum Scale GUI:
                * Supports mounting and unmounting of file systems
                  on selected nodes or group of nodes using GUI.
                * Added new Storage > Pools page. The Pools page
                  provides details about configuration, health,
                  capacity, and performance aspects of storage
                  pools.
                * Added new Files > Active File Management page.
                  This new GUI page helps
                  to view the configuration,
                  health status, and performance of AFM, AFM DR,
                  and gateway nodes.
                * Added new Monitoring > Tips
                  page. The tip events give
                  recommendations to the user
                  to avoid certain issues
                  that might occur in the future.
                  A tip disappears from the GUI when the
                  problem behind the tip event is resolved.
                * Added option to select events

               * Added option to select events
                  of type "tip" in the Settings >
                  Event Notifications > Email
                  Recipients page. You can configure
                  whether to send email to the
                  recipients if a tip event is
                  reported in the system.
                * Added detailed view in the
                  Files > Filesets page. You can
                  access the detailed view of
                  individual filesets either by
                  double-clicking the individual
                  filesets or by selecting
                  View Details option.
                * Modified the Storage > NSDs page
                  to list the rack, position,
                  and node of the NSD in an FPO-enabled
                  environment. This helps
                  to sort the NSDs based on these
                  parameters. The failure group
                  definition is also modified to accommodate these new
                  parameters.
                * Added the Customize the number of
                  replicas option in the Files
                  > Information Lifecycle page to
                  specify the number of replicas
                  in a file placement rule.
                * Modified the Settings > Event
                  Notifications page to accept both
                  IP address and host name for the email server.
                * Added Nodes and File Systems tabs
                  in the detailed view that is
                  available in the Files >
                  Transparent Cloud Tiering page.
                * Added a separate Properties tab
                  in the detailed view that is
                  available in the Monitoring >
                  Nodes, Files > File Systems, and
                  Storage > NSDs pages.

    Introduction of IBM Spectrum Scale management API Version 2
              The architecture and syntax of IBM
              Spectrum Scale management API
              is changed. The new implementation
              is based on the GUI stack. The
              GUI server is managing and processing
              the API requests and
              commands.  Version 2 has the following features:
                * Reuses the GUI deployment's
                  backend infrastructure, which
                  makes introduction of new API commands easier.
                * No separate configuration is
                  required as the GUI installation
                  takes care of the basic deployment.
                * Fixes scalability issues and introduces
                  new features such as
                  filter parameter, field parameter, and paging.
                * Supports large clusters
                  with thousands of nodes.
                * All POST, PUT, and DELETE
                  requests are completed
                  asynchronously.  A "jobs" object is created
                  immediately when such a request is submitted.
                * The APIs are driven by the
                  same WebSphere® server and object
                  cache that is used by the IBM Spectrum
                  Scale GUI.
                * The mmrest command is no longer
                  required for configuring the
                  management API. The IBM
                  Spectrum Scale GUI installation and
                  configuration takes care of the API
                  infrastructure configuration.
              As the syntax and architecture are
              changed for API, modified the
              entire set of commands, which were
              available in the Version 1.
              New API commands are also added for
              improved flexibility.

    Linux on z Systems™ enhancements
              The following changes are made:
                * IBM Spectrum Scale for
                  Linux on z Systems now supports
                  Remote Cluster Mount (Multi-cluster).
                * SLES 12.2 and RHEL 7.3 are
                  now supported by IBM Spectrum
                  Scale for Linux on z Systems.

    mmcallhome command: Addition of --long option to
    mmcallhome group list command
              The --long option displays the long
              admin node names.

    mmchconfig command: Setting an InfiniBand partition key
              The --verbsRdmaPkey attribute
              specifies an InfiniBand partition
              key for a connection between a node
              and an InfiniBand server that
              is included in an InfiniBand partition.

    mmdiag command: Status and queue statistics for NSD
    queues
              The --nsd parameter displays the
              status and queue statistics for
              NSD queues.

    mmfsck command: Severity of errors
              The command displays a summary
              of the errors that were found
              that includes the severity of
              each error: CRITICAL, NONCRITICAL,
              or HARMLESS. You must specify the
              verbose or semi-verbose
              parameter to get this output.

    mmhealth command: Addition of new options to command
              Addition of AFM and THRESHOLD
              options to the mmhealth node show
              and mmhealth cluster show commands.
              The AFM option displays the
              health status of a gateway node or
              cluster. The THRESHOLD option
              monitors whether the node-related
              thresholds rules evaluation
              is running as expected, and if
              the health status has changed
              as a result of the threshold limits
              being crossed.
              Addition --clear option to the
              mmhealth node eventlog command.
              This option clears the event log's database.
              Addition of threshold add and
              threshold delete option to the
              mmhealth command. This option
              allows users to create and
              delete threshold rule.
              Addition of event hide, event
              unhide, and list hidden options
              to the mmhealth command. The
              event hide option hides the
              specified TIP events, while the
              event unhide option reveals
              all TIP events that were previously
              hidden. The list hidden
              option shows all the TIP events
              that are added to the list
              of hidden events.
              Addition of config interval option
              to the mmhealth command.
              The config interval option allows
              you to set the interval
              for monitoring the whole cluster.

    mmkeyserv command: Updating a certificate or a
    connection
              You can now get a fresh
              certificate from an Remote Key
              Management (RKM) server without
              rebuilding the connection.
              You can also temporarily update a
              connection by adding backup
              servers, reordering the list of
              backup servers, or changing
              the timeout, number of retries,
              or retry interval.

    mmlslicense command: Displaying disk and
    cluster size information
              You can now get information
              about disk and cluster size with
              the mmlslicense command.

    mmnetverify command: Enhancements
              Several enhancements increase
              the capabilities of the
              mmnetverify command. Network
              checks are added to measure
              the total bandwidth to check
              connectivity with the CTDB port,
              and to check connectivity with
              servers that are used with
              the Object protocol. If there
              are multiple local nodes, the
              command is run on all the local
              nodes in parallel. The lists
              of local nodes and target nodes
              accept node classes. The
              --ces-override parameter causes
              the command to consider all
              the nodes in the configuration to
              be CES-enabled.

    mmrestripefile command: Fix inconsistencies
    between file data and replicas
              The -c option compares the data
              of individual files with their
              replicas and attempts to fix any inconsistencies.

    Monitoring of AFM and AFM DR
              Using commands:
                * Functionality added to mmhealth,mmdiag,
                  and mmperfmon
                * New parameters added to mmpmon.
              Using IBM Spectrum Scale GUI:
                * Added new Files > Active
                  File Management page. This
                  new GUI page helps to view
                  the configuration, health
                  status, and performance of AFM,
                  AFM DR, and gateway nodes.

    Mount options specific to IBM Spectrum Scale:
    syncnfs is now the default on Linux nodes
              In the mount options specific
              to IBM Spectrum Scale,
              syncnfs is now the default on
              Linux nodes. On AIX® nodes,
              nosyncnfs is the default.

    Protocol support on remotely mounted file systems
              You can create an NFS/SMB
              export on a file system that
              is mounted from a remote cluster.

    Questionnaire for an AFM DR deployment

    Tip added to event status to inform
    users when a configuration is not optimal
              A new event type TIP is added
              to system health monitoring. A
              Tip is similar to a state-changing
              event, but can be hidden
              by the user. Like state-changing
              events, a tip is removed
              automatically if the problem is resolved.

    Quality of Service for I/O operations (QoS):
    Detailed statistics
              You can now display more
              detailed statistics about IOPS
              rates for the QoS programs
              that are running on each node.
              The statistics are intended to
              be used as input for programs
              that analyze and display data.

    Support for Samba 4.5

    Transparent cloud tiering enhancements.
              The following changes are done:
                * Support for configuring and
                  deploying WORM solutions.
                  Your files will remain WORM-compliant,
                  both in the file
                  system and on the cloud object storage.
                * Support for configuring Transparent
                  cloud tiering with a proxy server.
                * Support for configuring cloud
                  retention time, which
                  overrides the default value.
                * Support for restoring only the
                  file stubs from the cloud
                  storage tier in situations where
                  files are deleted from
                  the local file system.
                * Substantial improvement in the
                  performance when files are
                  transparently recalled from the
                  storage tier.
                * Support for manually deleting
                  orphaned cloud objects
                  before retention time expires.
                * Support for migrating files in
                  the co-resident state,
                  by which applications can directly
                  access data without
                  performing any recall operation.

    -Y option
              Added the -Y option to the following commands:
                * mmblock mmhealth mmlsfileset
                * mmcloudgateway mmkeyserv mmlsfs
                * mmdf mmlscluster mmlslicense
                * mmdiag mmlsconfig mmlsmgr mmlsquota
                * mmgetstate mmlsdisk mmlsmount mmlssnapshot
                * mmlsnodeclass mmnetverify
                * mmlsnsd mmnfs mmlspolicy
                * mmrepquota mmsmb mmuserauth

    New commands
              mmclidecode

    Changed commands
                * mmblock mmhealth mmlsfileset mmlsnodeclass
                * mmcloudgateway mmkeyserv mmlsfs mmlsnsd
                * mmdf mmlscluster mmlslicense mmlspolicy
                * mmdiag mmlsconfig mmlsmgr mmlsquota mmsmb
                * mmgetstate mmlsdisk mmlsmount mmlssnapshot
                * mmbackup mmcallhome mmcesdr mmchattr
                * mmchqos mmcrnsd mmfsck mmgetstate
                * mmimgbackup mmimgrestore mmkeyserv
                * spectrumscale mmnetverify mmrepquota
                * mmuserauth mmchconfig mmhadoopctl
                * mmprotocoltrace mmnfs

    Changed structures
              gpfs_iattr64_t

    Changed subroutines
              gpfs_prealloc

    Deleted commands
              mmrest

    New messages
              6027-1525, 6027-1756, 6027-2392, 6027-2393,
              6027-2503, 6027-2504, and 6027-3258

    Changed messages
              6027-1023, 6027-1725

     

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-10-03T22:22:48Z  

    Technote (Troubleshooting):  IBM Spectrum Scale: SMB cluster export services (CES) must not be upgraded concurrently

    Problem (Abstract):
    SMB cluster export services (CES) within IBM Spectrum Scale must not be upgraded concurrently nor have differing versions of the gpfs.smb rpm active across multiple CES nodes at once.

     

    See the complete Technote at:  http://www.ibm.com/support/docview.wss?uid=ssg1S1010619

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-10-03T22:23:38Z  

    Flash (Alert): IBM Spectrum Scale: Quick restart of ctdb under SMB load can lead to a race condition

    Abstract:
    IBM has identified an issue with IBM Spectrum Scale V4.2.0.x in which a quick restart of ctdb under SMB load can lead to a race condition that keeps ctdb stuck in an endless recovery. This can occur during upgrade or by rapidly bringing the SMB service online / offline.

     

    See the complete Flash at:  http://www.ibm.com/support/docview.wss?uid=ssg1S1010618

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-10-09T19:19:15Z  

    Flash (Alert):  IBM Spectrum Scale (GPFS) V4.1 and 4.2 levels: network reconnect function may result in file system corruption or undetected file data corruption

    Abstract
    IBM has identified a problem with IBM Spectrum Scale (GPFS) V4.1 and V4.2 levels, in which resending an NSD RPC after a network reconnect function may result in file system corruption or undetected file data corruption.


    See the complete Flash at:  http://www.ibm.com/support/docview.wss?uid=ssg1S1010668

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-10-26T21:04:52Z  

    GPFS 4.2.3.5 is now available from IBM Fix Central:

    http://www-933.ibm.com/support/fixcentral

    Problems fixed in GPFS 4.2.3.5

    October 4, 2017

    * Fix a log assert "Unable to find cached PG map entry for pg X in vIndex Y". This fix will produce a GNR event log when unable to fix a media error IV99611.
    * Fix a problem where the nsd was deleted and created again, then the node tried to reread disk configuration so it can update the nsd information, but network issue caused that to fail, then the node got stale nsd info that led to mount failure IV99675.
    * Fix a mutex locking order problem, which can lead to a deadlock when the file system is being closed IV99611.
    * Fix the use count leak on a stripe group to resolve a stripe group cleanup pending issue IV99611.
    * Fix an assert 'iter->second' in the GPFS daemon (CCR) that can occur during mmshutdown IV99611.
    * Fix a problem in which the CTIME is not updated correctly on files, Ganesha, IV99677.
    * Fix the 93 seconds delay always seen during GPFS daemon startup on the current cluster manager node IV99611.
    * Fix GPFS (CCR) logic to close used socket file descriptors just one time avoiding failed GPFS remote procedure calls IV99611.
    * Fix generating unnecessary recalls when truncating migrated files IV99676.
    * Fix a problem in which a file system unmount will fail if FileHeat is enabled and snapshots are present IV99611.
    * Fix a problem in which the mmnfs export list command fails in an unpredictable manner IV99611.
    * Fix log assert when a Windows node is added into a cluster that has an encrypted fs IV99611.
    * Fix a ofP->inodeLk.get_lock_state() & (0x2000000000000000LL | 0x4000000000000000LL) assert that can occur when FileHeat is enabled and snapshots are present IV99611.
    * Fix a problem in which offline fsck does not repair all ind block replicas in reserved files which can lead to more corruptions during fs use IJ00397.
    * The fix affects customers using mmhealth THRESHOLD SERVICE. Fix a problem in which the mmhealth THRESHOLD state for some nodes never changes from CHECKING. This is for all platforms IV99611.
    * Fix a problem in which the default grace period on a Ganesha system is not displayed correctly IV99611.
    * Fix for blocked cesFailoverLock (cesFailoverLock: failed with rc 99) IV99611.
    * Fix a problem in which accessing a TCT migrated file can result in a hang when thumbnail support is used IV99611.
    * Fix a (delay forever" == "completed") daemon assert IV99678.
    * Fix an issue in the AFM environment where incorrect filtering under certain workloads causes the writes to be dropped. This causes the replication not to happen fully and causes the data mismatch between cache/primary and home/secondary IV99796.
    * The fix affects customers that have renamed the cluster and is using mmhealth THRESHOLD SERVICE. Fix a problem in which the SYSTEM HEALTH eventlog contains unhealthy alerts for pool_data, pool_metadata, and inode components even though they don't have capacity utilization problems. This can occur on any platform IV99611.

    * Fix a segmentation fault that happens when the file system rebalancing fails to open the file system IV99611.
    * Fix an issue in the AFM environment where incorrect entries in the prefetch list file (ex. . and ..) causes directory block corruption because AFM permits the filename as '.' to be created without validation of the input IV99679.
    * Fix a performance degradation problem when running tar from an NFS client IV99709.
    * Correct the %filesetName that is passed to the callback command for the usageUnderSoftQuota callback event. This can only occur on FILESET quota types IV99680.
    * Fix quota usage accounting, in a file system with strict allocation "whenpossible", when not all data replicas can be allocated due to lack of space or failure groups IJ00031.
    * Fix a deadlock seen while using NFS with TCT IJ00094.
    * Fix possible file system corruption caused by a network reconnect IJ00398.
    * This update addresses the following APARs: IJ00031 IJ00094 IJ00397 IJ00398 IV99611 IV99675 IV99676 IV99677 IV99678 IV99679 IV99680 IV99709 IV99796.

     

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-11-28T13:29:10Z  

    Security Bulletin: Vulnerabilities in Samba affect IBM Spectrum Scale SMB protocol access method (CVE-2017-12163, CVE-2017-12151, CVE-2017-12150)

    Summary

    Vulnerabilities in Samba affect IBM Spectrum Scale SMB protocol access method that:
    - could allow a remote authenticated attacker to obtain sensitive information, caused by a memory leak over SMB1 (CVE-2017-12163)
    - could provide weaker than expected security, caused by the failure to properly sign and encrypt DFS redirects when the max protocol for the original connection is set as ''SMB3'' (CVE-2017-12151)
    - could allow a remote attacker to obtain sensitive information, caused by the failure to require SMB signing in SMB1/2/3 connections (CVE-2017-12150)

     

    See the complete bulletin at http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010703

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-11-30T22:26:34Z  

    Technote (troubleshooting): IBM Spectrum Scale v4.2.x may experience cluster hangs, unacceptably high CPU load spikes, or high Command Line Interface command execution time on IBM System z14

    Problem (Abstract)

    Core components of Spectrum Scale initialize and use encrypted communication. This communication is happening whenever certain Spectrum Scale commands are performed, even when the system appears to be idle. Depending on the IBM System z14 configuration, cluster hangs may occur. In addition, encryption initialization may temporarily spike CPU usage to 100% and response times will slow down to unreasonable speeds.

    IBM has corrected the issues in IBM Spectrum Scale v4.2.3.6 and later releases for zLinux only.

     

    See the complete Technote at:  http://www.ibm.com/support/docview.wss?uid=ssg1S1010859

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-12-12T14:42:56Z  

    Technote (troubleshooting): IBM Spectrum Scale: Using O_DIRECT and fork(2) in the same process in Linux

    Problem(Abstract):
    IBM Spectrum Scale: Using O_DIRECT and fork(2) in the same process in Linux

    Symptom:
    System call read() may fail to record data into the user buffer when direct I/O is used, that is, when specifying the O_DIRECT flag when opening the file.

     

    See the complete Technote at:  http://www.ibm.com/support/docview.wss?uid=ssg1S1010878

     

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2017-12-18T19:33:45Z  

    GPFS 4.2.3.6 is now available from IBM Fix Central:

    http://www-933.ibm.com/support/fixcentral

    Problems fixed in GPFS 4.2.3.6

    November 22, 2017

    * This fix excludes hidden CCR files from the scheduled callhome data collection IJ00977.
    * Fix potential issue with cesiplist file updates in ccr that can result in messages like "cesiplistLocalSerial is not numeric: ()" IJ02158.
    * Fix code to avoid long running CCR synods on different quorum nodes causing long running GPFS 'mmgetstate -a' command IJ00977.
    * This fix respect the mmdelsnapshot -N option when resuming DeleteRequired snapshot IJ02220.
    * Fix an issue in the AFM environment where AFM prefetch causes daemon assert if the directories are deleted after the prefetch queueing IJ00977.
    * Fix a rare assert that can occur during metanode takeover due to a stalled indirect block left in the cache IJ00977.
    * Fix a deadlock caused by the allocation region requests handler. Users would see long waiters on allocation manager cursors when deadlock happens IJ01063.
    * Address a problem where failbackToPrimary --start command tries to delete any later snapshots than the latest snapshot present at the old primary end. But when that snapshot is not present at the acting primary - we need to handle the error rather than ignoring and continuing the failback command IJ00977.
    * Fix an inode count leak problem which may happen when gpfs_iwrite/gpfs_iwritex API fails with ESTALE, tsrestorefileset utility uses this GPFS API IJ00977.
    * Fix hangs and timeouts that occur during snapshot commands in rare failure cases IJ01335.
    * Fix a problem where the close issued on a remote NFS mount can get stuck and causes an unlink of an AFM fileset to get stuck IJ00977.
    * Fix the assert issue on generation number when flushing or writing indirect blocks. This issue only happens when the clone files were used and deleted IJ00977.
    * Fix a disk address assert that can occur when a thread reads a compressed region of a file at the same time when a different node uncompresses the same region IJ00977.
    * Fix a vectored DIO (writev/readv) dead lock which may happen if the filesystem is being quiesced IJ00977.
    * Fix a potential infinite loop when reading a compressed file with alternating compressed and uncompressed regions IJ00977.
    * Add specific handling of SKLM error messages in case a required configuration parameter in the SKLMConfig.properties file is missing. Add a more detailed error message from mmkeyserv command in case a configuration parameter is wrong or missing IJ00977.
    * Fix a potential snapshot file data corruption that can be caused by a crash occurring when a compressed file is being deleted from the active file system or a snapshot IJ00977.
    * Fix a NULL buffer pointer dereference problem by adding synchronization for accessing the buffer pointer IJ00977.
    * Fix a problem in disk verification that wrongly calculated on disk stripe group descriptor checksum IJ01325.
    * Fix a potential E_HOSTDOWN (80) error when a compressed file is being appended while the node is the process of becoming a metanode at the same time IJ00977.
    * Fix an issue in the AFM environment where recovery fails with error 112 while checking for the deleted directories IJ00977.
    * Fix the no space issue when running mmchdisk start command. A similar issue can happen on normal writes IJ01065.
    * Add more provision to catch a case where a Queue item is becoming NULL when IO is happening to the fileset and the queue is being flushed IJ00977.
    * Fix an assert that can occur when xattrs is heavily used and there is an unusual block size setting IJ00977.
    * An inaccurate and unnecessary assert in buffer bitmap processing is removed IJ00977

    * Fix wrong fs struct error format IJ00977.
    * Fix a deadlock situation involving lock conflict while stealing a buffer for file system metadata repair IJ00977.
    * Fix a logic bug which may cause a log recovery to fail with E_RECOV_INCOMPLETE (code 234), this problem can happen on PARALLEL_LOGRECOVERY enabled builds (since GPFS 4.2.1) if log file size is bigger than 16MB IJ00977.
    * Add a debugging utility to calculate checksum values of disk data IJ00977.
    * Fix a problem in which mmgetstate -s may not display the correct number of quorum nodes defined in the cluster IJ01064.
    * Fix a signal 7 that can occur when a compressed file is expanded in hyper allocation mode IJ00977.
    * Address a problem where re-applyupdates should not invoke failbacktoprimary --start when failbacktoprimary --stop is failed due to changes that are detected at acting primary IJ00977.
    * Fix assert "'false' failed" in paxosserver.C:3129 in the GPFS daemon (CCR) that happened during GPFS startup IJ00977.
    * Fix an issue in the AFM environment where deleting directories from .ptrash directory fails with directory not empty error. This issue happens when the directory is deleted from home before readdir is performed at cache IJ01066.
    * In a dump file, the dump directory size was incorrectly reported as PB when the unit is TB. The problem is now fixed IJ00977.
    * Fix a problem in which AIX/NFS servers deadlock trying to recover a client's fcntl lock following the loss of another node in the same cluster IJ01068.
    * Fix a problem in which the prefetch command did not work on a special character (--) named file IJ01067.
    * When the read or write of a log vdisk fails during rebuild operation, use the IO error code to trigger resignation, as opposed to using E_OK IJ00977.
    * Fix a potential assert that can occur when a compressed file is being closed after having been deleted and any compressed compression group within the file was partially copied (COW) before the file deletion IJ00977.
    * This fix supports callbacks with long list of parameters IJ01069.
    * With this fix the output of mmsmb export list -Y and mmsmb config list -Y is changed. It now has an additional colon at the end of the output lines IJ00977.
    * Fix an issue in the AFM environment where the cached bit is not set after reading the entire file. This causes the eviction failures and also performance degradation during the write operations IJ00977.
    * Address a problem where changing the backend from NFS to GPFS (or viceversa) - can cause bad filehandle errors IJ00977.
    * Update directory code to avoid excessive recursion that could lead to stack overflow. Stack overflow could cause GPFS daemon to either crash with Signal 11 or get stuck in a signal handler IJ01070.
    * Fix a DBGASSERT exp(bytesLeftInStride > 0) which may happen if multiple threads access the same file and (at least) one of them access the file with stride access pattern IJ01114.
    * Fix service_running appearing in the mmhealth eventlog without a reason IJ00977.
    * This fix is recommended if you see file system hangs requiring reboots to recover IJ00977.
    * Fix a GPFS daemon assert that can occur when the inode0 file grows to more than 4B blocks IJ01087.
    * Fix a deadlock scenario involving starting a disk at the time of recovery IJ01328.
    * Fix an issue in the AFM environment where a prefetch can cause a filesystem quiesce not to happen when home is not responding. This will cause a deadlock at cache cluster until home starts responding IJ00977.
    * This fix prevents a segmentation fault in tslspdisk. This fix applies to GSS/ESS customers IJ01327.
    * Fix an issue in the AFM environment where ACLs are not updated properly in the cache with directory inheritance. This happens when users do not have permission to update the ACLs IJ00977.
    * Fix an assert - exp(ioDataUpdateInProgress == 0 OR DaemonShuttingDown) which may happen if the application does IO with fuzzySequential access pattern IJ00977.

    * Fix for gpfs.snap to collect CES address marker files (node-affinity information) IJ01072.
    * Fix code to speed-up GPFS CCR read requests and mm-commands when reading from the CCR IJ01086.
    * This fix allows customers to run the mmchenclosure command to confirm that a storage enclosure fan is reporting a failure and can be replaced. This fix applies to GSS/ESS customers IJ01330.
    * In a mixed cluster where the HSM session manager runs on a 4.2.x node, the access to HSM migrated files from a 4.1.x node now works fine IJ01115.
    * Fix a problem in the gpfs mmap code that can result in negative mmap counters. When a file gets memory mapped by a child process GPFS skipped incramenting mmap counters when it failed to verifying its credentials because of the number of groupids exceeded the limit. But decramented mmap counters during close time. This caused a node to crash because of the negative mmap counter IJ01913.
    * Fix a performance problem in mmsmb and a minor problem with its machine readable output IJ01073.
    * Fix a problem that can result in a flood of handle_network_problem_info mmhealth events. This can cause the GUI to crash IJ02010.
    * Fix the high CPU usage issue on Windows due to a busy loop in a receiver thread when there are some network errors IJ01863.
    * This update addresses the following APARs: IJ00977 IJ01063 IJ01064 IJ01065 IJ01066 IJ01067 IJ01068 IJ01069 IJ01070 IJ01072 IJ01073 IJ01086 IJ01087 IJ01114 IJ01115 IJ01325 IJ01327 IJ01328 IJ01330 IJ01335 IJ01863 IJ01913 IJ02010 IJ02158 IJ02220.

     

  • gpfs@us.ibm.com
    gpfs@us.ibm.com
    662 Posts

    Re: IBM Spectrum Scale V4.2.3 announcements

    ‏2018-02-02T18:02:10Z  

    Flash (Alert):  IBM Spectrum Scale (GPFS):  Undetected corruption of archived sparse files (Linux)

    Abstract

    IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for Linux environments, in which a sparse file may be silently corrupted during archival, resulting in the file being restored incorrectly.

     

    See the complete Flash at:  http://www.ibm.com/support/docview.wss?uid=ssg1S1012054