Readme and Release notes for release 4.1.0.1 General Parallel File System Advanced Edition 4.1.0.1 GPFS

Fix Readme

Abstract

xxx

Content

Readme file for: GPFS Readme header
Product/Component Release: 4.1.0.1
Update Name: GPFS_ADV-4.1.0.1-power-Linux
Fix ID: GPFS_ADV-4.1.0.1-power-Linux
Publication Date: 11 June 2014
Last modified date: 11 June 2014

Download location
Prerequisites and co-requisites
Known issues

Installation information

Additional information

Download location

Below is a list of components, platforms, and file names that apply to this Readme file.

Fix Download for Linux

Product/Component Name:	Platform:	Fix:
General Parallel File System Advanced Edition	Linux 64-bit,pSeries RHEL Linux 64-bit,pSeries SLES	GPFS_ADV-4.1.0.1-power-Linux

Prerequisites and co-requisites

None

Known issues

Problem discovered in earlier GPFS releases
During internal testing, a rare but potentially serious problem has been discovered in GPFS. Under certain conditions, a read from a cached block in the GPFS pagepool may return incorrect data which is not detected by GPFS. The issue is corrected in GPFS 3.3.0.5 (APAR IZ70396). All prior versions of GPFS are affected.

The issue has been discovered during internal testing, where an MPI-IO application was employed to generate a synthetic workload. IBM is not aware of any occurrences of this issue in customer environments or under any other circumstances. Since the issue is specific to accessing cached data, it does not affect applications using DirectIO (the IO mechanism that bypasses file system cache, used primarily by databases, such as DB2® or Oracle).

This issue is limited to the following conditions:
1. The workload consists of a mixture of writes and reads, to file offsets that do not fall on the GPFS file system block boundaries;
2. The IO pattern is a mixture of sequential and random accesses to the same set of blocks, with the random accesses occurring on offsets not aligned on the file system block boundaries; and
3. The active set of data blocks is small enough to fit entirely in the GPFS pagepool.
The issue is caused by a race between an application IO thread doing a read from a partially filled block (such a block may be created by an earlier write to an odd offset within the block), and a GPFS prefetch thread trying to convert the same block into a fully filled one, by reading in the missing data, in anticipation of a future full-block read. Due to insufficient synchronization between the two threads, the application reader thread may read data that had been partially overwritten with the content found at a different offset within the same block. The issue is transient in nature: the next read from the same location will return correct data. The issue is limited to a single node; other nodes reading from the same file would be unaffected.

Installation information

Installing a GPFS update for Linux on Power Systems
Follow the steps below to install the fix package:
1. Unzip and extract the update package (&lt filename >.tar.gz file) with one of the following commands:
  
  gzip -d -c &lt filename >.tar.gz | tar -xvf -
  
  or
  
  tar -xzvf &lt filename >.tar.gz
2. Follow the installation and migration instructions in your GPFS Concepts, Planning and Installation Guide (http://www-01.ibm.com/support/knowledgecenter/SSFKCN/gpfs_welcome.html).

Upgrading GPFS nodes
In the below instructions, node-by-node upgrade cannot be used to migrate from GPFS 3.4 or prior releases. For example, upgrading from 3.4.x.x to 4.1.y.y requires complete cluster shutdown, upgrade install on all nodes and then cluster startup.

Upgrading GPFS may be accomplished by either upgrading one node in the cluster at a time or by upgrading all nodes in the cluster at once. When upgrading GPFS one node at a time, the below steps are performed on each node in the cluster in a sequential manner. When upgrading the entire cluster at once, GPFS must be shutdown on all nodes in the cluster prior to upgrading.

When upgrading nodes one at a time, you may need to plan the order of nodes to upgrade. Verify that stopping each particular machine does not cause quorum to be lost or that an NSD server might be the last server for some disks. Upgrade the quorum and manager nodes first. When upgrading the quorum nodes, upgrade the cluster manager last to avoid unnecessary cluster failover and election of new cluster managers.
1. Prior to upgrading GPFS on a node, all applications that depend on GPFS (e.g. DB2) must be stopped. Any GPFS file systems that are NFS exported must be unexported prior to unmounting GPFS file systems.
2. Stop GPFS on the node. Verify that the GPFS daemon has terminated and that the kernel extensions have been unloaded (mmfsenv -u). If the command mmfsenv -u reports that it cannot unload the kernel extensions because they are "busy", then the install can proceed, but the node must be rebooted after the install. By "busy" this means that some process has a "current directory" in some GPFS filesystem directory or has an open file descriptor. The freeware program lsof can identify the process and the process can then be killed. Retry mmfsenv -u and if that succeeds then a reboot of the node can be avoided.
3. Upgrade GPFS using the RPM command as follows(make sure to be in the same directory as the files):
  
  For SLES or RHEL system,
  rpm -Uvh gpfs*.rpm
4. Recompile any GPFS portability layer modules you may have previously compiled. The recompilation and installation procedure is outlined in the following file:
  
  /usr/lpp/mmfs/src/README

Additional information

Notices

Package information

The update images listed below and contained in the tar image with this README are maintenance packages for GPFS. The update images are a mix of standard RPM images that can be directly applied to your system.

The update images require a prior level of GPFS. Thus, the usefulness of this update is limited to installations that already have the GPFS product. Contact your IBM representative if you desire to purchase a fully installable product that does not require a prior level of GPFS.

After all RPMs are installed, you have successfully updated your GPFS product.

Before installing GPFS, it is necessary to verify that you have the correct levels of the prerequisite software installed on each node in the cluster. If the correct level of prerequisite software is not installed, see the appropriate installation manual before proceeding with your GPFS installation.

For the most up-to-date list of prerequisite software, see the GPFS FAQ in the IBMÂ® Knowledge Center (http://www-01.ibm.com/support/knowledgecenter/SSFKCN/gpfs_welcome.html)

Update to Version:

4.1.0-1

Update from Version:

4.1.0-0

Update (tar file) contents:

README

changelog
gpfs.base-4.1.0-1.ppc64.update.rpm
gpfs.docs-4.1.0-1.noarch.rpm
gpfs.gpl-4.1.0-1.noarch.rpm
gpfs.msg.en_US-4.1.0-1.noarch.rpm
gpfs.ext-4.1.0-1.ppc64.update.rpm (GPFS Standard Edition and GPFS Advanced Edition only)
gpfs.crypto-4.1.0-1.ppc64.update.rpm (GPFS Advanced Edition only)

Changelog for GPFS 4.1.x
Unless specifically noted otherwise, this history of problems fixed for GPFS 4.1.x applies for all supported platforms.

Problems fixed in GPFS 4.1.0.1 [June 06, 2014]

Fix thread-safe problem in dumping GPFS daemon threads backtrace.

Fixes a problem with fsck repair of deleted root directory inode of independent filesets.

Fixed a problem in clusters configured for secure communications (cipherListconfiguration variable containing a cipher other than AUTHONLY) which may cause communications between nodes to become blocked.

After a file system is panicked, new lock range request will notbe accepted.

This fix only affects customers running GNR/GSS on Linux, and who have in the past misconfigured their GNR servers by turning the config parameter "numaMemoryInterleave" off, and who experienced IO errors on Vdisks as a result of that misconfiguration. These IO errors can potentially corrupt in-memory metadata of the GNR/GSS server, which can lead to data loss later on. This fix provides a tool that can be used to locate and repair such corruption.

Remove mmchconfig -N restrictions for aioWorkerThreads and enableLinuxReplicatedAio.

Fixed problem when reading a clonde child from a snapshot

Fixed a rare race condition causing the assert when two threads are attempting to do a metanode operation at the same time whilethe node is in the process of becoming a metanode.

Fixed a deadlock in a complicated scenario involving restripe,token revoke and exceeding file cache limit.

Fixed race between log recovery and mnodeResign thread

E_VALIDATE errors in the aclFile after node failure

Deal with stress condition where mmfsd was running out of threads

Fix a problem in log recovery that would cause it to fail when replaying a directory insert record. The error only occurs for filesystems in version 4.1 format, where the hash value of the file name being inserted is the same as an existing file in the directory. The problem is also dependent on the length of the file name, and only happens if the system crashes after the log record is committed, but before the directory contents are flushed.

Fixed the problem that was caused by a hole in the cleanBufferafter the file system panicked.

Close a hole that fileset snapshot restore tool (mmrestorefs -j) may cannot restore changed data for a clone child file.

Fix a rare assert which happens under low disk space situation

Fixed deadlock during mmap pagein

Fixed the problem of excessive RPCs to get indirect blocks and the problemof metanode lock starvation involving a huge sized sparse file.

A problem has been fixed where the GPFS daemon terminates abnormallywhen direct I/O and vector I/O (readv/writev) is used on encrypted files,and the data is replicated, or the data must go through an NSD server.

Fix a potential deadlock when selinux is enabled and FS is dmapi managed.

Close a hole that fileset snapshot restore tool (mmrestorefs -j) may cannot restore a snapshot in a race condition that one restore thread is deleting a file but another restore thread is also trying to get file attributes for this file.

Fixed a kernel oops that caused by a race in multiple NFS readson the same large file.

mmchfirmware command will avoid accessing non-existent disk path.

Fix a directory generation mismatch problem in an encrypted secvm file system.

shutdown hangs in the kernel trying to acquire revokeLock

Apply at your convenience. Even if you hit this bug, an equivalent cleanup is completed later in the command execution.

improved stability of daemon-to-daemon communications when cipherList is set to a real cipher (i.e. not AUTHONLY).

The serial number of physical disks is now recorded in the GNR event log, and displayed in the mmlspdisk command.

GNR on AIX allow only 32K segment.

Fixes a problem with fsck repair of corrupt root directory inode

mmbackup tricked by false TSM success messages Mmbackup can be fooled by TSM output when dsmc decides to roll-back a transaction of multiple files being backed up. When the TSM server runs out of data storage space, the current transaction which may hold many files will be rolled back and re-tried with each file separately. The failure of a file to be backed up in this case was not detected because the earlier message from dsmc contained "Normal File --> [Sent]" though it was later rolled back. Fixes in tsbuhelper now detect the failure signature "** Unsuccessfull **" string and instead of simply ignoring these now will revert the changes in the shadow DB for the matching record(s). Hash table keeps track of last change in each record already, so reverting is now a legal state transition for hashed records. Reorganized some debug messages and streamlined some common code to work better. Now find 'failed' string to issue reversion updates as well. Fixed pattern matching in tsbackup33.pl to properly display all "ANS####" messages.

Fix RO cache i/o error if mounting fs in ro mode.

Don't release mutex if daemon death.

Fix the path buffer length calculation to return the correct length for dm_handle_to_path() functions.

Fix bug in mmauth that may cause duplicate configure entries and node numbermismatch in configure file.

Fix a problem with creating directory if the parent directory has default POSIX ACL.

mmbackup fails to read hex env values mmbackup debug values, progress reporting, and possibly other user settings may be presented in decimal or hex, especially the bit-mapped progress and debugging settings. Perl doesn't always interpret the hex values correctly unless converted with the oct() function.

Correct an NLS-related problem with mmchdisk and similar commands.

This update addresses the following APARs: IV60187 IV60468 IV60469 IV60471 IV60475 IV60478 IV60543 IV60863 IV60864.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSFKCN","label":"General Parallel File System"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HWZZZ","label":"Older System x->Netfinity SP Switch"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"","label":""}}]

Was this topic helpful?

Document Information

Modified date:
25 June 2021

UID

isg400001819

Tips

Readme and Release notes for release 4.1.0.1 General Parallel File System Advanced Edition 4.1.0.1 GPFS_ADV-4.1.0.1-power-Linux Readme

Fix Readme

Abstract

Content

Contents

Installation information

Download location

Fix Download for Linux

Prerequisites and co-requisites

Known issues

Installation information

Additional information

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?