Fixes are available
Rational ClearCase Fix Pack (7.1.2.5) for version 7.1.2
Rational ClearCase Fix Pack (8.0.0.1) for version 8.0
Rational ClearCase Fix Pack 14 (7.1.2.14) for 7.1.2
Rational ClearCase Fix Pack 11 (8.0.0.11) for 8.0
Rational ClearCase Fix Pack 12 (8.0.0.12) for 8.0
Rational ClearCase Fix Pack 15 (7.1.2.15) for 7.1.2
Rational ClearCase Fix Pack 13 (8.0.0.13) for 8.0
Rational ClearCase Fix Pack 16 (7.1.2.16) for 7.1.2
Rational ClearCase Fix Pack 17 (7.1.2.17) for 7.1.2
Rational ClearCase Fix Pack 14 (8.0.0.14) for 8.0
Rational ClearCase Fix Pack 18 (7.1.2.18) for 7.1.2
Rational ClearCase Fix Pack 15 (8.0.0.15) for 8.0
Rational ClearCase Fix Pack 19 (7.1.2.19) for 7.1.2
Rational ClearCase Fix Pack 16 (8.0.0.16) for 8.0
Rational ClearCase Fix Pack 17 (8.0.0.17) for 8.0
Rational ClearCase Fix Pack 18 (8.0.0.18) for 8.0
Rational ClearCase Fix Pack 19 (8.0.0.19) for 8.0
Rational ClearCase Fix Pack 20 (8.0.0.20) for 8.0
Rational ClearCase Fix Pack 21 (8.0.0.21) for 8.0
APAR status
Closed as program error.
Error description
Processes hang within a zombie state for several minutes under certain conditions ClearCase 7.0.x Any Linux OS Description of Problem: Previous history of fix applied: *** *** *** When a process is killed, no matter what reason, all files opened by said process (and not voluntarily closed) are automatically closed by the OS: However, the OS does not remove the process from the process list until the entire file cleanup completes. On a regular file, the close operation is just done and the process exits. However, when the file being closed is from within an MVFS mount, MVFS must first to communicate with the VOB//View server, using RPC calls. The problem arises when the process is signaled (such as a ctrl-c): Under these circumstances, the OS denies RPC initiation (which is a big problem as we rely on the RPC communication to complete our tasks). Since we try and communicate and are inhibited, we retry again after 5 seconds. To wait those 5 seconds, (in 7.0.x) we are using mdelay, which is a busy loop -- This is why the CPU's usage propels up to 100 percent. *** *** *** Specific problem now with the code modification: Because of the events that occur within the defect above, after each failed communication attempt we spent 5 seconds waiting before each RPC retry to the VOB/View -- remember: only when the process has opened files and pending fatal signals will this delay occur. In reviewing, each close call will expand to 2 or more calls: a 'flush' call for each time the file was opened (at least 1) and a 'release' call when there are no more references to that file (the 'last' close, or the real close). So, a single close operation requires more than one set of RPC communications to the VOB/View server plus the time required by the underlying FS to actually close the cleartext file. In reviewing all of the retries and what is occurring with the minute or so delay, this is to be expected with the current MVFS internal processes. Work Around: Signal handle ctrl-c (as well as others) to gracefully close the open files and exit cleanly.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: * **************************************************************** * PROBLEM DESCRIPTION: * **************************************************************** * RECOMMENDATION: * **************************************************************** When a process is being terminated by a signal, the RPC layer blocks any further RPCs from that process, returning the error ERESTARTSYS to the calling layer. MVFS includes retry logic for certain RPC errors, and that retry loop, compounded by the number of files the process had open, was adding significant time to the process termination. The MVFS RPC code now handles this case by checking for the pending signal.
Problem conclusion
Fixed in ClearCase 7.1.1.8, 7.1.2.5, and 8.0.0.1.
Temporary fix
Comments
APAR Information
APAR number
PK97793
Reported component name
CLEARCASE UNIX
Reported component ID
5724G2901
Reported release
701
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2009-10-02
Closed date
2011-12-16
Last modified date
2011-12-16
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
CLEARCASE UNIX
Fixed component ID
5724G2901
Applicable component levels
R701 PSN
UP
Document Information
Modified date:
16 December 2011