2004-04-27 kernel 2.6.5 bug fix patch 01 ("April 2004")

If you download any software from this web site please be aware of the Warranty Disclaimer and Limitation of Liabilities.

linux-2.6.5-s390-01-april2004.tar.gz / MD5 ... accumulated patch, recommended (2004-04-27)

linux-2.6.5-s390-01-april2004-patches.tar.gz / MD5 ... per-problem-patches, recommended (2004-04-27)

These patches contain the following linux kernel bug fixes:

Description:
3270: Missing pointer checks.
Symptom:
Kernel crashes if the subchannel of a 3270 device is detached.
Problem:
If the subchannel of a 3270 is detached while an i/o is in progress the common i/o layer passes an error encoded in the irb pointer to the 3270 interrupt handler. The interrupt handler needs to check for this special case.
Solution:
Add missing pointer sanity checks.
Problem-ID:
-
Description:
cio: Clean up subchannels on reipl.
Symptom:
After a forced reboot via SysRq-b on LPAR, some devices that were active are not detected.
Problem:
On SysRq-b, no real cleanup is performed (especially the shutdown method is not called by the driver core and the reboot notifier chain is not triggered). After reipl on LPAR the subchannels may remain in a busy state and can not be sensed on startup.
Solution:
Insert a loop before do_reipl which tries to quiesce via csch() and disable all active subchannels.
Problem-ID:
-
Description:
cio: Correct handling of no path during path verification.
Symptom:
Linux hangs because path grouping is tried endlessly.
Problem:
If a subchannel has lpm==0 and path verification is performed (as may happen after logical vary off), cio_start() will return -ENODEV (and not -EACCES). The subchannel's vpm is not adapted in that case and the while loop in __ccw_device_verify_start() will never finish because vpm and lpm will always be unequal.
Solution:
Handle -ENODEV like -EACCES and switch off current path in the vpm.
Problem-ID:
-
Description:
cio: Clean up subchannels on reipl.
Symptom:
After a forced reboot via SysRq-b on LPAR, some devices that were active are not detected.
Problem:
On SysRq-b, no real cleanup is performed (especially the shutdown method is not called by the driver core and the reboot notifier chain is not triggered). After reipl on LPAR the subchannels may remain in a busy state and can not be sensed on startup.
Solution:
Insert a loop before do_reipl which tries to quiesce via csch() and disable all active subchannels.
Problem-ID:
-
Description:
cio: Correct handling of no path during path verification.
Symptom:
Linux hangs because path grouping is tried endlessly.
Problem:
If a subchannel has lpm==0 and path verification is performed (as may happen after logical vary off), cio_start() will return -ENODEV (and not -EACCES). The subchannel's vpm is not adapted in that case and the while loop in __ccw_device_verify_start() will never finish because vpm and lpm will always be unequal.
Solution:
Handle -ENODEV like -EACCES and switch off current path in the vpm.
Problem-ID:
-
Description:
cio: Spurious timeouts.
Symptom:
Not user observable - cio does unneeded I/O reiterations.
Problem:
After getting an interrupt for killing I/O after a logical vary off, the timer is not deleted.
Solution:
Delete timer when interrupt for killing the I/O arrived.
Problem-ID:
-
Description:
cio: Usage of strings in cio_crw debug feature.
Symptom:
Oops when displaying /proc/s390dbf/cio_crw/sprintf.
Problem:
The debug feature saves a pointer to the bus_id string, not the string itself. If the subchannel is freed, the pointer isn't valid anymore.
Solution:
Use subchannel id instead of bus_id.
Problem-ID:
-
Description:
cio: unexpected event in state machine leads to kernel panic.
Symptom:
After a logical vary off of a networking chpid on which a device is in use, "dev_jumptable[0][1]==0" is printed and a BUG() occurs (only observed with pre-emptible kernel).
Problem:
There is a window where an unsolicited interrupt may come in for a subchannel which has no longer an associated ccw_device, but for which the subchannel's intparm is still set, leading to the attempted delivery of an interrupt for a non-existent device.
Solution:
Correctly disable the subchannel in ccw_device_nopath_notify, because after a logical vary off the subchannel is still existing but must not produce (unsolicited) interrupts.
Problem-ID:
-
Description:
ctc: Fix for a Kernel panic in CTC driver related to irq_handler
Symptom:
Kernel panic in CTC driver at the attempt to vary off/on for chipids
Problem:
Interrupt handler missed checking error codes in irb
Solution:
checking routine and call to it then exit with errorcode if appr.
Problem-ID:
-
Description:
dasd: Alternating 'steal lock' leads to hanging process.
Symptom:
Process performing reserve/release hangs and can not be killed.
Problem:
If reserve/release is performed after the existing lock was stolen by someone else, the SSH hits the pending 'lock stolen' sense an returns with unit check.
ERP generates a recovery request (TIC). The ERP is started and has to wait until the device is released.
Solution:
Introduce a generic 'USE_ERP' flag per request, which is set by default, but cleared for reserve/release.
Problem-ID:
-
Description:
dasd: open_count problems when using dasdfmt.
Symptom:
dasdfmt reports 'Disk in use' when trying to format a device.
Problem:
Usage of the open_count changed.
Solution:
Initialize the open_count with '-1'. This means that an online device has an open_count of '0' and a device opened once (e.g. for formatting) has an open_count of '1' -- as it was on kernel 2.4.
Problem-ID:
-
Description:
iucv: NULL pointer dereference
Symptom:
Kernel OOPS
Problem:
missing return in case of ENOMEM in netiucv_transmit_skb() wrong place of an if-branch in conn_action_txdone()
Solution:
add missing return in netiucv_transmit_skb move if (privptr) inside the if (skb=...) in conn_action_txdone()
Problem-ID:
-
Description:
iucv: potential memory leek
Symptom:
available memory decreases over time
Problem:
missing kfree(new_handler) in iucv_register_program()
Solution:
added kfree(new_handler)
Problem-ID:
-
Description:
Kernel: Race condition with kernel preemption.
Symptom:
If the configuration option CONFIG_PREEMPT is used the kernel can crash. This happens in particular under heavy network load.
Problem:
The do_softirq function uses the asynchronous interrupt stack. The switch from the per process kernel stack to the asynchronous interrupt stack might be done while running enabled. If an asynchronous interrupt comes in and the kernel gets preempted, the information stored by do_softirq on the asynchronous interrupt stack gets lost. This results in a crash as soon as the preempted process is scheduled again.
Solution:
Disable interrupts before switching to the asychronous interrupt stack.
Problem-ID:
-
Description:
qdio: incorrect comments/messages
Symptom:
source code descriptions and messages are misleading
Problem:
used inappropriate names for OSA and HiperSockets devices
Solution:
change strings in code
Problem-ID:
-
Description:
qeth: bad network performance and potential network freeze
Symptom:
network performance (especially the inbound path) is considerably bad compared to 2.4 version of qeth; network may hang under stress
Problem:
inbound and outbound traffic handled by tasklets does not perform as well as when buffers were handled directly in interrupt handler
Solution:
remove inbound and outbound tasklets and handle buffers directly in interrupt handlers
Problem-ID:
7789
Description:
qeth: ccwgroup device reference count problem
Symptom:
If a qeth ccwgroup device is ungrouped, the single ccw devices cannot be regrouped again
Problem:
qeth creates sub directories in sysfs for each device. These sub directories hold references to the corresponding ccwgroup device. On removal of qeth device the sub directories are not removed automatically; references on ccwgroup device are not released. As a result, common IO does not clean up the ccwgroup device and does not allow another grouping of related ccw devices.
Solution:
Add a call to qeth_remove_device_attributes to qeth_remove_device. This explicitly removes previously created subdirs om sysfs.
Problem-ID:
-
Description:
qeth: query ARP on HiperSockets overwrites HiperSockets device memory
Symptom:
after a qetharp -q on a HiperSockets device, a subsequent operation (e.g. shutdown of the device) causes all HiperSockets devices on the same CHPID to be recovered
Problem:
size of IPA PDU for the query ARP command is too large and causes HiperSockets memory to be overwritten
Solution:
fix size of IPA PDU
Problem-ID:
-
Description:
qeth: setting router type for qeth device in ccwgroup.conf not possible
Symptom:
if the route4 or route6 attribute is defined in ccwgroup.conf the startup of the corresponding qeth device fails
Problem:
when router type is written in sysfs attribute by ccwgroup script it is tried to send the SETRTG IPA command which fails, since the qeth device is not yet brought up
Solution:
don't send IPA command if card is not in state SOFTSETUP or UP
Problem-ID:
-
Description:
zfcp: Always initialize host_scribble field.
Symptom:
Addressing exception, system hangs
Problem:
The zfcp driver sometimes passes commands back to the SCSI midlayer where the host_scribble field is not set to NULL. When such a command is re-tried this might lead to an addressing exception since the host_scribble field then points to an already freed memory area.
Solution:
Initialize host_scribble to NULL in queuecommand function.
Problem-ID:
-
Description:
zfcp: Remove request debug feature.
Symptom:
Lots of obsolete debug entries.
Problem:
The zfcp lldd logs its sequence number for each request, thus resulting in a high number of written debug entries and an increased cpu usage.
Solution:
Remove request debug feature.
Problem-ID:
-
Description:
zfcp: Reset host_scribble field in scsi_cmnd.
Symptom:
Addressing exception, system hangs
Problem:
zfcp relies on the fact that the host_scribble field for each scsi command that is passed to it is NULL. When the zfcp lldd can not send a command, it passes the command back to the scsi mid layer but does not reset this field to NULL. Result is that if this command gets re-tried an addressing exception may occur since the zfcp lldd assumes that host_scribble points to a valid address if it is not NULL.
Solution:
Reset host_scribble field to NULL.
Problem-ID:
-
Description:
zfcp: error recovery stall in case of unavailable nameserver
Symptom:
system hangs
Problem:
error recovery is waiting infinitely for the completion of the 'nameserver open' action which was not issued at all because the nameserver port was already known to be inaccessible and thus had been marked as failed
Solution:
fail 'open port' action immediately if issueing an 'open nameserver port' action is not possible though required
Problem-ID:
-

Everybody should apply this patch.

To create the complete linux kernel sources, the following patches need to be applied in sequence:

linux-2.6.5.tar.gz (see www.kernel.org/pub/linux/kernel/v2.6)
+ linux-2.6.5-s390-base-april2004.diff (IBM)
+ linux-2.6.5-s390-01-april2004.diff (IBM)

Contact the IBM team

If you want to contact the Linux on System z IBM team refer to the Contact the Linux on System z IBM team page.