2004-06-21 kernel 2.6.5 bug fix patch 04 ("April 2004")

If you download any software from this web site please be aware of the Warranty Disclaimer and Limitation of Liabilities.

linux-2.6.5-s390-04-april2004.tar.gz / MD5 ... accumulated patch, recommended (2004-06-21)

linux-2.6.5-s390-04-april2004-patches.tar.gz / MD5 ... per-problem-patches, recommended (2004-06-21)

These patches contain the following linux kernel bug fixes:

Description:
cio: Always get subchannel from list.
Symptom:
Oopses during heavy machine check load and vary on/off stress.
Problem:
Obtaining a subchannel via stsch() might yield a subchannel not yet or not any longer registered, leading to incorrect handling.
Solution:
Always obtain subchannels by their id via searching the devices list of the css bus.
Problem-ID:
-
Description:
cio: Disconnected ccw device missing after detach and re-attach.
Symptom:
After several devices were detached from VM and re-attached in a different order, not all devices appear under /sys/bus/ccw/devices, while all devices are present under /sys/devices/css0/.
Problem:
If devices are re-attached in a different order, the devices appear on different subchannels than before, leading to un-registration and re-registration of disconnected devices. However, there may be two devices with the same bus id now, since the old device structure has yet to be de-registered, leading to a (silent) failure when sysfs tries to create the symlink from /sys/bus/ccw/devices/<dev> to /sys/devices/css0/<subchannel>/<dev>.
Solution:
If the device number apparently changed, search amongst the disconnected devices for a device with the same bus id and un-register it as well.
Problem-ID:
-
Description:
cio: Incorrect error handling on ccwgroup device creation.
Symptom:
Oopses when trying to group a device (not observed yet).
Problem:
Devices are freed directly after being registered.
Solution:
Handle freeing in error path via release function.
Problem-ID:
-
Description:
cio: machine check and vary on/off problems.
Symptom:
Various oopses after machine checks or logical vary on/off.
Problem:
Handling of appearing and disappearing devices was not correctly serialized. Can not allocate channel path in interrupt path.
Solution:
Convert kernel thread for slow handling of crws to a single-threaded workqueue. Use this workqueue in addition to the existing for proper serialization to avoid double un-register. When detecting a missing channel path structure, trigger a re-scan of the channel subsystem.
Problem-ID:
-
Description:
dasd: DIAG access does not work when compiling DIAG as module.
Symptom:
On systems where the DASD DIAG component was compiled as a kernel module, trying to set a DASD device online after activating use_diag (SysFS attribute) will always fail.
Problem:
The DASD-internal check for availability of the DIAG component is incorrect in case DIAG was compiled as module.
Solution:
Fix the DIAG check.
Problem-ID:
-
Note:
applicable for 31-bit Linux, only
Description:
dasd: DIAG access does not work with disks which have not been CMS reserved.
Symptom:
Trying to access a disk via the DIAG discipline which has been formatted, but not CMS reserved, will fail.
Problem:
There is an obsolete DASD-internal check for CMS reserved data when trying to access a DASD disk via the DIAG discipline.
Solution:
Remove the obsolete check.
Problem-ID:
-
Note:
applicable for 31-bit Linux, only
Description:
dasd: dasd module load->unload->load leads to corrupt pointers.
Symptom:
Load the dasd_mod and dasd_eckd_mod modules. Set at least one eckd device online and offline. Unload dasd_mod and dasd_eck_mod and load them again. From that point on errors may happen when the modules are unloaded and loaded again, but there might be other triggers as well. Typical errors would be 'Unable to handle kernel pointer dereference at virtual kernel address' or an 'addressing exception'. One routine that is known to be affected by this error is dasd_device_from_cdev.
Problem:
The first time a dasd device is set online, a struct dasd_devmap is allocated. A pointer to this struct is stored in the matching struct ccw_device (in ccw_device.dev.driver_data). When the dasd module is unloaded, all dasd_devmap structures are freed, but since the ccw_device structures belong to the common I/O layer these remain and contain a now invalid pointer. When the dasd modules are loaded again, these pointers and the invalid data areas are used.
Solution:
When the connection between the common I/O layer data structures and the dasd data structures is broken (in dasd_delete_device) the driver_data pointer is set to Null.
Problem-ID:
-
Description:
dasd: dasd module load->unload->load can still corrupt pointers.
Symptom:
Load the dasd_mod and dasd_eckd_mod modules. Do not set the devices online but set the use_diag attribut to 0 or 1 for one or more devices. Unload the dasd modules and load them again. Set the dasd device of the previous step online. The device is not set up correctly. This can be checked by looking at /proc/dasd/devices. The device should be listed there, but it is not.
Problem:
A struct dasd_devmap is allocated the first time it is accessed, which is the case when a dasd is set online, but also when a dasd specific attribute is set. When the dasd_devmap is allocated, a pointer to it is stored in the ccw_device.dev.driver_data. This pointer needs to be set to Null before module unload or the invalid pointer will be used when the dasd module is loaded again. The cleanup code is invoked when a device is set offline, but when it was never set online we will keep the invalid pointer.
Solution:
The driver_data pointer is set when setting the device online and not when the devmap is created.
Problem-ID:
-
Description:
dasd: ioctl BIODASDDISABLE does not clean up partitions.
Symptom:
A call to BIODASDDISABLE should remove all partition information for a dasd. This is typical seen during dasdfmt. When dasdfmt is started it calls BIODASDDISABLE and all partition information from /sys/block/dasd<x>/ should disappear. Due to this defect the partition information is not removed at the beginning of formatting, but only by the partition reread at the end of formatting.
Problem:
dasd_destroy_partitions invokes ioctl BLKPG_DEL_PARTITION but this is not valid.
Solution:
The correct ioctl is BLKPG and BLKPG_DEL_PARTITION is an option that needs to be given as part of the argument.
Problem-ID:
-
Description:
dasd: Fixed dasd I/O for eckd and fba with blocksize!=4k and mem > 2GB
Symptom:
I/O errors reported in syslog, cannot boot with >2GB memory
Problem:
Calculation of number of idal words needed for channel program was incorrect.
Solution:
Fix above calculation.
Problem-ID:
-
Note:
applicable for 64-bit Linux, only
Description:
dcssblk: Error handling in dcssblk_add_store() may lead to duplicate call to kfree().
Symptom:
Error message from "kfree_debugcheck" may occur in dmesg when adding a segment failed.
Problem:
The same kfree() may be called by both, the device release function and the error handling code.
Solution:
Fixed error handling after calling device_register() in function dcssblk_add_store().
Problem-ID:
-
Description:
dcssblk: sleeping function called from invalid context.
Symptom:
Error message "bad: scheduling while atomic!" may occur in dmesg when a segment is being added.
Problem:
device_register() function (which may sleep), is called with a spinlock held.
Solution:
Replaced spinlock with semaphore.
Problem-ID:
-
Description:
iucv: Connection lost if peers use different interface names.
Symptom:
Data sent via an iucv connection whose peers use different interface names is not received.
Problem:
When connecting to a peer via IUCV, it is possible to specify up to 16 bytes of user data. Originally always the same default string was used, but in the last patch the interface name was stored within these 16 bytes. This has caused problems in the interrupt handler of iucv.c, which could no longer determine its matching handler, if interface names are different on both sides.
Solution:
Switch back to the original "iucvMagic" string specified as user data for iucv connect.
Problem-ID:
-
Description:
Kernel: Lost dirty bits.
Symptom:
Data corruption under memory pressure.
Problem:
The hardware dirty bit is cleared everytime SetPageUptodate is called. The common memory management code calls the function even if the page already is up to date. In this case the dirty bit may not be cleared because the page is potentially mapped writable in some user process which could have written to the page since the last writeback.
Solution:
Add a check to arch_set_page_uptodate to skip the clearing of the dirty bit if the page is already up to date.
Problem-ID:
-
Description:
lcs: cat /sys/bus/ccwgroup/drivers/lcs/0.0.ca00/type shows misleading device type.
Symptom:
The result of the command shows "OSA2 card" although it is an OSA Express card configured in LCS mode.
Problem:
There is no indication for cu3088 module to distinguish whether it is an OSA Express card configured in LCS mode or just an OSA 2 card. Therefore the text for device type is hardcoded in cu3088 module.
Solution:
Changed the device type text to "OSA LCS card"
Problem-ID:
-
Description:
lcs: registering multicast addresses for Token Ring and Ethernet failed.
Symptom:
Trying to join another multicast group fails for all multicast addresses belonging to a Token Ring device and some multicast addresses belonging to an Ethernet device as the appropriate set_multicast_list function of the network device is not called.
Problem:
Network stack does not notify the network devices for every multicast address.
Solution:
Introduced a multicast notifier chain so that network devices may register if they need to be notified.
Problem-ID:
-
Description:
lcs: multicast ping does not work.
Symptom:
Issueing a multicast ping generates no multicast traffic, means no packets are received by other systems.
Problem:
Command structure for registering multicast addresses was misaligned.
Solution:
Pack the structure so there are no holes anymore.
Problem-ID:
-
Description:
network: registering multicast addresses for Token Ring and Ethernet failed.
Symptom:
Trying to join another multicast group fails for all multicast addresses belonging to a Token Ring device and some multicast addresses belonging to an Ethernet device as the appropriate set_multicast_list function of the network device is not called.
Problem:
Network stack does not notify the network devices for every multicast address.
Solution:
Introduced a multicast notifier chain so that network devices may register if they need to be notified.
Problem-ID:
-
Description:
qeth: DIRECT SNMP not working correctly.
Symptom:
Direct SNMP does not work correctly. Osasnmpd displays error messages "... cannot handle OID ...".
Problem:
SNMP functions in qeth driver do not copy all data in a user's request from userspace to kernel. Size of IPA PDU encoded in IPA_PDU_HEADER is too small for SNMP commands. As a result, OSA copies only parts of its answers into qeth's i/o buffers, so we get broken OIDs and values back which cannot be handled by osasnmpd.
Solution:
Do not copy sizeof(struct qeth_snmp_ureq) from userspace, but use qeth_snmp_ureq.req_len instead. Further, for SNMP commands adjust the PDU size fields in IPA_PDU_HEADER. Introduced function qeth_send_ipa_snmp_cmd for this purpose.
Problem-ID:
-
Description:
qeth: potential race in debug feature.
Symptom:
With multiple CPUs it is possible that inconsistent debug output in the s390dbf is produced.
Problem:
We had only one global char buffer used by the macro QETH_DBF_TEXT_. This buffer can potentially be used concurrently by multiple CPUs.
Solution:
Defined the char buffer qeth_dbf_txt_buf per CPU. Modified the debug macro to use get_cpu_var and put_cpu_var.
Problem-ID:
-
Description:
qeth: registering multicast addresses for Token Ring and Ethernet failed.
Symptom:
Trying to join another multicast group fails for all multicast addresses belonging to a Token Ring device and some multicast addresses belonging to an Ethernet device as the appropriate set_multicast_list function of the network device is not called.
Problem:
Network stack does not notify the network devices for every multicast address
Solution:
Introduced a multicast notifier chain so that network devices may register if they need to be notified.
Problem-ID:
-
Description:
qeth: under heavy stress (200 guest scenario) qeth causes a kernel bug in case of sending packets.
Symptom:
In case of sending packets in a heavy stress scenario, qeth causes a kernel bug.
Problem:
If we caught up with the sending queue, BUG is caused as this should never happen.
Solution:
Some changes in buffer handling of qeth and return with EBUSY rather than using BUG();
Problem-ID:
-
Description:
xip2: disable readpage and readpages address space operations.
Symptom:
Error in syslog with backchain that warn that readpage or readpages address space operation has been called in xip2 file system.
Problem:
When calling sys_madvice for a memory area that has been generated by calling sys_mmap for a file that is stored on a xip2 file system, sys_madvice calls readpage or readpages address space operations.
Solution:
Disable readpage and readpages which should not be used anyway.
Problem-ID:
-
Description:
xpram: xpram request queue backing_dev_info not initialized correctly.
Symptom:
Kernel crash due to branch to zero.
Problem:
The request queue of the xpram devices do not have a valid unplug_io_fn pointer in its struct backing_dev_info. Due to that swap_unplug_io_fn branches to zero trying to call the unplug_io_fn function.
Solution:
Dynamically allocate xpram request queue with blk_alloc_queue which sets a pointer to the default unplug_io_fn.
Problem-ID:
-
Description:
z90crypt: domain=xx parameter ignored in certain situations (plus cleanup and minor changes)
Symptom:
domain=xx parameter ignored in certain situations.
Problem:
Incorrect checking in probe_crypto_domain.
Solution:
Remove un-needed check Further changes:
  • reformat code (indenting and #defines primarily), to make maintenance easier
  • changed the names of a couple fields in structs for correctness
  • change name from "zLinux" to "zSeries Linux" to match the legal requirements
  • make hotplug function static to prevent kernel namespace usage
  • use __user hints for sparse
Problem-ID:
9373
Description:
zfcp: Misleading abort messages.
Symptom:
Lots of zfcp messages and long aborts.
Problem:
By default the zfcp lldd prints out a lot of messages, especially if commands are getting aborted which might confuse the user.
Solution:
Change default loglevel from info to normal. In addition change a few messages from info to debug. Patch also changes abort behaviour where there is no more waiting for SBALs.
Problem-ID:
-
Description:
zfcp: downloading generic ACT leads to kernel BUG.
Symptom:
Kernel BUG: illegal operation.
Problem:
Condition in BUG_ON statement was too strict.
Solution:
Changed condition in BUG_ON statement.
Problem-ID:
-

Everybody should apply this patch.

To create the complete linux kernel sources, the following patches need to be applied in sequence:

linux-2.6.5.tar.gz (see www.kernel.org/pub/linux/kernel/v2.6)
+ linux-2.6.5-s390-base-april2004.diff (IBM)
+ linux-2.6.5-s390-01-april2004.diff (IBM)
+ xipfs612 (see linuxvm.org/patches/index.html)
+ xipfs622 (see linuxvm.org/patches/index.html)
+ linux-2.6.5-s390-02-april2004.diff (IBM)
+ linux-2.6.5-s390-03-april2004.diff (IBM)
+ single threaded workqueue patch (see marc.theaimsgroup.com/?l=bk-commits-head&m=108305028322900&q=raw)
+ linux-2.6.5-s390-04-april2004.diff (IBM)

Contact the IBM team

If you want to contact the Linux on System z IBM team refer to the Contact the Linux on System z IBM team page.