2006-11-02 kernel 2.6.16 bug fix patch 08 ("October 2005")

If you download any software from this web site please be aware of the Warranty Disclaimer and Limitation of Liabilities.

linux-2.6.16-s390-08-october2005.tar.gz / MD5 ... accumulated patch, recommended (2006-11-02)

linux-2.6.16-s390-08-october2005-patches.tar.gz / MD5 ... per-problem-patches, recommended (2006-11-02)

This patch contains the following linux kernel bug fixes:

Description:
cio: 0 is a valid CHPID.
Symptom:
/sys/devices/css0/chp0.0 is not created, although devices that use the channel path with id 0 are present in the system.
Problem:
0 was not considered a valid CHPID.
Solution:
Check for CHPID validity via the path installed mask (pim).
Problem-ID:
27654

Description:
cio: I/O request failure after CHPID deactivation.
Symptom:
CCW device I/O fails after CHPID vary off operation despite remaining paths.
Problem:
After I/O finishes during CHPID vary grace period, path verification is not started.
Solution:
Start path verification after I/O finished during grace period.
Problem-ID:
28548

Description:
cio: css_probe_device() must be called enabled.
Symptom:
'BUG: sleeping function called from invalid context' message when re-defining a device number with 'DEFINE xxxx AS yyyy' in z/VM.
Problem:
In case of a revalidate machine check, css_probe_device() was called while interrupts were still disabled.
Solution:
Move call to css_probe_device() after interrupts have been re-enabled.
Problem-ID:
28522

Description:
cio: inaccessible device after CHPID deactivation.
Symptom:
CCW device enters state 'no device' after a CHPID is deactivated.
Problem:
Internal I/O operations are not re-tried when aborted by CHPID deactivation.
Solution:
Re-try internal I/O when aborted by CHPID deactivation.
Problem-ID:
28547

Description:
cio: incorrect detection of unsolicited interrupts during sense pgid.
Symptom:
Calls to set a device online with path grouping may get stuck in some cases. This can result in a hang during boot or when trying to set a device online at a later time.
Problem:
Certain device conditions were discarded in response to unsolicited interrupts.
Solution:
Check subchannel activity in these cases and re-try if the subchannel is idle.
Problem-ID:
26648

Description:
cio: incorrect device operational notification.
Symptom:
DASD device driver reports failed I/O request after running out of re-tries when channel paths become unavailable during path verification.
Problem:
Device operational notifications are reported to the device driver even though device has become unavailable during the course of path verification.
Solution:
Reset device operational notification flag on failed path verification.
Problem-ID:
28177

Description:
cio: incorrect no-path indication after machine check.
Symptom:
Device enters no-path state after disabling a channel path via the SE even though another path has been re-enabled at the SE.
Problem:
Device is set into no-path state before triggering path verification even though other paths may have become available.
Solution:
Trigger path verification before setting a device into no-path state.
Problem-ID:
27255

Description:
cio: modalias missing from CCW bus uevent environment.
Symptom:
hotplug or coldplug at boot will not work for CCW bus devices.
Problem:
If uevents are triggered for CCW bus devices, the modalias is not properly added to the environment.
Solution:
Add modalias to uevent environment for CCW bus devices.
Problem-ID:
26829

Description:
cio: path group not updated by CHPID vary operation.
Symptom:
CHPIDs that are logically varied off will not be removed from a CCW device's path group.
Problem:
Command resign-from-pathgroup is issued with invalid path mask of 0 because internal CCW operations are masked by the logical path mask after the relevant bits are cleared by the vary operation.
Solution:
Do not apply logical path mask to internal operations.
Problem-ID:
27257

Description:
cio: path verification ignores re-appearing channel paths.
Symptom:
Re-appearing channel paths are sometimes not utilized by CCW devices.
Problem:
Path verification incorrectly relies on path-operational-mask information which is not updated until a channel path has been used again.
Solution:
Modify path verification procedure to always query all available paths to a device.
Problem-ID:
27258

Description:
cio: race condition leaves device in inaccessible state.
Symptom:
Availability of CCW device becomes 'no path' or 'no device' for no apparent reason.
Problem:
Asynchronous subchannel evaluation operates channel without proper locking.
Solution:
Add locking to subchannel evaluation.
Problem-ID:
27256

Description:
cio: subchannel scan loop for re-appearing channel paths ends early.
Symptom:
Subchannel may incorrectly remain in state no-path after channel paths have re-appeared.
Problem:
The scan for subchannels which are using a channel path ends at the first occurrence if a full link address was provided by the channel subsystem.
Solution:
Always continue scan for subchannels after first occurrence.
Problem-ID:
27254

Description:
dasd: clean up timer when DASD device is set off-line.
Symptom:
dasd_timeout_device runs into problems like invalid kernel pointers.
Problem:
Certain events, like a long busy, require the DASD device driver to wait for some time and then restart I/O again. When right after such an event the DASD device is set offline, and later the timer expires, the timer function tries to access device structures that are no longer valid.
Solution:
When a DASD device is set off-line, a pending timer must be removed as part of the clean up.
Problem-ID:
28122

Description:
kernel: incorrect copy_in_user.
Symptom:
Incorrect execution of 31-bit programs on a 64-bit kernel.
Problem:
The copy_in_user function copies 1 byte too much.
Solution:
Correct the copy_in_user function.
Problem-ID:
26754
Note:
applicable for 64-bit Linux, only

Description:
kernel: user readable un-initialized kernel memory.
Symptom:
None.
Problem:
A user space program can read un-initialized kernel memory by appending to a file from a bad address and then reading the result back. The cause is the copy_from_user function that does not clear the remaining bytes of the kernel buffer after it got a fault on the user space address.
Solution:
Fix the copy_from_user function to clear the remaining bytes of the kernel buffer after a user space fault.
Problem-ID:
27706

Description:
qeth: After cable pull (out/in) device does not work.
Symptom:
After cable pulling, DMESG said that qeth successfully recovered, but PING ist not working.
Problem:
Recovery procedure does not correctly handle netif carrier.
Solution:
When the link is established again, call netif_carrier_on before starting the card recovery.
Problem-ID:
26756

Description:
qeth: do not manipulate outgoing cloned skbs.
Symptom:
tcpdump shows damaged LL headers for outgoing vlan packets.
Problem:
The skb given to qeth is a clone, so qeth is not allowed to change the skb.
Solution:
Create a copy of the skb and change the copy - not the clone.
Problem-ID:
28120

Description:
qeth: kernel panic after module unload.
Symptom:
After the qeth module has been unloaded, a kernel panic (illegal operation: 0001) occurs. The problem does not occur on every module unload. The occurrence is random.
Problem:
The qeth driver overwrites the default arp_constructor, which initializes a 'struct neighbor' object. For the 'ops' pointer of the neigbor object, an 'ops' structure is used, which is allocated by qeth with kmalloc() during module initialization. During the module unload process of qeth, this ops structure is freed with kfree().
There is a garbage collector (dst_run_gc), which asynchronously erases unused neighbor objects from time to time, calling neighbor destruction callbacks, which are defined in the 'ops' field. When the qeth module has been unloaded at this time and the freed 'ops' structure has been re-used by the kernel, the callback pointers are invalid and a kernel panic occurs.
Solution:
Export standard kernel 'arp_direct_ops' and use them instead of the allocated qeth ops. This ensures that the 'ops' structure is still valid, if the qeth module has been unloaded.
Problem-ID:
28623

Description:
qeth: VLAN header re-ordering does not work on packets received through qeth interface in layer 2 mode.
Symptom:
dhcpcd (dhcp client) does not work.
Problem:
qeth removes vlan tag and uses vlan_hwaccel_rx(). vlan driver does not do header re-ordering because it is not necessary for hw accelerated packets. Looking at the packet in memory the ip header is prefixed by vlan (not ethernet as expected by dhcpcd).
Solution:
Since qeth layer 2 mode is not hardware accelerated, qeth should not remove vlan tag and call vlan_hwaccel_rx(). Leave the packet as is and call netif_rx() instead of vlan_hwaccel_rx(), so that the vlan driver will do the job of header re-ordering.
Problem-ID:
27068

Description:
statistics: buffer overflow in histogram.
Symptom:
Histogram with linear scale might show unexpected results.
The system might show undefined behavior, e.g. crash, due to memory corruption.
Problem:
Values outside the range covered by a histogram with linear scale resulted in invalid indices pointing to non-existing 'buckets'.
Solution:
Index is adjusted to array boundaries, if required.
Problem-ID:
28316

Description:
z90crypt: Logfile flooding when no crypto hardware is available.
Symptom:
z90crypt floods logfiles when no hardware is available with unnecessary error messages.
Problem:
probe_crypto_domain() in z90main.c is periodically called to scan AP bus for devices. If no devices are found an error message is printed regardless if this error message was printed earlier.
Solution:
Provide new PRINTK.*_ONCE macro to print an error message only once.
Problem-ID:
25692

Description:
zfcp: excess commands in statistic for pending read I/O.
Symptom:
pending_scsi_read exhibits a minimum >1 (excess number applicable to avg and max as well)
Problem:
Ambiguous check for data direction that caused the counter for read I/Os to be increased by mistake for I/Os without any data.
Solution:
Check data direction of scsi_cmnd instead of sbflags.
Problem-ID:
19109

Description:
zfcp: problems occured with larger block-sizes (e.g. from tape driver) while using default max_sector (512).
Symptom:
st READ/WRITE request with higher request blocksize failed.
Problem:
scsi_host_template.max_sector size was never specified in zfcp -> fall-back to default (512) which is too small.
Solution:
Setting value of scsi_host_template.max_sector according to zfcp specs.
Problem-ID:
28087

Description:
zfcp: dimension error on latency calculation.
Symptom:
The measured latency values are wrong by a factor of 4.
Problem:
The fetched clock value was expected to be of another dimension as it really is.
Solution:
Calculate the latency correctly.
Problem-ID:
28550

Everybody should apply this patch.

To create the complete linux kernel sources, the following patches need to be applied in sequence:

linux-2.6.16.tar.gz (from http://www.kernel.org/pub/linux/kernel/v2.6)
+ linux-2.6.16-s390-base-october2005.diff (IBM)
+ linux-2.6.16-s390-01-october2005.diff (IBM)
+ linux-2.6.16-s390-02-october2005.diff (IBM)
+ linux-2.6.16-s390-03-october2005.diff (IBM)
+ linux-2.6.16-s390-04-october2005.diff (IBM)
+ linux-2.6.16-s390-05-october2005.diff (IBM)
+ linux-2.6.16-s390-06-october2005.diff (IBM)
+ linux-2.6.16-s390-07-october2005.diff (IBM)
+ linux-2.6.16-s390-08-october2005.diff (IBM)

Contact the IBM team

If you want to contact the Linux on System z IBM team refer to the Contact the Linux on System z IBM team page.