2008-02-01 kernel 2.6.5 bug fix patch 47 ("April 2004")

If you download any software from this web site please be aware of the Warranty Disclaimer and Limitation of Liabilities.

linux-2.6.5-s390-47-april2004.tar.gz / MD5 ... accumulated patch, recommended (2008-02-01)

linux-2.6.5-s390-47-april2004-patches.tar.gz / MD5 ... per-problem-patches, recommended (2008-02-01)

These patches contain the following linux kernel bug fixes:

Description:
dasd: Fix loop in request expiration handling.
Symptom:
I/O on a DASD is blocked and message log shows a lot of messages that say "termination failed, retrying in 5s", but the message repeats several times a second and not just every 5 seconds.
Problem:
The first thing we do in the dasd device tasklet is to check for expired requests. When an expired cqr is found, we try to terminate that request so that we can give it a fresh start. If this termination fails, we want to wait for 5 seconds and do the same check/termination again. We set up a timer, which will schedule the tasklet in 5 seconds. Unfortunately the termination function itself schedules the tasklet as well, so the tasklet will be executed again right after it finished and will find the expired cqr. If the termination failed due to a hardware problem, it will probably fail again, and we are stuck in a loop until the hardware allows termination again.
Solution:
The schedule in the termination function may be needed in other contexts, so if we want to give a request some more time, we need to add this time to its "expires" value.
Problem-ID:
41604
Description:
af_iucv: af_iucv module does not load automatically.
Symptom:
To run IUCV socket program, af_iucv module must be loaded manually with "modprobe af_iucv".
Problem:
Missing MODULE_ALIAS_NETPROTO statement.
Solution:
Add "MODULE_ALIAS_NETPROTO(PF_IUCV);" to the af_iucv module.
Problem-ID:
39911
Description:
iucv: Locking problem with iucv_table_lock.
Symptom:
Hang when running af_iucv socket programs concurrently with a load of module netiucv.
Problem:
iucv_register() tries to take the iucv_table_lock with a spin_lock_irq. This conflicts with iucv_connect() which has a need for an smp_call_function while holding the iucv_table_lock.
Solution:
Use bh-disabling locking in iucv_register.
Problem-ID:
40451
Description:
kernel: pfault disabled.
Symptom:
Performance degradation if running many guests in z/VM.
Problem:
The parameter list that the kernel uses to enable the pfault mechanism must be double-word aligned. This is not necessarily the case since the data structure has a packed attribute annotation. If the alignment is incorrect a program check will be generated and the kernel will silently continue with the pfault feature disabled.
Solution:
Make sure data structure is properly aligned.
Problem-ID:
40293
Description:
lcs: Channel errors drive lcs_recovery which leads to kernel panic.
When the lcs IRQ routine detects channel failures it drives device recovery. After this event the device is no longer usable for shutdown requests, because the lcs_irq routine may get wrong channel status info.
Symptom:
Kernel panic after lcs recovery.
Problem:
lcs works on wrong buffer.
Solution:
lcs_irq routine marks the channel in "error" state. The channel state comes back to "running" after restarting the channel.
Problem-ID:
41636
Description:
qeth: Kernel panic during online setting of OSA devices.
Symptom:
BUG in qeth_set_multicast_list / qeth_set_ip_addr_list.
Problem:
If qeth_set_multicast_list() is performed on two CPUs in parallel, card->ip_list may end corrupted in rare cases.
Solution:
In function __qeth_delete_all_mc() remove card->ip_list entry before invoking qeth_deregister_addr_entry(). Thus a second invocation of qeth_set_multicast_list() can not try to remove the same entry twice.
Problem-ID:
40816
Description:
qeth: HiperSockets layer-3 interface drops packets that are neither IPv4- nor IPv6-packets.
HiperSockets infrastructure (layer-3 mode) supports only IPv4 and IPv6 packets.
Symptom:
Sending other packet types disturbs TCP/IP on z/VM, which issues messages about invalid packets.
Problem:
qeth sends packets over HiperSockets that are neither IPv4 nor IPv6.
Solution:
qeth send routine will detect packet type on sending over a HiperSockets interface and drop non-IP packets. The error and drop count of the interface is incremented.
Problem-ID:
38980
Description:
qeth: Revert copying of outgoing skbs. (patch 40, Problem-ID 28120).
Symptom:
High CPU costs and less network throughput.
Problem:
Technically it is correct to copy/unshare outgoing skbs before manipulating them, but this increases CPU costs.
Solution:
Since the packet socket support for the qeth driver is anyway broken for most outgoing frames we decided to go with another broken packet socket frame rather the high CPU costs.
Problem-ID:
41622
Description:
zfcp: Deadlock when adding invalid LUN.
Symptom:
After adding an invalid LUN to zfcp, this LUN can not be removed again, the sysfs write for removal returs ENODEV.
Problem:
The unit_add calls internally scsi_add_device, this function calls the zfcp slave_destroy handler after determining that the unit does not exist. The zfcp slave_destroy handler waits for the scsi_add_device call to complete and we have a deadlock.
Solution:
Remove the wait in the zfcp slave_destroy handler, it was not required anyway and this resolves the deadlock.
Problem-ID:
40331
Description:
zfcp: Fix check for handles in abort handler.
Symptom:
When an abort command is running in parallel to a port or unit re-open, zfcp escalates the ERP without a reason.
Problem:
The response to the abort request provides two handles for a port or unit in the FSF status qualifier. The code that compares them was wrong.
Solution:
Fix the comparison of the port or unit handles.
Problem-ID:
41038
Description:
zfcp: Hold queue lock when checking handle for ELS command.
Symptom:
zfcp receives an error "port/unit handle invalid" and escalates the error recovery to "reopen adapter".
Problem:
When issuing an ELS command, zfcp first checks if the unit is unblocked, i.e. the handles are still valid. There is a possible race where this check says that the unit is OK, and then the status changes before the request is finally sent. If a unit/port close command has been sent in this window, the adapter will report "port/unit handle invalid" for the ELS command which will escalate the ERP.
Solution:
We need to hold the queue-lock when checking whether we still have a valid unit/port handle for the ELS command, i.e whether we can issue this request for this unit/port. If the error recovery is about to close this unit/port, then it competes for the queue-lock. If the close request issued by the error recovery wins, then it is guaranteed that this unit/port has been blocked for other requests.
Problem-ID:
41145
Description:
zfcp: Hold queue lock when checking handle for FCP command.
Symptom:
zfcp receives an error "port/unit handle invalid" and escalates the error recovery to "reopen adapter".
Problem:
When issuing an FCP command, zfcp first checks if the unit is unblocked, i.e. the handles are still valid. There is a possible race where this check says that the unit is OK, and then the status changes before the request is finally sent. If a unit/port close command has been sent in this window, the adapter will report "port/unit handle invalid" for the FCP command which will escalate the ERP.
Solution:
We need to hold the queue-lock when checking whether we still have a valid unit/port handle for the FCP command, i.e whether we can issue this request for this unit/port. If the error recovery is about to close this unit/port, then it competes for the queue-lock. If the close request issued by the error recovery wins, then it is guaranteed that this unit/port has been blocked for other requests.
Problem-ID:
41146
Description:
zfcp: Hold queue lock when checking handle for abort command.
Symptom:
zfcp receives an error "port/unit handle invalid" and escalates the error recovery to "reopen adapter".
Problem:
When issuing an abort command, zfcp first checks if the unit is unblocked, i.e. the handles are still valid. There is a possible race where this check says that the unit is OK, and then the status changes before the request is finally sent. If a unit/port close command has been sent in this window, the adapter will report "port/unit handle invalid" for the abort command which will escalate the ERP.
Solution:
We need to hold the queue-lock when checking whether we still have a valid unit/port handle for the abort command, i.e whether we can issue this request for this unit/port. If the error recovery is about to close this unit/port, then it competes for the queue-lock. If the close request issued by the error recovery wins, then it is guaranteed that this unit/port has been blocked for other requests.
Problem-ID:
41142
Description:
zfcp: Hold queue lock when checking handle for task management command.
Symptom:
zfcp receives an error "port/unit handle invalid" and escalates the error recovery to "reopen adapter".
Problem:
When issuing an task management command, zfcp first checks if the unit is unblocked, i.e. the handles are still valid. There is a possible race where this check says that the unit is OK, and then the status changes before the request is finally sent. If a unit/port close command has been sent in this window, the adapter will report "port/unit handle invalid" for the task management command which will escalate the ERP.
Solution:
We need to hold the queue-lock when checking whether we still have a valid unit/port handle for the task management command, i.e whether we can issue this request for this unit/port. If the error recovery is about to close this unit/port, then it competes for the queue-lock. If the close request issued by the error recovery wins, then it is guaranteed that this unit/port has been blocked for other requests.
Problem-ID:
41147
Description:
zfcp: Oops when performing error injection on DS8000.
Symptom:
On error injection on the storage system while running I/O load, the Linux kernel stops with a panic_on_oops.
Problem:
The error injection leads to errors that trigger the SCSI midlayer error recovery that uses a set of callbacks to zfcp. In the host_reset callback, zfcp dismisses all pending request while the adapter is still active. There is the possibility that a request is dismissed here and then the real response arrives. In this case a NULL pointer is being accessed, leading to the oops and panic_on_oops.
Solution:
Do not dismiss the requests in the callback function.
When the host_reset callback triggers an adapter re-open, this re-open will clear the pending requests properly after closing the connection to the adapter.
Problem-ID:
40332
Description:
zfcp: Reduce flood on HBA trace.
Symptom:
The debug trace s390dbf/zfcp*hba contain "qual"-entries for all successfully processed FSF commands. Errors which would be of interest are hard to find or are already overwritten.
Problem:
The protocol to the adapter changed: A field that used to describe errors, now contains measurement data by default. This measurement data triggers the tracing of all normal responses.
Solution:
The fix is to simply remove the "qual" tracing: The responses with an interesting status are also traced as "ferr" or "perr" and all responses can be traced as "norm" with a higher trace level.
Problem-ID:
40333
Description:
zfcp: Remove SCSI devices when removing complete adapter.
Symptom:
The sequence (1) chpid off, (2) wait for cio timeout, (3) chccwdev -d for a zfcp adapter leads to a hang in the cio kernel thread. After this scenario, cio is unusable.
Problem:
The above sequence calls the ccw device remove callback which indicates that the CCW device for the FCP adapter disappeared. The callback in zfcp then tries to remove all data structures. Since the zfcp units are still registered with the SCSI stack, the removal waits for the SCSI devices to be removed, which does not happen.
Solution:
When removing all zfcp data structures, first remove the unit registrations with the SCSI stack.
Problem-ID:
38981
Description:
zfcp: Dismiss actions in ready queue.
Symptom:
Ready actions are kept in ready queue although superseding actions are scheduled.
Problem:
Actions in ready queue are not dismissed.
Solution:
Dismiss actions in ready queue as well.
Problem-ID:
40502
Description:
zfcp: Imbalance in erp_ready_sem usage.
Symptom:
Some ERP actions fail for no obvious reason.
Problem:
Imbalance in the usage of the erp_ready_sem.
Solution:
Correct the imbalance.
Problem-ID:
40504
Description:
zfcp: invalid kernel pointer dereference if CONFIG_STATISTICS is not selected.
Symptom:
An Oops is triggered when the zfcp adapter is set online.
Problem:
A data structure is not initialized when CONFIG_STATISTICS is not selected in the kernel configuration.
Solution:
Return immediately from function when CONFIG_STATISTICS is not configured.
Problem-ID:
40339

Everybody should apply this patch.

To create the complete linux kernel sources, the following patches need to be applied in sequence:

linux-2.6.5.tar.gz (see www.kernel.org/pub/linux/kernel/v2.6)
+ linux-2.6.5-s390-base-april2004.diff (IBM)
+ linux-2.6.5-s390-01-april2004.diff (IBM)
+ xipfs612 (see linuxvm.org/patches/index.html)
+ xipfs622 (see linuxvm.org/patches/index.html)
+ linux-2.6.5-s390-02-april2004.diff (IBM)
+ linux-2.6.5-s390-03-april2004.diff (IBM)
+ single threaded workqueue patch (see marc.theaimsgroup.com/?l=bk-commits-head&m=108305028322900&q=raw)
+ linux-2.6.5-s390-04-april2004.diff (IBM)
+ linux-2.6.5-s390-05-april2004.diff (IBM)
+ linux-2.6.5-s390-06-april2004.diff (IBM)
+ linux-2.6.5-s390-07-april2004.diff (IBM)
+ linux-2.6.5-s390-08-april2004.diff (IBM)
+ linux-2.6.5-s390-09-april2004.diff (IBM)
+ linux-2.6.5-s390-10-april2004.diff (IBM)
+ linux-2.6.5-s390-11-april2004.diff (IBM)
+ linux-2.6.5-s390-12-april2004.diff (IBM)
+ linux-2.6.5-s390-13-april2004.diff (IBM)
+ linux-2.6.5-s390-14-april2004.diff (IBM)
+ linux-2.6.5-s390-15-april2004.diff (IBM)
+ linux-2.6.5-s390-16-april2004.diff (IBM)
+ linux-2.6.5-s390-17-april2004.diff (IBM)
+ linux-2.6.5-s390-18-april2004.diff (IBM)
+ linux-2.6.5-s390-19-april2004.diff (IBM)
+ linux-2.6.5-s390-20-april2004.diff (IBM)
+ linux-2.6.5-s390-21-april2004.diff (IBM)
+ linux-2.6.5-s390-22-april2004.diff (IBM)
+ linux-2.6.5-s390-23-april2004.diff (IBM)
+ linux-2.6.5-s390-24-april2004.diff (IBM)
+ linux-2.6.5-s390-25-april2004.diff (IBM)
+ linux-2.6.5-s390-26-april2004.diff (IBM)
+ linux-2.6.5-s390-27-april2004.diff (IBM)
+ linux-2.6.5-s390-28-april2004.diff (IBM)
+ linux-2.6.5-s390-29-april2004.diff (IBM)
+ linux-2.6.5-s390-30-april2004.diff (IBM)
+ linux-2.6.5-s390-31-april2004.diff (IBM)
+ linux-2.6.5-s390-32-april2004.diff (IBM)
+ linux-2.6.5-s390-33-april2004.diff (IBM)
+ linux-2.6.5-s390-34-april2004.diff (IBM)
+ linux-2.6.5-s390-35-april2004.diff (IBM)
+ linux-2.6.5-s390-36-april2004.diff (IBM)
+ linux-2.6.5-s390-37-april2004.diff (IBM)
+ linux-2.6.5-s390-38-april2004.diff (IBM)
+ linux-2.6.5-s390-39-april2004.diff (IBM)
+ linux-2.6.5-s390-40-april2004.diff (IBM)
+ linux-2.6.5-s390-41-april2004.diff (IBM)
+ linux-2.6.5-s390-42-april2004.diff (IBM)
+ linux-2.6.5-s390-43-april2004.diff (IBM)
+ linux-2.6.5-s390-44-april2004.diff (IBM)
+ linux-2.6.5-s390-45-april2004.diff (IBM)
+ linux-2.6.5-s390-46-april2004.diff (IBM)
+ linux-2.6.5-s390-47-april2004.diff (IBM)

Contact the IBM team

If you want to contact the Linux on System z IBM team refer to the Contact the Linux on System z IBM team page.