2006-09-07 kernel 2.6.5 bug fix patch 39 ("April 2004")

If you download any software from this web site please be aware of the Warranty Disclaimer and Limitation of Liabilities.

linux-2.6.5-s390-39-april2004.tar.gz / MD5 ... accumulated patch, recommended (2006-09-07)

linux-2.6.5-s390-39-april2004-patches.tar.gz / MD5 ... per-problem-patches, recommended (2006-09-07)

These patches contain the following linux kernel bug fixes:

Description:
cio: 5 minutes timeout after setting chpid offline.
Symptom:
After setting a path to a DASD offline at the SE, I/O hangs on that DASD for 5 minutes, then continues.
Problem:
I/O for which an interrupt will not be reported after the channel path has been disabled was not terminated by the common I/O layer, causing the dasd MIH to hit after 5 minutes.
Solution:
Be more aggressive in terminating I/O after setting a channel path offline. Also make sure to generate a fake irb if the device driver issues an I/O request after being notified of the killed I/O and clear residual information from the irb before trying to start the delayed verification.
Problem-ID:
25507
Description:
cio: Disallow ccwgroup devices containing non-unique ccw devices.
Symptom:
After user creates ccwgroup device containing the same ccw device twice, oopses or VM abends occur.
Problem:
Treating the same device as two unique devices when performing operations leads to unpredictable results.
Solution:
Check for unique ccw devices in ccwgroup_create().
Problem-ID:
25802
Description:
cio: Fix some path grouping and path verification related problems.
Symptom:
Hangs when paths become unavailable and available.
Problem:
  1. Multipath devices for which SetPGID is not supported are not handled well.
  2. Only the first path is checked for a previously set PGID.
  3. PGIDs are not reset before re-boot.
Solution:
  1. Use NOP ccws for path verification (sans path grouping) when Set PGID is not supported.
  2. Check for PGIDs already set with SensePGID on all paths and try to find a common one. Moan if no common PGID can be found (and use NOP verification). If no PGIDs have been set, use the css global PGID (as before).
  3. Immediately before re-boot, issue RESET CHANNEL PATH (rcp) on all chpids.
Problem-ID:
25511
Description:
cio: I/O stall due to lost interrupt after CHPID vary off/on cycle.
Symptom:
I/O on a CCW device may stall if a CHPID to that device is logically varied off/on. This may also apply to SE CHPID off/on cycles.
Problem:
A user I/O interrupt is mis-interpreted as interrupt for an internal path verification operation due to a missing check and is therefore never reported to the device driver.
Solution:
Correct check for pending interruptions before starting path verification.
Problem-ID:
26103
Description:
cio: Inconsistent values in channel measurement facility.
Symptom:
Values obtained from cmf do not form a consistent picture.
Problem:
Blocks copied from the hardware may not be consistent. avg_sample_interval was re-calculated on every read and did not contain values in nanoseconds.
Solution:
Move copying the hardware block into idle state and do it several times, if needed.
Store a timestamp when the last new block is received and use this value to calculate avg_sample_interval. Print avg_sample_interval in nanoseconds, like it is documented.
Problem-ID:
23427, 24855
Description:
cio: permanent subchannel busy conditions may cause I/O stall.
Symptom:
In special conditions where a subchannel rejects the HALT I/O- instruction with a busy indication (cc 2), I/O may stall.
Problem:
I/O request termination logic retries HALT I/O indefinitely because it expects HALT I/O to alter the subchannel status which is not true when cc 2 is returned.
Solution:
In case of a busy indication, try CLEAR I/O instruction immediately.
Problem-ID:
25801
Description:
dasd: kernel BUG when setting a DASD device offline.
Symptom:
The message 'kernel BUG at drivers/ll_rw_block.c:2xxx' is printed.
Problem:
A request that should be ended is still enqueued.
Solution:
First dequeue request and than call end_request when flushing request queue.
Problem-ID:
26118
Description:
iucv: multiple interfaces with same peer established.
Symptom:
'cat /sys/class/net/iucv*/device/user' may show same peer more than once.
Problem:
Handling of 'echo <peer> > /sys/bus/iucv/drivers/netiucv/connection' does not check whether a connection with <peer> already exists.
Solution:
Add such a check in subroutines conn_write() and user_write() and reject creation of a second interface with same peer.
Problem-ID:
25799
Description:
qeth: Set routing for IPv6 invalid on HiperSockets.
Symptom:
Error message when HiperSockets interface (hsi) set online. 'Error 0001 while setting routing type on hsi.'
Problem:
During card setup for HiperSockets qeth issues IPA SETRTG with IPv6.
Solution:
Omit IPA SETRTG with IPv6 when card does not support IPv6.
Problem-ID:
26144
Description:
qeth: kernel panic under heavy UDP workload.
Symptom:
Addressing exception in qeth_hard_start_xmit / skb_realloc_headroom.
Problem:
When sending an skb, qeth might have a need to re-allocate or copy this skb before turning it over to the network device (for instance with bonding setups). Once this happens qeth frees the original skb immediately and continues processing with its clone. But later on qeth may realize that all its output queue buffers are full due to high traffic load. Then qeth gives up and returns an EBUSY condition to its caller. Typically the caller retries sending of the same skb in this case. If qeth has already freed this skb, kernel panic occurs.
Solution:
Keep original skb until it has been successfully turned over to the network device.
Problem-ID:
26068
Description:
qeth: qethconf not adding ipa entries.
Symptom:
dmesg shows "qeth: Invalid IP address format!"
Problem:
Incorrect / insufficient IP-address checking of qeth.
Solution:
Correct qeth IP-address checking (IPv4 and IPv6 addresses). Cover correct function for IP address attribute storing with "echo" and "echo -n", i.e. string termination with \n or \0.
Problem-ID:
24645
Description:
qeth: race during setup of qeth device.
Symptom:
Kernel panic in qeth code (qeth_flush_buffers) when sending the first packet.
Problem:
The crash happened in qeth_main.c within qeth_flush_buffers() for line
'queue->card->dev->trans_start = jiffies';
queue->card is still 0.
queue->card is initialized in qeth_init_qdio_queues(), but card->state is set to CARD_STATE_SOFTSETUP before ( see __qeth_set_online() ).
That means a qeth_open() / qeth_hard_start_xmit() executes already, even though queue->card might not yet be initialized.
Solution:
In __qeth_set_online() move setting of card->state to CARD_STATE_SOFTSETUP after invocation of qeth_init_qdio_queues().
Problem-ID:
25564
Description:
qeth: race when re-boot and recovery run concurrently.
Symptom:
Addressing exception: 0038 in qdio_shutdown.
Problem:
It may happen that a qeth re-boot event runs on one CPU while a qeth recovery starts on another CPU for the same OSA card. This causes duplicate invocations of qeth_qdio_clear_card(), which may result for a specific interlocking scenario in a duplicate invocation of qdio_cleanup().
Solution:
Avoid duplicate invocation of qdio_cleanup() by changing card->qdio.state into an atomic variable.
Problem-ID:
26016
Description:
qeth: stack trace with msg "inconsistent lock state".
Symptom:
message:
[ INFO: inconsistent lock state ]
inconsistent {in-softirq-W} ->ftirq-on-W} usage.
modprobe/758 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&tbl->lock){-+-.}, at: [<000000008080c396>]
qeth_init+0x396/0x4e4 [qeth]
Problem:
With kernel 2.6.17-1.2473.el5 a new service called "Lock dependency validator" is activated detecting this deficiency in the qeth code.
Solution:
bhs must be disabled when accessing neighbor tables.
Problem-ID:
26014
Description:
xpram: module parameter parsing.
Symptom:
Module parameters for xpram are not parsed or parsed in a wrong way.
Problem:
The xpram module uses the module_param_array directive with an int parameter which causes the kernel to automatically parse the passed numbers. This will cause errors if arguments are omitted or cause wrong results if arguments have size qualifiers.
Solution:
Use module_param_array with charp and parse the arguments later.
Problem-ID:
25393
Description:
zfcp: ERP "deadlock" when registering a SCSI device.
Symptom:
I/O stall when SCSI commands fail during SCSI device registration.
Problem:
zfcp ERP waits for completion of SCSI device registration. The registration failed and an adapter re-open is triggered via scsi_eh. For the adapter re-open the ERP thread of zfcp is needed but this thread is blocked.
Solution:
Change adapter re-open for bus/host reset and scsi_er_timer to dismiss all FSF requests if ERP is already pending on the adapter. Thus the ERP thread will not block and can process further erp_actions.
Problem-ID:
25599
Description:
zfcp: duplicate usage of zfcp's scsi_er_timer.
Symptom:
I/O stall during error injection on Storage Subsystem.
Problem:
The scsi_er_timer for an adapter was used twice. One usage was for an abort or a unit reset. Second usage must have happened when zfcp_scsi_eh_timed_out handler was called. Problem is that the timer was deleted by one user and the adapter is stuck. So there is no timer active to trigger an adapter re-open.
Solution:
Introduced struct timer_list to struct zfcp_fsf_req. So if a timer is needed for an FSF request, fsf_req->timer can be used.
Problem-ID:
22069
Description:
zfcp: re-open adapter on do_QDIO error.
Symptom:
I/O stall during chpid off/on cycles within grace period.
Problem:
cio triggers shutdown of qdio queues but does not call no-path-notifier for zfcp. Hence qdio queues for an adapter are down but zfcp does not re-open the queues instead zfcp receives -EBUSY when calling do_QDIO.
Solution:
Re-open qdio queues for an adapter if do_QDIO returns an error.
Problem-ID:
19628

Everybody should apply this patch.

To create the complete linux kernel sources, the following patches need to be applied in sequence:

linux-2.6.5.tar.gz (see www.kernel.org/pub/linux/kernel/v2.6)
+ linux-2.6.5-s390-base-april2004.diff (IBM)
+ linux-2.6.5-s390-01-april2004.diff (IBM)
+ xipfs612 (see linuxvm.org/patches/index.html)
+ xipfs622 (see linuxvm.org/patches/index.html)
+ linux-2.6.5-s390-02-april2004.diff (IBM)
+ linux-2.6.5-s390-03-april2004.diff (IBM)
+ single threaded workqueue patch (see marc.theaimsgroup.com/?l=bk-commits-head&m=108305028322900&q=raw)
+ linux-2.6.5-s390-04-april2004.diff (IBM)
+ linux-2.6.5-s390-05-april2004.diff (IBM)
+ linux-2.6.5-s390-06-april2004.diff (IBM)
+ linux-2.6.5-s390-07-april2004.diff (IBM)
+ linux-2.6.5-s390-08-april2004.diff (IBM)
+ linux-2.6.5-s390-09-april2004.diff (IBM)
+ linux-2.6.5-s390-10-april2004.diff (IBM)
+ linux-2.6.5-s390-11-april2004.diff (IBM)
+ linux-2.6.5-s390-12-april2004.diff (IBM)
+ linux-2.6.5-s390-13-april2004.diff (IBM)
+ linux-2.6.5-s390-14-april2004.diff (IBM)
+ linux-2.6.5-s390-15-april2004.diff (IBM)
+ linux-2.6.5-s390-16-april2004.diff (IBM)
+ linux-2.6.5-s390-17-april2004.diff (IBM)
+ linux-2.6.5-s390-18-april2004.diff (IBM)
+ linux-2.6.5-s390-19-april2004.diff (IBM)
+ linux-2.6.5-s390-20-april2004.diff (IBM)
+ linux-2.6.5-s390-21-april2004.diff (IBM)
+ linux-2.6.5-s390-22-april2004.diff (IBM)
+ linux-2.6.5-s390-23-april2004.diff (IBM)
+ linux-2.6.5-s390-24-april2004.diff (IBM)
+ linux-2.6.5-s390-25-april2004.diff (IBM)
+ linux-2.6.5-s390-26-april2004.diff (IBM)
+ linux-2.6.5-s390-27-april2004.diff (IBM)
+ linux-2.6.5-s390-28-april2004.diff (IBM)
+ linux-2.6.5-s390-29-april2004.diff (IBM)
+ linux-2.6.5-s390-30-april2004.diff (IBM)
+ linux-2.6.5-s390-31-april2004.diff (IBM)
+ linux-2.6.5-s390-32-april2004.diff (IBM)
+ linux-2.6.5-s390-33-april2004.diff (IBM)
+ linux-2.6.5-s390-34-april2004.diff (IBM)
+ linux-2.6.5-s390-35-april2004.diff (IBM)
+ linux-2.6.5-s390-36-april2004.diff (IBM)
+ linux-2.6.5-s390-37-april2004.diff (IBM)
+ linux-2.6.5-s390-38-april2004.diff (IBM)
+ linux-2.6.5-s390-39-april2004.diff (IBM)

Contact the IBM team

If you want to contact the Linux on System z IBM team refer to the Contact the Linux on System z IBM team page.