2007-03-23 kernel 2.6.16 bug fix patch 16 ("October 2005")

If you download any software from this web site please be aware of the Warranty Disclaimer and Limitation of Liabilities.

linux-2.6.16-s390-16-october2005.tar.gz / MD5 ... accumulated patch, recommended (2007-08-30)

linux-2.6.16-s390-16-october2005-patches.tar.gz / MD5 ... per-problem-patches, recommended (2007-08-30)

This patch contains the following linux kernel bug fixes:

Description:
cio: Memory leak in cm enable/disable.
Symptom:
When continuously activating and deactivating channel path measurement data, memory is leaking.
Problem:
Memory allocated on enable was not released on disable call.
Solution:
Make sure to release memory when cm disable is called.
Problem-ID:
28458
Description:
cio: Device becomes not-operational after device offline.
Symptom:
When trying to set a CCW device offline, the device enters the not-operational state.
Problem:
An error during device shutdown will put the device into a not-operational state. Note that this will only occur when there is no other hardware indication (e.g. machine check) of a device/CHPID malfunction.
Solution:
De-register CCW device when shutdown fails.
Problem-ID:
35129
Description:
cio/dasd: Forced online not possible for ECKD DASDs.
Symptom:
For a boxed ECKD DASD, forced online does not succeed if the other side holds on to the reserve.
Problem:
ccw_device_set_options() in dasd_generic_probe() unsets the CCWDEV_ALLOW_FORCE flag set in dasd_eckd_probe(). This leads to unconditional reserve not being allowed on ECKD DASDs from the online attribute.
Solution:
Set the needed flags via ccw_device_set_options() only in dasd_eckd_probe() and dasd_fba_probe() and not in dasd_generic_probe().
Problem-ID:
36049
Description:
monwriter: Serialization bug for multithreaded applications.
Symptom:
Writing to /dev/monwriter from multiple threads may lead to kernel data corruption.
Problem:
Synchronization over private file pointer data only works for processes, not for threads which are sharing the same file pointer.
Solution:
Protect write function with a mutex.
Problem-ID:
37170
Description:
zcrypt: Add missing tasklet_kill() call.
Symptom:
A kernel panic might occur during AP module unload.
Problem:
ap_poll_all() might run after ap module was unloaded.
Solution:
Add missing tasklet_kill() call to AP bus module exit code.
Problem-ID:
36636
Description:
zcrypt: Fix ap_poll_requests counter.
Symptom:
appoll kernel thread might consume one CPU at 100%.
Problem:
In the unlikely event that an AP device lost requests or an AP device is removed while there are still outstanding requests ap_poll_requests might not be updated correctly.
Solution:
Update ap_poll_requests counter accordingly.
Problem-ID:
36639
Description:
zcrypt: Fix timeout handling.
Symptom:
Under very high load crypto requests might fail with ETIME.
Problem:
Request timeouts are currently based on the total time a request spends within zcrypt. This can cause false request timeouts under high load conditions due to requests waiting on a large request queue to be submitted to the crypto adapter.
Solution:
Timeouts are now based on crypto adapter responses. A timeout occurs only if a crypto adapter does not respond within a given time frame to submitted requests.
Problem-ID:
36641
Description:
zcrypt: Possible deadlock in AP bus module.
Symptom:
AP bus module might hang.
Problem:
AP bus module uses bus_for_each_dev() in software interrupt context to poll for completed requests which might cause deadlocks.
Solution:
Use private AP device list for polling in software interrupt context.
Problem-ID:
36637
Description:
zcrypt: possible deadlock in AP bus module.
Symptom:
zcrypt device driver might hang during device un-registration.
Problem:
If a AP device is unconfigured __ap_poll_all() will call device_unregister() in software interrupt context which can cause deadlocks.
Solution:
The device will be marked as unconfigured; the device_unregister() call will be done later by either ap_scan_bus() or ap_queue_message() in process context.
Problem-ID:
36640
Description:
zcrypt: Possible race when unloading zcrypt driver modules.
Symptom:
zcrypt might hang when unloading a zcrypt driver module.
Problem:
Missing reference count to zcrypt driver modules.
Solution:
Move try_module_get() call into spin protected block to prevent zcrypt driver module unload while submitting a request to driver.
Problem-ID:
36638
Description:
zfcp: Adapter fails after several hours chpid off/on
Symptom:
zfcp adapters go into failed state and cannot be recovered manually.
Problem:
zfcp error recovery is waiting for a successful return or timeout of an exchange config data. The timeout is not set up correctly. As a result the error recovery waits forever, when the exchange config data has been sent but the response could not be received because the adapter was disabled at this moment.
Solution:
Set up timer correctly (add missing add_timer call).
Problem-ID:
35232
Description:
zfcp: Avoid possible memory corruption.
Symptom:
Adapter are into failed state after SVC port bypass.
Sometimes random parts of memory are overwritten.
Problem:
zfcp has to register and un-register devices in newer SCSI stacks. Registration of SCSI devices leads to SCSI commands. If these SCSI commands fail there is a deadlock, because the zfcp ERP that will try recovery already waits for the registration to return. The old workaround was to dismiss all zfcp requests sometimes. But the requests were still active in the adapter, the hardware wrote the responses back in now invalid memory, causing all kinds of strange errors.
Solution:
Do not dismiss the FSF requests while the queues are still active. The dismiss_all function can only be called safely while the queues are down.
Problem-ID:
35294
Description:
zfcp: I/O stall when performing FC cable pulls.
Symptom:
After performing 15s-off/120s-on cable pulls, all paths were marked as failed/faulty in multipath.
Problem:
Some adapter status flags have not been reset during adapter re-open.
Solution:
Correctly clear adapter flags during adapter re-open.
Problem-ID:
35237
Description:
zfcp: Locking for req_no and req_seq_no.
Symptom:
Kernel message "bug: Sequence number mismatch between driver (0x37768c) and adapter 0.0.b31a (0x3776c8). Restarting all operations on this adapter." zfcp triggers an adapter re-open in this case.
Problem:
req_no are read without locking. There is a race condition when two threads read these counters at the same time and get the same id.
Solution:
Reorder the calls in zfcp_req_create: First call zfcp_fsf_req_sbal_get to get the queue lock, then read and increment req_no and read req_seq_no in zfcp_fsf_req_qtcb_init.
Problem-ID:
34405
Description:
zfcp: Units are reported as BOXED.
Symptom:
zfcp units are reported as BOXED, although I/O is running without problems.
Problem:
During adapter reopen the BOXED flags is not cleared. Although the unit was boxed before the re-open, this is no longer the case after the re-open.
Solution:
Clear the BOXED flags in zfcp on adapter re-open to correctly report the unit status.
Problem-ID:
35239
Description:
zfcp: Adapter in failed state after successful recovery.
Symptom:
After an FSF request timeout the adapter is re-opened and recovers.
Problem:
Clearing of the ZFCP_STATUS_COMMON_ERP_FAILED flag missing.
Solution:
Clear the ZFCP_STATUS_COMMON_ERP_FAILED flag during re-open.
Problem-ID:
35244
Description:
zfcp: Fix incorrect usage of fsf_req_list_lock.
Symptom:
Kernel message "BUG: illegal lock usage!"
Problem:
Incorrect usage of fsf_req_list_lock.
Solution:
Fix incorrect usage of fsf_req_list_lock. It is used in tasklet context (irqs on) as well as in irq context. Therefore use the spin_lock_irqsave variant to avoid deadlocks.
Problem-ID:
35282
Description:
zfcp: Incorrect usage of erp_lock might lead to deadlocks.
Symptom:
Possible deadlock in zfcp.
Problem:
write_lock() instead of write_lock_irq() is used in zfcp_erp_adapter_strategy_open_fsf_xconf and zfcp_erp_adapter_strategy_open_fsf_xport.
Solution:
Replace write_lock() with write_lock_irq() calls.
Problem-ID:
35248
Description:
zfcp: Kernel panic during FCP chpid on / off.
Symptom:
Badness message and kernel panic during chpid on / off.
Problem:
Incorrect memory management in request lists leads to invalid pointer de-reference.
Solution:
Rework memory management for request lists.
Problem-ID:
35245
Description:
zfcp: message "LUN larger than allowed by the host adapter".
Symptom:
The kernel log shows a lot of messages with the text "LUN larger than allowed by the host adapter".
Problem:
zfcp reports the first allocated unit with LUN 0. This number is used only internally in the kernel. The Linux SCSI mid-layer assumes that LUN 0 is really the first unit in the SCSI device and issues a REPORT LUNS command. The command reports the FCP LUNs that are not expected.
Solution:
Do not create LUNs in zfcp, simply use the FCP LUN and report it to the SCSI mid-layer.
Problem-ID:
35236
Description:
zfcp: Might hang while registering devices with SCSI.
Symptom:
zfcp might hang while registering devices with the SCSI stack.
Problem:
A change in the SCSI stack requires low level drivers to register and un-register devices. For zfcp this leads to the situation where zfcp calls the SCSI stack, the SCSI stack tries to scan the new device and the scan SCSI command fails. This would require the zfcp ERP, but the ERP thread is already blocked in the register call.
Solution:
Keep the calls from the zfcp ERP to the SCSI stack, since that is the place where zfcp knows about device changes, but make sure that the calls do not block the zfcp ERP thread. In detail:

This avoids blocking and has the same end result.

Problem-ID:
35235

Everybody should apply this patch.

To create the complete linux kernel sources, the following patches need to be applied in sequence:

linux-2.6.16.tar.gz (from http://www.kernel.org/pub/linux/kernel/v2.6)
+ linux-2.6.16-s390-base-october2005.diff (IBM)
+ linux-2.6.16-s390-01-october2005.diff (IBM)
+ linux-2.6.16-s390-02-october2005.diff (IBM)
+ linux-2.6.16-s390-03-october2005.diff (IBM)
+ linux-2.6.16-s390-04-october2005.diff (IBM)
+ linux-2.6.16-s390-05-october2005.diff (IBM)
+ linux-2.6.16-s390-06-october2005.diff (IBM)
+ linux-2.6.16-s390-07-october2005.diff (IBM)
+ linux-2.6.16-s390-08-october2005.diff (IBM)
+ linux-2.6.16-s390-09-october2005.diff (IBM)
+ linux-2.6.16-s390-10-october2005.diff (IBM)
+ linux-2.6.16-s390-11-october2005.diff (IBM)
+ linux-2.6.16-s390-12-october2005.diff (IBM)
+ linux-2.6.16-s390-13-october2005.diff (IBM)
+ linux-2.6.16-s390-14-october2005.diff (IBM)
+ linux-2.6.16-s390-15-october2005.diff (IBM)
+ linux-2.6.16-s390-16-october2005.diff (IBM)

Contact the IBM team

If you want to contact the Linux on System z IBM team refer to the Contact the Linux on System z IBM team page.