To check if multipath is available we count the bits set in lpm,
which could change over time (via configure [on|off] of a path).
The following patch uses the pim (which is persistent) for this
decision.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sometimes we change the pmcw configuration but don't call msch
to transmit these changes to the channel subsystem.
The patch fixes this by calling cio_commit_config in such cases.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
To change the configuration of a subchannel we alter the modifiable
bits of the subchannel's schib field and issue a modify subchannel.
There can be the case that not all changes were applied -or worse-
quietly overwritten by the hardware. With the next store subchannel
we obtain the current state of the hardware but lose our target
configuration.
With this patch we introduce a subchannel_config structure which
contains the target subchannel configuration. Additionally the msch
wrapper cio_modify is replaced with cio_commit_config which
copies the desired changes to a temporary schib. msch is then
called with the temporary schib. This schib is only written back
to the subchannel if all changes were applied.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There is the chance that we get condition code 0 for a stsch but
the resulting schib is not vaild. In the current code there are
2 cases:
* we do a check for validity of the schib after stsch, but at this
time we have already stored the invaild schib in the subchannel
structure. This may lead to problems.
* we don't do a check for validity, which is not that good either.
The patch addresses both issues by introducing the stsch wrapper
cio_update_schib which performs stsch on a local schib. This schib
is only written back to the subchannel if it's valid.
side note: For some functions (chp_events) the return codes are
different now (-ENXIO vs -ENODEV) but this shouldn't do harm
since the caller doesn't check for _specific_ errors.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Check if a ccw device is registered via device_is_registered()
and not via the old kludge of checking the membership in driver
core internal klists.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Just put the cdev's reference count to give up our reference.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If we fail the probe for an I/O subchannel, we won't be able
to unregister it again since there are no sch_event()
callbacks for unbound subchannels. Just succeed the probe in
any case and schedule unregistering the subchannel.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There is a race between io_subchannel_register() and
io_subchannel_sch_event() which may cause a subchannel to be
unregistered because it is no longer operational before
io_subchannel_register() had run. We need to check whether the
subchannel is still registered before the ccw device can be
registered and just bail out if it is not.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Subchannel refcounting was incorrect in some places, especially
a refcount was missing when ccw_device_call_sch_unregister()
was called and the refcount was not correctly switched after
moving devices.
Fix this by establishing the following rules:
- The ccw_device obtains a reference on its parent subchannel
when dev.parent is set and gives it up in its release
function. This is needed because we need a parent reference
for correct refcounting even before the ccw device is (if at
all) registered.
- When calling device_move(), obtain a reference on the new
subchannel before moving the ccw device and give up the
reference on the old parent after moving. This brings the
refcount in line with the first rule.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The current code attempts to get an extra reference count
for online devices by doing a get_device() in ccw_device_online()
and a put_device() in ccw_device_done(). However, this
- incorrectly obtains an extra reference for disconnected
devices becoming available again (since they are already
online)
- needs special checks for css_init_done in order to handle
the console device
- is not obvious and
- may incorretly drop a reference count in ccw_device_done() if
that function is called after path verification for a device
that just became not operational.
So let's just get the reference in ccw_device_set_online() and
drop it in ccw_device_set_offline(). (Unfortunately, we still
need the special case in io_subchannel_probe().)
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Ensure atomicity of ungroup operation to prevent concurrent ungroup
and online processing which may lead to use-after-release situations.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Due to former patches a comment and device id initialization were
split from the addressed function call in io_subchannel_probe.
Move it back to where it belongs.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move cio_tpi() to the rest of the CONFIG_CCW_CONSOLE functions to
get rid of this one:
drivers/s390/cio/cio.c:115: warning: 'cio_tpi' defined but not used
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
__dasd_cleanup_cqr should be called with request_queue_lock held and
__dasd_block_process_erp with queue_lock
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
SIM sense data are always 32 bit sense data so sense byte 27 bit 0
has not to be set.
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For a large number of I/O requests the values were shifted binary.
The shift was not transparent for the user because the shift value
was not displayed. To make this interface more human readable the
values are shifted decimal and the scale factor is displayed.
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Register zfcp with the new /proc/service_level interface to report the
FCP microcode level. When the adapter goes offline or a channel path
disappears, zfcp unregisters, since the microcode version might change
and zfcp does not know about it.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a new proc interface /proc/service_levels that allows any code
to report a relevant service level, e.g. the microcode level of
devices, the service level of the hypervisor, etc.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Hipersocket connections can encounter temporary busy conditions.
In case of the busy bit set we retry the SIGA operation immediatelly.
If the busy condition still persists after 100 ms we fail and report
the error to the upper layer. The second stage retry logic is removed.
In case of ongoing busy conditions the upper layer needs to reset the
connection.
The reporting of a SIGA error is now done synchronously to allow the
network driver to requeue the buffers. Also no error trace is created
for the temporary SIGA errors so the error message view is not flooded.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
- Use automatic acknowledgement of incoming buffers in QEBSM mode
- Move ACK for non-QEBSM mode always to the newest buffer to prevent
a race with qdio_stop_polling
- Remove the polling spinlock, the upper layer drivers return new buffers
in the same code path and could not run in parallel
- Don't flood the error log in case of no-target-buffer-empty
- In handle_inbound we check if we would overwrite an ACK'ed buffer, if so
advance the pointer to the oldest ACK'ed buffer so we don't overwrite an
empty buffer in qdio_stop_polling
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
- make qdio_trace a per device view
- remove s390dbf exceptions
- remove CONFIG_QDIO_DEBUG, not needed anymore if we check for the level
before calling sprintf
- use snprintf for dbf entries
- add start markers to see if the dbf view wrapped
- add a global error view for all queues
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The QEBSM instructions are only available for CONFIG_64BIT, they are not
used under 31 bit. Make compiler happy about the false positive:
drivers/s390/cio/qdio_main.c: In function ?qdio_inbound_q_done?:
drivers/s390/cio/qdio_main.c:532: warning: ?state? may be used uninitialized in this function
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add counters for the eqbs and sqbs instructions that indicate how often
we issued the instructions and how often the instructions returned with
less buffers than specified.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
qeth needs to get the port count information before
qdio has allocated a page for the chsc operation.
Extend qdio_get_ssqd_desc() to store the data in the
specified structure.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Changed some symbol names for a better and clearer code.
Signed-off-by: Christian Maaser <cmaaser@de.ibm.com>
Signed-off-by: Felix Beck <beckf@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When the machine supports AP adapter interrupts polling will be
switched off at module initialization and the driver will work in
interrupt mode.
Signed-off-by: Felix Beck <felix.beck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The vmcp driver uses the session->mutex for concurrent access of the data
structures. Therefore, the BKL in vmcp_open does not protect against any
other function in the driver.
The BLK in vmcp_open would protect concurrent access to the module init
but all necessary steps ave finished before misc_register is called.
We can safely remove the lock_kernel from vcmp.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The zfcp_scsi_queuecommand was not acting according to the standard
when the respective unit was not available. In this case an -EBUSY was
returned, which is not valid in itself, and in addition scsi_done
was called. This combination is not allowed and was leading to a
double finish of the request and therefor double decrement of the
host_busy counter.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Waiting for the ERP to be finished in a task running in the global
kernel work-queue is a bad idea, especially if the ERP needs to run
another job in this work-queue before it can finish. -> deadlock.
This patch removes the necessity to wait for a finished ERP from the
scan task and moves the job scheduling to the end of the ERP.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The check of having a valid pointer was performed before the
processing was secured by the lock. Between those two steps the
pointer can turn invalid. During further processing another value is
used (referenced by the pointer described above) as a function pointer
which is never verified to be valid either, resulting under some
circumstances in an invalid function call. This patch is fixing both
issues.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Prevent a SCSI target scan for a rport which have turned invalid
in the meantime.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Aborting a SCSI cmnd might requrie to send a abort_fsf_cmnd. If the
creation of this fsf_req fails an ERR_PTR is returned where a NULL
value would be expected as an error indicator. This ERR_PTR is
dereferenced as valid fsf_req in succeeding processing leading to
an error.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Running two wka_port_get calls in parallel could issue two open_port
requests, overwriting the port handle. Don't issue an open_port
for the state PORT_OPENING, and only read the data from GOOD
responses.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Acked-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
For an incoming RSCN it was checked by the ZFCP_STATUS_PORT_DID_DID
define to re-open a remote port or to test the connection. Since this
define was re-used it was also necessary to replace that define with
ZFCP_STATUS_PORT_PHYS_OPEN.
Signed-off-by: Martin Petermann <martin@linux.vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The logging of sense data for fatal errors was accidentally removed
during Hyper PAV implementation.
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In ccw_device_move_to_orphanage(), a replacing ccw_device
is searched via get_{disc,orphaned}_ccwdev_by_dev_id()
which obtain a reference on the returned ccw_device.
This reference must be given up again after the device
has been moved to its new parent.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The current virtio model on s390 has the descriptor page above the main
memory. The guest virtio detection will oops if the mem= parameter is
used to reduce/change the memory size.
We have to use real_memory_size instead of max_pfn to detect the virtio
descriptor pages.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fix multiple problems found in the hexdump data:
- length calculation was wrong, traces were incomplete
- FC payloads were dumped in different record than the output
function tried to read
- minor fixes in output
- allow complete RSCN traces (up to 1024 bytes according to spec)
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
If an open port fsf request times out (in erp) the
corresponding erp_action member of the fsf
request need to set to NULL. If the port structure
will be removed later-on there will be still a
reference in the fsf request to the non existing
erp_action otherwise.
Signed-off-by: Martin Petermann <martin.petermann@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Attaching a unit immediately after setting the adapter online should
be possible. The problem right now is that the port_scan runs from a
workqueue and has not finished when the set_online call returns and
the sysfs structures for the ports are not available yet. Fix that by
waiting for the port scan to complete.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Fix leftover from last typecast patch:
drivers/s390/scsi/zfcp_aux.c: In function ‘zfcp_port_enqueue’:
drivers/s390/scsi/zfcp_aux.c:629: warning: format ‘%016llx’ expects
type ‘long long unsigned int’, but argument 3 has type ‘u64’
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Fix the handling of the request list in the error path:
- Use irqsave for the lock as in the good path.
- Before removing the request, check if it is still in the list, a
call to dismiss_all might have changed the list in between.
- zfcp_qdio_send does not change the queue counters on failure,
trying revert something is wrong, so remove this.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
When allocating fsf requests without qtcb, store the pointer to the
mempool in the fsf requests for later call to mempool_free. This
codepath is only used by the status_read requests.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The per adapter req_list_lock must be held with interrupts disabled, otherwise
we might end up with nice deadlocks as lockdep tells us (see below).
zfcp 0.0.1804: QDIO problem occurred.
=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.27-rc8-00035-g4a77035-dirty #86
---------------------------------------------------------
swapper/0 just changed the state of lock:
(&adapter->erp_lock){++..}, at: [<00000000002c82ae>] zfcp_erp_adapter_reopen+0x4e/0x8c
but this lock took another, hard-irq-unsafe lock in the past:
(&adapter->req_list_lock){-+..}
and interrupts could create inverse lock ordering between them.
[tons of backtraces, but only the interesting part follows]
the second lock's dependencies:
-> (&adapter->req_list_lock){-+..} ops: 2280627634176 {
initial-use at:
[<0000000000071f10>] __lock_acquire+0x504/0x18bc
[<000000000007335c>] lock_acquire+0x94/0xbc
[<00000000003d7224>] _spin_lock_irqsave+0x6c/0xb0
[<00000000002cf684>] zfcp_fsf_req_dismiss_all+0x50/0x140
[<00000000002c87ee>] zfcp_erp_adapter_strategy_generic+0x66/0x3d0
[<00000000002c9498>] zfcp_erp_thread+0x88c/0x1318
[<000000000001b0d2>] kernel_thread_starter+0x6/0xc
[<000000000001b0cc>] kernel_thread_starter+0x0/0xc
in-softirq-W at:
[<0000000000072172>] __lock_acquire+0x766/0x18bc
[<000000000007335c>] lock_acquire+0x94/0xbc
[<00000000003d7224>] _spin_lock_irqsave+0x6c/0xb0
[<00000000002ca73e>] zfcp_qdio_int_resp+0xbe/0x2ac
[<000000000027a1d6>] qdio_kick_inbound_handler+0x82/0xa0
[<000000000027daba>] tiqdio_inbound_processing+0x62/0xf8
[<0000000000047ba4>] tasklet_action+0x100/0x1f4
[<0000000000048b5a>] __do_softirq+0xae/0x154
[<0000000000021e4a>] do_softirq+0xea/0xf0
[<00000000000485de>] irq_exit+0xde/0xe8
[<0000000000268c64>] do_IRQ+0x160/0x1fc
[<00000000000261a2>] io_return+0x0/0x8
[<000000000001b8f8>] cpu_idle+0x17c/0x224
hardirq-on-W at:
[<0000000000072190>] __lock_acquire+0x784/0x18bc
[<000000000007335c>] lock_acquire+0x94/0xbc
[<00000000003d702c>] _spin_lock+0x5c/0x9c
[<00000000002caff6>] zfcp_fsf_req_send+0x3e/0x158
[<00000000002ce7fe>] zfcp_fsf_exchange_config_data+0x106/0x124
[<00000000002c8948>] zfcp_erp_adapter_strategy_generic+0x1c0/0x3d0
[<00000000002c98ea>] zfcp_erp_thread+0xcde/0x1318
[<000000000001b0d2>] kernel_thread_starter+0x6/0xc
[<000000000001b0cc>] kernel_thread_starter+0x0/0xc
}
... key at: [<0000000000e356c8>] __key.26629+0x0/0x8
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmit@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
It is possible that a remote port has a problem, the SCSI device gets
deleted after the rport timeout and then the timeout for pending SCSI
commands trigger an abort. For this case, don't delete the reference
from the SCSI device to the zfcp unit, so that we can still have the
reference to issue an abort request.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>