Add a new type of quirk for resetting devices at pci_dev_reset time.
This is necessary to handle device with nonstandard reset procedures,
especially useful for guest drivers.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Prior to this patch, if pci_read_config_byte(dev, PCI_CACHE_LINE_SIZE, ...)
returns 0 for all dev, pci_cache_line_size ends up set to zero
(instead of pci_dfl_cache_line_size).
This patch ensures the pci_cache_line_size = pci_dfl_cache_line_size
setting in the above scenario.
This happens in case of a kvm-88 guest (where, consequently, the rtl8139
NIC failed to initialize).
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Csaba Henk <csaba@gluster.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Having read the PM part of the PCIe 2.0 specification more carefully
I think that it was a mistake to restrict the wake-up enable
propagation to non-PCIe devices, because if we do not request
control of the root ports' PME registers via OSC, PCIe PME is
supposed to be handled by the platform, just like the non-PCIe PME.
Even if we do that, the wake-up propagation is done to allow the
devices to wake up the system from sleep states which involves the
platform anyway, so it won't hurt.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPICA: Update version to 20091112.
ACPICA: Add additional module-level code support
ACPICA: Deploy new create integer interface where appropriate
ACPICA: New internal utility function to create Integer objects
ACPICA: Add repair for predefined methods that must return sorted lists
ACPICA: Fix possible fault if return Package objects contain NULL elements
ACPICA: Add post-order callback to acpi_walk_namespace
ACPICA: Change package length error message to an info message
ACPICA: Reduce severity of predefined repair messages, Warning to Info
ACPICA: Update version to 20091013
ACPICA: Fix possible memory leak for Scope ASL operator
ACPICA: Remove possibility of executing _REG methods twice
ACPICA: Add repair for bad _MAT buffers
ACPICA: Add repair for bad _BIF/_BIX packages
* 'timers-for-linus-hpet' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: hpet: Make WARN_ON understandable
x86: arch specific support for remapping HPET MSIs
intr-remap: generic support for remapping HPET MSIs
x86, hpet: Simplify the HPET code
x86, hpet: Disable per-cpu hpet timer if ARAT is supported
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
x86, Calgary IOMMU quirk: Find nearest matching Calgary while walking up the PCI tree
x86/amd-iommu: Remove amd_iommu_pd_table
x86/amd-iommu: Move reset_iommu_command_buffer out of locked code
x86/amd-iommu: Cleanup DTE flushing code
x86/amd-iommu: Introduce iommu_flush_device() function
x86/amd-iommu: Cleanup attach/detach_device code
x86/amd-iommu: Keep devices per domain in a list
x86/amd-iommu: Add device bind reference counting
x86/amd-iommu: Use dev->arch->iommu to store iommu related information
x86/amd-iommu: Remove support for domain sharing
x86/amd-iommu: Rearrange dma_ops related functions
x86/amd-iommu: Move some pte allocation functions in the right section
x86/amd-iommu: Remove iommu parameter from dma_ops_domain_alloc
x86/amd-iommu: Use get_device_id and check_device where appropriate
x86/amd-iommu: Move find_protection_domain to helper functions
x86/amd-iommu: Simplify get_device_resources()
x86/amd-iommu: Let domain_for_device handle aliases
x86/amd-iommu: Remove iommu specific handling from dma_ops path
x86/amd-iommu: Remove iommu parameter from __(un)map_single
x86/amd-iommu: Make alloc_new_range aware of multiple IOMMUs
...
Remove a stray space in pci_save_state().
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Commit ae21ee65e8 "PCI: acs p2p upsteram
forwarding enabling" doesn't actually enable ACS.
Add a function to pci core to allow an IOMMU to request that ACS
be enabled. The existing mechanism of using iommu_found() in the pci
core to know when ACS should be enabled doesn't actually work due to
initialization order; iommu has only been detected not initialized.
Have Intel and AMD IOMMUs request ACS, and Xen does as well during early
init of dom0.
Cc: Allen Kay <allen.m.kay@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This problem happened when removing PCIe root port using PCI logical
hotplug operation.
The immediate cause of this problem is that the pointer to invalid
data structure is passed to pcie_update_aspm_capable() by
pcie_aspm_exit_link_state(). When pcie_aspm_exit_link_state() received
a pointer to root port link, it unconfigures the root port link and
frees its data structure at first. At this point, there are not links
to configure under the root port and the data structure for root port
link is already freed. So pcie_aspm_exit_link_state() must not call
pcie_update_aspm_capable() and pcie_config_aspm_path().
This patch fixes the problem by changing pcie_aspm_exit_link_state()
not to call pcie_update_aspm_capable() and pcie_config_aspm_path() if
the specified link is root port link.
------------[ cut here ]------------
kernel BUG at drivers/pci/pcie/aspm.c:606!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/pci0000:40/0000:40:13.0/remove
CPU 1
Modules linked in: shpchp
Pid: 9345, comm: sysfsd Not tainted 2.6.32-rc5 #98 ProLiant DL785 G6
RIP: 0010:[<ffffffff811df69b>] [<ffffffff811df69b>] pcie_update_aspm_capable+0x15/0xbe
RSP: 0018:ffff88082a2f5ca0 EFLAGS: 00010202
RAX: 0000000000000e77 RBX: ffff88182cc3e000 RCX: ffff88082a33d006
RDX: 0000000000000001 RSI: ffffffff811dff4a RDI: ffff88182cc3e000
RBP: ffff88082a2f5cc0 R08: ffff88182cc3e000 R09: 0000000000000000
R10: ffff88182fc00180 R11: ffff88182fc00198 R12: ffff88182cc3e000
R13: 0000000000000000 R14: ffff88182cc3e000 R15: ffff88082a2f5e20
FS: 00007f259a64b6f0(0000) GS:ffff880864600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007feb53f73da0 CR3: 000000102cc94000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sysfsd (pid: 9345, threadinfo ffff88082a2f4000, task ffff88082a33cf00)
Stack:
ffff88182cc3e000 ffff88182cc3e000 0000000000000000 ffff88082a33cf00
<0> ffff88082a2f5cf0 ffffffff811dff52 ffff88082a2f5cf0 ffff88082c525168
<0> ffff88402c9fd2f8 ffff88402c9fd2f8 ffff88082a2f5d20 ffffffff811d7db2
Call Trace:
[<ffffffff811dff52>] pcie_aspm_exit_link_state+0xf5/0x11e
[<ffffffff811d7db2>] pci_stop_bus_device+0x76/0x7e
[<ffffffff811d7d67>] pci_stop_bus_device+0x2b/0x7e
[<ffffffff811d7e4f>] pci_remove_bus_device+0x15/0xb9
[<ffffffff811dcb8c>] remove_callback+0x29/0x3a
[<ffffffff81135aeb>] sysfs_schedule_callback_work+0x15/0x6d
[<ffffffff81072790>] worker_thread+0x19d/0x298
[<ffffffff8107273b>] ? worker_thread+0x148/0x298
[<ffffffff81135ad6>] ? sysfs_schedule_callback_work+0x0/0x6d
[<ffffffff810765c0>] ? autoremove_wake_function+0x0/0x38
[<ffffffff810725f3>] ? worker_thread+0x0/0x298
[<ffffffff8107629e>] kthread+0x7d/0x85
[<ffffffff8102eafa>] child_rip+0xa/0x20
[<ffffffff8102e4bc>] ? restore_args+0x0/0x30
[<ffffffff81076221>] ? kthread+0x0/0x85
[<ffffffff8102eaf0>] ? child_rip+0x0/0x20
Code: 89 e5 8a 50 48 31 c0 c0 ea 03 83 e2 07 e8 b2 de fe ff c9 48 98 c3 55 48 89 e5 41 56 49 89 fe 41 55 41 54 53 48 83 7f 10 00 74 04 <0f> 0b eb fe 48 8b 05 da 7d 63 00 4c 8d 60 e8 4c 89 e1 eb 24 4c
RIP [<ffffffff811df69b>] pcie_update_aspm_capable+0x15/0xbe
RSP <ffff88082a2f5ca0>
---[ end trace 6ae0f65bdeab8555 ]---
Reported-by: Alex Chiang <achiang@hp.com>
Tested-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The pci_cleanup_aer_correct_error_status() function has been
#if 0'd out since 2.6.25. Time to remove the dead code.
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The current implementation of pci_cleanup_aer_uncorrect_error_status
only clears either fatal or non-fatal error status bits depending
on the state of the I/O channel. This implementation will then often
leave some bits set after PCI error recovery completes. The uncleared bit
settings will then be falsely reported the next time an AER interrupt is
generated for that hierarchy. An easy way to illustrate this issue is to
use the aer-inject module to simultaneously inject both an uncorrectable
non-fatal and uncorrectable fatal error. One of the errors will not be
cleared.
This patch resolves this issue by unconditionally clearing all bits in
the AER uncorrectable status register. All settings and corrective action
strategies are saved and determined before
pci_cleanup_aer_uncorrect_error_status is called, so this change should not
affect errory handling functionality.
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Remove 'port_type' field in struct pcie_port_data(), because we can
get port type information from struct pci_dev. With this change, this
patch also does followings:
- Remove struct pcie_port_data because it no longer has any field.
- Remove portdrv private definitions about port type (PCIE_RC_PORT,
PCIE_SW_UPSTREAM_PORT and PCIE_SW_DOWNSTREAM_PORT), and use generic
definitions instead.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add missing service irqs cleanup in the error code path of
pcie_port_device_register().
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Call pci_enable_device() before initializing service irqs, because
legacy interrupt is initialized in pci_enable_device() on some
architectures.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch cleans up the service irqs initialization as follows:
- Remove 'irq_mode' field in pcie_port_data and related definitions,
which is not needed because we can get the same information from
'is_msix', 'is_msi' and 'pin' fields in struct pci_dev.
- Change the name of 'vectors' argument of assign_interrupt_mode() to
'irqs' because it holds irq numbers actually. People might confuse
it with CPU vector or MSI/MSI-X vector.
- Change function name assign_interrupt_mode() to init_service_irqs()
becasuse we no longer have 'irq_mode' data structure, and new name
is more straightforward (IMO).
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
No reason to check PME capability outside get_port_device_capability().
Do it in get_port_device_capability().
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
PCIe port type is already stored in 'pcie_type' field of struct
pci_dev. So we don't need to get it from pci configuration space.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
In the current port bus driver implementation, pcie_device allocation,
initialization and registration are done in separated functions. Doing
those in one function make the code simple and easier to read.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We don't need pcie_port_device_probe() because we can get pci
device/port type using pci_is_pcie() and 'pcie_type' fields in struct
pci_dev. Remove pcie_port_device_probe().
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Prior to 1f82de10 we always initialized the upper 32bits of the
prefetchable memory window, regardless of the address range used.
Now we only touch it for a >32bit address, which means the upper32
registers remain whatever the BIOS initialized them too.
It's valid for the BIOS to set the upper32 base/limit to
0xffffffff/0x00000000, which makes us program prefetchable ranges
like 0xffffffffabc00000 - 0x00000000abc00000
Revert the chunk of 1f82de10 that made this conditional so we always
write the upper32 registers and remove now unused pref_mem64 variable.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The pcie_flr routine writes the device control register with the FLR bit
set clearing all other fields for the FLR duration. Among other fields,
the Max_Payload_Size is also cleared which can cause errors if there are
transactions lurking in the HW pipeline. The patch replaces the blank
write with read-modify-write of the control register keeping the other
fields intact.
Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
So we can catch if the driver sets an incorrect dma_mask.
Reviewed-by: Grant Grundler <grundler@google.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
If we stop the kthread, we may end up up'ing the sem twice, which seems
unintended.
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The existing interface only has a pre-order callback. This change
adds an additional parameter for a post-order callback which will
be more useful for bus scans. ACPICA BZ 779.
Also update the external calls to acpi_walk_namespace.
http://www.acpica.org/bugzilla/show_bug.cgi?id=779
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Enabling power fault detected event notification in current pciehp
might cause power fault interrupt storm on some machines. On those
machines. On those machines, power fault detected bit in the slot
status register was set again immediately when it is cleared in the
interrupt service routine, and next power fault detected interrupt was
notified again. Therefore, disable power fault detected event
notification for now.
This patch also removes unnecessary handling for power fault cleared
event because this event is not supported by PCIe spec.
Tested-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Change for PCI hotplug to use pci_is_pcie() instead of checking
pci_dev->is_pcie.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Changes for PCIe AER driver to use pci_is_pcie() instead of checking
pci_dev->is_pcie.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Change for PCIe ASPM driver to use pci_is_pcie() instead of checking
pci_dev->is_pcie.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Change for PCI core to use pci_is_pcie() instead of checking
pci_dev->is_pcie.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use pci_pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in pciehp driver. This avoids unnecessary search in PCI
configuration space. This patch also removes 'cap_base' field in
struct controller, that was used to hold PCIe capability offset by
pciehp itself.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use pci_pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in PCI hotplug core. This avoids unnecessary search in PCI
configuration space.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use pci_pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in PCIe ASPM driver. This avoids unnecessary search in PCI
configuration space.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use pci_pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in PCI Express Port Bus driver. This avoids unnecessary serarch
in PCI configuration space.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in PCIe AER driver. This avoids unnecessary search in PCI
configuration space.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in PCI core code. This avoids unnecessary search in PCI
configuration space.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Commit 86cf898e1d ("intel-iommu: Check for
'DMAR at zero' BIOS error earlier.") was supposed to work by pretending
not to detect an IOMMU if it was actually being reported by the BIOS at
physical address zero.
However, the intel_iommu_init() function is called unconditionally, as
are the corresponding functions for other IOMMU hardware.
So the patch only worked if you have RAM above the 4GiB boundary. It
caused swiotlb to be initialised when no IOMMU was detected during early
boot, and thus the later IOMMU init would refuse to run.
But if you have less RAM than that, swiotlb wouldn't get set up and the
IOMMU _would_ still end up being initialised, even though we never
claimed to detect it.
This patch also sets the dmar_disabled flag when the error is detected
during the initial detection phase -- so that the later call to
intel_iommu_init() will return without doing anything, regardless of
whether swiotlb is used or not.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.infradead.org/users/dwmw2/iommu-2.6.32:
intel-iommu: Support PCIe hot-plug
intel-iommu: Obey coherent_dma_mask for alloc_coherent on passthrough
intel-iommu: Check for 'DMAR at zero' BIOS error earlier.
To support PCIe hot plug in IOMMU, we register a notifier to respond to device
change action.
When the notifier gets BUS_NOTIFY_UNBOUND_DRIVER, it removes the device
from its DMAR domain.
A hot added device will be added into an IOMMU domain when it first does IOMMU
op. So there is no need to add more code for hot add.
Without the patch, after a hot-remove, a hot-added device on the same
slot will not work.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Tested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The model for IOMMU passthrough is that decent devices that can cope
with DMA to all of memory get passthrough; crappy devices with a limited
dma_mask don't -- they get to use the IOMMU anyway.
This is done on the basis that IOMMU passthrough is usually wanted for
performance reasons, and it's only the decent PCI devices that you
really care about performance for, while the crappy 32-bit ones like
your USB controller can just use the IOMMU and you won't really care.
Unfortunately, the check for this was only looking at dev->dma_mask, not
at dev->coherent_dma_mask. And some devices have a 32-bit
coherent_dma_mask even though they have a full 64-bit dma_mask.
Even more unfortunately, fixing that simple oversight would upset
certain broken HP devices. Not only do they have a 32-bit
coherent_dma_mask, but they also have a tendency to do stray DMA to
unmapped addresses. And then they die when they take the DMA fault they
so richly deserve.
So if we do the 'correct' fix, it'll mean that affected users have to
disable IOMMU support completely on "a large percentage of servers from
a major vendor."
Personally, I have little sympathy -- given that this is the _same_
'major vendor' who is shipping machines which claim to have IOMMU
support but have obviously never _once_ booted a VT-d capable OS to do
any form of QA. But strictly speaking, it _would_ be a regression even
though it only ever worked by fluke.
For 2.6.33, we'll come up with a quirk which gives swiotlb support
for this particular device, and other devices with an inadequate
coherent_dma_mask will just get normal IOMMU mapping.
The simplest fix for 2.6.32, though, is just to jump through some hoops
to try to allocate coherent DMA memory for such devices in a place that
they can reach. We'd use dma_generic_alloc_coherent() for this if it
existed on IA64.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
I'm not entirely sure it needs to go into 32, but it's probably the right
thing to do. Another way of explaining the patch is:
- we currently pick the _first_ exactly matching bus resource entry, but
the _last_ inexactly matching one. Normally first/last shouldn't
matter, but bus resource entries aren't actually all created equal: in
a transparent bus, the last resources will be the parent resources,
which we should generally try to avoid unless we have no choice. So
"first matching" is the thing we should always aim for.
- the patch is a bit bigger than it needs to be, because I simplified the
logic at the same time. It used to be a fairly incomprehensible
if ((res->flags & IORESOURCE_PREFETCH) && !(r->flags & IORESOURCE_PREFETCH))
best = r; /* Approximating prefetchable by non-prefetchable */
and technically, all the patch did was to make that complex choice be
even more complex (it basically added a "&& !best" to say that if we
already gound a non-prefetchable window for the prefetchable resource,
then we won't override an earlier one with that later one: remember
"first matching").
- So instead of that complex one with three separate conditionals in one,
I split it up a bit, and am taking advantage of the fact that we
already handled the exact case, so if 'res->flags' has the PREFETCH
bit, then we already know that 'r->flags' will _not_ have it. So the
simplified code drops the redundant test, and does the new '!best' test
separately. It also uses 'continue' as a way to ignore the bus
resource we know doesn't work (ie a prefetchable bus resource is _not_
acceptable for anything but an exact match), so it turns into:
/* We can't insert a non-prefetch resource inside a prefetchable parent .. */
if (r->flags & IORESOURCE_PREFETCH)
continue;
/* .. but we can put a prefetchable resource inside a non-prefetchable one */
if (!best)
best = r;
instead. With the comments, it's now six lines instead of two, but it's
conceptually simpler, and I _could_ have written it as two lines:
if ((res->flags & IORESOURCE_PREFETCH) && !best)
best = r; /* Approximating prefetchable by non-prefetchable */
but I thought that was too damn subtle.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
If HW IOMMU initialization fails (Intel VT-d often does this,
typically due to BIOS bugs), we fall back to nommu. It doesn't
work for the majority since nowadays we have more than 4GB
memory so we must use swiotlb instead of nommu.
The problem is that it's too late to initialize swiotlb when HW
IOMMU initialization fails. We need to allocate swiotlb memory
earlier from bootmem allocator. Chris explained the issue in
detail:
http://marc.info/?l=linux-kernel&m=125657444317079&w=2
The current x86 IOMMU initialization sequence is too complicated
and handling the above issue makes it more hacky.
This patch changes x86 IOMMU initialization sequence to handle
the above issue cleanly.
The new x86 IOMMU initialization sequence are:
1. we initialize the swiotlb (and setting swiotlb to 1) in the case
of (max_pfn > MAX_DMA32_PFN && !no_iommu). dma_ops is set to
swiotlb_dma_ops or nommu_dma_ops. if swiotlb usage is forced by
the boot option, we finish here.
2. we call the detection functions of all the IOMMUs
3. the detection function sets x86_init.iommu.iommu_init to the
IOMMU initialization function (so we can avoid calling the
initialization functions of all the IOMMUs needlessly).
4. if the IOMMU initialization function doesn't need to swiotlb
then sets swiotlb to zero (e.g. the initialization is
sucessful).
5. if we find that swiotlb is set to zero, we free swiotlb
resource.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-10-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This changes detect_intel_iommu() to set intel_iommu_init() to
iommu_init hook if detect_intel_iommu() finds the IOMMU.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-6-git-send-email-fujita.tomonori@lab.ntt.co.jp>
[ -v2: build fix for the !CONFIG_DMAR case ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>