The current Intel IOMMU code assumes that both host page size and Intel
IOMMU page size are 4KiB. The first patch supports variable page size.
This provides support for IA64 which has multiple page sizes.
This patch also adds some other code hooks for IA64 platform including
DMAR_OPERATION_TIMEOUT definition.
[dwmw2: some cleanup]
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Now that we have DMA-remapping support for queued invalidation, we
can enable both DMA-remapping and interrupt-remapping at the same time.
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
If queued invalidation interface is available and enabled, queued invalidation
interface will be used instead of the register based interface.
According to Vt-d2 specification, when queued invalidation is enabled,
invalidation command submit works only through invalidation queue and not
through the command registers interface.
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Implement context cache invalidate and IOTLB invalidation using
queued invalidation interface. This interface will be used by
DMA remapping, when queued invalidation is supported.
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Next patch in the series will use queued invalidation interface
qi_submit_sync() for DMA-remapping aswell, which can be called from interrupt
context.
So use spin_lock_irqsave() instead of spin_lock() in qi_submit_sync().
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
I dunno how this missed Bjorn and his quest to use %pF in commit
c80cfb0406c01bb5da91bfe30f5cb1fd96831138 ("vsprintf: use new vsprintf
symbolic function pointer format"), but it did.
So use %pF in the two remaining places that still tried to print out
function pointers by hand.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (134 commits)
KVM: ia64: Add intel iommu support for guests.
KVM: ia64: add directed mmio range support for kvm guests
KVM: ia64: Make pmt table be able to hold physical mmio entries.
KVM: Move irqchip_in_kernel() from ioapic.h to irq.h
KVM: Separate irq ack notification out of arch/x86/kvm/irq.c
KVM: Change is_mmio_pfn to kvm_is_mmio_pfn, and make it common for all archs
KVM: Move device assignment logic to common code
KVM: Device Assignment: Move vtd.c from arch/x86/kvm/ to virt/kvm/
KVM: VMX: enable invlpg exiting if EPT is disabled
KVM: x86: Silence various LAPIC-related host kernel messages
KVM: Device Assignment: Map mmio pages into VT-d page table
KVM: PIC: enhance IPI avoidance
KVM: MMU: add "oos_shadow" parameter to disable oos
KVM: MMU: speed up mmu_unsync_walk
KVM: MMU: out of sync shadow core
KVM: MMU: mmu_convert_notrap helper
KVM: MMU: awareness of new kvm_mmu_zap_page behaviour
KVM: MMU: mmu_parent_walk
KVM: x86: trap invlpg
KVM: MMU: sync roots on mmu reload
...
The PCI core wants to reorder the devices in the bus list. So move this
functionality out of the pci core and into the driver core so that
anyone else can also do this if needed. This also lets us change how
struct device is attached to drivers in the future without messing with
the PCI core.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This code is not ready, but we need to rip it out instead of rebasing
as we would lose the APIC/IO_APIC unification otherwise.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
An error return from create_irq_nr() is 0, but an error return from
create_irq() is -1.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It is possible that,
instead of PCI endpoint/sub-hierarchy structures, only IO-APIC/HPET
devices may be reported under device scope structures. Fix the devices_cnt
error check, which cares about only PCI structures and removes the
dma-remapping unit structure (dmaru) when the devices_cnt is zero
and include_all flag is not set.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In dmar_dev_scope_init(), functions called under for_each_drhd_unit()/
for_each_rmrr_units() can delete the list entry under some error conditions.
So we should use list_for_each_entry_safe() for safe traversal.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
initialize the return value in dmar_parse_dev()
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Very early detection of the DMAR tables will setup fixmap mapping. For
parsing these tables later (while enabling dma and/or interrupt remapping),
early fixmap mapping shouldn't be used. Fix it by calling table detection
routines again, which will call generic apci_get_table() for setting up
the correct mapping.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In irq_2_iommu_alloc() and set_irte_irq(), irq_to_desc or
irq_2_iommu pointers may not be allocated. So use the routines
which will allocate them if they are not already allocated.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
also print out irq no in /proc/interrups and /proc/stat in hex, so could
tell bus/dev/func.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
when CONFIG_HAVE_SPARSE_IRQ
preallocate some irq_2_iommu entries, and use get_one_free_irq_2_iomm to
get new one and link to irq_desc if needed.
else will use dyn_array or static array.
v2: <= nr_irqs fix
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch extends the VT-d driver to support KVM
[Ben: fixed memory pinning]
[avi: move dma_remapping.h as well]
Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Acked-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
As of version 2.0, ACPI can return 64-bit integers. The current
acpi_evaluate_integer only supports 64-bit integers on 64-bit platforms.
Change the argument to take a pointer to an acpi_integer so we support
64-bit integers on all platforms.
lenb: replaced use of "acpi_integer" with "unsigned long long"
lenb: fixed bug in acpi_thermal_trips_update()
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This is loosely based on a patch by Jesse Barnes to check the user-space
PCI mappings though the sysfs interfaces. Quoting Jesse's original
explanation:
It's fairly common for applications to map PCI resources through sysfs.
However, with the current implementation, it's possible for an application
to map far more than the range corresponding to the resourceN file it
opened. This patch plugs that hole by checking the range at mmap time,
similar to what is done on platforms like sparc64 in their lower level
PCI remapping routines.
It was initially put together to help debug the e1000e NVRAM corruption
problem, since we initially thought an X driver might be walking past the
end of one of its mappings and clobbering the NVRAM. It now looks like
that's not the case, but doing the check is still important for obvious
reasons.
and this version of the patch differs in that it uses a helper function
to clarify the code, and does all the checks in pages (instead of bytes)
in order to avoid overflows when doing "<< PAGE_SHIFT" etc.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
.... so that they can be used by MTD map drivers. Lets us close#9420
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
IO resource and ioremap debugging uncovered this ioremap() done
by drivers/pci/hotplug/ibmphp_ebda.c:
initcall pci_hotplug_init+0x0/0x41 returned 0 after 3 msecs
calling ibmphp_init+0x0/0x360 @ 1
ibmphpd: IBM Hot Plug PCI Controller Driver version: 0.6
resource map sanity check conflict: 0x9f800 0xaf5e7 0x9f800 0x9ffff reserved
------------[ cut here ]------------
WARNING: at arch/x86/mm/ioremap.c:175 __ioremap_caller+0x5c/0x226()
Pid: 1, comm: swapper Not tainted 2.6.27-rc7-tip-00914-g347b10f-dirty #36038
[<c013a72d>] warn_on_slowpath+0x41/0x68
[<c0156f00>] ? __lock_acquire+0x9ba/0xa7f
[<c012158c>] ? do_flush_tlb_all+0x0/0x59
[<c015ac31>] ? smp_call_function_mask+0x74/0x17d
[<c012158c>] ? do_flush_tlb_all+0x0/0x59
[<c013b228>] ? printk+0x1a/0x1c
[<c013f302>] ? iomem_map_sanity_check+0x82/0x8c
[<c0a773e8>] ? _read_unlock+0x22/0x25
[<c013f302>] ? iomem_map_sanity_check+0x82/0x8c
[<c0154e17>] ? trace_hardirqs_off+0xb/0xd
[<c0127731>] __ioremap_caller+0x5c/0x226
[<c0156158>] ? trace_hardirqs_on+0xb/0xd
[<c012767d>] ? iounmap+0x9d/0xa5
[<c01279dd>] ioremap_nocache+0x15/0x17
[<c0403c42>] ? ioremap+0xd/0xf
[<c0403c42>] ioremap+0xd/0xf
[<c0f1928f>] ibmphp_access_ebda+0x60/0xa0e
[<c0f17f64>] ibmphp_init+0xb5/0x360
[<c0101057>] do_one_initcall+0x57/0x138
[<c0f17eaf>] ? ibmphp_init+0x0/0x360
[<c0156158>] ? trace_hardirqs_on+0xb/0xd
[<c0148d75>] ? __queue_work+0x2b/0x30
[<c0f17eaf>] ? ibmphp_init+0x0/0x360
[<c0f015a0>] kernel_init+0x17b/0x1e2
[<c0f01425>] ? kernel_init+0x0/0x1e2
[<c01178b3>] kernel_thread_helper+0x7/0x10
=======================
---[ end trace a7919e7f17c0a725 ]---
initcall ibmphp_init+0x0/0x360 returned -19 after 144 msecs
calling zt5550_init+0x0/0x6a @ 1
the problem is this code:
io_mem = ioremap (ebda_seg<<4, 65000);
it assumes that the EBDA is 65000 bytes. But BIOS EBDA pointers are
at most 1K large.
_if_ the Rio code truly extends upon the customary EBDA size it needs
to iounmap() this memory and ioremap() it larger, once it knows it from
the generic descriptors that a Rio system is around.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
dock's uevent reported itself, not ata. It might be difficult to find an
ata device just according to a dock. This patch introduces docking ops
for each device in a dock. when docking, dock driver can send device
specific uevent. This should help dock station too (not just bay)
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
pci_get_subsys() changed in 2.6.26 so that the from pointer is modified
when the call is being invoked, so fix up the 'const' marking of it that
the compiler is complaining about.
Reported-by: Rufus & Azrael <rufus-azrael@numericable.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
pcie_aspm=force did not work because aspm_force was being double negated
leading to the sanity check failing. Moving a bracket should fix this.
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
There's no good reason why a resource_size_t shouldn't just be a
physical address, so simply redefine it in terms of phys_addr_t.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Print out for device BAR values before the kernel tries to update them.
Also make related output use KERN_DEBUG.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch fixes an obvious bug (loop was never entered) caused by
commit 820943b6fc4781621dee52ba026106758a727dd3
(pciehp: cleanup pcie_poll_cmd).
Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Commit fe99740cac117f208707488c03f3789cf4904957 (construct one
fakephp slot per PCI slot) introduced a regression, causing a
deadlock when removing a PCI device.
We also never actually removed the device from the PCI core.
So we:
- remove the device from the PCI core
- do not directly call remove_slot() to prevent deadlock
Yu Zhao reported and diagnosed this defect.
Signed-off-by: Alex Chiang <achiang@hp.com>
Acked-by: Yu Zhao <yu.zhao@intel.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Again, the cleaned up code introduced some resource warnings:
drivers/pci/setup-bus.c: In function 'pci_bus_dump_res':
drivers/pci/setup-bus.c:542: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'resource_size_t'
drivers/pci/setup-bus.c:542: warning: format '%llx' expects type 'long long unsigned int', but argument 6 has type 'resource_size_t'
Fix those up too.
Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The cleaned up resource code in probe.c introduced some warnings:
drivers/pci/probe.c: In function 'pci_read_bridge_bases':
drivers/pci/probe.c:386: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'resource_size_t'
drivers/pci/probe.c:386: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'resource_size_t'
drivers/pci/probe.c:398: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'resource_size_t'
drivers/pci/probe.c:398: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'resource_size_t'
drivers/pci/probe.c:434: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'resource_size_t'
drivers/pci/probe.c:434: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'resource_size_t'
So fix them up.
Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Some BIOSes (the Intel DG33BU, for example) wrongly claim to have DMAR
when they don't. Avoid the resulting crashes when it doesn't work as
expected.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Some BIOSes (the Intel DG33BU, for example) wrongly claim to have DMAR
when they don't. Avoid the resulting crashes when it doesn't work as
expected.
I'd still be grateful if someone could test it on a DG33BU with the old
BIOS though, since I've killed mine. I tested the DMI version, but not
this one.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 884525655d07fdee9245716b998ecdc45cdd8007 ("PCI: clean up resource
alignment management") changed the resource handling to mark how a
resource was aligned on a per-resource basis.
Thus, instead of looking at the resource number to determine whether it
was a bridge resource or a regular resource (they have different
alignment rules), we should just ask the resource for its alignment
directly.
The reason this broke only cardbus resources was that for the other
types of resources, the old way of deciding alignment actually still
happened to work. But CardBus bridge resources had been changed by
commit 934b7024f0ed29003c95cef447d92737ab86dc4f ("Fix cardbus resource
allocation") to look more like regular resources than PCI bridge
resources from an alignment handling standpoint.
Reported-and-tested-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Chiang and Matthew Wilcox pointed out that pci_get_dev_by_id() does
not properly decrement the reference on the from pointer if it is
present, like the documentation for the function states it will.
It fixes a pretty bad leak in the hotplug core (we were leaking an
entire struct pci_dev for each function of each offlined card, the first
time around; subsequent onlines/offlines were ok).
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: stable <stable@kernel.org>
Tested-by: Alex Chiang <achiang@hp.com>
Acked-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Commit ef0ff95f136f0f2d035667af5d18b824609de320 (shpchp: fix slot name)
introduces the shpchp_slot_with_bus module parameter, which was intended
to help work around broken firmware that assigns the same name to multiple
slots.
Commit b3bd307c628af2f0a581c42d5d7e4bcdbbf64b6a (shpchp: add message about
shpchp_slot_with_bus option) tells the user to use the above parameter
in the event of a name collision.
This approach is sub-optimal because it requires too much work from
the user.
Instead, let's rename the slot on behalf of the user. If firmware
assigns the name N to multiple slots, then:
The first registered slot is assigned N
The second registered slot is assigned N-1
The third registered slot is assigned N-2
The Mth registered slot becomes N-M
In the event we overflow the slot->name parameter, we report an
error to the user.
This is a temporary fix until the entire PCI core can be reworked
such that individual drivers no longer have to manage their own
slot names.
Tested-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>