In driver/xen/events.c, whether bind_pirq is shareable or not is
determined by desc->action is NULL or not. But in __setup_irq,
startup(irq) is invoked before desc->action is assigned with
new action. So desc->action in startup_irq is always NULL, and
bind_pirq is always not shareable. This results in pt_irq_create_bind
failure when passthrough a device which shares irq to other devices.
This patch doesn't use probing_irq to determine if pirq is shareable
or not, instead set shareable flag in irq_info according to trigger
mode in xen_allocate_pirq. Set level triggered interrupts shareable.
Thus use this flag to set bind_pirq flag accordingly.
[v2: arch/x86/xen/pci.c no more, so file skipped]
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The 'xen_poll_irq_timeout' provides a method to pass in
the poll timeout for IRQs if requested. We also export
those two poll functions as Xen PCI fronted uses them.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
In earlier Xen Linux kernels, the IRQ mapping was a straight 1:1 and the
find_unbound_irq started looking around 256 for open IRQs and up. IRQs
from 0 to 255 were reserved for PCI devices. Previous to this patch,
the 'find_unbound_irq' started looking at get_nr_hw_irqs() number.
For privileged domain where the ACPI information is available that
returns the upper-bound of what the GSIs. For non-privileged PV domains,
where ACPI is no-existent the get_nr_hw_irqs() reports the IRQ_LEGACY (16).
With PCI passthrough enabled, and with PCI cards that have IRQs pinned
to a higher number than 16 we collide with previously allocated IRQs.
Specifically the PCI IRQs collide with the IPI's for Xen functions
(as they are allocated earlier).
For example:
00:00.11 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller (prog-if 10 [OHCI])
...
Interrupt: pin A routed to IRQ 18
[root@localhost ~]# cat /proc/interrupts | head
CPU0 CPU1 CPU2
16: 38186 0 0 xen-dyn-virq timer0
17: 149 0 0 xen-dyn-ipi spinlock0
18: 962 0 0 xen-dyn-ipi resched0
and when the USB controller is loaded, the kernel reports:
IRQ handler type mismatch for IRQ 18
current handler: resched0
One way to fix this is to reverse the logic when looking for un-used
IRQ numbers and start with the highest available number. With that,
we would get:
CPU0 CPU1 CPU2
... snip ..
292: 35 0 0 xen-dyn-ipi callfunc0
293: 3992 0 0 xen-dyn-ipi resched0
294: 224 0 0 xen-dyn-ipi spinlock0
295: 57183 0 0 xen-dyn-virq timer0
NMI: 0 0 0 Non-maskable interrupts
.. snip ..
And interrupts for PCI cards are now accessible.
This patch also includes the fix, found by Ian Campbell, titled
"xen: fix off-by-one error in find_unbound_irq."
[v2: Added an explanation in the code]
[v3: Rebased on top of tip/irq/core]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Sometimes cpu_evtchn_mask_p can get used early, before it has been
allocated. Statically initialize it with an initdata version to catch
any early references.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Impact: cleanup
Make pirq show useful information in /proc/interrupts
[v2: Removed the parts for arch/x86/xen/pci.c ]
Signed-off-by: Gerd Hoffmann <kraxel@xeni.home.kraxel.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Dynamically allocate the irq_info and evtchn_to_irq arrays, so that
1) the irq_info array scales to the actual number of possible irqs,
and 2) we don't needlessly increase the static size of the kernel
when we aren't running under Xen.
Derived on patch from Mike Travis <travis@sgi.com>.
[Impact: reduce memory usage ]
[v2: Conflict in drivers/xen/events.c: Replaced alloc_bootmen with kcalloc ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Impact: preserve compat with native
Reserve the lower irq range for use for hardware interrupts so we
can identity-map them.
[v2: Rebased on top tip/irq/core]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
A privileged PV Xen domain can get direct access to hardware. In
order for this to be useful, it must be able to get hardware
interrupts.
Being a PV Xen domain, all interrupts are delivered as event channels.
PIRQ event channels are bound to a pirq number and an interrupt
vector. When a IO APIC raises a hardware interrupt on that vector, it
is delivered as an event channel, which we can deliver to the
appropriate device driver(s).
This patch simply implements the infrastructure for dealing with pirq
event channels.
[ Impact: integrate hardware interrupts into Xen's event scheme ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Impact: allow Xen control of bio merging
When running in Xen domain with device access, we need to make sure
the block subsystem doesn't merge requests across pages which aren't
machine physically contiguous. To do this, we define our own
BIOVEC_PHYS_MERGEABLE. When CONFIG_XEN isn't enabled, or we're not
running in a Xen domain, this has identical behaviour to the normal
implementation. When running under Xen, we also make sure the
underlying machine pages are the same or adjacent.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
There seems to be more cleanups possible, but that's left to the xen
experts :)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
irq_2_iommu is in struct irq_cfg, so we can do the irq_remapped check
based on irq_cfg instead of going through a lookup function. That's
especially interesting in the eoi_ioapic_irq() hotpath.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Switch the intr_remapping code to use the irq_2_iommu struct in
irg_cfg.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
That interrupt remapping code is x86 specific and tied to the io_apic
code. No need for separate allocator functions in the interrupt
remapping code. This allows to simplify the code and irq_2_iommu is
small (13 bytes on 64bit) so it's not a real problem even if interrupt
remapping is runtime disabled. If it's compile time disabled the
impact is zero.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
No need to dereference irq_desc.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
With SPARSE_IRQ=y the irte descriptors are dynamically allocated, but not
freed in free_irte().
That was ok as long as the sparse irq core was not freeing irq descriptors on
destroy_irq(). Now we leak the irte descriptor. Free it in free_irte().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Handing down irq_desc to msi just so that msi can access
irq_desc.irq_data.msi_desc is a pretty stupid idea. The calling code
can hand down a pointer to msi_desc so msi code does not need to know
about the irq descriptor at all.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Russell King <linux@arm.linux.org.uk>
Abusing irq stats in a driver for counting interrupts is a horrible
idea and not safe with shared interrupts. Replace it by a local
interrupt counter.
Noticed by the attempt to remove the irq stats export.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
The twl irqchip uses the dummy irq chip ack functions, which is NULL
now. Switch it over to use irq_ack.
Reported-and-tested-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The irq_data namespace is the preference for the generic irq
layer. Rename the union typedef in drivers/isdn/act2000/act2000.h
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
struct irq_data is the preferred name for the data associated to an
interrupt in the core code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
libata depends on scsi_host_template for module reference counting and
sht's should be owned by each low level driver. During libahci split,
the sht was left with libahci.ko leaving the actual low level drivers
not reference counted. This made ahci and ahci_platform always
unloadable even while they're being actively used.
Fix it by defining AHCI_SHT() macro in ahci.h and defining a sht for
each low level ahci driver.
stable: only applicable to 2.6.35.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Pedro Francisco <pedrogfrancisco@gmail.com>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
tcp: Fix >4GB writes on 64-bit.
net/9p: Mount only matching virtio channels
de2104x: fix ethtool
tproxy: check for transparent flag in ip_route_newports
ipv6: add IPv6 to neighbour table overflow warning
tcp: fix TSO FACK loss marking in tcp_mark_head_lost
3c59x: fix regression from patch "Add ethtool WOL support"
ipv6: add a missing unregister_pernet_subsys call
s390: use free_netdev(netdev) instead of kfree()
sgiseeq: use free_netdev(netdev) instead of kfree()
rionet: use free_netdev(netdev) instead of kfree()
ibm_newemac: use free_netdev(netdev) instead of kfree()
smsc911x: Add MODULE_ALIAS()
net: reset skb queue mapping when rx'ing over tunnel
br2684: fix scheduling while atomic
de2104x: fix TP link detection
de2104x: fix power management
de2104x: disable autonegotiation on broken hardware
net: fix a lockdep splat
e1000e: 82579 do not gate auto config of PHY by hardware during nominal use
...
Commit e40cc4bdfd introduced
a build breakage if CONFIG_SMP is undefined. This commit
fixes the problem.
This fix is only a workaround. For a real fix, cpu_sibling_mask() should
be defined in UP include code, eg in linux/smp.h, and asm/smp.h should not be
included directly. This fix is currently not possible because asm/smp.h defines
cpu_sibling_mask() unconditionally and is included directly from many source
files.
Reported-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
The PKT_CTRL_CMD_STATUS device ioctl retrieves a pointer to a
pktcdvd_device from the global pkt_devs array. The index into this
array is provided directly by the user and is a signed integer, so the
comparison to ensure that it falls within the bounds of this array will
fail when provided with a negative index.
This can be used to read arbitrary kernel memory or cause a crash due to
an invalid pointer dereference. This can be exploited by users with
permission to open /dev/pktcdvd/control (on many distributions, this is
readable by group "cdrom").
Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
[ Rather than add a cast, just make the function take the right type -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the interface is up, using ethtool breaks it because:
a) link is put down but media_timer interval is not shortened to NO_LINK
b) rxtx is stopped but not restarted
Also manual 10baseT-HD (and probably FD too - untested) mode does not work -
the link is forced up, packets are transmitted but nothing is received.
Changing CSR14 value to match documentation (not disabling link check) fixes this.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* master.kernel.org:/home/rmk/linux-2.6-arm: (28 commits)
ARM: 6411/1: vexpress: set RAM latencies to 1 cycle for PL310 on ct-ca9x4 tile
ARM: 6409/1: davinci: map sram using MT_MEMORY_NONCACHED instead of MT_DEVICE
ARM: 6408/1: omap: Map only available sram memory
ARM: 6407/1: mmu: Setup MT_MEMORY and MT_MEMORY_NONCACHED L1 entries
ARM: pxa: remove pr_<level> uses of KERN_<level>
ARM: pxa168fb: clear enable bit when not active
ARM: pxa: fix cpu_is_pxa*() not expanding to zero when not configured
ARM: pxa168: fix corrected reset vector
ARM: pxa: Use PIO for PI2C communication on Palm27x
ARM: pxa: Fix Vpac270 gpio_power for MMC
ARM: 6401/1: plug a race in the alignment trap handler
ARM: 6406/1: at91sam9g45: fix i2c bus speed
leds: leds-ns2: fix locking
ARM: dove: fix __io() definition to use bus based offset
dmaengine: fix interrupt clearing for mv_xor
ARM: kirkwood: Unbreak PCIe I/O port
ARM: Fix build error when using KCONFIG_CONFIG
ARM: 6383/1: Implement phys_mem_access_prot() to avoid attributes aliasing
ARM: 6400/1: at91: fix arch_gettimeoffset fallout
ARM: 6398/1: add proc info for ARM11MPCore/Cortex-A9 from ARM
...
* git://git.infradead.org/iommu-2.6:
intel-iommu: Use symbolic values instead of magic numbers in Lenovo w/a
intel-iommu: Abort IOMMU setup for igfx if BIOS gave no shadow GTT space
This patch (commit 690a1f2002) added a
new call site for acpi_set_WOL() without checking that the function is
actually suitable to be called via
vortex_set_wol+0xcd/0xe0 [3c59x]
dev_ethtool+0xa5a/0xb70
dev_ioctl+0x2e0/0x4b0
T.961+0x49/0x50
sock_ioctl+0x47/0x290
do_vfs_ioctl+0x7f/0x340
sys_ioctl+0x80/0xa0
system_call_fastpath+0x16/0x1b
i.e. outside of code paths run when the device is not yet enabled or
already disabled. In particular, putting the device into D3hot is a
pretty bad idea when it was already brought up.
Furthermore, all prior callers of the function made sure they're
actually dealing with a PCI device, while the newly added one didn't.
In the same spirit, the .get_wol handler shouldn't indicate support
for WOL for non-PCI devices.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The HW by default has RX coalescing on. For iWARP connections, this
causes a 100ms delay in connection establishement due to the ingress
MPA Start message being stalled in HW. So explicitly turn RX
coalescing off when setting up iWARP connections.
This was causing very bad performance for NP64 gather operations using
Open MPI, due to the way it sets up connections on larger jobs.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
f4347553b3 removed the edac polling
mechanism in favor of using a notifier chain for conveying MCE
information to edac. However, the module removal path didn't test
whether the driver had setup the polling function workqueue at all and
the rmmod process was hanging in the kernel at try_to_del_timer_sync()
in the cancel_delayed_work() path, trying to cancel an uninitialized
work struct.
Fix that by adding a balancing check to the workqueue removal path.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
@@
struct net_device* dev;
@@
-kfree(dev)
+free_netdev(dev)
Signed-off-by: David S. Miller <davem@davemloft.net>
Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
@@
struct net_device* dev;
@@
-kfree(dev)
+free_netdev(dev)
Signed-off-by: David S. Miller <davem@davemloft.net>
Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
@@
struct net_device* dev;
@@
-kfree(dev)
+free_netdev(dev)
Signed-off-by: David S. Miller <davem@davemloft.net>
Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
@@
struct net_device* dev;
@@
-kfree(dev)
+free_netdev(dev)
Signed-off-by: David S. Miller <davem@davemloft.net>
This enables auto loading for the smsc911x ethernet driver.
Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Signed-off-by: David S. Miller <davem@davemloft.net>