linux/arch
Brandon Phiilps ced5b697a7 x86: Avoid race condition in pci_enable_msix()
Keep chip_data in create_irq_nr and destroy_irq.

When two drivers are setting up MSI-X at the same time via
pci_enable_msix() there is a race.  See this dmesg excerpt:

[   85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X
[   85.170611]   alloc irq_desc for 99 on node -1
[   85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X
[   85.170614]   alloc kstat_irqs on node -1
[   85.170616] alloc irq_2_iommu on node -1
[   85.170617]   alloc irq_desc for 100 on node -1
[   85.170619]   alloc kstat_irqs on node -1
[   85.170621] alloc irq_2_iommu on node -1
[   85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X
[   85.170626]   alloc irq_desc for 101 on node -1
[   85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X
[   85.170630]   alloc kstat_irqs on node -1
[   85.170631] alloc irq_2_iommu on node -1
[   85.170635]   alloc irq_desc for 102 on node -1
[   85.170636]   alloc kstat_irqs on node -1
[   85.170639] alloc irq_2_iommu on node -1
[   85.170646] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000088

As you can see igb and ixgbe are both alternating on create_irq_nr()
via pci_enable_msix() in their probe function.

ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe
choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and
calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data =
NULL via dynamic_irq_init().

igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[]
via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this:

	cfg_new = irq_desc_ptrs[102]->chip_data;
	if (cfg_new->vector != 0)
		continue;

This hits the NULL deref.

Another possible race exists via pci_disable_msix() in a driver or in
the number of error paths that call free_msi_irqs():

destroy_irq()
dynamic_irq_cleanup() which sets desc->chip_data = NULL
...race window...
desc->chip_data = cfg;

Remove the save and restore code for cfg in create_irq_nr() and
destroy_irq() and take the desc->lock when checking the irq_cfg.

Reported-and-analyzed-by: Brandon Philips <bphilips@suse.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-3-git-send-email-yinghai@kernel.org>
Signed-off-by: Brandon Phililps <bphilips@suse.de>
Cc: stable@kernel.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10 14:27:28 -08:00
..
alpha alpha: cpumask_of_node() should handle -1 as a node 2010-01-14 13:21:35 -05:00
arm Merge master.kernel.org:/home/rmk/linux-2.6-arm 2010-02-04 16:09:01 -08:00
avr32 avr32: clean up memory allocation in at32_add_device_mci 2009-12-28 12:33:00 +01:00
blackfin blackfin,kgdb: Do not put PC in gdb_regs into retx. 2010-01-07 11:58:37 -06:00
cris
frv FDPIC: Respect PT_GNU_STACK exec protection markings when creating NOMMU stack 2010-01-06 18:16:02 -08:00
h8300
ia64 [IA64] move fnptr definition inside #ifdef __KERNEL__ 2010-01-08 10:53:28 -08:00
m32r
m68k m68knommu: fix definitions of __pa() and __va() 2010-01-12 20:51:45 -08:00
m68knommu m68knommu: fix invalid flags on coldfire pit clocksource 2010-01-16 12:15:38 -08:00
microblaze microblaze: Invalidate dcache before enabling it 2010-02-08 11:39:18 +01:00
mips Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus 2010-02-02 12:45:33 -08:00
mn10300 mn10300: update the ASB2303 defconfig 2010-01-11 09:34:10 -08:00
parisc
powerpc powerpc: Fix address masking bug in hpte_need_flush() 2010-02-10 13:58:06 +11:00
s390 [S390] Fix struct _lowcore layout. 2010-02-09 09:46:23 +01:00
score mm: make totalhigh_pages unsigned long 2010-01-11 09:34:03 -08:00
sh sh: Remove superfluous setup_frame_reg call 2010-02-08 10:47:11 +09:00
sparc sparc: TIF_ABI_PENDING bit removal 2010-01-29 08:22:01 -08:00
um Unrot uml mconsole a bit 2010-01-14 09:05:26 -05:00
x86 x86: Avoid race condition in pci_enable_msix() 2010-02-10 14:27:28 -08:00
xtensa
.gitignore
Kconfig