* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Clear TS in irq_ts_save() when in an atomic section
x86: Detect use of extended APIC ID for AMD CPUs
x86: memtest: remove 64-bit division
x86, UV: Fix macros for multiple coherency domains
x86: Fix non-lazy GS handling in sys_vm86()
x86: Add quirk for reboot stalls on a Dell Optiplex 360
x86: Fix UV BAU activation descriptor init
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits)
x86: fix system without memory on node0
x86, mm: Fix node_possible_map logic
mm, x86: remove MEMORY_HOTPLUG_RESERVE related code
x86: make sparse mem work in non-NUMA mode
x86: process.c, remove useless headers
x86: merge process.c a bit
x86: use sparse_memory_present_with_active_regions() on UMA
x86: unify 64-bit UMA and NUMA paging_init()
x86: Allow 1MB of slack between the e820 map and SRAT, not 4GB
x86: Sanity check the e820 against the SRAT table using e820 map only
x86: clean up and and print out initial max_pfn_mapped
x86/pci: remove rounding quirk from e820_setup_gap()
x86, e820, pci: reserve extra free space near end of RAM
x86: fix typo in address space documentation
x86: 46 bit physical address support on 64 bits
x86, mm: fault.c, use printk_once() in is_errata93()
x86: move per-cpu mmu_gathers to mm/init.c
x86: move max_pfn_mapped and max_low_pfn_mapped to setup.c
x86: unify noexec handling
x86: remove (null) in /sys kernel_page_tables
...
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, microcode: Simplify vfree() use
x86: microcode: use smp_call_function_single instead of set_cpus_allowed, cleanup of synchronization logic
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: cpu_debug: Remove model information to reduce encoding-decoding
x86: fixup numa_node information for AMD CPU northbridge functions
x86: k8 convert node_to_k8_nb_misc() from a macro to an inline function
x86: cacheinfo: complete L2/L3 Cache and TLB associativity field definitions
x86/docs: add description for cache_disable sysfs interface
x86: cacheinfo: disable L3 ECC scrubbing when L3 cache index is disabled
x86: cacheinfo: replace sysfs interface for cache_disable feature
x86: cacheinfo: use cached K8 NB_MISC devices instead of scanning for it
x86: cacheinfo: correct return value when cache_disable feature is not active
x86: cacheinfo: use L3 cache index disable feature only for CPUs that support it
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, nmi: Use predefined numbers instead of hardcoded one
x86: asm/processor.h: remove double declaration
x86, mtrr: replace MTRRdefType_MSR with msr-index's MSR_MTRRdefType
x86, mtrr: replace MTRRfix4K_C0000_MSR with msr-index's MSR_MTRRfix4K_C0000
x86, mtrr: remove mtrr MSRs double declaration
x86, mtrr: replace MTRRfix16K_80000_MSR with msr-index's MSR_MTRRfix16K_80000
x86, mtrr: replace MTRRfix64K_00000_MSR with msr-index's MSR_MTRRfix64K_00000
x86, mtrr: replace MTRRcap_MSR with msr-index's MSR_MTRRcap
x86: mce: remove duplicated #include
x86: msr-index.h remove duplicate MSR C001_0015 declaration
x86: clean up arch/x86/kernel/tsc_sync.c a bit
x86: use symbolic name for VM86_SIGNAL when used as vm86 default return
x86: added 'ifndef _ASM_X86_IOMAP_H' to iomap.h
x86: avoid multiple declaration of kstack_depth_to_print
x86: vdso/vma.c declare vdso_enabled and arch_setup_additional_pages before they get used
x86: clean up declarations and variables
x86: apic/x2apic_cluster.c x86_cpu_to_logical_apicid should be static
x86 early quirks: eliminate unused function
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, 64-bit: ifdef out struct thread_struct::ip
x86, 32-bit: ifdef out struct thread_struct::fs
x86: clean up alternative.h
* 'x86-kbuild-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (46 commits)
x86, boot: add new generated files to the appropriate .gitignore files
x86, boot: correct the calculation of ZO_INIT_SIZE
x86-64: align __PHYSICAL_START, remove __KERNEL_ALIGN
x86, boot: correct sanity checks in boot/compressed/misc.c
x86: add extension fields for bootloader type and version
x86, defconfig: update kernel position parameters
x86, defconfig: update to current, no material changes
x86: make CONFIG_RELOCATABLE the default
x86: default CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN to 16 MB
x86: document new bzImage fields
x86, boot: make kernel_alignment adjustable; new bzImage fields
x86, boot: remove dead code from boot/compressed/head_*.S
x86, boot: use LOAD_PHYSICAL_ADDR on 64 bits
x86, boot: make symbols from the main vmlinux available
x86, boot: determine compressed code offset at compile time
x86, boot: use appropriate rep string for move and clear
x86, boot: zero EFLAGS on 32 bits
x86, boot: set up the decompression stack as early as possible
x86, boot: straighten out ranges to copy/zero in compressed/head*.S
x86, boot: stylistic cleanups for boot/compressed/head_64.S
...
Fixed trivial conflict in arch/x86/configs/x86_64_defconfig manually
* 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (76 commits)
x86, apic: Fix dummy apic read operation together with broken MP handling
x86, apic: Restore irqs on fail paths
x86: Print real IOAPIC version for x86-64
x86: enable_update_mptable should be a macro
sparseirq: Allow early irq_desc allocation
x86, io-apic: Don't mark pin_programmed early
x86, irq: don't call mp_config_acpi_gsi() if update_mptable is not enabled
x86, irq: update_mptable needs pci_routeirq
x86: don't call read_apic_id if !cpu_has_apic
x86, apic: introduce io_apic_irq_attr
x86/pci: add 4 more return parameters to IO_APIC_get_PCI_irq_vector(), fix
x86: read apic ID in the !acpi_lapic case
x86: apic: Fixmap apic address even if apic disabled
x86: display extended apic registers with print_local_APIC and cpu_debug code
x86: read apic ID in the !acpi_lapic case
x86: clean up and fix setup_clear/force_cpu_cap handling
x86: apic: Check rev 3 fadt correctly for physical_apic bit
x86/pci: update pirq_enable_irq() to setup io apic routing
x86/acpi: move setup io apic routing out of CONFIG_ACPI scope
x86/pci: add 4 more return parameters to IO_APIC_get_PCI_irq_vector()
...
The e_powersaver driver for VIA's C7 CPU's needs to be marked as
DANGEROUS as it configures the CPU to power states that are out
of specification.
According to Centaur, all systems with C7 and Nano CPU's support
the ACPI p-state method. Thus, the acpi-cpufreq driver should
be used instead.
Signed-off-by: Harald Welte <HaraldWelte@viatech.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The VIA/Centaur C7, C7-M and Nano CPU's all support ACPI based cpu p-states
using a MSR interface. The Linux driver just never made use of it, since in
addition to the check for the EST flag it also checked if the vendor is Intel.
Signed-off-by: Harald Welte <HaraldWelte@viatech.com>
[ Removed the vendor checks entirely - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Booting a 32-bit kernel on Magny-Cours results in the following panic:
...
Using APIC driver default
...
Overriding APIC driver with bigsmp
...
Getting VERSION: 80050010
Getting VERSION: 80050010
Getting ID: 10000000
Getting ID: ef000000
Getting LVT0: 700
Getting LVT1: 10000
Kernel panic - not syncing: Boot APIC ID in local APIC unexpected (16 vs 0)
Pid: 1, comm: swapper Not tainted 2.6.30-rcX #2
Call Trace:
[<c05194da>] ? panic+0x38/0xd3
[<c0743102>] ? native_smp_prepare_cpus+0x259/0x31f
[<c073b19d>] ? kernel_init+0x3e/0x141
[<c073b15f>] ? kernel_init+0x0/0x141
[<c020325f>] ? kernel_thread_helper+0x7/0x10
The reason is that default_get_apic_id handled extension of local APIC
ID field just in case of XAPIC.
Thus for this AMD CPU, default_get_apic_id() returns 0 and
bigsmp_get_apic_id() returns 16 which leads to the respective kernel
panic.
This patch introduces a Linux specific feature flag to indicate
support for extended APIC id (8 bits instead of 4 bits width) and sets
the flag on AMD CPUs if applicable.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: <stable@kernel.org>
LKML-Reference: <20090608135509.GA12431@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
These are defined as static cpumask_var_t so if MAXSMP is not used,
they are cleared already. Avoid surprises when MAXSMP is enabled.
Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fix bug in the SGI UV macros that support systems with multiple
coherency domains. The macros used for referencing global MMR
(chipset registers) are failing to correctly "or" the NASID
(node identifier) bits that reside above M+N. These high bits
are supplied automatically by the chipset for memory accesses
coming from the processor socket.
However, the bits must be present for references to the special
global MMR space used to map chipset registers. (See uv_hub.h
for more details ...)
The bug results in references to invalid/incorrect nodes.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: <stable@kernel.org>
LKML-Reference: <20090608154405.GA16395@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
vfree() does its own 'NULL' check, so no need for check before
calling it.
In v2, remove the stray newline.
[ Impact: cleanup ]
Signed-off-by: Figo.zhang <figo1802@gmail.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
LKML-Reference: <1244385036.3402.11.camel@myhost>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This fixes a stack corruption panic or null dereference oops
due to a bad GS in resume_userspace() when returning from
sys_vm86() and calling lockdep_sys_exit().
Only a problem when CONFIG_LOCKDEP and CONFIG_CC_STACKPROTECTOR
enabled.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1244384628.2323.4.camel@bimbo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar reported that read_apic is buggy novadays:
[ 0.000000] Using APIC driver default
[ 0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs
[ 0.000000] Local APIC disabled by BIOS -- you can enable it with "lapic"
[ 0.000000] APIC: disable apic facility
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] WARNING: at arch/x86/kernel/apic/apic.c:254 native_apic_read_dummy+0x2d/0x3b()
[ 0.000000] Hardware name: HP OmniBook PC
Indeed we still rely on apic->read operation for SMP compiled
kernel. And instead of disfigure the SMP code with #ifdef we
allow to call apic->read. To capture any unexpected results
we check for apic->read being called for sane reason via
WARN_ON_ONCE but(!) instead of OR we should use AND logical
operation (thanks Yinghai for spotting the root of the problem).
Along with that we could be have bad MP table and we are
to fix it that way no SMP started and no complains about
BIOS bug if apic was just disabled via command line.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090607124840.GD4547@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The Dell Optiplex 360 hangs on reboot, just like the Optiplex 330, so
the same quirk is needed.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Steve Conklin <steve.conklin@canonical.com>
Cc: Leann Ogasawara <leann.ogasawara@canonical.com>
Cc: <stable@kernel.org>
LKML-Reference: <200906051202.38311.jdelvare@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove model information, encoding/decoding and reduce bookkeeping.
This, besides removing a lot of code and cleaning up the code, also
enables these features on many more CPUs that were enumerated before.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
LKML-Reference: <1244224637.8212.6.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The powernow-k8 driver checks to see that the Performance Control/Status
Registers are declared as FFH (functional fixed hardware) by the BIOS.
However, this check got broken in the commit:
0e64a0c982c06a6b8f5e2a7f29eb108fdf257b2f
[CPUFREQ] checkpatch cleanups for powernow-k8
Fix based on an original patch from Naga Chumbalkar.
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
The UV tlb shootdown code has a serious initialization error.
An array of structures [32*8] is initialized as if it were [32].
The array is indexed by (cpu number on the blade)*8, so the short
initialization works for up to 4 cpus on a blade.
But above that, we provide an invalid opcode to the hub's
broadcast assist unit.
This patch changes the allocation of the array to use its symbolic
dimensions for better clarity. And initializes all 32*8 entries.
Shortened 'UV_ACTIVATION_DESCRIPTOR_SIZE' to 'UV_ADP_SIZE' per Ingo's
recommendation.
Tested on the UV simulator.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org>
LKML-Reference: <E1M6lZR-0007kV-Aq@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix the fact that the IOAPIC version number in the x86_64 code path always
gets assigned to 0, instead of the correct value.
Before the patch: (from "dmesg" output):
ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 8, version 0, address 0xfec00000, GSI 0-23 <---
After the patch:
ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23 <---
History:
io_apic_get_version() was compiled out of the x86_64 code path in the commit
f2c2cca3acef8b253a36381d9b469ad4fb08563a:
Author: Andi Kleen <ak@suse.de>
Date: Tue Sep 26 10:52:37 2006 +0200
[PATCH] Remove APIC version/cpu capability mpparse checking/printing
ACPI went to great trouble to get the APIC version and CPU capabilities
of different CPUs before passing them to the mpparser. But all
that data was used was to print it out. Actually it even faked some data
based on the boot cpu, not on the actual CPU being booted.
Remove all this code because it's not needed.
Cc: len.brown@intel.com
At the time, the IOAPIC version number was deliberately not printed
in the x86_64 code path. However, after the x86 and x86_64 files were
merged, the net result is that the IOAPIC version is printed incorrectly
in the x86_64 code path.
The patch below provides a fix. I have tested it with acpi, and with
acpi=off, and did not see any problems.
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Acked-by: Yinghai Lu <yhlu.kernel@gmail.com>
LKML-Reference: <20090416014230.4885.94926.sendpatchset@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
*************************
Merge reason: irq/numa didnt build because this commit:
2759c32: x86: don't call read_apic_id if !cpu_has_apic
Had a dependency on x86/cpufeature changes. Pull in that
(small) branch to fix the dependency.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/mips/sibyte/bcm1480/irq.c
arch/mips/sibyte/sb1250/irq.c
Merge reason: we gathered a few conflicts plus update to latest upstream fixes.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Slightly modified by trenn@suse.de -> only do this on fam 10h and fam 11h.
Currently powernow-k8 determines CPU frequency from ACPI PSS objects, but
according to AMD family 11h BKDG this frequency is just a rounded value:
"CoreFreq (MHz) = The CPU COF specified by MSRC001_00[6B:64][CpuFid]
rounded to the nearest 100 Mhz."
As a consequnce powernow-k8 reports wrong CPU frequency on some systems,
e.g. on Turion X2 Ultra:
powernow-k8: Found 1 AMD Turion(tm)X2 Ultra DualCore Mobile ZM-82
processors (2 cpu cores) (version 2.20.00)
powernow-k8: 0 : pstate 0 (2200 MHz)
powernow-k8: 1 : pstate 1 (1100 MHz)
powernow-k8: 2 : pstate 2 (600 MHz)
But this is wrong as frequency for Pstate2 is 550 MHz. x86info reports it
correctly:
#x86info -a |grep Pstate
...
Pstate-0: fid=e, did=0, vid=24 (2200MHz)
Pstate-1: fid=e, did=1, vid=30 (1100MHz)
Pstate-2: fid=e, did=2, vid=3c (550MHz) (current)
Solution is to determine the frequency directly from Pstate MSRs instead
of using rounded values from ACPI table.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
- Make the message shorter and easier to grep for
- Use printk_once instead of WARN_ONCE (functionality of these was mixed)
Signed-off-by: Thomas Renninger <trenn@suse.de>
Cc: Langsdorf, Mark <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
arch/x86/kernel/cpu/cpufreq/powernow-k7.c:172: warning: 'invalidate_entry' defined but not used
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Dave Jones <davej@redhat.com>
Some atom procs don't do freq scaling (such as the atom 330 on my own
littlefalls2 board). By adding the atom family here, we at least get
the benefit of passive cooling in a thermal emergency. Not sure how
to see that its actually helping any, but the driver does bind and
claim its functioning on my atom 330.
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Remap percpu allocator has subtle bug when combined with page
attribute changing. Remap percpu allocator aliases PMD pages for the
first chunk and as pageattr doesn't know about the alias it ends up
updating page attributes of the original mapping thus leaving the
alises in inconsistent state which might lead to subtle data
corruption. Please read the following threads for more information:
http://thread.gmane.org/gmane.linux.kernel/835783
The following is the proposed fix which teaches pageattr about percpu
aliases.
http://thread.gmane.org/gmane.linux.kernel/837157
However, the above changes are deemed too pervasive for upstream
inclusion for 2.6.30 release, so this patch essentially disables
the remap allocator for the time being.
Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <4A1A0A27.4050301@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce "noxsave" boot parameter which will disable the cpu's xsave/xrstor
capabilities. Useful for debugging and working around xsave related issues.
[ Impact: make it possible to debug problems in the field ]
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
x86: DMI match for the Sony VGN-Z540N as it needs BIOS reboot,
see:
http://bugzilla.kernel.org/show_bug.cgi?id=12901
[ Impact: fix hung reboot on certain systems ]
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <lenb@kernel.org>
LKML-Reference: <1242963350.32574.53.camel@rzhang-dt>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter bisected that:
| commit b9c61b70075c87a8612624736faf4a2de5b1ed30
| Date: Wed May 6 10:10:06 2009 -0700
|
| x86/pci: update pirq_enable_irq() to setup io apic routing
|
| So we can set io apic routing only when enabling the device irq.
wrecked his opteron box, ata1 interrupts fail to get through.
ata1 is using irq 11:
[ 1.451839] sata_svw 0000:01:0e.0: version 2.3
[ 1.456333] sata_svw 0000:01:0e.0: PCI INT A -> GSI 11 (level, low) -> IRQ 11
[ 1.463639] scsi0 : sata_svw
[ 1.466949] scsi1 : sata_svw
[ 1.470022] scsi2 : sata_svw
[ 1.473090] scsi3 : sata_svw
[ 1.476112] ata1: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe000 irq 11
[ 1.483490] ata2: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe100 irq 11
[ 1.490870] ata3: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe200 irq 11
[ 1.498247] ata4: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe300 irq 11
that pin is overlapped with pin with legacy ones.
We should not set bits in pin_programmed here, so that those bit could
be set later via io_apic_set_pci_routing().
[ Impact: fix boot hang on certain systems ]
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org>
Tested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Jack Steiner <steiner@sgi.com>
LKML-Reference: <4A119990.9020606@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: Append prompt in /debug/tracing/README file
x86/function-graph: fix constraint for recording old return value
Len expressed concern that the update_mptable feature has
side-effects on the ACPI code.
Make it sure explicitly that the code only ever gets called if
the (default disabled) update_mptable boot quirk option is
disabled.
[ Impact: isolate the update_mptable feature from ACPI code more ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Len Brown <lenb@kernel.org>
LKML-Reference: <4A0DC832.5090200@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To get all device irq routing and to save them.
This is basically an implicit pci=routeirq enablement if (and on if)
the update_mptable boot option (which is off by default) has been
specified.
[ Impact: extend the update_mptable boot opion's scope ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <4A0DB7B4.4060702@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jack found a boot crash on a system which doesn't have memory on node0.
It turns out with recent per_cpu changes, node_number for BSP will always
be 0, and it is not consistent to cpu_to_node() that might set it to a
different (nearer) node already.
aka when numa_set_node() for node0 is called early before per_cpu area is
setup:
two places touched that per_cpu(node_number,):
1. in cpu/common.c::cpu_init() and it is not for BP
| #ifdef CONFIG_NUMA
| if (cpu != 0 && percpu_read(node_number) == 0 &&
| cpu_to_node(cpu) != NUMA_NO_NODE)
| percpu_write(node_number, cpu_to_node(cpu));
| #endif
for BP: traps_init ==> cpu_init
for AP: start_secondary ==> cpu_init
2. cpu/intel.c or amd.c::srat_detect_node via numa_set_node()
for BP: check_bugs ==> identify_boot_cpu ==> identify_cpu()
that is rather later before numa_node_id() is used for BP...
for AP: start_secondary => smp_callin => smp_store_cpu_info() =>
=> identify_secondary_cpu => identify_cpu()
so try to set that for BP earlier in setup_per_cpu_areas(), and
don't bother to set that for APs there (it will be updated later
and will be used later)
(and don't mess the 0 before the copying BP per_cpu data to APs)
[ Impact: fix boot crash on memoryless node-0 ]
Reported-and-tested-by: Jack Steiner <steiner@sgi.com>
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4A0C4A02.7050401@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Merge reason: sync up to -rc6 which has changes to mm/ which we are
going to touch in the commits to follow as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
should not call that if apic is disabled.
[ Impact: fix crash on certain UP configs ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
LKML-Reference: <4A09CCBB.2000306@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
according to Ingo, io_apic irq-setup related functions have too many
parameters with a repetitive signature.
So reduce related funcs to get less params by passing a pointer
to a newly defined io_apic_irq_attr structure.
v2: io_apic_irq ==> irq_attr
triggering ==> trigger
v3: add set_io_apic_irq_attr
[ Impact: cleanup ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Len Brown <lenb@kernel.org>
LKML-Reference: <4A08ACD3.2070401@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Xiaohui Xin and some other folks at Intel have been looking into what's
behind the performance hit of paravirt_ops when running native.
It appears that the hit is entirely due to the paravirtualized
spinlocks introduced by:
| commit 8efcbab674de2bee45a2e4cdf97de16b8e609ac8
| Date: Mon Jul 7 12:07:51 2008 -0700
|
| paravirt: introduce a "lock-byte" spinlock implementation
The extra call/return in the spinlock path is somehow
causing an increase in the cycles/instruction of somewhere around 2-7%
(seems to vary quite a lot from test to test). The working theory is
that the CPU's pipeline is getting upset about the
call->call->locked-op->return->return, and seems to be failing to
speculate (though I haven't seen anything definitive about the precise
reasons). This doesn't entirely make sense, because the performance
hit is also visible on unlock and other operations which don't involve
locked instructions. But spinlock operations clearly swamp all the
other pvops operations, even though I can't imagine that they're
nearly as common (there's only a .05% increase in instructions
executed).
If I disable just the pv-spinlock calls, my tests show that pvops is
identical to non-pvops performance on native (my measurements show that
it is actually about .1% faster, but Xiaohui shows a .05% slowdown).
Summary of results, averaging 10 runs of the "mmperf" test, using a
no-pvops build as baseline:
nopv Pv-nospin Pv-spin
CPU cycles 100.00% 99.89% 102.18%
instructions 100.00% 100.10% 100.15%
CPI 100.00% 99.79% 102.03%
cache ref 100.00% 100.84% 100.28%
cache miss 100.00% 90.47% 88.56%
cache miss rate 100.00% 89.72% 88.31%
branches 100.00% 99.93% 100.04%
branch miss 100.00% 103.66% 107.72%
branch miss rt 100.00% 103.73% 107.67%
wallclock 100.00% 99.90% 102.20%
The clear effect here is that the 2% increase in CPI is
directly reflected in the final wallclock time.
(The other interesting effect is that the more ops are
out of line calls via pvops, the lower the cache access
and miss rates. Not too surprising, but it suggests that
the non-pvops kernel is over-inlined. On the flipside,
the branch misses go up correspondingly...)
So, what's the fix?
Paravirt patching turns all the pvops calls into direct calls, so
_spin_lock etc do end up having direct calls. For example, the compiler
generated code for paravirtualized _spin_lock is:
<_spin_lock+0>: mov %gs:0xb4c8,%rax
<_spin_lock+9>: incl 0xffffffffffffe044(%rax)
<_spin_lock+15>: callq *0xffffffff805a5b30
<_spin_lock+22>: retq
The indirect call will get patched to:
<_spin_lock+0>: mov %gs:0xb4c8,%rax
<_spin_lock+9>: incl 0xffffffffffffe044(%rax)
<_spin_lock+15>: callq <__ticket_spin_lock>
<_spin_lock+20>: nop; nop /* or whatever 2-byte nop */
<_spin_lock+22>: retq
One possibility is to inline _spin_lock, etc, when building an
optimised kernel (ie, when there's no spinlock/preempt
instrumentation/debugging enabled). That will remove the outer
call/return pair, returning the instruction stream to a single
call/return, which will presumably execute the same as the non-pvops
case. The downsides arel 1) it will replicate the
preempt_disable/enable code at eack lock/unlock callsite; this code is
fairly small, but not nothing; and 2) the spinlock definitions are
already a very heavily tangled mass of #ifdefs and other preprocessor
magic, and making any changes will be non-trivial.
The other obvious answer is to disable pv-spinlocks. Making them a
separate config option is fairly easy, and it would be trivial to
enable them only when Xen is enabled (as the only non-default user).
But it doesn't really address the common case of a distro build which
is going to have Xen support enabled, and leaves the open question of
whether the native performance cost of pv-spinlocks is worth the
performance improvement on a loaded Xen system (10% saving of overall
system CPU when guests block rather than spin). Still it is a
reasonable short-term workaround.
[ Impact: fix pvops performance regression when running native ]
Analysed-by: "Xin Xiaohui" <xiaohui.xin@intel.com>
Analysed-by: "Li Xin" <xin.li@intel.com>
Analysed-by: "Nakajima Jun" <jun.nakajima@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Xen-devel <xen-devel@lists.xensource.com>
LKML-Reference: <4A0B62F7.5030802@goop.org>
[ fixed the help text ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use standard msr-index.h's MSR declaration and no need to declare again.
[ Impact: cleanup, no object code change ]
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Use standard msr-index.h's MSR declaration and no need to declare again.
[ Impact: cleanup, no object code change ]
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Removed MTRR MSR from mtrr/mtrr.h as these are already declared in
msr-index.h and nobody is using them:
MTRRfix16K_A0000_MSR
MTRRfix4K_C8000_MSR
MTRRfix4K_D0000_MSR
MTRRfix4K_D8000_MSR
MTRRfix4K_E0000_MSR
MTRRfix4K_E8000_MSR
MTRRfix4K_F0000_MSR
MTRRfix4K_F8000_MSR
Use standard msr-index.h's MSR declaration and no need to declare again
[ Impact: cleanup, no object code change ]
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Use standard msr-index.h's MSR declaration and no need to declare again
[ Impact: cleanup, no object code change ]
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>