linux/arch/x86/kernel
Mel Gorman f98b7a772a x86: mm: change tlb_flushall_shift for IvyBridge
There was a large performance regression that was bisected to
commit 611ae8e3 ("x86/tlb: enable tlb flush range support for
x86").  This patch simply changes the default balance point
between a local and global flush for IvyBridge.

In the interest of allowing the tests to be reproduced, this
patch was tested using mmtests 0.15 with the following
configurations

	configs/config-global-dhp__tlbflush-performance
	configs/config-global-dhp__scheduler-performance
	configs/config-global-dhp__network-performance

Results are from two machines

Ivybridge   4 threads:  Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz
Ivybridge   8 threads:  Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

Page fault microbenchmark showed nothing interesting.

Ebizzy was configured to run multiple iterations and threads.
Thread counts ranged from 1 to NR_CPUS*2. For each thread count,
it ran 100 iterations and each iteration lasted 10 seconds.

Ivybridge 4 threads
                    3.13.0-rc7            3.13.0-rc7
                       vanilla           altshift-v3
Mean   1     6395.44 (  0.00%)     6789.09 (  6.16%)
Mean   2     7012.85 (  0.00%)     8052.16 ( 14.82%)
Mean   3     6403.04 (  0.00%)     6973.74 (  8.91%)
Mean   4     6135.32 (  0.00%)     6582.33 (  7.29%)
Mean   5     6095.69 (  0.00%)     6526.68 (  7.07%)
Mean   6     6114.33 (  0.00%)     6416.64 (  4.94%)
Mean   7     6085.10 (  0.00%)     6448.51 (  5.97%)
Mean   8     6120.62 (  0.00%)     6462.97 (  5.59%)

Ivybridge 8 threads
                     3.13.0-rc7            3.13.0-rc7
                        vanilla           altshift-v3
Mean   1      7336.65 (  0.00%)     7787.02 (  6.14%)
Mean   2      8218.41 (  0.00%)     9484.13 ( 15.40%)
Mean   3      7973.62 (  0.00%)     8922.01 ( 11.89%)
Mean   4      7798.33 (  0.00%)     8567.03 (  9.86%)
Mean   5      7158.72 (  0.00%)     8214.23 ( 14.74%)
Mean   6      6852.27 (  0.00%)     7952.45 ( 16.06%)
Mean   7      6774.65 (  0.00%)     7536.35 ( 11.24%)
Mean   8      6510.50 (  0.00%)     6894.05 (  5.89%)
Mean   12     6182.90 (  0.00%)     6661.29 (  7.74%)
Mean   16     6100.09 (  0.00%)     6608.69 (  8.34%)

Ebizzy hits the worst case scenario for TLB range flushing every
time and it shows for these Ivybridge CPUs at least that the
default choice is a poor on.  The patch addresses the problem.

Next was a tlbflush microbenchmark written by Alex Shi at
http://marc.info/?l=linux-kernel&m=133727348217113 .  It
measures access costs while the TLB is being flushed.  The
expectation is that if there are always full TLB flushes that
the benchmark would suffer and it benefits from range flushing

There are 320 iterations of the test per thread count.  The
number of entries is randomly selected with a min of 1 and max
of 512.  To ensure a reasonably even spread of entries, the full
range is broken up into 8 sections and a random number selected
within that section.

iteration 1, random number between 0-64
iteration 2, random number between 64-128 etc

This is still a very weak methodology.  When you do not know
what are typical ranges, random is a reasonable choice but it
can be easily argued that the opimisation was for smaller ranges
and an even spread is not representative of any workload that
matters.  To improve this, we'd need to know the probability
distribution of TLB flush range sizes for a set of workloads
that are considered "common", build a synthetic trace and feed
that into this benchmark.  Even that is not perfect because it
would not account for the time between flushes but there are
limits of what can be reasonably done and still be doing
something useful.  If a representative synthetic trace is
provided then this benchmark could be revisited and the shift values retuned.

Ivybridge 4 threads
                        3.13.0-rc7            3.13.0-rc7
                           vanilla           altshift-v3
Mean       1       10.50 (  0.00%)       10.50 (  0.03%)
Mean       2       17.59 (  0.00%)       17.18 (  2.34%)
Mean       3       22.98 (  0.00%)       21.74 (  5.41%)
Mean       5       47.13 (  0.00%)       46.23 (  1.92%)
Mean       8       43.30 (  0.00%)       42.56 (  1.72%)

Ivybridge 8 threads
                         3.13.0-rc7            3.13.0-rc7
                            vanilla           altshift-v3
Mean       1         9.45 (  0.00%)        9.36 (  0.93%)
Mean       2         9.37 (  0.00%)        9.70 ( -3.54%)
Mean       3         9.36 (  0.00%)        9.29 (  0.70%)
Mean       5        14.49 (  0.00%)       15.04 ( -3.75%)
Mean       8        41.08 (  0.00%)       38.73 (  5.71%)
Mean       13       32.04 (  0.00%)       31.24 (  2.49%)
Mean       16       40.05 (  0.00%)       39.04 (  2.51%)

For both CPUs, average access time is reduced which is good as
this is the benchmark that was used to tune the shift values in
the first place albeit it is now known *how* the benchmark was
used.

The scheduler benchmarks were somewhat inconclusive.  They
showed gains and losses and makes me reconsider how stable those
benchmarks really are or if something else might be interfering
with the test results recently.

Network benchmarks were inconclusive.  Almost all results were
flat except for netperf-udp tests on the 4 thread machine.
These results were unstable and showed large variations between
reboots.  It is unknown if this is a recent problems but I've
noticed before that netperf-udp results tend to vary.

Based on these results, changing the default for Ivybridge seems
like a logical choice.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Alex Shi <alex.shi@linaro.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-cqnadffh1tiqrshthRj3Esge@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25 09:10:43 +01:00
..
acpi Merge branch 'acpica' 2013-11-07 19:21:11 +01:00
apic Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-19 10:48:19 -08:00
cpu x86: mm: change tlb_flushall_shift for IvyBridge 2014-01-25 09:10:43 +01:00
kprobes Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-09-04 08:42:44 -07:00
.gitignore
alternative.c lockdep, x86/alternatives: Drop ancient lockdep fixup message 2013-09-28 10:09:41 +02:00
amd_gart_64.c x86, mm: use pfn_range_is_mapped() with gart 2012-11-17 11:59:10 -08:00
amd_nb.c x86/AMD/NB: Fix amd_set_subcaches() parameter type 2014-01-25 08:50:09 +01:00
apb_timer.c intel_mid: Renamed *mrst* to *intel_mid* 2013-10-17 16:40:47 -07:00
aperture_64.c x86/mm/gart: Drop unnecessary check 2013-04-16 10:54:40 +02:00
apm_32.c x86, asmlinkage, apm: Make APM data structure used from assembler visible 2013-08-06 14:20:20 -07:00
asm-offsets_32.c x86: Get rid of ->hard_math and all the FPU asm fu 2013-06-06 14:32:04 -07:00
asm-offsets_64.c x86, gdt, hibernate: Store/load GDT for hibernate path. 2013-05-02 11:27:35 -07:00
asm-offsets.c sched, x86: Provide a per-cpu preempt_count implementation 2013-09-25 14:07:57 +02:00
audit_64.c
bootflag.c
check.c
cpuid.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
crash_dump_32.c
crash_dump_64.c
crash.c x86/apic: Disable I/O APIC before shutdown of the local APIC 2013-11-07 10:12:37 +01:00
devicetree.c Merge remote-tracking branch 'grant/devicetree/next' into for-next 2013-11-07 10:34:46 -06:00
doublefault.c x86: Extend #DF debugging aid to 64-bit 2013-05-13 13:42:44 -07:00
dumpstack_32.c dump_stack: unify debug information printed by show_regs() 2013-04-30 17:04:02 -07:00
dumpstack_64.c dump_stack: unify debug information printed by show_regs() 2013-04-30 17:04:02 -07:00
dumpstack.c x86/dumpstack: Fix printk_address for direct addresses 2013-11-12 21:06:06 +01:00
e820.c x86: avoid remapping data in parse_setup_data() 2013-08-13 23:29:19 -07:00
early_printk.c Merge branch 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 11:12:22 +09:00
early-quirks.c x86/early quirk: use gen6 stolen detection for VLV 2013-11-14 09:32:11 +01:00
entry_32.S ftrace/x86: Load ftrace_ops in parameter not the variable holding it 2014-01-09 13:24:29 -08:00
entry_64.S ftrace/x86: Load ftrace_ops in parameter not the variable holding it 2014-01-09 13:24:29 -08:00
ftrace.c ftrace/x86: skip over the breakpoint for ftrace caller 2013-11-05 16:01:47 -05:00
head32.c intel_mid: Renamed *mrst* to *intel_mid* 2013-10-17 16:40:47 -07:00
head64.c x86, trace: Register exception handler to trace IDT 2013-11-08 14:15:45 -08:00
head_32.S Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-09-04 09:11:16 -07:00
head_64.S x86: Make sure IDT is page aligned 2013-07-16 15:14:48 -07:00
head.c x86: Make sure we can boot in the case the BDA contains pure garbage 2013-02-27 13:38:57 -08:00
hpet.c x86, hpet: Introduce x86_msi_ops.setup_hpet_msi 2013-01-28 10:48:30 +01:00
hw_breakpoint.c ptrace/x86: flush_ptrace_hw_breakpoint() shoule clear the virtual debug registers 2013-07-09 10:33:26 -07:00
i386_ksyms_32.c sched, x86: Optimize the preempt_schedule() call 2013-09-25 14:23:07 +02:00
i387.c x86: move fpu_counter into ARCH specific thread_struct 2013-11-13 12:09:13 +09:00
i8237.c
i8253.c
i8259.c x86/irq: Correct comment about i8259 initialization 2013-09-04 07:46:04 +02:00
io_delay.c
ioport.c x86: get rid of pt_regs argument of iopl(2) 2013-02-03 18:16:24 -05:00
irq_32.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 10:20:12 +09:00
irq_64.c irq: Consolidate do_softirq() arch overriden implementations 2013-10-01 12:53:25 +02:00
irq_work.c x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible 2013-08-06 14:18:23 -07:00
irq.c x86: Add check for number of available vectors before CPU down 2014-01-15 22:24:02 -08:00
irqinit.c KVM: VMX: Register a new IPI for posted interrupt 2013-04-16 16:32:39 -03:00
jump_label.c x86/jump_label: expect default_nop if static_key gets enabled on boot-up 2013-10-19 19:45:35 -04:00
kdebugfs.c
kgdb.c kgdb,x86: fix warning about unused variable 2012-10-12 06:37:34 -05:00
kvm.c x86, trace: Register exception handler to trace IDT 2013-11-08 14:15:45 -08:00
kvmclock.c pvclock: detect watchdog reset at pvclock read 2013-11-06 09:48:43 +02:00
ldt.c
machine_kexec_32.c
machine_kexec_64.c x86, kexec, 64bit: Only set ident mapping for ram. 2013-01-29 15:26:35 -08:00
Makefile sched, x86: Optimize the preempt_schedule() call 2013-09-25 14:23:07 +02:00
microcode_amd_early.c x86, microcode, AMD: Fix early microcode loading 2013-08-12 18:32:45 +02:00
microcode_amd.c x86/microcode/amd: Tone down printk(), don't treat a missing firmware file as an error 2013-11-12 22:03:49 +01:00
microcode_core_early.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
microcode_core.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
microcode_intel_early.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
microcode_intel_lib.c x86/microcode_intel_lib.c: Early update ucode on Intel's CPU 2013-01-31 13:19:14 -08:00
microcode_intel.c x86/microcode_intel.h: Define functions and macros for early loading ucode 2013-01-31 13:18:50 -08:00
mmconf-fam10h_64.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
module.c mm/arch: use NUMA_NO_NODE 2013-11-13 12:09:05 +09:00
mpparse.c
msr.c x86, msr: Use file_inode(), not f_mapping->host 2013-10-08 12:01:29 -07:00
nmi_selftest.c
nmi.c perf/x86: Fix NMI measurements 2013-10-29 12:01:20 +01:00
paravirt_patch_32.c
paravirt_patch_64.c
paravirt-spinlocks.c x86, ticketlock: Add slowpath logic 2013-08-09 07:54:00 -07:00
paravirt.c x86, paravirt: Remove duplicate definition for DEF_NATIVE 2013-09-04 09:46:43 -07:00
pci-calgary_64.c
pci-dma.c x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIES 2013-01-24 17:34:18 +01:00
pci-iommu_table.c
pci-nommu.c
pci-swiotlb.c
pcspeaker.c
perf_regs.c perf: Fix off by one test in perf_reg_value() 2012-09-19 17:08:40 +02:00
preempt.S sched, x86: Optimize the preempt_schedule() call 2013-09-25 14:23:07 +02:00
probe_roms.c x86/pci/probe_roms: Add missing __iomem annotation to pci_map_biosrom() 2012-09-05 10:52:25 +02:00
process_32.c x86: move fpu_counter into ARCH specific thread_struct 2013-11-13 12:09:13 +09:00
process_64.c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-14 16:55:56 +09:00
process.c sched, idle: Fix the idle polling state logic 2013-09-25 13:53:10 +02:00
ptrace.c ptrace/x86: cleanup ptrace_set_debugreg() 2013-07-09 10:33:26 -07:00
pvclock.c hung_task: add method to reset detector 2013-11-06 09:49:02 +02:00
quirks.c x86/quirks: Add workaround for AMD F16h Erratum792 2014-01-25 08:44:19 +01:00
reboot_fixups_32.c
reboot.c x86/apic, doc: Justification for disabling IO APIC before Local APIC 2013-12-04 19:33:21 -08:00
relocate_kernel_32.S x86, asm, cleanup: Replace open-coded control register values with symbolic 2013-06-25 16:26:06 -07:00
relocate_kernel_64.S x86, reloc: Use xorl instead of xorq in relocate_kernel_64.S 2013-06-20 21:30:04 -07:00
resource.c
rtc.c Merge branch 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 11:12:22 +09:00
setup_percpu.c
setup.c x86, acpi, crash, kdump: do reserve_crashkernel() after SRAT is parsed. 2013-11-13 12:09:08 +09:00
signal.c Merge branch 'x86-smap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-09-04 11:08:32 -07:00
smp.c x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible 2013-08-06 14:18:23 -07:00
smpboot.c x86: Add check for number of available vectors before CPU down 2014-01-15 22:24:02 -08:00
stacktrace.c
step.c ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL 2013-01-22 10:08:00 -08:00
sys_x86_64.c x86 get_unmapped_area: Access mmap_legacy_base through mm_struct member 2013-08-22 10:19:35 -07:00
syscall_32.c x86, asmlinkage: Make syscall tables visible 2013-08-06 14:20:18 -07:00
syscall_64.c x86, asmlinkage: Make syscall tables visible 2013-08-06 14:20:18 -07:00
sysfb_efi.c x86: sysfb: move EFI quirks from efifb to sysfb 2013-08-02 16:17:47 -07:00
sysfb_simplefb.c x86/simplefb: Mark framebuffer mem-resources as IORESOURCE_BUSY to avoid bootup warning 2013-10-03 07:51:11 +02:00
sysfb.c x86: sysfb: move EFI quirks from efifb to sysfb 2013-08-02 16:17:47 -07:00
tboot.c x86 / tboot / ACPI: Fail extended mode reduced hardware sleep 2013-07-31 14:25:51 +02:00
tce_64.c
test_nx.c
test_rodata.c
time.c
tls.c make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect 2013-03-03 22:58:33 -05:00
tls.h
topology.c hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock() 2013-09-30 19:55:51 +02:00
trace_clock.c tracing,x86: Add a TSC trace_clock 2012-11-13 15:48:27 -05:00
tracepoint.c x86: Make sure IDT is page aligned 2013-07-16 15:14:48 -07:00
traps.c Merge branch 'x86-trace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-14 16:25:10 +09:00
tsc_sync.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
tsc.c perf/x86: Add ability to calculate TSC from perf sample timestamps 2013-07-23 12:17:45 +02:00
uprobes.c uretprobes/x86: Hijack return address 2013-04-13 15:31:55 +02:00
verify_cpu.S
vm86_32.c x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...) 2013-05-02 20:36:32 -04:00
vmlinux.lds.S x86: intel-mid: Add section for sfi device table 2013-10-17 16:41:04 -07:00
vsmp_64.c
vsyscall_64.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
vsyscall_emu_64.S
vsyscall_trace.h
x86_init.c PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq() 2013-11-06 16:32:19 -07:00
x8664_ksyms_64.c sched, x86: Optimize the preempt_schedule() call 2013-09-25 14:23:07 +02:00
xsave.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00