linux

mirror of https://github.com/FEX-Emu/linux.git synced 2024-12-26 11:28:28 +00:00

History

Mel Gorman f98b7a772a x86: mm: change tlb_flushall_shift for IvyBridge There was a large performance regression that was bisected to commit `611ae8e3` ("x86/tlb: enable tlb flush range support for x86"). This patch simply changes the default balance point between a local and global flush for IvyBridge. In the interest of allowing the tests to be reproduced, this patch was tested using mmtests 0.15 with the following configurations configs/config-global-dhp__tlbflush-performance configs/config-global-dhp__scheduler-performance configs/config-global-dhp__network-performance Results are from two machines Ivybridge 4 threads: Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz Ivybridge 8 threads: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz Page fault microbenchmark showed nothing interesting. Ebizzy was configured to run multiple iterations and threads. Thread counts ranged from 1 to NR_CPUS2. For each thread count, it ran 100 iterations and each iteration lasted 10 seconds. Ivybridge 4 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 6395.44 ( 0.00%) 6789.09 ( 6.16%) Mean 2 7012.85 ( 0.00%) 8052.16 ( 14.82%) Mean 3 6403.04 ( 0.00%) 6973.74 ( 8.91%) Mean 4 6135.32 ( 0.00%) 6582.33 ( 7.29%) Mean 5 6095.69 ( 0.00%) 6526.68 ( 7.07%) Mean 6 6114.33 ( 0.00%) 6416.64 ( 4.94%) Mean 7 6085.10 ( 0.00%) 6448.51 ( 5.97%) Mean 8 6120.62 ( 0.00%) 6462.97 ( 5.59%) Ivybridge 8 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 7336.65 ( 0.00%) 7787.02 ( 6.14%) Mean 2 8218.41 ( 0.00%) 9484.13 ( 15.40%) Mean 3 7973.62 ( 0.00%) 8922.01 ( 11.89%) Mean 4 7798.33 ( 0.00%) 8567.03 ( 9.86%) Mean 5 7158.72 ( 0.00%) 8214.23 ( 14.74%) Mean 6 6852.27 ( 0.00%) 7952.45 ( 16.06%) Mean 7 6774.65 ( 0.00%) 7536.35 ( 11.24%) Mean 8 6510.50 ( 0.00%) 6894.05 ( 5.89%) Mean 12 6182.90 ( 0.00%) 6661.29 ( 7.74%) Mean 16 6100.09 ( 0.00%) 6608.69 ( 8.34%) Ebizzy hits the worst case scenario for TLB range flushing every time and it shows for these Ivybridge CPUs at least that the default choice is a poor on. The patch addresses the problem. Next was a tlbflush microbenchmark written by Alex Shi at http://marc.info/?l=linux-kernel&m=133727348217113 . It measures access costs while the TLB is being flushed. The expectation is that if there are always full TLB flushes that the benchmark would suffer and it benefits from range flushing There are 320 iterations of the test per thread count. The number of entries is randomly selected with a min of 1 and max of 512. To ensure a reasonably even spread of entries, the full range is broken up into 8 sections and a random number selected within that section. iteration 1, random number between 0-64 iteration 2, random number between 64-128 etc This is still a very weak methodology. When you do not know what are typical ranges, random is a reasonable choice but it can be easily argued that the opimisation was for smaller ranges and an even spread is not representative of any workload that matters. To improve this, we'd need to know the probability distribution of TLB flush range sizes for a set of workloads that are considered "common", build a synthetic trace and feed that into this benchmark. Even that is not perfect because it would not account for the time between flushes but there are limits of what can be reasonably done and still be doing something useful. If a representative synthetic trace is provided then this benchmark could be revisited and the shift values retuned. Ivybridge 4 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 10.50 ( 0.00%) 10.50 ( 0.03%) Mean 2 17.59 ( 0.00%) 17.18 ( 2.34%) Mean 3 22.98 ( 0.00%) 21.74 ( 5.41%) Mean 5 47.13 ( 0.00%) 46.23 ( 1.92%) Mean 8 43.30 ( 0.00%) 42.56 ( 1.72%) Ivybridge 8 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 9.45 ( 0.00%) 9.36 ( 0.93%) Mean 2 9.37 ( 0.00%) 9.70 ( -3.54%) Mean 3 9.36 ( 0.00%) 9.29 ( 0.70%) Mean 5 14.49 ( 0.00%) 15.04 ( -3.75%) Mean 8 41.08 ( 0.00%) 38.73 ( 5.71%) Mean 13 32.04 ( 0.00%) 31.24 ( 2.49%) Mean 16 40.05 ( 0.00%) 39.04 ( 2.51%) For both CPUs, average access time is reduced which is good as this is the benchmark that was used to tune the shift values in the first place albeit it is now known how* the benchmark was used. The scheduler benchmarks were somewhat inconclusive. They showed gains and losses and makes me reconsider how stable those benchmarks really are or if something else might be interfering with the test results recently. Network benchmarks were inconclusive. Almost all results were flat except for netperf-udp tests on the 4 thread machine. These results were unstable and showed large variations between reboots. It is unknown if this is a recent problems but I've noticed before that netperf-udp results tend to vary. Based on these results, changing the default for Ivybridge seems like a logical choice. Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Alex Shi <alex.shi@linaro.org> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-cqnadffh1tiqrshthRj3Esge@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>		2014-01-25 09:10:43 +01:00
..
mcheck	ACPI, APEI, CPER: Add UEFI 2.4 support for memory error	2013-10-23 10:10:20 -07:00
mtrr	mm, x86: Account for TLB flushes only when debugging	2014-01-25 09:10:41 +01:00
.gitignore
amd.c	x86, cpu, amd: Add workaround for family 16h, erratum 793	2014-01-14 16:39:07 -08:00
bugs_64.c
bugs.c	x86: Get rid of ->hard_math and all the FPU asm fu	2013-06-06 14:32:04 -07:00
centaur.c	x86/cpu: Track legacy CPU model data only on 32-bit kernels	2013-10-26 13:34:39 +02:00
common.c	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2013-11-12 10:46:43 +09:00
cpu.h	x86/cpu: Track legacy CPU model data only on 32-bit kernels	2013-10-26 13:34:39 +02:00
cyrix.c	x86: delete __cpuinit usage from all x86 files	2013-07-14 19:36:56 -04:00
hypervisor.c	x86: Correctly detect hypervisor	2013-08-05 06:35:33 -07:00
intel_cacheinfo.c	treewide: Fix common typo in "identify"	2013-10-14 15:31:06 +02:00
intel.c	x86: mm: change tlb_flushall_shift for IvyBridge	2014-01-25 09:10:43 +01:00
Makefile	perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation	2013-06-19 13:04:53 +02:00
match.c
mkcapflags.sh	mkcapflags.pl: convert to mkcapflags.sh	2013-04-29 15:54:27 -07:00
mshyperv.c	x86, hyperv: Move a variable to avoid an unused variable warning	2013-11-06 10:02:05 -08:00
perf_event_amd_ibs.c	x86: delete __cpuinit usage from all x86 files	2013-07-14 19:36:56 -04:00
perf_event_amd_iommu.c	perf/x86/amd: Do not print an error when the device is not present	2013-07-05 08:27:15 +02:00
perf_event_amd_iommu.h	perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation	2013-06-19 13:04:53 +02:00
perf_event_amd_uncore.c	x86: delete __cpuinit usage from all x86 files	2013-07-14 19:36:56 -04:00
perf_event_amd.c	perf: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node()	2013-09-02 08:42:49 +02:00
perf_event_intel_ds.c	perf: Fix arch_perf_out_copy_user default	2013-11-06 12:34:25 +01:00
perf_event_intel_lbr.c	perf: Fix arch_perf_out_copy_user default	2013-11-06 12:34:25 +01:00
perf_event_intel_uncore.c	perf/x86/intel: Add Ivy Bridge-EP uncore IRP box support	2013-11-06 12:34:31 +01:00
perf_event_intel_uncore.h	perf/x86/intel/uncore: Enable EV_SEL_EXT bit for PCU	2013-08-16 17:55:50 +02:00
perf_event_intel.c	perf/x86: Suppress duplicated abort LBR records	2013-10-04 10:06:16 +02:00
perf_event_knc.c
perf_event_p4.c	perf/x86/intel/P4: Robistify P4 PMU types	2013-04-26 09:31:41 +02:00
perf_event_p6.c
perf_event.c	perf: Fix arch_perf_out_copy_user default	2013-11-06 12:34:25 +01:00
perf_event.h	perf/x86: Fix constraint table end marker bug	2013-12-05 10:02:30 +01:00
perfctr-watchdog.c
powerflags.c	update AMD powerflags comments	2013-05-28 12:02:10 +02:00
proc.c	x86/cpu: Always print SMP information in /proc/cpuinfo	2013-11-06 08:13:56 +01:00
rdrand.c	x86: delete __cpuinit usage from all x86 files	2013-07-14 19:36:56 -04:00
scattered.c	treewide: Fix common typo in "identify"	2013-10-14 15:31:06 +02:00
topology.c	x86: delete __cpuinit usage from all x86 files	2013-07-14 19:36:56 -04:00
transmeta.c	x86: delete __cpuinit usage from all x86 files	2013-07-14 19:36:56 -04:00
umc.c	x86/cpu: Track legacy CPU model data only on 32-bit kernels	2013-10-26 13:34:39 +02:00
vmware.c	x86: Correctly detect hypervisor	2013-08-05 06:35:33 -07:00