linux/kernel/sched
Mel Gorman 32e839dda3 sched/fair: Use a recently used CPU as an idle candidate and the basis for SIS
The select_idle_sibling() (SIS) rewrite in commit:

  10e2f1acd0 ("sched/core: Rewrite and improve select_idle_siblings()")

... replaced a domain iteration with a search that broadly speaking
does a wrapped walk of the scheduler domain sharing a last-level-cache.

While this had a number of improvements, one consequence is that two tasks
that share a waker/wakee relationship push each other around a socket. Even
though two tasks may be active, all cores are evenly used. This is great from
a search perspective and spreads a load across individual cores, but it has
adverse consequences for cpufreq. As each CPU has relatively low utilisation,
cpufreq may decide the utilisation is too low to used a higher P-state and
overall computation throughput suffers.

While individual cpufreq and cpuidle drivers may compensate by artifically
boosting P-state (at c0) or avoiding lower C-states (during idle), it does
not help if hardware-based cpufreq (e.g. HWP) is used.

This patch tracks a recently used CPU based on what CPU a task was running
on when it last was a waker a CPU it was recently using when a task is a
wakee. During SIS, the recently used CPU is used as a target if it's still
allowed by the task and is idle.

The benefit may be non-obvious so consider an example of two tasks
communicating back and forth. Task A may be an application doing IO where
task B is a kworker or kthread like journald. Task A may issue IO, wake
B and B wakes up A on completion.  With the existing scheme this may look
like the following (potentially different IDs if SMT is in use but similar
principal applies).

 A (cpu 0)	wake	B (wakes on cpu 1)
 B (cpu 1)	wake	A (wakes on cpu 2)
 A (cpu 2)	wake	B (wakes on cpu 3)
 etc.

A careful reader may wonder why CPU 0 was not idle when B wakes A the
first time and it's simply due to the fact that A can be rescheduled to
another CPU and the pattern is that prev == target when B tries to wakeup A
and the information about CPU 0 has been lost.

With this patch, the pattern is more likely to be:

 A (cpu 0)	wake	B (wakes on cpu 1)
 B (cpu 1)	wake	A (wakes on cpu 0)
 A (cpu 0)	wake	B (wakes on cpu 1)
 etc

i.e. two communicating casts are more likely to use just two cores instead
of all available cores sharing a LLC.

The most dramatic speedup was noticed on dbench using the XFS filesystem on
UMA as clients interact heavily with workqueues in that configuration. Note
that a similar speedup is not observed on ext4 as the wakeup pattern
is different:

                          4.15.0-rc9             4.15.0-rc9
                           waprev-v1        biasancestor-v1
 Hmean      1      287.54 (   0.00%)      817.01 ( 184.14%)
 Hmean      2     1268.12 (   0.00%)     1781.24 (  40.46%)
 Hmean      4     1739.68 (   0.00%)     1594.47 (  -8.35%)
 Hmean      8     2464.12 (   0.00%)     2479.56 (   0.63%)
 Hmean     64     1455.57 (   0.00%)     1434.68 (  -1.44%)

The results can be less dramatic on NUMA where automatic balancing interferes
with the test. It's also known that network benchmarks running on localhost
also benefit quite a bit from this patch (roughly 10% on netperf RR for UDP
and TCP depending on the machine). Hackbench also seens small improvements
(6-11% depending on machine and thread count). The facebook schbench was also
tested but in most cases showed little or no different to wakeup latencies.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180130104555.4125-5-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06 10:20:37 +01:00
..
autogroup.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
autogroup.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
clock.c sched/clock, sched/cputime: Use lockdep to assert IRQs are disabled/enabled 2017-11-08 11:13:53 +01:00
completion.c locking/lockdep: Remove cross-release leftovers 2018-01-08 17:30:45 +01:00
core.c sched/fair: Use a recently used CPU as an idle candidate and the basis for SIS 2018-02-06 10:20:37 +01:00
cpuacct.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cpudeadline.c sched/deadline: Change return value of cpudl_find() 2017-08-10 12:18:17 +02:00
cpudeadline.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cpufreq_schedutil.c sched/cpufreq: Always consider all CPUs when deciding next freq 2018-01-10 12:53:34 +01:00
cpufreq.c cpufreq / sched: Pass flags to cpufreq_update_util() 2016-08-16 22:14:55 +02:00
cpupri.c sched/cpupri: Don't re-initialize 'struct cpupri' 2017-08-10 12:18:14 +02:00
cpupri.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cputime.c Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2017-11-15 14:29:44 -08:00
deadline.c sched/deadline: Make bandwidth enforcement scale-invariant 2018-01-10 12:53:35 +01:00
debug.c sched/fair: Propagate an effective runnable_load_avg 2017-09-29 19:35:15 +02:00
fair.c sched/fair: Use a recently used CPU as an idle candidate and the basis for SIS 2018-02-06 10:20:37 +01:00
features.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
idle_task.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
idle.c sched/idle: Micro-optimize the idle loop 2017-10-26 08:31:29 +02:00
isolation.c sched/isolation: Add basic isolcpus flags 2017-10-27 09:55:31 +02:00
loadavg.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Makefile Merge branch 'linus' into sched/core, to pick up fixes 2017-11-08 10:17:15 +01:00
membarrier.c membarrier: Provide core serializing command, *_SYNC_CORE 2018-02-05 21:35:03 +01:00
rt.c sched/rt: Make update_curr_rt() more accurate 2018-02-06 10:20:34 +01:00
sched-pelt.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sched.h sched/rt: Up the root domain ref count when passing it around via IPIs 2018-02-06 10:20:33 +01:00
stats.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
stats.h sched/core: Optimize update_stats_*() 2018-02-06 10:20:32 +01:00
stop_task.c Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2017-11-15 14:29:44 -08:00
swait.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
topology.c sched/rt: Up the root domain ref count when passing it around via IPIs 2018-02-06 10:20:33 +01:00
wait_bit.c Pass mode to wait_on_atomic_t() action funcs and provide default actions 2017-11-13 15:38:16 +00:00
wait.c sched/wait: Fix add_wait_queue() behavioral change 2017-12-06 19:30:34 +01:00