linux/kernel
Tim Chen 333c5ae994 idle governor: Avoid lock acquisition to read pm_qos before entering idle
Thanks to the reviews and comments by Rafael, James, Mark and Andi.
Here's version 2 of the patch incorporating your comments and also some
update to my previous patch comments.

I noticed that before entering idle state, the menu idle governor will
look up the current pm_qos target value according to the list of qos
requests received.  This look up currently needs the acquisition of a
lock to access the list of qos requests to find the qos target value,
slowing down the entrance into idle state due to contention by multiple
cpus to access this list.  The contention is severe when there are a lot
of cpus waking and going into idle.  For example, for a simple workload
that has 32 pair of processes ping ponging messages to each other, where
64 cpu cores are active in test system, I see the following profile with
37.82% of cpu cycles spent in contention of pm_qos_lock:

-     37.82%          swapper  [kernel.kallsyms]          [k]
_raw_spin_lock_irqsave
   - _raw_spin_lock_irqsave
      - 95.65% pm_qos_request
           menu_select
           cpuidle_idle_call
         - cpu_idle
              99.98% start_secondary

A better approach will be to cache the updated pm_qos target value so
reading it does not require lock acquisition as in the patch below.
With this patch the contention for pm_qos_lock is removed and I saw a
2.2X increase in throughput for my message passing workload.

cc: stable@kernel.org
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: James Bottomley <James.Bottomley@suse.de>
Acked-by: mark gross <markgross@thegnar.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-05-29 00:50:59 -04:00
..
debug Merge branch 'master' into for-next 2010-12-22 18:57:02 +01:00
gcov
irq genirq: Disable the SHIRQ_DEBUG call in request_threaded_irq for now 2011-02-19 12:11:13 +01:00
power Merge branch 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2011-02-18 12:36:06 -08:00
time clockevents: Prevent oneshot mode when broadcast device is periodic 2011-02-26 09:45:28 +01:00
trace blktrace: Remove blk_fill_rwbs_rq. 2011-03-03 10:53:20 -05:00
.gitignore
acct.c
async.c
audit_tree.c
audit_watch.c
audit.c
audit.h
auditfilter.c
auditsc.c
backtracetest.c
bounds.c
capability.c security: add cred argument to security_capable() 2011-02-11 17:41:58 +11:00
cgroup_freezer.c
cgroup.c Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin 2011-01-14 09:08:29 -08:00
compat.c
configs.c
cpu.c Merge branches 'x86-alternatives-for-linus', 'x86-fpu-for-linus', 'x86-hwmon-for-linus', 'x86-paravirt-for-linus', 'core-locking-for-linus' and 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-01-06 11:11:50 -08:00
cpuset.c cpuset: add a missing unlock in cpuset_write_resmask() 2011-03-04 17:53:38 -08:00
cred.c CRED: Fix memory and refcount leaks upon security_prepare_creds() failure 2011-02-07 14:04:00 -08:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-01-11 11:02:13 -08:00
extable.c
fork.c thp: khugepaged 2011-01-13 17:32:43 -08:00
freezer.c Freezer: Fix a race during freezing of TASK_STOPPED tasks 2010-12-24 15:02:40 +01:00
futex_compat.c
futex.c Merge branches 'core-fixes-for-linus', 'x86-fixes-for-linus', 'timers-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-01-15 12:45:00 -08:00
groups.c
hrtimer.c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-01-13 10:05:56 -08:00
hung_task.c
hw_breakpoint.c perf: Dynamic pmu types 2010-12-16 11:36:43 +01:00
irq_work.c irq_work: Use per cpu atomics instead of regular atomics 2010-12-18 15:54:48 +01:00
itimer.c
jump_label.c
kallsyms.c Revert "kernel: make /proc/kallsyms mode 400 to reduce ease of attacking" 2010-11-19 11:54:40 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kexec.c
kfifo.c
kmod.c
kprobes.c Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2011-01-07 17:02:58 -08:00
ksysfs.c
kthread.c sched: Constify function scope static struct sched_param usage 2011-01-07 15:55:45 +01:00
latencytop.c fs/proc/base.c, kernel/latencytop.c: convert sprintf_symbol() to %ps 2011-01-13 08:03:16 -08:00
lockdep_internals.h
lockdep_proc.c
lockdep_states.h
lockdep.c lockdep: Move early boot local IRQ enable/disable status to init/main.c 2011-01-20 13:32:33 +01:00
Makefile kernel: clean up USE_GENERIC_SMP_HELPERS 2011-01-13 08:03:08 -08:00
module.c tracepoints: Fix section alignment using pointer array 2011-02-03 09:28:46 -05:00
mutex-debug.c
mutex-debug.h
mutex.c mutexes, sched: Introduce arch_mutex_cpu_relax() 2010-11-26 15:05:34 +01:00
mutex.h
notifier.c
ns_cgroup.c
nsproxy.c
padata.c
panic.c ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support 2011-01-12 03:06:19 -05:00
params.c module: show version information for built-in modules in sysfs 2011-01-24 14:32:51 +10:30
perf_event.c perf: Fix throttle logic 2011-02-16 13:25:29 +01:00
pid_namespace.c
pid.c
pm_qos_params.c idle governor: Avoid lock acquisition to read pm_qos before entering idle 2011-05-29 00:50:59 -04:00
posix-cpu-timers.c
posix-timers.c
printk.c cap_syslog: accept CAP_SYS_ADMIN for now 2011-02-10 17:53:55 -08:00
profile.c
ptrace.c Mark ptrace_{traceme,attach,detach} static 2011-03-04 09:23:30 -08:00
range.c kernel/range.c: fix clean_sort_range() for the case of full array 2010-11-12 07:55:31 -08:00
rcupdate.c
rcutiny_plugin.h rcu: Distinguish between boosting and boosted 2010-11-29 22:01:56 -08:00
rcutiny.c rcu: avoid pointless blocked-task warnings 2011-01-14 04:58:08 -08:00
rcutorture.c
rcutree_plugin.h rcu: increase synchronize_sched_expedited() batching 2010-12-17 12:34:08 -08:00
rcutree_trace.c rcu,cleanup: simplify the code when cpu is dying 2010-11-29 22:01:58 -08:00
rcutree.c Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2011-01-07 17:02:58 -08:00
rcutree.h rcu: limit rcu_node leaf-level fanout 2010-12-17 12:34:20 -08:00
relay.c
res_counter.c
resource.c resources: add arch hook for preventing allocation in reserved areas 2010-12-17 10:01:09 -08:00
rtmutex_common.h
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rwsem.c
sched_autogroup.c sched, autogroup: Fix CONFIG_RT_GROUP_SCHED sched_setscheduler() failure 2011-01-18 15:09:42 +01:00
sched_autogroup.h sched, autogroup: Fix CONFIG_RT_GROUP_SCHED sched_setscheduler() failure 2011-01-18 15:09:42 +01:00
sched_clock.c sched: Add some clock info to sched_debug 2010-11-23 10:29:08 +01:00
sched_cpupri.c
sched_cpupri.h
sched_debug.c sched: Replace rq->bkl_count with rq->rq_sched_info.bkl_count 2011-01-18 15:09:43 +01:00
sched_fair.c sched: Use rq->clock_task instead of rq->clock for correctly maintaining load averages 2011-01-26 12:31:03 +01:00
sched_features.h sched: Rewrite tg_shares_up) 2010-11-18 13:27:46 +01:00
sched_idletask.c
sched_rt.c sched: Fix sched rt group scheduling when hierachy is enabled 2011-03-04 11:03:18 +01:00
sched_stats.h
sched_stoptask.c sched: Fix cross-sched-class wakeup preemption 2010-11-11 14:37:23 +01:00
sched.c SUNRPC: Close a race in __rpc_wait_for_completion_task() 2011-03-10 15:04:52 -05:00
seccomp.c
semaphore.c
signal.c
smp.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-01-20 18:30:37 -08:00
softirq.c kernel: clean up USE_GENERIC_SMP_HELPERS 2011-01-13 08:03:08 -08:00
spinlock.c
srcu.c rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status 2011-01-14 04:56:49 -08:00
stacktrace.c
stop_machine.c
sys_ni.c
sys.c Fix prlimit64 for suid/sgid processes 2011-01-31 13:01:27 +10:00
sysctl_binary.c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-01-13 10:05:56 -08:00
sysctl_check.c
sysctl.c unfuck proc_sysctl ->d_compare() 2011-03-08 02:22:27 -05:00
taskstats.c taskstats: use better ifdef for alignment 2011-01-13 08:03:19 -08:00
test_kprobes.c
time.c Kill off a bunch of warning: ‘inline’ is not at beginning of declaration 2010-11-28 23:08:04 +01:00
timeconst.pl
timer.c Revert "lockdep, timer: Fix del_timer_sync() annotation" 2011-02-08 16:18:39 +01:00
tracepoint.c tracepoints: Fix section alignment using pointer array 2011-02-03 09:28:46 -05:00
tsacct.c
uid16.c
up.c
user_namespace.c user_ns: improve the user_ns on-the-slab packaging 2011-01-13 08:03:18 -08:00
user-return-notifier.c
user.c fix freeing user_struct in user cache 2010-12-29 11:31:38 -08:00
utsname_sysctl.c
utsname.c
wait.c
watchdog.c watchdog, nmi: Lower the severity of error messages 2011-02-10 13:21:59 +01:00
workqueue_sched.h
workqueue.c workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long 2011-02-16 18:10:19 +01:00