linux/kernel/time
Thomas Gleixner 332962f2c8 clocksource: Reselect clocksource when watchdog validated high-res capability
Up to commit 5d33b883a (clocksource: Always verify highres capability)
we had no sanity check when selecting a clocksource, which prevented
that a non highres capable clocksource is used when the system already
switched to highres/nohz mode.

The new sanity check works as Alex and Tim found out. It prevents the
TSC from being used. This happens because on x86 the boot process
looks like this:

 tsc_start_freqency_validation(TSC);
 clocksource_register(HPET);
 clocksource_done_booting();
	clocksource_select()
		Selects HPET which is valid for high-res

 switch_to_highres();

 clocksource_register(TSC);
 	TSC is not selected, because it is not yet
	flagged as VALID_HIGH_RES

 clocksource_watchdog()
	Validates TSC for highres, but that does not make TSC
	the current clocksource.

Before the sanity check was added, we installed TSC unvalidated which
worked most of the time. If the TSC was really detected as unstable,
then the unstable logic removed it and installed HPET again.

The sanity check is correct and needed. So the watchdog needs to kick
a reselection of the clocksource, when it qualifies TSC as a valid
high res clocksource.

To solve this, we mark the clocksource which got the flag
CLOCK_SOURCE_VALID_FOR_HRES set by the watchdog with an new flag
CLOCK_SOURCE_RESELECT and trigger the watchdog thread. The watchdog
thread evaluates the flag and invokes clocksource_select() when set.

To avoid that the clocksource_done_booting() code, which is about to
install the first real clocksource anyway, needs to go through
clocksource_select and tick_oneshot_notify() pointlessly, split out
the clocksource_watchdog_kthread() list walk code and invoke the
select/notify only when called from clocksource_watchdog_kthread().

So clocksource_done_booting() can utilize the same splitout code
without the select/notify invocation and the clocksource_mutex
unlock/relock dance.

Reported-and-tested-by: Alex Shi <alex.shi@intel.com>
Cc: Hans Peter Anvin <hpa@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Tested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307042239150.11637@ionos.tec.linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-05 11:09:28 +02:00
..
alarmtimer.c alarmtimer: Export symbols of functions declared in linux/alarmtimer.h 2013-06-12 14:02:12 -07:00
clockevents.c clockevents: Implement unbind functionality 2013-05-16 11:09:18 +02:00
clocksource.c clocksource: Reselect clocksource when watchdog validated high-res capability 2013-07-05 11:09:28 +02:00
jiffies.c
Kconfig Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-05-15 14:05:17 -07:00
Makefile sched_clock: Make ARM's sched_clock generic for all architectures 2013-06-12 14:02:13 -07:00
ntp_internal.h
ntp.c ntp: Remove unused variable flags in __hardpps 2013-05-28 13:45:19 -07:00
posix-clock.c
sched_clock.c ARM: sched_clock: Load cycle count after epoch stabilizes 2013-06-17 15:56:11 -07:00
tick-broadcast.c Merge branch 'timers/posix-cpu-timers-for-tglx' of 2013-07-04 23:11:22 +02:00
tick-common.c tick: Sanitize broadcast control logic 2013-07-02 14:26:45 +02:00
tick-internal.h clockevents: Define CS_NAME_LEN unconditionally 2013-05-28 09:28:02 +02:00
tick-oneshot.c
tick-sched.c nohz: Fix notifier return val that enforce timekeeping 2013-05-31 11:33:10 +02:00
timeconv.c
timekeeping_debug.c power: Add option to log time spent in suspend 2013-05-29 12:57:34 -07:00
timekeeping_internal.h power: Add option to log time spent in suspend 2013-05-29 12:57:34 -07:00
timekeeping.c Merge branch 'timers/posix-cpu-timers-for-tglx' of 2013-07-04 23:11:22 +02:00
timer_list.c timer_list: Convert timer list to be a proper seq_file 2013-04-17 20:51:02 +02:00
timer_stats.c