linux/kernel
Huang Ying 89081d17f7 kexec jump: save/restore device state
This patch implements devices state save/restore before after kexec.

This patch together with features in kexec_jump patch can be used for
following:

- A simple hibernation implementation without ACPI support.  You can kexec a
  hibernating kernel, save the memory image of original system and shutdown
  the system.  When resuming, you restore the memory image of original system
  via ordinary kexec load then jump back.

- Kernel/system debug through making system snapshot.  You can make system
  snapshot, jump back, do some thing and make another system snapshot.

- Cooperative multi-kernel/system.  With kexec jump, you can switch between
  several kernels/systems quickly without boot process except the first time.
  This appears like swap a whole kernel/system out/in.

- A general method to call program in physical mode (paging turning
  off). This can be used to invoke BIOS code under Linux.

The following user-space tools can be used with kexec jump:

- kexec-tools needs to be patched to support kexec jump. The patches
  and the precompiled kexec can be download from the following URL:
       source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
       patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
       binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10

- makedumpfile with patches are used as memory image saving tool, it
  can exclude free pages from original kernel memory image file. The
  patches and the precompiled makedumpfile can be download from the
  following URL:
       source: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-src_cvs_kh10.tar.bz2
       patches: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-patches_cvs_kh10.tar.bz2
       binary: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile_cvs_kh10

- An initramfs image can be used as the root file system of kexeced
  kernel. An initramfs image built with "BuildRoot" can be downloaded
  from the following URL:
       initramfs image: http://khibernation.sourceforge.net/download/release_v10/initramfs/rootfs_cvs_kh10.gz
  All user space tools above are included in the initramfs image.

Usage example of simple hibernation:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_RELOCATABLE=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PM=y
CONFIG_HIBERNATION=y
CONFIG_KEXEC_JUMP=y

2. Build an initramfs image contains kexec-tool and makedumpfile, or
   download the pre-built initramfs image, called rootfs.gz in
   following text.

3. Prepare a partition to save memory image of original kernel, called
   hibernating partition in following text.

4. Boot kernel compiled in step 1 (kernel A).

5. In the kernel A, load kernel compiled in step 1 (kernel B) with
   /sbin/kexec. The shell command line can be as follow:

   /sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000
     --mem-max=0xffffff --initrd=rootfs.gz

6. Boot the kernel B with following shell command line:

   /sbin/kexec -e

7. The kernel B will boot as normal kexec. In kernel B the memory
   image of kernel A can be saved into hibernating partition as
   follow:

   jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='`
   echo $jump_back_entry > kexec_jump_back_entry
   cp /proc/vmcore dump.elf

   Then you can shutdown the machine as normal.

8. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
   root file system.

9. In kernel C, load the memory image of kernel A as follow:

   /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf

10. Jump back to the kernel A as follow:

   /sbin/kexec -e

   Then, kernel A is resumed.

Implementation point:

To support jumping between two kernels, before jumping to (executing)
the new kernel and jumping back to the original kernel, the devices
are put into quiescent state, and the state of devices and CPU is
saved. After jumping back from kexeced kernel and jumping to the new
kernel, the state of devices and CPU are restored accordingly. The
devices/CPU state save/restore code of software suspend is called to
implement corresponding function.

Known issues:

- Because the segment number supported by sys_kexec_load is limited,
  hibernation image with many segments may not be load. This is
  planned to be eliminated by adding a new flag to sys_kexec_load to
  make a image can be loaded with multiple sys_kexec_load invoking.

Now, only the i386 architecture is supported.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:04 -07:00
..
irq kernel/irq/manage.c: replace a printk + WARN_ON() to a WARN() 2008-07-25 10:53:29 -07:00
power kexec jump: save/restore device state 2008-07-26 12:00:04 -07:00
time Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-07-24 12:55:01 -07:00
trace markers: fix sparse integer as NULL pointer warning 2008-07-25 10:53:45 -07:00
.gitignore
acct.c bsdacct: fix and add comments around acct_process() 2008-07-25 10:53:47 -07:00
audit_tree.c
audit.c
audit.h
auditfilter.c
auditsc.c x86_64 syscall audit fast-path 2008-07-23 17:47:32 -07:00
backtracetest.c backtrace: replace timer with tasklet + completions 2008-06-27 18:09:16 +02:00
bounds.c
capability.c security: filesystem capabilities refactor kernel code 2008-07-24 10:47:22 -07:00
cgroup_debug.c
cgroup.c cgroup_clone: use pid of newly created task for new cgroup 2008-07-25 10:53:37 -07:00
compat.c
configs.c
cpu.c workqueues: make get_online_cpus() useable for work->func() 2008-07-25 10:53:40 -07:00
cpuset.c cpuset: two minor code-cleanups 2008-07-25 10:53:38 -07:00
delayacct.c per-task-delay-accounting: update taskstats for memory reclaim delay 2008-07-25 10:53:47 -07:00
dma.c
exec_domain.c remove CONFIG_KMOD from core kernel code 2008-07-22 19:24:31 +10:00
exit.c task IO accounting: provide distinct tgid/tid I/O statistics 2008-07-25 10:53:47 -07:00
extable.c
fork.c task IO accounting: provide distinct tgid/tid I/O statistics 2008-07-25 10:53:47 -07:00
futex_compat.c
futex.c
hrtimer.c Merge branch 'generic-ipi' into generic-ipi-for-linus 2008-07-15 21:55:59 +02:00
itimer.c
kallsyms.c kallsyms: fix potential overflow in binary search 2008-07-25 10:53:27 -07:00
Kconfig.hz sched: fix hrtick & generic-ipi dependency 2008-07-23 11:18:28 +02:00
Kconfig.preempt
kexec.c kexec jump: save/restore device state 2008-07-26 12:00:04 -07:00
kfifo.c
kgdb.c
kmod.c call_usermodehelper(): increase reliability 2008-07-25 10:53:28 -07:00
kprobes.c kprobes: remove redundant config check 2008-07-25 10:53:30 -07:00
ksysfs.c
kthread.c kthread: reduce stack pressure in create_kthread and kthreadd 2008-07-18 18:46:58 +02:00
latencytop.c
lockdep_internals.h
lockdep_proc.c
lockdep.c Merge branch 'core/locking' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-07-14 14:55:13 -07:00
Makefile build kernel/profile.o only when requested 2008-07-25 10:53:27 -07:00
marker.c markers: use rcu_barrier_sched() and call_rcu_sched() 2008-07-25 10:53:45 -07:00
module.c modules: Take a shortcut for checking if an address is in a module 2008-07-22 19:24:28 +10:00
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c
ns_cgroup.c cgroup_clone: use pid of newly created task for new cgroup 2008-07-25 10:53:37 -07:00
nsproxy.c cgroup_clone: use pid of newly created task for new cgroup 2008-07-25 10:53:37 -07:00
panic.c Add a WARN() macro; this is WARN_ON() + printk arguments 2008-07-25 10:53:29 -07:00
params.c
pid_namespace.c bsdacct: switch from global bsd_acct_struct instance to per-pidns one 2008-07-25 10:53:47 -07:00
pid.c pidns: remove now unused find_pid function. 2008-07-25 10:53:45 -07:00
pm_qos_params.c pm_qos_params: BKL pushdown 2008-07-02 15:06:24 -06:00
posix-cpu-timers.c
posix-timers.c posix timers: release_posix_timer: kill the bogus put_task_struct(->it_process); 2008-07-25 10:53:38 -07:00
printk.c printk ratelimiting rewrite 2008-07-25 10:53:29 -07:00
profile.c build kernel/profile.o only when requested 2008-07-25 10:53:27 -07:00
ptrace.c ptrace children revamp 2008-07-16 18:02:33 -07:00
rcuclassic.c Merge branch 'linus' into cpus4096 2008-07-16 00:29:07 +02:00
rcupdate.c Merge branch 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-07-15 14:12:03 -07:00
rcupreempt_trace.c
rcupreempt.c Merge branch 'linus' into cpus4096 2008-07-16 00:29:07 +02:00
rcutorture.c
relay.c
res_counter.c cgroup files: convert res_counter_write() to be a cgroups write_string() handler 2008-07-25 10:53:36 -07:00
resource.c
rtmutex_common.h
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c sysdev: Pass the attribute to the low level sysdev show/store function 2008-07-21 21:55:02 -07:00
rtmutex.c
rtmutex.h
rwsem.c
sched_clock.c Merge branch 'sched/clock' into sched/devel 2008-07-14 12:19:13 +02:00
sched_cpupri.c
sched_cpupri.h
sched_debug.c
sched_fair.c Merge branch 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-07-23 19:36:53 -07:00
sched_features.h sched: bias effective_load() error towards failing wake_affine(). 2008-06-27 14:31:47 +02:00
sched_idletask.c
sched_rt.c Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-07-24 12:53:51 -07:00
sched_stats.h sched: fix accounting in task delay accounting & migration 2008-07-04 12:50:23 +02:00
sched.c accounting: account for user time when updating memory integrals 2008-07-25 10:53:46 -07:00
seccomp.c
semaphore.c mmiotrace broken in linux-next (8-bit writes only) 2008-07-01 10:14:06 +02:00
signal.c pidns: remove now unused kill_proc function 2008-07-25 10:53:45 -07:00
smp.c generic ipi function calls: wait on alloc failure fallback 2008-07-15 14:12:20 -07:00
softirq.c Merge branch 'linus' into timers/nohz 2008-07-18 19:53:16 +02:00
softlockup.c softlockup: print a module list on being stuck 2008-07-05 08:51:24 +02:00
spinlock.c
srcu.c
stacktrace.c stacktrace: fix modular build, export print_stack_trace and save_stack_trace 2008-06-30 09:20:55 +02:00
stop_machine.c cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr 2008-07-18 22:02:57 +02:00
sys_ni.c signalfd: fix undefined reference to `compat_sys_signalfd4' when !CONFIG_SIGNALFD 2008-07-25 11:35:41 -07:00
sys.c kexec jump 2008-07-26 12:00:04 -07:00
sysctl_check.c sysctl: check for bogus modes 2008-07-25 10:53:45 -07:00
sysctl.c printk ratelimiting rewrite 2008-07-25 10:53:29 -07:00
taskstats.c taskstats: remove initialization of static per-cpu variable 2008-07-25 10:53:47 -07:00
test_kprobes.c
time.c
timeconst.pl
timer.c Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm 2008-07-14 16:06:58 -07:00
tsacct.c tsacct: fix bacct_add_tsk()'s use of do_div() 2008-07-25 10:53:47 -07:00
uid16.c
user_namespace.c
user.c
utsname_sysctl.c
utsname.c
wait.c
workqueue.c workqueues: do CPU_UP_CANCELED if CPU_UP_PREPARE fails 2008-07-25 10:53:41 -07:00