With this patch, kdump uses the firmware soft-reset NMI for two purposes:
1) Initiate the kdump (take a crash dump) by issuing a soft-reset.
2) Break a CPU out of a deadlock condition that is detected during kdump
processing.
When a soft-reset is initiated each CPU will enter
system_reset_exception() and set its corresponding bit in the global
bit-array cpus_in_sr then call die(). When die() finds the CPU's bit set
in cpu_in_sr crash_kexec() is called to initiate a crash dump. The first
CPU to enter crash_kexec() is called the "crashing CPU". All other CPUs
are "secondary CPUs". The secondary CPU's pass through to
crash_kexec_secondary() and sleep. The crashing CPU waits for all CPUs
to enter via soft-reset then boots the kdump kernel (see
crash_soft_reset_check())
When the system crashes due to a panic or exception, crash_kexec() is
called by panic() or die(). The crashing CPU sends an IPI to all other
CPUs to notify them of the pending shutdown. If a CPU is in a deadlock
or hung state with interrupts disabled, the IPI will not be delivered.
The result being, that the kdump kernel is not booted. This problem is
solved with the use of a firmware generated soft-reset. After the
crashing_cpu has issued the IPI, it waits for 10 sec for all CPUs to
enter crash_ipi_callback(). A CPU signifies its entry to
crash_ipi_callback() by setting its corresponding bit in the
cpus_in_crash bit array. After 10 sec, if one or more CPUs have not set
their bit in cpus_in_crash we assume that the CPU(s) is deadlocked. The
operator is then prompted to generate a soft-reset to break the
deadlock. Each CPU enters the soft reset handler as described above.
Two conditions must be handled at this point:
1) The system crashed because the operator generated a soft-reset. See
2) The system had crashed before the soft-reset was generated ( in the
case of a Panic or oops).
The first CPU to enter crash_kexec() uses the state of the kexec_lock to
determine this state. If kexec_lock is already held then condition 2 is
true and crash_kexec_secondary() is called, else; this CPU is flagged as
the crashing CPU, the kexec_lock is acquired and crash_kexec() proceeds
as described above.
Each additional CPUs responding to the soft-reset will pass through
crash_kexec() to kexec_secondary(). All secondary CPUs call
crash_ipi_callback() readying them self's for the shutdown. When ready
they clear their bit in cpus_in_sr. The crashing CPU waits in
kexec_secondary() until all other CPUs have cleared their bits in
cpus_in_sr. The kexec kernel boot is then started.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The class zero interrupt handling for spus was confusing alignment and
error interrupts, so swap them.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
spufs_base.c calls __add_pages, which depends on CONFIG_MEMORY_HOTPLUG.
Moved the selection of CONFIG_MEMORY_HOTPLUG from CONFIG_SPUFS_MMAP
to CONFIG_SPU_FS.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In the context save/restore code, the SPU MFC command queue purge
code has a bug:
static inline void wait_purge_complete(struct spu_state *csa, struct
spu *spu)
{
struct spu_priv2 __iomem *priv2 = spu->priv2;
/* Save, Step 28:
* Poll MFC_CNTL[Ps] until value '11' is
* read
* (purge complete).
*/
POLL_WHILE_FALSE(in_be64(&priv2->mfc_control_RW)
& MFC_CNTL_PURGE_DMA_COMPLETE);
}
This will exit as soon as _one_ of the 2 bits that compose
MFC_CNTL_PURGE_DMA_COMPLETE is set, and one of them happens to be
"purge in progress"... which means that we'll happily continue
restoring the MFC while it's being purged at the same time.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes a bug where we don't properly map SPE MMIO space as guarded,
causing various test cases to fail, probably due to write combining and other
niceties caused by the lack of the G bit.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now that we have the udbg callbacks we can enable XMON by default.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Enable the RTAS udbg console on IBM Cell Blade, this allows xmon
to work.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add udbg hooks for the RTAS console, based on the RTAS put-term-char
and get-term-char calls. Along with my previous patches, this should
enable debugging as soon as early_init_dt_scan_rtas() is called.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Althought RTAS is instantiated when we enter the kernel, we can't actually
call into it until we know its entry point address. Currently we grab that
in rtas_initialize(), however that's quite late in the boot sequence.
To enable rtas_call() earlier, we can grab the RTAS entry etc. values while
we're scanning the flattened device tree. There's existing code to retrieve
the values from /chosen, however we don't store them there anymore, so remove
that code.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Move RTAS exports next to their declarations.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently it's unsafe to call rtas_call() prior to rtas_initialize(). This
is because the rtas.entry value hasn't been setup and so we don't know
where to enter, but we just try anyway.
We can't do anything intelligent without rtas.entry, so if it's not set, just
return. Code that calls rtas_call() early needs to be aware that the call
might fail.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There's no need to set the boot cpu paca in asm, so do it in C so us
mere mortals can understand it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There's no reason kexec_setup() needs to be called explicitly from
setup_system(), it can just be a regular initcall.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
With the ppc_md htab pointers setup earlier, we can use ppc_md.hpte_insert
in htab_bolt_mapping(), rather than deciding which version to call by hand.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Initialise the ppc_md htab callbacks earlier, in the probe routines. This
allows us to call htab_finish_init() from htab_initialize(), and makes it
private to hash_utils_64.c. Move htab_finish_init() and make_bl() above
htab_initialize() to avoid forward declarations.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If DEBUG is turned on in prom.c, export the flat device tree via debugfs.
This has been handy on several occasions.
To look at it:
# mount -t debugfs none /sys/kernel/debug
# od -a /sys/kernel/debug/powerpc/flat-device-tree
and/or
# dtc -fI dtb /sys/kernel/debug/powerpc/flat-device-tree -O dts
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
None of this seems to be necessary, so let's see if can remove it and not
break anything. Booted on iSeries & pSeries here.
NB. we don't remove the hvReleaseData, we just move it down so that the file
reads more clearly.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
During kdump boot, noticed some machines checkstop on dma protection
fault for ongoing DMA left in the first kernel. Instead of initializing
TCE entries in iommu_init() for the kdump boot, this patch fixes this
issue by walking through the each TCE table and checks whether the
entries are in use by the first kernel. If so, reserve those entries by
setting the corresponding bit in tbl->it_map such that these entries
will not be available for the kdump boot.
However it could be possible that all TCE entries might be used up due
to the driver bug that does continuous mapping. My observation is around
1700 TCE entries are used on some systems (Ex: P4) at some point of
time during kdump boot and saving dump (either write into the disk or
sending to remote machine). Hence, this patch will make sure that
minimum of 2048 entries will be available such that kdump boot could be
successful in some cases.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The following patch avoids accessing Hypervisor privilege HID
registers when running on a Hypervisor (MSR[HV]=0).
Signed-off-by: Jimi Xenidis <jimix@watson.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
typo fixes
Clean up 'inline is not at beginning' warnings for usb storage
Storage class should be first
i386: Trivial typo fixes
ixj: make ixj_set_tone_off() static
spelling fixes
fix paniced->panicked typos
Spelling fixes for Documentation/atomic_ops.txt
move acknowledgment for Mark Adler to CREDITS
remove the bouncing email address of David Campbell
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (40 commits)
kbuild: trivial fixes in Makefile
kbuild: adding symbols in Kconfig and defconfig to TAGS
kbuild: replace abort() with exit(1)
kbuild: support for %.symtypes files
kbuild: fix silentoldconfig recursion
kbuild: add option for stripping modules while installing them
kbuild: kill some false positives from modpost
kbuild: export-symbol usage report generator
kbuild: fix make -rR breakage
kbuild: append -dirty for updated but uncommited changes
kbuild: append git revision for all untagged commits
kbuild: fix module.symvers parsing in modpost
kbuild: ignore make's built-in rules & variables
kbuild: bugfix with initramfs
kbuild: modpost build fix
kbuild: check license compatibility when building modules
kbuild: export-type enhancement to modpost.c
kbuild: add dependency on kernel.release to the package targets
kbuild: `make kernelrelease' speedup
kconfig: KCONFIG_OVERWRITECONFIG
...
Change to using a configurable RAM setup in startup code. This cleans up
the whole RAM base/sizing issue, and removes a lot of board specific code.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the fixed RAM configurations for each board type from the
linker script. Replace with simple defines usng the flexible RAM
configuration options. This cleans out of lot of board specific
munging of addresses.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reworked the way RAM regions are defined. Instead of coding all the
variations for each board type we now just configure RAM base and size
in the usual Kconfig setup. This much simplifies the code, and makes it
a lot more flexible when setting up new boards or board varients.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clean out unused variable definitions from 68328 start up code.
Also use a more appropriate start address for the case of relocating
the kernel code to RAM (from ROM/flash).
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
__ramvec has been removed from the linker script. The vector base
address is defined as a configurable option, use that. Remove its
use from the 68328/pilot startup code.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Applies to git & 2.6.17-rc6 after CONFIG_DEBUG_STACKOVERFLOW patch
uses same stack-zeroing mechanism as on i386 to discover maximum stack
excursions.
Signed-off-by: Eric Sandeen <sandeen@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Take two, now without spurious whitespace :( Applies to git & 2.6.17-rc6
CONFIG_DEBUG_STACKOVERFLOW existed for x86_64 in 2.4, but seems to have gone AWOL in 2.6.
I've pretty much just copied this over from the 2.4 code, with
appropriate tweaks for the 2.6 kernel, plus a bugfix. I'd personally
rather see it printed out the way other arches do it, i.e.
bytes-remaining-until-overflow, rather than having to do the subtraction
yourself. Also, only 128 bytes remaining seems awfully late to issue a
warning. But I'll start here :)
Signed-off-by: Eric Sandeen <sandeen@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Intel now has support for Architectural Performance Monitoring Counters
( Refer to IA-32 Intel Architecture Software Developer's Manual
http://www.intel.com/design/pentium4/manuals/253669.htm ). This
feature is present starting from Intel Core Duo and Intel Core Solo processors.
What this means is, the performance monitoring counters and some performance
monitoring events are now defined in an architectural way (using cpuid).
And there will be no need to check for family/model etc for these architectural
events.
Below is the patch to use this performance counters in nmi watchdog driver.
Patch handles both i386 and x86-64 kernels.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On some i386/x86_64 systems, sending an NMI IPI as a broadcast will
reset the system. This seems to be a BIOS bug which affects machines
where one or more cpus are not under OS control. It occurs on HT
systems with a version of the OS that is not compiled without HT
support. It also occurs when a system is booted with max_cpus=n where
2 <= n < cpus known to the BIOS. The fix is to always send NMI IPI as
a mask instead of as a broadcast.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Appended patch fixes the "APIC error on CPUX: 00(40)" observed during bootup.
From SDM Vol-3A "Valid Interrupt Vectors" section:
"When an illegal vector value (0-15) is written to an LVT entry
and the delivery mode is Fixed, the APIC may signal an illegal
vector error, with out regard to whether the mask bit is set
or whether an interrupt is actually seen on input."
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow stack growth so the 'enter' instruction works. Also
fixes problem in compat_sys_kexec_load() which could allocate
more than 128 bytes using compat_alloc_user_space().
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Use tail call from clear_user to __clear_user to save some code size
- Use standard memcpy for forward memmove
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Only exports for assembler files are left in x8664_ksyms.c
Originally inspired by a patch from Al Viro
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
x86_64 and i386 behave inconsistently when sending an IPI on vector 2
(NMI_VECTOR). Make both behave the same, so IPI 2 is sent as NMI.
The crash code was abusing send_IPI_allbutself() by passing a code
instead of a vector, it only worked because crash knew about the
internal code of send_IPI_allbutself(). Change crash to use NMI_VECTOR
instead, and remove the comment about how crash was abusing the function.
This patch is a pre-requisite for fixing the problem where sending an
IPI as NMI would reboot some Dell Xeon systems. I cannot fix that
problem while crash continus to abuse send_IPI_allbutself().
It also removes the inconsistency between i386 and x86_64 for
NMI_VECTOR. That will simplify all the RAS code that needs to bring
all the cpus to a clean stop, even when one or more cpus are spinning
disabled.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It turned out that the following change is needed when the speaker is
compiled as a module.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes the no longer used sys32_ni_syscall()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently CONFIG_REORDER uses -ffunction-sections for all code;
however, creating a separate section for each function is not useful
for modules and just adds bloat. Moving this option from CFLAGS to
CFLAGS_KERNEL shrinks module object files (e.g., the module tree for a
kernel built with most drivers as modules shrinked from 54M to 46M),
and decreases the number of sysfs files in /sys/module/*/sections/
directories.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Defaulting to a value not evenly divisible by four makes little sense,
as four values are displayed per line (and hence the rest of the line
would otherwise be wasted).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With (significantly) more than 10 CPUs online, the column headings
drifted off the positions of the column contents with growing CPU
numbers.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The APIC ID returned by hard_smp_processor_id can be beyond
NR_CPUS and then overflow the x86_cpu_to_apic[] array.
Add a check for overflow. If it happens then the slow loop below
will catch.
Bug pointed out by Doug Thompson
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Getting phys_proc_id and cpu_core_id information to be printed at boot
time for AMD processors. Also matching the Node related boot time
information that gets printed for Intel and AMD processors for NUMA
configurations.
Signed-off-by: Rohit Seth <rohitseth@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>