632 Commits

Author SHA1 Message Date
Yinghai Lu
84b56fa46b x86, numa, 32-bit: make sure get we kva space
when 1/3 user/kernel split is used, and less memory is installed, or if
we have a big hole below 4g, max_low_pfn is still using 3g-128m

try to go down from max_low_pfn until we get it. otherwise will panic.

need to make 32-bit code to use register_e820_active_regions ... later.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 12:01:55 +02:00
Yinghai Lu
6af61a7614 x86: clean up max_pfn_mapped usage - 32-bit
on 32-bit in head_32.S after initial page table is done, we get initial
max_pfn_mapped, and then kernel_physical_mapping_init will give us
a final one.

We need to use that to make sure find_e820_area will get valid addresses
for boot_map and for NODE_DATA(0) on numa32.

XEN PV and lguest may need to assign max_pfn_mapped too.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:28 +02:00
Yinghai Lu
287572cb38 x86, numa, 32-bit: avoid clash between ramdisk and kva
use find_e820_area to get address space...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:28 +02:00
Yinghai Lu
e8c27ac919 x86, numa, 32-bit: print out debug info on all kvas
also fix the print out of node_remap_end_vaddr

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:26 +02:00
Yinghai Lu
0596152388 x86, 32-bit: change propagate_e820_map() back to find_max_pfn()
we don't need to call memory_present that early.
numa and sparse will call memory_present later and might
even fail, it will call memory_present for the full range.

also for sparse it will call alloc_bootmem ... before we set up bootmem.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:25 +02:00
Yinghai Lu
b66cd72073 x86: set node_remap_size[0] in fallback path
... otherwise alloc_remap will not get node_mem_map from kva area, and
alloc_node_mem_map has to alloc_bootmem_node to get mem_map.
It will use two low address copies ...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:25 +02:00
Yinghai Lu
ba924c81dd x86, numa, 32-bit: increase max_elements to 1024
so every element will represent 64M instead of 256M.

AMD opteron could have HW memory hole remapping, so could have
[0, 8g + 64M) on node0. Reduce element size to 64M to keep that on node 0

Later we need to use find_e820_area() to allocate memory_node_map like
on 64-bit. But need to move memory_present out of populate_mem_map...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:24 +02:00
Ingo Molnar
471b3c1b01 x86, numaq 32-bit: build fix
fix:

drivers/built-in.o: In function `acpi_numa_init':
: undefined reference to `acpi_numa_arch_fixup'

which can happen with ACPI && NUMAQ.
2008-06-03 12:49:06 +02:00
Ingo Molnar
cf3d0cb8a1 x86: 32-bit numa, build fix
on Summit it's possible to have:

 CONFIG_ACPI_SRAT=y
 CONFIG_HAVE_ARCH_PARSE_SRAT=y

in which case acpi.h defines the acpi_numa_slit_init() and
acpi_numa_processor_affinity_init() methods as a macro.
2008-06-03 11:45:09 +02:00
Ingo Molnar
b28852d670 x86: add dummy acpi_numa_processor_affinity_init() implementation on 32-bit
allow CONFIG_ACPI_NUMA builds to succeed.
2008-06-03 10:23:53 +02:00
Ingo Molnar
2772f54bf3 x86: add acpi_numa_slit_init() dummy implementation on 32-bit
allow CONFIG_ACPI_NUMA builds to succeed on 32-bit.
2008-06-03 10:23:48 +02:00
Yinghai Lu
a5481280b2 x86: extend e820 early_res support 32bit -fix #5
reserve early numa kva, so it will not clash with new RAMDISK

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-31 09:55:53 +02:00
Yinghai Lu
163872950d x86: extend e820 early_res support 32bit -fix #4
reserve_early pgdata for 32bit numa

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-31 09:55:50 +02:00
Paul Jackson
e9197bf011 x86 boot: remove some unused extern function declarations
Remove three extern declarations for routines
that don't exist.  Fix a typo in a comment.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:10 +02:00
Ingo Molnar
668a6c3654 - fix mmioftrace + rcu merge interaction
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 09:51:43 +02:00
Thomas Gleixner
48e2395722 x86: fixup the fallout of the bitops changes
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:58:36 +02:00
Jan Beulich
4e50e62ce5 x86: eliminate duplicate consistency checks in init_32.c
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:58:30 +02:00
Thomas Gleixner
6a1673ae22 x86: make memory_add_physaddr_to_nid depend on MEMORY_HOTPLUG
memory_add_physaddr_to_nid() is only used in the
CONFIG_MEMORY_HOTPLUG_SPARSE || CONFIG_ACPI_HOTPLUG_MEMORY case.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:29 +02:00
Thomas Gleixner
11034d5597 x86: init64.c include initrd.h
free_initrd_mem needs a prototype, which is in linux/initrd.h

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:27 +02:00
Thomas Gleixner
55d4f22abc x86: k8topology cleanup variable declarations
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:26 +02:00
Thomas Gleixner
d34c08958f x86: k8topology fix shadow variable
sparse mutters:
arch/x86/mm/k8topology_64.c:108:7: warning: symbol 'nodeid' shadows an earlier one

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:26 +02:00
Thomas Gleixner
0eafe234a2 x86: k8topology add missing header
k8_scan_nodes is global and needs a prototype. Add the header file
which contains it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:26 +02:00
Andrew Morton
46dd98a5c0 arch/x86/mm/pat.c: use boot_cpu_has()
arch/x86/mm/pat.c: In function 'phys_mem_access_prot_allowed':
arch/x86/mm/pat.c:526: warning: passing argument 2 of 'constant_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:526: warning: passing argument 2 of 'variable_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:527: warning: passing argument 2 of 'constant_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:527: warning: passing argument 2 of 'variable_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:528: warning: passing argument 2 of 'constant_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:528: warning: passing argument 2 of 'variable_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:529: warning: passing argument 2 of 'constant_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:529: warning: passing argument 2 of 'variable_test_bit' from incompatible pointer type

Don't open-code test_bit() on a __u32

Cc: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:50:25 +02:00
Pekka Paalanen
790e2a290b x86 mmiotrace: page level is unsigned
Fixes some sparse warnings.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-24 11:27:47 +02:00
Pekka Paalanen
a50445d76c mmiotrace: rename kmmio_probe::user_data to :private.
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-24 11:27:41 +02:00
Pekka Paalanen
dee310d0ad x86 mmiotrace: use resource_size_t for phys addresses
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-24 11:27:36 +02:00
Pekka Paalanen
87e547fe41 x86 mmiotrace: fix page-unaligned ioremaps
mmiotrace_ioremap() expects to receive the original unaligned map phys address
and size. Also fix {un,}register_kmmio_probe() to deal properly with
unaligned size.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-24 11:27:32 +02:00
Pekka Paalanen
970e6fa038 mmiotrace: code style cleanups
From c2da03771e29159627c5c7b9509ec70bce9f91ee Mon Sep 17 00:00:00 2001
From: Pekka Paalanen <pq@iki.fi>
Date: Mon, 28 Apr 2008 21:25:22 +0300

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-24 11:27:28 +02:00
Pekka Paalanen
7423d1115f x86 mmiotrace: dynamically disable non-boot CPUs
From 8979ee55cb6a429c4edd72ebec2244b849f6a79a Mon Sep 17 00:00:00 2001
From: Pekka Paalanen <pq@iki.fi>
Date: Sat, 12 Apr 2008 00:18:57 +0300

Mmiotrace is not reliable with multiple CPUs and may
miss events. Drop to single CPU when mmiotrace is activated.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-24 11:26:52 +02:00
Randy Dunlap
0663bb6cd9 mmiotrace: fix printk format
Fix gcc printk format warnings:

next-20080415/arch/x86/mm/mmio-mod.c: In function 'print_pte':
next-20080415/arch/x86/mm/mmio-mod.c:154: warning: format '%lx' expects type 'long unsigned int', but argument 3 has type 'pteval_t'
next-20080415/arch/x86/mm/mmio-mod.c:154: warning: format '%lx' expects type 'long unsigned int', but argument 4 has type 'pteval_t'
next-20080415/arch/x86/mm/mmio-mod.c: At top level:
next-20080415/arch/x86/mm/mmio-mod.c:403: warning: 'downed_cpus' defined but not used

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 11:26:26 +02:00
Pekka Paalanen
e4b37ee686 x86 mmiotrace: remove ISA_trace parameter.
This had become a no-op.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 11:25:44 +02:00
Pekka Paalanen
ff3a3e9ba5 x86 mmiotrace: move files into arch/x86/mm/.
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 11:25:37 +02:00
Pekka Paalanen
d61fc44853 x86: mmiotrace, preview 2
Kconfig.debug, Makefile and testmmiotrace.c style fixes.
Use real mutex instead of mutex.
Fix failure path in register probe func.
kmmio: RCU read-locked over single stepping.
Generate mapping id's.
Make mmio-mod.c built-in and rewrite its locking.
Add debugfs file to enable/disable mmiotracing.
kmmio: use irqsave spinlocks.
Lots of cleanups in mmio-mod.c
Marker file moved from /proc into debugfs.
Call mmiotrace entrypoints directly from ioremap.c.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 11:22:24 +02:00
Pekka Paalanen
0fd0e3da45 x86: mmiotrace full patch, preview 1
kmmio.c handles the list of mmio probes with callbacks, list of traced
pages, and attaching into the page fault handler and die notifier. It
arms, traps and disarms the given pages, this is the core of mmiotrace.

mmio-mod.c is a user interface, hooking into ioremap functions and
registering the mmio probes. It also decodes the required information
from trapped mmio accesses via the pre and post callbacks in each probe.
Currently, hooking into ioremap functions works by redefining the symbols
of the target (binary) kernel module, so that it calls the traced
versions of the functions.

The most notable changes done since the last discussion are:
- kmmio.c is a built-in, not part of the module
- direct call from fault.c to kmmio.c, removing all dynamic hooks
- prepare for unregistering probes at any time
- make kmmio re-initializable and accessible to more than one user
- rewrite kmmio locking to remove all spinlocks from page fault path

Can I abuse call_rcu() like I do in kmmio.c:unregister_kmmio_probe()
or is there a better way?

The function called via call_rcu() itself calls call_rcu() again,
will this work or break? There I need a second grace period for RCU
after the first grace period for page faults.

Mmiotrace itself (mmio-mod.c) is still a module, I am going to attack
that next. At some point I will start looking into how to make mmiotrace
a tracer component of ftrace (thanks for the hint, Ingo). Ftrace should
make the user space part of mmiotracing as simple as
'cat /debug/trace/mmio > dump.txt'.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 11:22:12 +02:00
Pekka Paalanen
10c43d2eb5 x86: explicit call to mmiotrace in do_page_fault()
The custom page fault handler list is replaced with a single function
pointer. All related functions and variables are renamed for
mmiotrace.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: pq@iki.fi
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 11:21:55 +02:00
Pekka Paalanen
75bb88350e x86 mmiotrace: use lookup_address()
Use lookup_address() from pageattr.c instead of doing the same
manually. Also had to EXPORT_SYMBOL_GPL(lookup_address) to make this
work for modules. This also fixes "undefined symbol 'init_mm'"
compile error for x86_32.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 11:21:32 +02:00
Pekka Paalanen
72b59d67f8 x86_64: fix kernel rodata NX setting
Without CONFIG_DYNAMIC_FTRACE, mark_rodata_ro() would mark a wrong
number of pages as no-execute. The bug was introduced in the patch
"ftrace: dont write protect kernel text". The symptom was machine reboot
after a CPU hotplug.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 21:53:07 +02:00
Pekka Paalanen
86069782d6 x86: add a list for custom page fault handlers.
Provides kernel modules a way to register custom page fault handlers.
On every page fault this will call a list of registered functions. The
functions may handle the fault and force do_page_fault() to return
immediately.

This functionality is similar to the now removed page fault notifiers.
Custom page fault handlers are used by debugging and reverse engineering
tools. Mmiotrace is one such tool and a patch to add it into the tree
will follow.

The custom page fault handlers are called earlier in do_page_fault()
than the page fault notifiers were.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 21:16:38 +02:00
Steven Rostedt
8f0f996e80 ftrace: dont write protect kernel text
Dynamic ftrace cant work when the kernel has its text write protected.
This patch keeps the kernel from being write protected when
dynamic ftrace is in place.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 21:16:22 +02:00
Andy Whitcroft
84d6bd0e27 x86: cope with no remap space being allocated for a numa node
When allocating the pgdat's for numa nodes on x86_32 we attempt to place
them in the numa remap space for that node.  However should the node not
have any remap space allocated (such as due to having non-ram pages in
the remap location in the node) then we will incorrectly place the pgdat
at zero.  Check we have remap available, falling back to node 0 memory
where we do not.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-20 14:10:50 +02:00
Andy Whitcroft
b9ada4281c x86: reinstate numa remap for SPARSEMEM on x86 NUMA systems
Recent kernels have been panic'ing trying to allocate memory early in boot,
in __alloc_pages:

  BUG: unable to handle kernel paging request at 00001568
  IP: [<c10407b6>] __alloc_pages+0x33/0x2cc
  *pdpt = 00000000013a5001 *pde = 0000000000000000
  Oops: 0000 [#1] SMP
  Modules linked in:

  Pid: 1, comm: swapper Not tainted (2.6.25 #78)
  EIP: 0060:[<c10407b6>] EFLAGS: 00010246 CPU: 0
  EIP is at __alloc_pages+0x33/0x2cc
  EAX: 00001564 EBX: 000412d0 ECX: 00001564 EDX: 000005c3
  ESI: f78012a0 EDI: 00000001 EBP: 00001564 ESP: f7871e50
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
  Process swapper (pid: 1, ti=f7870000 task=f786f670 task.ti=f7870000)
  Stack: 00000000 f786f670 00000010 00000000 0000b700 000412d0 f78012a0 00000001
         00000000 c105b64d 00000000 000412d0 f78012a0 f7803120 00000000 c105c1c5
         00000010 f7803144 000412d0 00000001 f7803130 f7803120 f78012a0 00000001
  Call Trace:
   [<c105b64d>] kmem_getpages+0x94/0x129
   [<c105c1c5>] cache_grow+0x8f/0x123
   [<c105c689>] ____cache_alloc_node+0xb9/0xe4
   [<c105c999>] kmem_cache_alloc_node+0x92/0xd2
   [<c1018929>] build_sched_domains+0x536/0x70d
   [<c100b63c>] do_flush_tlb_all+0x0/0x3f
   [<c100b63c>] do_flush_tlb_all+0x0/0x3f
   [<c10572d6>] interleave_nodes+0x23/0x5a
   [<c105c44f>] alternate_node_alloc+0x43/0x5b
   [<c1018b47>] arch_init_sched_domains+0x46/0x51
   [<c136e85e>] kernel_init+0x0/0x82
   [<c137ac19>] sched_init_smp+0x10/0xbb
   [<c136e8a1>] kernel_init+0x43/0x82
   [<c10035cf>] kernel_thread_helper+0x7/0x10

Debugging this showed that the NODE_DATA() for nodes other than node 0
were all NULL.  Tracing this back showed that the NODE_DATA() pointers
were being initialised to each nodes remap space.  However under
SPARSEMEM remap is disabled which leads to the pgdat's being placed
incorrectly at kernel virtual address 0.  Leading to the panic when
attempting to allocate memory from these nodes.

Numa remap was disabled in the commit below.  This occured while fixing
problems triggered when attempting to boot x86_32 NUMA SPARSEMEM kernels
on non-numa hardware.

	x86: make NUMA work on 32-bit
	commit 1b000a5dbeb2f34bc03d45ebdf3f6d24a60c3aed

The real problem is believed to be related to other alignment issues in
the regions blocked out from the bootmem allocator for small memory
systems, and has been fixed separately.  Therefore re-enable remap for
SPARSMEM, which fixes pgdat allocation issues.  Testing confirms that
SPARSMEM NUMA kernels will boot correctly with this part of the change
reverted.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-20 14:10:48 +02:00
Ingo Molnar
a8ac1ae3a2 Merge branch 'linus' into x86/pat 2008-05-19 09:23:07 +02:00
Thomas Gleixner
b4ef290d7c Merge branch 'linus' into x86/pat 2008-05-18 11:54:43 +02:00
Avi Kivity
31f4d870b0 x86: fix crash on cpu hotplug on pat-incapable machines
pat_disable() is __init, which means it goes away after booting is complete.
Unfortunately it is used by the hotplug code if the machine is not
pat-capable, causing a crash.

Fix by marking pat_disable() as __cpuinit.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-17 22:57:20 +02:00
Pranith Kumar
afc8534380 x86: arch/x86/mm/pat.c - fix warning
fix this warning:

 arch/x86/mm/pat.c: In function `phys_mem_access_prot_allowed':
 arch/x86/mm/pat.c:558: warning: long long unsigned int format, long
 unsigned int arg (arg 6)
 arch/x86/mm/pat.c: In function `map_devmem':
 arch/x86/mm/pat.c:580: warning: long long unsigned int format, long
 unsigned int arg (arg 6)

Signed-off-by: D Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-13 19:39:30 +02:00
Hugh Dickins
61165d7a03 x86: fix app crashes after SMP resume
After resume on a 2cpu laptop, kernel builds collapse with a sed hang,
sh or make segfault (often on 20295564), real-time signal to cc1 etc.

Several hurdles to jump, but a manually-assisted bisect led to -rc1's
d2bcbad5f3ad38a1c09861bca7e252dde7bb8259 x86: do not zap_low_mappings
in __smp_prepare_cpus.  Though the low mappings were removed at bootup,
they were left behind (with Global flags helping to keep them in TLB)
after resume or cpu online, causing the crashes seen.

Reinstate zap_low_mappings (with local __flush_tlb_all) for each cpu_up
on x86_32.  This used to be serialized by smp_commenced_mask: that's now
gone, but a low_mappings flag will do.  No need for native_smp_cpus_done
to repeat the zap: let mem_init zap BSP's low mappings just like on UP.

(In passing, fix error code from native_cpu_up: do_boot_cpu returns a
variety of diagnostic values, Dprintk what it says but convert to -EIO.
And save_pg_dir separately before zap_low_mappings: doesn't matter now,
but zapping twice in succession wiped out resume's swsusp_pg_dir.)

That worked well on the duo and one quad, but wouldn't boot 3rd or 4th
cpu on P4 Xeon, oopsing just after unlock_ipi_call_lock.  The TLB flush
IPI now being sent reveals a long-standing bug: the booting cpu has its
APIC readied in smp_callin at the top of start_secondary, but isn't put
into the cpu_online_map until just before that unlock_ipi_call_lock.

So native_smp_call_function_mask to online cpus would send_IPI_allbutself,
including the cpu just coming up, though it has been excluded from the
count to wait for: by the time it handles the IPI, the call data on
native_smp_call_function_mask's stack may well have been overwritten.

So fall back to send_IPI_mask while cpu_online_map does not match
cpu_callout_map: perhaps there's a better APICological fix to be
made at the start_secondary end, but I wouldn't know that.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-13 19:36:12 +02:00
Yinghai Lu
0327318445 x86_64: simplify the memtest parameter setting
use CONFIG_MEMTEST only. if it is set, will have memtest=0 (disabled)

need to have memtest=4 in command line to test more patterns.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:15 +02:00
Venki Pallipadi
77b52b4c5c x86: add "debugpat" boot option
enable debug messages by a boot option "debugpat".

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:07 +02:00
Thomas Gleixner
8d4a430085 x86: cleanup PAT cpu validation
Move the scattered checks for PAT support to a single function. Its
moved to addon_cpuid_features.c as this file is shared between 32 and
64 bit.

Remove the manipulation of the PAT feature bit and just disable PAT in
the PAT layer, based on the PAT bit provided by the CPU and the
current CPU version/model white list.

Change the boot CPU check so it works on Voyager somewhere in the
future as well :) Also panic, when a secondary has PAT disabled but
the primary one has alrady switched to PAT. We have no way to undo
that.

The white list is kept for now to ensure that we can rely on known to
work CPU types and concentrate on the software induced problems
instead of fighthing CPU erratas and subtle wreckage caused by not yet
verified CPUs. Once the PAT code has stabilized enough, we can remove
the white list and open the can of worms.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-08 15:43:51 +02:00
Hugh Dickins
aeed5fce37 x86: fix PAE pmd_bad bootup warning
Fix warning from pmd_bad() at bootup on a HIGHMEM64G HIGHPTE x86_32.

That came from 9fc34113f6880b215cbea4e7017fc818700384c2 x86: debug pmd_bad();
but we understand now that the typecasting was wrong for PAE in the previous
version: pagetable pages above 4GB looked bad and stopped Arjan from booting.

And revert that cded932b75ab0a5f9181ee3da34a0a488d1a14fd x86: fix pmd_bad
and pud_bad to support huge pages.  It was the wrong way round: we shouldn't
weaken every pmd_bad and pud_bad check to let huge pages slip through - in
part they check that we _don't_ have a huge page where it's not expected.

Put the x86 pmd_bad() and pud_bad() definitions back to what they have long
been: they can be improved (x86_32 should use PTE_MASK, to stop PAE thinking
junk in the upper word is good; and x86_64 should follow x86_32's stricter
comparison, to stop thinking any subset of required bits is good); but that
should be a later patch.

Fix Hans' good observation that follow_page() will never find pmd_huge()
because that would have already failed the pmd_bad test: test pmd_huge in
between the pmd_none and pmd_bad tests.  Tighten x86's pmd_huge() check?
No, once it's a hugepage entry, it can get quite far from a good pmd: for
example, PROT_NONE leaves it with only ACCESSED of the KERN_PGTABLE bits.

However... though follow_page() contains this and another test for huge
pages, so it's nice to keep it working on them, where does it actually get
called on a huge page?  get_user_pages() checks is_vm_hugetlb_page(vma) to
to call alternative hugetlb processing, as does unmap_vmas() and others.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Earlier-version-tested-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeff Chua <jeff.chua.linux@gmail.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-06 13:08:58 -07:00