Commit Graph

128 Commits

Author SHA1 Message Date
Linus Torvalds
b278240839 Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
  [PATCH] Don't set calgary iommu as default y
  [PATCH] i386/x86-64: New Intel feature flags
  [PATCH] x86: Add a cumulative thermal throttle event counter.
  [PATCH] i386: Make the jiffies compares use the 64bit safe macros.
  [PATCH] x86: Refactor thermal throttle processing
  [PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
  [PATCH] Fix unwinder warning in traps.c
  [PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
  [PATCH] x86: Move direct PCI scanning functions out of line
  [PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
  [PATCH] Don't leak NT bit into next task
  [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
  [PATCH] Fix some broken white space in ia32_signal.c
  [PATCH] Initialize argument registers for 32bit signal handlers.
  [PATCH] Remove all traces of signal number conversion
  [PATCH] Don't synchronize time reading on single core AMD systems
  [PATCH] Remove outdated comment in x86-64 mmconfig code
  [PATCH] Use string instructions for Core2 copy/clear
  [PATCH] x86: - restore i8259A eoi status on resume
  [PATCH] i386: Split multi-line printk in oops output.
  ...
2006-09-26 13:07:55 -07:00
Dave McCracken
46a82b2d55 [PATCH] Standardize pxx_page macros
One of the changes necessary for shared page tables is to standardize the
pxx_page macros.  pte_page and pmd_page have always returned the struct
page associated with their entry, while pte_page_kernel and pmd_page_kernel
have returned the kernel virtual address.  pud_page and pgd_page, on the
other hand, return the kernel virtual address.

Shared page tables needs pud_page and pgd_page to return the actual page
structures.  There are very few actual users of these functions, so it is
simple to standardize their usage.

Since this is basic cleanup, I am submitting these changes as a standalone
patch.  Per Hugh Dickins' comments about it, I am also changing the
pxx_page_kernel macros to pxx_page_vaddr to clarify their meaning.

Signed-off-by: Dave McCracken <dmccr@us.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:48:51 -07:00
Christoph Lameter
776ed98b84 [PATCH] reduce MAX_NR_ZONES: remove two strange uses of MAX_NR_ZONES
I keep seeing zones on various platforms that are never used and wonder why we
compile support for them into the kernel.  Counters show up for HIGHMEM and
DMA32 that are alway zero.

This patch allows the removal of ZONE_DMA32 for non x86_64 architectures and
it will get rid of ZONE_HIGHMEM for arches not using highmem (like 64 bit
architectures).  If an arch does not define CONFIG_HIGHMEM then ZONE_HIGHMEM
will not be defined.  Similarly if an arch does not define CONFIG_ZONE_DMA32
then ZONE_DMA32 will not be defined.

No current architecture uses all the 4 zones (DMA,DMA32,NORMAL,HIGH) that we
have now.  The patchset will reduce the number of zones for all platforms.

On many platforms that do not have DMA32 or HIGHMEM this will reduce the
number of zones by 50%.  F.e.  ia64 only uses DMA and NORMAL.

Large amounts of memory can be saved for larger systemss that may have a few
hundred NUMA nodes.

With ZONE_DMA32 and ZONE_HIGHMEM support optional MAX_NR_ZONES will be 2 for
many non i386 platforms and even for i386 without CONFIG_HIGHMEM set.

Tested on ia64, x86_64 and on i386 with and without highmem.

The patchset consists of 11 patches that are following this message.

One could go even further than this patchset and also make ZONE_DMA optional
because some platforms do not need a separate DMA zone and can do DMA to all
of memory.  This could reduce MAX_NR_ZONES to 1.  Such a patchset will
hopefully follow soon.

This patch:

Fix strange uses of MAX_NR_ZONES

Sometimes we use MAX_NR_ZONES - x to refer to a zone.  Make that explicit.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:48:46 -07:00
Andi Kleen
0637a70a5d [PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
Some buggy systems can machine check when config space accesses
happen for some non existent devices.  i386/x86-64 do some early
device scans that might trigger this. Allow pci=noearly to disable
this. Also when type 1 is disabling also don't do any early
accesses which are always type1.

This moves the pci= configuration parsing to be a early parameter.
I don't think this can break anything because it only changes
a single global that is only used by PCI.

Cc: gregkh@suse.de
Cc: Trammell Hudson <hudson@osresearch.net>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
5e6b0bfe5b [PATCH] Use proper accessors to change PSE bits in change_page_attr()
Use normal pte accessors in change_page_attr() to access the PSE
bits.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
df992848f5 [PATCH] Fix pte_exec/mkexec and use it in change_page_attr()
Fix the pte_exec/mkexec page table accessor functions to really
use the NX bit. Previously they only checked the USER bit, but
weren't actually used for anything.

Then use them in change_page_attr() to manipulate the NX bit
properly.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
d3cf7f0615 [PATCH] Remove bogus warning from early_ioremap
It is correct for its only caller right now, but not for possible
future others.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andrew Morton
1164c9994f [PATCH] make numa_emulation() __init
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Keith Mannthey
6ad9165811 [PATCH] x86_64 kernel mapping fix
Fix for the x86_64 kernel mapping code.  Without this patch the update path
only inits one pmd_page worth of memory and tramples any entries on it.  now
the calling convention to phys_pmd_init and phys_init is to always pass a
[pmd/pud] page not an offset within a page.

Signed-off-by: Keith Mannthey<kmannth@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-26 10:52:36 +02:00
Andi Kleen
273819a2d9 [PATCH] make fault notifier unconditional and export it
It's needed for external debuggers and overhead is very small.

Also make the actual notifier chain they use static

Cc: jbeulich@novell.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:35 +02:00
Andi Kleen
dd2994f619 [PATCH] Add sparse annotations to quiet sparse in arch/x86_64/mm/fault.c
Fixes

linux/arch/x86_64/mm/fault.c:125:7: warning: incorrect type in argument 1 (different address spaces)
linux/arch/x86_64/mm/fault.c:125:7:    expected void [noderef] *<noident><asn:1>
linux/arch/x86_64/mm/fault.c:125:7:    got unsigned char *[assigned] instr
linux/arch/x86_64/mm/fault.c:163:8: warning: incorrect type in argument 1 (different address spaces)
linux/arch/x86_64/mm/fault.c:163:8:    expected void [noderef] *<noident><asn:1>
linux/arch/x86_64/mm/fault.c:163:8:    got unsigned char *[assigned] instr
linux/arch/x86_64/mm/fault.c:179:9: warning: incorrect type in argument 1 (different address spaces)
linux/arch/x86_64/mm/fault.c:179:9:    expected void [noderef] *<noident><asn:1>
linux/arch/x86_64/mm/fault.c:179:9:    got unsigned long *<noident>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
c31fbb1ad8 [PATCH] Clean up acpi_numa variable
Move it into srat.c No need to clutter up setup.c for it

And remove use in setup.c completely - it only guarded a printk
which can be done unconditionally.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
2c8c0e6b8d [PATCH] Convert x86-64 to early param
Instead of hackish manual parsing

Requires earlier i386 patchkit, but also fixes i386 early_printk again.

I removed some obsolete really early parameters which didn't do anything useful.
Also made a few parameters that needed it early (mostly oops printing setup)

Also removed one panic check that wasn't visible without
early console anyways (the early console is now initialized after that
panic)

This cleans up a lot of code.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:32 +02:00
Jan Beulich
caff0710eb [PATCH] initialize end of memory variables as early as possible
While an earlier patch already did a small step into that direction,
this patch moves initialization of all memory end variables to as
early as possible, so that dependent code doesn't need to check
whether these variables have already been set.

Also, remove a misleading (perhaps just outdated) comment, and make
static a variable only used in a single file.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:31 +02:00
Ingo Molnar
3ac94932a2 [PATCH] lockdep: beautify x86_64 stacktraces
Beautify x86_64 stacktraces to be more readable.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:02 -07:00
Heiko Carstens
a581c2a469 [PATCH] add __[start|end]_rodata sections to asm-generic/sections.h
Add __start_rodata and __end_rodata to sections.h to avoid extern
declarations.  Needed by s390 code (see following patch).

[akpm@osdl.org: update architectures]
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:03 -07:00
Jörn Engel
6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Adrian Bunk
80f7228b59 typo fixes: occuring -> occurring
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 18:27:16 +02:00
Randy Dunlap
c9cf55285e [PATCH] add poison.h and patch primary users
Localize poison values into one header file for better documentation and
easier/quicker debugging and so that the same values won't be used for
multiple purposes.

Use these constants in core arch., mm, driver, and fs code.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Matt Mackall <mpm@selenic.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:38 -07:00
Yasunori Goto
bc02af93dd [PATCH] pgdat allocation for new node add (specify node id)
Change the name of old add_memory() to arch_add_memory.  And use node id to
get pgdat for the node at NODE_DATA().

Note: Powerpc's old add_memory() is defined as __devinit. However,
      add_memory() is usually called only after bootup.
      I suppose it may be redundant. But, I'm not well known about powerpc.
      So, I keep it. (But, __meminit is better at least.)

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:35 -07:00
Linus Torvalds
81a07d7588 Merge branch 'x86-64'
* x86-64: (83 commits)
  [PATCH] x86_64: x86_64 stack usage debugging
  [PATCH] x86_64: (resend) x86_64 stack overflow debugging
  [PATCH] x86_64: msi_apic.c build fix
  [PATCH] x86_64: i386/x86-64 Add nmi watchdog support for new Intel CPUs
  [PATCH] x86_64: Avoid broadcasting NMI IPIs
  [PATCH] x86_64: fix apic error on bootup
  [PATCH] x86_64: enlarge window for stack growth
  [PATCH] x86_64: Minor string functions optimizations
  [PATCH] x86_64: Move export symbols to their C functions
  [PATCH] x86_64: Standardize i386/x86_64 handling of NMI_VECTOR
  [PATCH] x86_64: Fix modular pc speaker
  [PATCH] x86_64: remove sys32_ni_syscall()
  [PATCH] x86_64: Do not use -ffunction-sections for modules
  [PATCH] x86_64: Add cpu_relax to apic_wait_icr_idle
  [PATCH] x86_64: adjust kstack_depth_to_print default
  [PATCH] i386/x86-64: adjust /proc/interrupts column headings
  [PATCH] x86_64: Fix race in cpu_local_* on preemptible kernels
  [PATCH] x86_64: Fix fast check in safe_smp_processor_id
  [PATCH] x86_64: x86_64 setup.c - printing cmp related boottime information
  [PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status
  ...

Manual resolve of trivial conflict in arch/i386/kernel/Makefile
2006-06-26 10:51:09 -07:00
Chuck Ebbert
03fdc2c277 [PATCH] x86_64: enlarge window for stack growth
Allow stack growth so the 'enter' instruction works.  Also
fixes problem in compat_sys_kexec_load() which could allocate
more than 128 bytes using compat_alloc_user_space().

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:22 -07:00
Andi Kleen
2ee60e1789 [PATCH] x86_64: Move export symbols to their C functions
Only exports for assembler files are left in x8664_ksyms.c

Originally inspired by a patch from Al Viro
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:22 -07:00
Jan Beulich
5f51e139d8 [PATCH] x86_64: miscellaneous mm/init.c fixes
- fix an off-by-one error in phys_pmd_init()
 - prevent phys_pmd_init() from removing mappings established earlier
 - fix the direct mapping early printk to in fact show the end of the range
 - remove an apparently orphan comment

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:20 -07:00
Jon Mason
0dc243ae10 [PATCH] x86_64: Calgary IOMMU - IOMMU abstractions
This patch creates a new interface for IOMMUs by adding a centralized
location for IOMMU allocation (for translation tables/apertures) and
IOMMU initialization.  In creating these, code was moved around for
abstraction, uniformity, and consiceness.

Take note of the move of the iommu_setup bootarg parsing code to
__setup.  This is enabled by moving back the location of the aperture
allocation/detection to mem init (which while ugly, was already the
location of the swiotlb_init).

While a slight departure from the previous patch, I belive this provides
the true intention of the previous versions of the patch which changed
this code.  It also makes the addition of the upcoming calgary code much
cleaner than previous patches.

[AK: Removed one broken change. iommu_setup still has to be called
early]

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:18 -07:00
Andi Kleen
d2ae5b5f6a [PATCH] x86_64: Get rid of pud_offset_k / __pud_offset_k
pud_offset_k() equivalent to pud_offset() now.  Pointed out by Jan Beulich
Similar for __pud_offset_ok, which needs a small change in the callers.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:18 -07:00
Gerd Hoffmann
d167a51877 [PATCH] x86_64: x86_64 version of the smp alternative patch.
Changes are largely identical to the i386 version:

 * alternative #define are moved to the new alternative.h file.
 * one new elf section with pointers to the lock prefixes which can be
   nop'ed out for non-smp.
 * two new elf sections simliar to the "classic" alternatives to
   replace SMP code with simpler UP code.
 * fixup headers to use alternative.h instead of defining their own
   LOCK / LOCK_PREFIX macros.

The patch reuses the i386 version of the alternatives code to avoid code
duplication.  The code in alternatives.c was shuffled around a bit to
reduce the number of #ifdefs needed.  It also got some tweaks needed for
x86_64 (vsyscall page handling) and new features (noreplacement option
which was x86_64 only up to now).  Debug printk's are changed from
compile-time to runtime.

Loosely based on a early version from Bastian Blank <waldi@debian.org>

Signed-off-by: Gerd Hoffmann <kraxel@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:14 -07:00
Anil S Keshavamurthy
1bd858a507 [PATCH] Notify page fault call chain for x86_64
Currently in the do_page_fault() code path, we call notify_die(DIE_PAGE_FAULT,
...) to notify the page fault.  Since notify_die() is highly overloaded, this
page fault notification is currently being sent to all the components
registered with register_die_notification() which uses the same die_chain to
loop for all the registered components which is unnecessary.

In order to optimize the do_page_fault() code path, this critical page fault
notification is now moved to different call chain and the test results showed
great improvements.

And the kprobes which is interested in this notifications, now registers onto
this new call chain only when it need to, i.e Kprobes now registers for page
fault notification only when their are an active probes and unregisters from
this page fault notification when no probes are active.

I have incorporated all the feedback given by Ananth and Keith and everyone,
and thanks for all the review feedback.

This patch:

Overloading of page fault notification with the notify_die() has performance
issues(since the only interested components for page fault is kprobes and/or
kdb) and hence this patch introduces the new notifier call chain exclusively
for page fault notifications their by avoiding notifying unnecessary
components in the do_page_fault() code path.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:22 -07:00
Yasunori Goto
762834e8bf [PATCH] Unify pxm_to_node() and node_to_pxm()
Consolidate the various arch-specific implementations of pxm_to_node() and
node_to_pxm() into a single generic version.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:48 -07:00
Daniel Yeisley
0d01532451 [PATCH] x86_64: Handle empty node zero
From: Daniel Yeisley <dan.yeisley@unisys.com>

It is possible to boot a Unisys ES7000 with CPUs from multiple cells, and not
also include the memory from those cells.  This can create a scenario where
node 0 has cpus, but no associated memory.  The system will boot fine in a
configuration where node 0 has memory, but nodes 2 and 3 do not.

[AK: I rechecked the code and generic code seems to indeed handle that already.
Dan's original patch had a change for mm/slab.c that seems to be already in now.]

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-30 20:31:06 -07:00
Andi Kleen
fad7906d16 [PATCH] x86_64: Fix memory hotadd heuristics
This fixes some boot failures on Dell and Unisys systems with memory
hotadd added.

 - Set hotadd_percent to 0 by default.  This means anybody using hotadd
   memory needs to specify the value on the command line.  That's
   because there are lots of Intel boxes which have a bogus hotplug area
   in their SRAT and they would waste a lot of memory before.
 - Fix calculation of how much memory to use when the hotplug area
   exceeds hotadd_percent
 - Fix fallback when the
 - Fix fallback if memory hotadd is not compiled in.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-16 07:59:31 -07:00
Andy Whitcroft
3b5fd59fdd [PATCH] x86_64: sparsemem does not need node_mem_map
Seems we are trying to init the node_mem_map when we don't need to, for
example when SPARSEMEM is enabled.  This causes the error below during
compilation.  Use CONFIG_FLAT_NODE_MEM_MAP to gate allocation and init.

  arch/x86_64/mm/numa.c: In function `setup_node_zones':
  arch/x86_64/mm/numa.c:191: error: structure has no member
                                                  named `node_mem_map'

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-22 09:19:52 -07:00
Arjan van de Ven
eee5a9fa63 [PATCH] x86_64: Rename e820_mapped to e820_any_mapped
Rename e820_mapped to e820_any_mapped since it tests if any part of the
range is mapped according to the type.

Later steps will introduce e820_all_mapped which will check if the
entire range is mapped with the type.  Both have their merit.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-09 11:53:17 -07:00
Andi Kleen
a8062231d8 [PATCH] x86_64: Handle empty PXMs that only contain hotplug memory
The node setup code would try to allocate the node metadata in the node
itself, but that fails if there is no memory in there.

This can happen with memory hotplug when the hotplug area defines an so
far empty node.

Now use bootmem to try to allocate the mem_map in other nodes.

And if it fails don't panic, but just ignore the node.

To make this work I added a new __alloc_bootmem_nopanic function that
does what its name implies.

TBD should try to use nearby nodes here.  Currently we just use any.
It's hard to do it better because bootmem doesn't have proper fallback
lists yet.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-09 11:53:16 -07:00
Andi Kleen
68a3a7feb0 [PATCH] x86_64: Reserve SRAT hotadd memory on x86-64
From: Keith Mannthey, Andi Kleen

Implement memory hotadd without sparsemem. The memory in the SRAT
hotadd area is just preserved instead and can be activated later.

There are a few restrictions:
- Only one continuous hotadd area allowed per node

The main problem is dealing with the many buggy SRAT tables
that are out there. The strategy here is to reject anything
suspicious.

Originally from Keith Mannthey, with several hacks and changes by AK
and also contributions from Andrew Morton

[ TBD: Problems pointed out by KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:

 1) Goto's rebuild_zonelist patch will not work if CONFIG_MEMORY_HOTPLUG=n.

    Rebuilding zonelist is necessary when the system has just memory <
    4G at boot, and hot add memory > 4G.  because x86_64 has DMA32,
    ZONE_NORAML is not included into zonelist at boot time if system
    doesn't have memory >4G at boot.

    [AK: should just force the higher zones at boot time when SRAT tells us]

 2) zone and node's spanned_pages and present_pages are not incremented.
    They should be.

    For example, our server (ia64/Fujitsu PrimeQuest) can equip memory
    from 4G to 1T(maybe 2T in future), and SRAT will *always* say we have
    possible 1T +memory.  (Microsoft requires "write all possible memory
    in SRAT") When we reserve memmap for possible 1T memory, Linux will
    not work well in +minimum 4G configuraion ;)

    [AK: needs limiting to 5-10% of max memory]
 ]

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-09 11:53:16 -07:00
Andi Kleen
9d99aaa31f [PATCH] x86_64: Support memory hotadd without sparsemem
Memory hotadd doesn't need SPARSEMEM, but can be handled by just preallocating
mem_maps. This only needs some untangling of ifdefs to enable the necessary
code even without SPARSEMEM.

Originally from Keith Mannthey, hacked by AK.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-09 11:53:16 -07:00
OGAWA Hirofumi
9b41046cd0 [PATCH] Don't pass boot parameters to argv_init[]
The boot cmdline is parsed in parse_early_param() and
parse_args(,unknown_bootoption).

And __setup() is used in obsolete_checksetup().

	start_kernel()
		-> parse_args()
			-> unknown_bootoption()
				-> obsolete_checksetup()

If __setup()'s callback (->setup_func()) returns 1 in
obsolete_checksetup(), obsolete_checksetup() thinks a parameter was
handled.

If ->setup_func() returns 0, obsolete_checksetup() tries other
->setup_func().  If all ->setup_func() that matched a parameter returns 0,
a parameter is seted to argv_init[].

Then, when runing /sbin/init or init=app, argv_init[] is passed to the app.
If the app doesn't ignore those arguments, it will warning and exit.

This patch fixes a wrong usage of it, however fixes obvious one only.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:18:53 -08:00
KAMEZAWA Hiroyuki
ec936fc563 [PATCH] for_each_online_pgdat: renaming for_each_pgdat
Replace for_each_pgdat() with for_each_online_pgdat().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:48 -08:00
KAMEZAWA Hiroyuki
dc8ecb4370 [PATCH] unify pfn_to_page: x86_64 pfn_to_page
x86_64 can use generic funcs.
For DISCONTIGMEM, CONFIG_OUT_OF_LINE_PFN_TO_PAGE is selected.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:44 -08:00
Eric Dumazet
dcf36bfa5d [PATCH] x86_64: group memnodemap and memnodeshift in a memnode structure
pfn_to_page() and others need to access both memnode_shift and the very
first bytes of memnodemap[]. If we force memnode_shift to be just before the
memnodemap array, we can reduce the memory footprint to one cache line
instead of two for most setups. This patch introduce a 'memnode' structure
where shift and map[] are carefully placed.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:14:38 -08:00
Andi Kleen
267b48014a [PATCH] x86_64: Try to allocate node memmap near the end of node
This fixes problems with very large nodes (over 128GB) filling up all of
the first 4GB with their mem_map and not leaving enough space for the
swiotlb.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:10:56 -08:00
Andi Kleen
5f44a66980 [PATCH] x86_64: Add __init to fixmap functions that are only called during boot
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:10:55 -08:00
Andi Kleen
f2d3efedbe [PATCH] x86_64: Implement early DMI scanning
There are more and more cases where we need to know DMI information
early to work around bugs.  i386 already had early DMI scanning, but
x86-64 didn't.  Implement this now.

This required some cleanup in the i386 code.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:10:55 -08:00
Arjan van de Ven
a9ba9a3b38 [PATCH] x86_64: prefetch the mmap_sem in the fault path
In a micro-benchmark that stresses the pagefault path, the down_read_trylock
on the mmap_sem showed up quite high on the profile. Turns out this lock is
bouncing between cpus quite a bit and thus is cache-cold a lot. This patch
prefetches the lock (for write) as early as possible (and before some other
somewhat expensive operations). With this patch, the down_read_trylock
basically fell out of the top of profile.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:10:54 -08:00
Jan Beulich
8c914cb704 [PATCH] x86_64: actively synchronize vmalloc area when registering certain callbacks
While the modular aspect of the respective i386 patch doesn't apply to
x86-64 (as the top level page directory entry is shared between modules
and the base kernel), handlers registered with register_die_notifier()
are still under similar constraints for touching ioremap()ed or
vmalloc()ed memory. The likelihood of this problem becoming visible is
of course significantly lower, as the assigned virtual addresses would
have to cross a 2**39 byte boundary. This is because the callback gets
invoked
(a) in the page fault path before the top level page table propagation
gets carried out (hence a fault to propagate the top level page table
entry/entries mapping to module's code/data would nest infinitly) and
(b) in the NMI path, where nested faults must absolutely not happen,
since otherwise the IRET from the nested fault re-enables NMIs,
potentially resulting in nested NMI occurences.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:10:53 -08:00
Andi Kleen
abe059e759 [PATCH] x86_64: Rename struct node in x86-64 NUMA code to struct bootnode
It conflicts with the struct node in node.h
Actually the x86-64 version was there first, but ..

Suggested by Jan Beulich

Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:10:52 -08:00
Jan Beulich
c7ea1a96ec [PATCH] x86_64: Use correct PUD for memory hotadd
Memory >39bits has a different PUD.

Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:10:52 -08:00
Nick Piggin
7835e98b2e [PATCH] remove set_page_count() outside mm/
set_page_count usage outside mm/ is limited to setting the refcount to 1.
Remove set_page_count from outside mm/, and replace those users with
init_page_count() and set_page_refcounted().

This allows more debug checking, and tighter control on how code is allowed
to play around with page->_count.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:02 -08:00
Nick Piggin
4fa4f53bf9 [PATCH] x86_64: pageattr remove __put_page
Remove page_count and __put_page from x86-64 pageattr

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:01 -08:00
Nick Piggin
20aaffd6a6 [PATCH] x86_64: pageattr use single list
Use page->lru.next to implement the singly linked list of pages rather than
the struct deferred_page which needs to be allocated and freed for each
page.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:01 -08:00