Commit Graph

4794 Commits

Author SHA1 Message Date
Robert Richter
e2fee2761a OProfile: Rework oprofile_add_ibs_sample() function
Code looks much more cleaner now.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-10-15 20:47:31 +02:00
Xiantao Zhang
3de42dc094 KVM: Separate irq ack notification out of arch/x86/kvm/irq.c
Moving irq ack notification logic as common, and make
it shared with ia64 side.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 14:25:35 +02:00
Xiantao Zhang
8a98f6648a KVM: Move device assignment logic to common code
To share with other archs, this patch moves device assignment
logic to common parts.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 14:25:33 +02:00
Zhang xiantao
371c01b28e KVM: Device Assignment: Move vtd.c from arch/x86/kvm/ to virt/kvm/
Preparation for kvm/ia64 VT-d support.

Signed-off-by: Zhang xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 14:25:32 +02:00
Marcelo Tosatti
83dbc83a0d KVM: VMX: enable invlpg exiting if EPT is disabled
Manually disabling EPT via module option fails to re-enable INVLPG
exiting.

Reported-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 14:25:31 +02:00
Jan Kiszka
1b10bf31a5 KVM: x86: Silence various LAPIC-related host kernel messages
KVM-x86 dumps a lot of debug messages that have no meaning for normal
operation:
 - INIT de-assertion is ignored
 - SIPIs are sent and received
 - APIC writes are unaligned or < 4 byte long
   (Windows Server 2003 triggers this on SMP)

Degrade them to true debug messages, keeping the host kernel log clean
for real problems.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:30 +02:00
Weidong Han
e5fcfc821a KVM: Device Assignment: Map mmio pages into VT-d page table
Assigned device could DMA to mmio pages, so also need to map mmio pages
into VT-d page table.

Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:29 +02:00
Marcelo Tosatti
e48258009d KVM: PIC: enhance IPI avoidance
The PIC code makes little effort to avoid kvm_vcpu_kick(), resulting in
unnecessary guest exits in some conditions.

For example, if the timer interrupt is routed through the IOAPIC, IRR
for IRQ 0 will get set but not cleared, since the APIC is handling the
acks.

This means that everytime an interrupt < 16 is triggered, the priority
logic will find IRQ0 pending and send an IPI to vcpu0 (in case IRQ0 is
not masked, which is Linux's case).

Introduce a new variable isr_ack to represent the IRQ's for which the
guest has been signalled / cleared the ISR. Use it to avoid more than
one IPI per trigger-ack cycle, in addition to the avoidance when ISR is
set in get_priority().

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:28 +02:00
Marcelo Tosatti
582801a95d KVM: MMU: add "oos_shadow" parameter to disable oos
Subject says it all.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:27 +02:00
Marcelo Tosatti
0074ff63eb KVM: MMU: speed up mmu_unsync_walk
Cache the unsynced children information in a per-page bitmap.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:26 +02:00
Marcelo Tosatti
4731d4c7a0 KVM: MMU: out of sync shadow core
Allow guest pagetables to go out of sync.  Instead of emulating write
accesses to guest pagetables, or unshadowing them, we un-write-protect
the page table and allow the guest to modify it at will.  We rely on
invlpg executions to synchronize individual ptes, and will synchronize
the entire pagetable on tlb flushes.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:25 +02:00
Marcelo Tosatti
6844dec694 KVM: MMU: mmu_convert_notrap helper
Need to convert shadow_notrap_nonpresent -> shadow_trap_nonpresent when
unsyncing pages.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:24 +02:00
Marcelo Tosatti
0738541396 KVM: MMU: awareness of new kvm_mmu_zap_page behaviour
kvm_mmu_zap_page will soon zap the unsynced children of a page. Restart
list walk in such case.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:23 +02:00
Marcelo Tosatti
ad8cfbe3ff KVM: MMU: mmu_parent_walk
Introduce a function to walk all parents of a given page, invoking a handler.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:22 +02:00
Marcelo Tosatti
a7052897b3 KVM: x86: trap invlpg
With pages out of sync invlpg needs to be trapped. For now simply nuke
the entry.

Untested on AMD.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:21 +02:00
Marcelo Tosatti
0ba73cdadb KVM: MMU: sync roots on mmu reload
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:20 +02:00
Marcelo Tosatti
e8bc217aef KVM: MMU: mode specific sync_page
Examine guest pagetable and bring the shadow back in sync. Caller is responsible
for local TLB flush before re-entering guest mode.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:19 +02:00
Marcelo Tosatti
38187c830c KVM: MMU: do not write-protect large mappings
There is not much point in write protecting large mappings. This
can only happen when a page is shadowed during the window between
is_largepage_backed and mmu_lock acquision. Zap the entry instead, so
the next pagefault will find a shadowed page via is_largepage_backed and
fallback to 4k translations.

Simplifies out of sync shadow.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:18 +02:00
Marcelo Tosatti
a378b4e64c KVM: MMU: move local TLB flush to mmu_set_spte
Since the sync page path can collapse flushes.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:17 +02:00
Marcelo Tosatti
1e73f9dd88 KVM: MMU: split mmu_set_spte
Split the spte entry creation code into a new set_spte function.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:16 +02:00
Marcelo Tosatti
93a423e704 KVM: MMU: flush remote TLBs on large->normal entry overwrite
It is necessary to flush all TLB's when a large spte entry is
overwritten with a normal page directory pointer.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:15 +02:00
Harvey Harrison
a08546001c x86: pvclock: fix shadowed variable warning
arch/x86/kernel/pvclock.c:102:6: warning: symbol 'tsc_khz' shadows an earlier one
include/asm/tsc.h:18:21: originally declared here

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:14 +02:00
Gleb Natapov
af2152f545 KVM: don't enter guest after SIPI was received by a CPU
The vcpu should process pending SIPI message before entering guest mode again.
kvm_arch_vcpu_runnable() returns true if the vcpu is in SIPI state, so
we can't call it here.

Signed-off-by: Gleb Natapov <gleb@qumranet.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:09 +02:00
Harvey Harrison
2259e3a7a6 KVM: x86.c make kvm_load_realmode_segment static
Noticed by sparse:
arch/x86/kvm/x86.c:3591:5: warning: symbol 'kvm_load_realmode_segment' was not declared. Should it be static?

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:07 +02:00
Marcelo Tosatti
4c2155ce81 KVM: switch to get_user_pages_fast
Convert gfn_to_pfn to use get_user_pages_fast, which can do lockless
pagetable lookups on x86. Kernel compilation on 4-way guest is 3.7%
faster on VMX.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:06 +02:00
Amit Shah
bfadaded0d KVM: Device Assignment: Free device structures if IRQ allocation fails
When an IRQ allocation fails, we free up the device structures and
disable the device so that we can unregister the device in the
userspace and not expose it to the guest at all.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:04 +02:00
Ben-Ami Yassour
62c476c7c7 KVM: Device Assignment with VT-d
Based on a patch by: Kay, Allen M <allen.m.kay@intel.com>

This patch enables PCI device assignment based on VT-d support.
When a device is assigned to the guest, the guest memory is pinned and
the mapping is updated in the VT-d IOMMU.

[Amit: Expose KVM_CAP_IOMMU so we can check if an IOMMU is present
and also control enable/disable from userspace]

Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Amit Shah <amit.shah@qumranet.com>

Acked-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 14:25:04 +02:00
Ingo Molnar
6b2ada8210 Merge branches 'core/softlockup', 'core/softirq', 'core/resources', 'core/printk' and 'core/misc' into core-v28-for-linus 2008-10-15 12:48:44 +02:00
Guillaume Thouvenin
aa3a816b6d KVM: x86 emulator: Use DstAcc for 'and'
For instruction 'and al,imm' we use DstAcc instead of doing
the emulation directly into the instruction's opcode.

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:16:14 +02:00
Guillaume Thouvenin
8a9fee67fb KVM: x86 emulator: Add cmp al, imm and cmp ax, imm instructions (ocodes 3c, 3d)
Add decode entries for these opcodes; execution is already implemented.

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:16:14 +02:00
Guillaume Thouvenin
9c9fddd0e7 KVM: x86 emulator: Add DstAcc operand type
Add DstAcc operand type. That means that there are 4 bits now for
DstMask.

"In the good old days cpus would have only one register that was able to
 fully participate in arithmetic operations, typically called A for
 Accumulator.  The x86 retains this tradition by having special, shorter
 encodings for the A register (like the cmp opcode), and even some
 instructions that only operate on A (like mul).

 SrcAcc and DstAcc would accommodate these instructions by decoding A
 into the corresponding 'struct operand'."
  -- Avi Kivity

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:16:14 +02:00
Sheng Yang
defed7ed92 x86: Move FEATURE_CONTROL bits to msr-index.h
For MSR_IA32_FEATURE_CONTROL is already there.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:16:14 +02:00
Sheng Yang
9ea542facb KVM: VMX: Rename IA32_FEATURE_CONTROL bits
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:16:14 +02:00
Avi Kivity
ef46f18ea0 KVM: x86 emulator: fix jmp r/m64 instruction
jmp r/m64 doesn't require the rex.w prefix to indicate the operand size
is 64 bits.  Set the Stack attribute (even though it doesn't involve the
stack, really) to indicate this.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:27 +02:00
Jan Kiszka
4b92fe0c9d KVM: VMX: Cleanup stalled INTR_INFO read
Commit 1c0f4f5011829dac96347b5f84ba37c2252e1e08 left a useless access
of VM_ENTRY_INTR_INFO_FIELD in vmx_intr_assist behind. Clean this up.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:26 +02:00
Marcelo Tosatti
9c3e4aab5a KVM: x86: unhalt vcpu0 on reset
Since "KVM: x86: do not execute halted vcpus", HLT by vcpu0 before system
reset by the IO thread will hang the guest.

Mark vcpu as runnable in such case.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:26 +02:00
Mohammed Gamal
d19292e457 KVM: x86 emulator: Add call near absolute instruction (opcode 0xff/2)
Add call near absolute instruction.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:26 +02:00
Marcelo Tosatti
d76901750a KVM: x86: do not execute halted vcpus
Offline or uninitialized vcpu's can be executed if requested to perform
userspace work.

Follow Avi's suggestion to handle halted vcpu's in the main loop,
simplifying kvm_emulate_halt(). Introduce a new vcpu->requests bit to
indicate events that promote state from halted to running.

Also standardize vcpu wake sites.

Signed-off-by: Marcelo Tosatti <mtosatti <at> redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:26 +02:00
Mohammed Gamal
a6a3034cb9 KVM: x86 emulator: Add in/out instructions (opcodes 0xe4-0xe7, 0xec-0xef)
The patch adds in/out instructions to the x86 emulator.

The instruction was encountered while running the BIOS while using
the invalid guest state emulation patch.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:25 +02:00
Avi Kivity
fa89a81766 KVM: Add statistics for guest irq injections
These can help show whether a guest is making progress or not.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:25 +02:00
Sheng Yang
d40a1ee485 KVM: MMU: Modify kvm_shadow_walk.entry to accept u64 addr
EPT is 4 level by default in 32pae(48 bits), but the addr parameter
of kvm_shadow_walk->entry() only accept unsigned long as virtual
address, which is 32bit in 32pae. This result in SHADOW_PT_INDEX()
overflow when try to fetch level 4 index.

Fix it by extend kvm_shadow_walk->entry() to accept 64bit addr in
parameter.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:25 +02:00
Mohammed Gamal
fb4616f431 KVM: x86 emulator: Add std and cld instructions (opcodes 0xfc-0xfd)
This adds the std and cld instructions to the emulator.

Encountered while running the BIOS with invalid guest
state emulation enabled.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:25 +02:00
Joerg Roedel
a89c1ad270 KVM: add MC5_MISC msr read support
Currently KVM implements MC0-MC4_MISC read support. When booting Linux this
results in KVM warnings in the kernel log when the guest tries to read
MC5_MISC. Fix this warnings with this patch.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:24 +02:00
Avi Kivity
48d1503949 KVM: SVM: No need to unprotect memory during event injection when using npt
No memory is protected anyway.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:24 +02:00
Avi Kivity
3201b5d9f0 KVM: MMU: Fix setting the accessed bit on non-speculative sptes
The accessed bit was accidentally turned on in a random flag word, rather
than, the spte itself, which was lucky, since it used the non-EPT compatible
PT_ACCESSED_MASK.

Fix by turning the bit on in the spte and changing it to use the portable
accessed mask.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:24 +02:00
Avi Kivity
171d595d3b KVM: MMU: Flush tlbs after clearing write permission when accessing dirty log
Otherwise, the cpu may allow writes to the tracked pages, and we lose
some display bits or fail to migrate correctly.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:24 +02:00
Avi Kivity
2245a28fe2 KVM: MMU: Add locking around kvm_mmu_slot_remove_write_access()
It was generally safe due to slots_lock being held for write, but it wasn't
very nice.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:24 +02:00
Avi Kivity
bc2d429979 KVM: MMU: Account for npt/ept/realmode page faults
Now that two-dimensional paging is becoming common, account for tdp page
faults.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:23 +02:00
Mohammed Gamal
a5e2e82b8b KVM: x86 emulator: Add mov r, imm instructions (opcodes 0xb0-0xbf)
The emulator only supported one instance of mov r, imm instruction
(opcode 0xb8), this adds the rest of these instructions.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:23 +02:00
Avi Kivity
acee3c04e8 KVM: Allocate guest memory as MAP_PRIVATE, not MAP_SHARED
There is no reason to share internal memory slots with fork()ed instances.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:23 +02:00
Avi Kivity
abb9e0b8e3 KVM: MMU: Convert the paging mode shadow walk to use the generic walker
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:23 +02:00
Avi Kivity
140754bc80 KVM: MMU: Convert direct maps to use the generic shadow walker
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:23 +02:00
Avi Kivity
3d000db568 KVM: MMU: Add generic shadow walker
We currently walk the shadow page tables in two places: direct map (for
real mode and two dimensional paging) and paging mode shadow.  Since we
anticipate requiring a third walk (for invlpg), it makes sense to have
a generic facility for shadow walk.

This patch adds such a shadow walker, walks the page tables and calls a
method for every spte encountered.  The method can examine the spte,
modify it, or even instantiate it.  The walk can be aborted by returning
nonzero from the method.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:23 +02:00
Avi Kivity
6c41f428b7 KVM: MMU: Infer shadow root level in direct_map()
In all cases the shadow root level is available in mmu.shadow_root_level,
so there is no need to pass it as a parameter.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:22 +02:00
Avi Kivity
6e37d3dc3e KVM: MMU: Unify direct map 4K and large page paths
The two paths are equivalent except for one argument, which is already
available.  Merge the two codepaths.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:22 +02:00
Avi Kivity
135f8c2b07 KVM: MMU: Move SHADOW_PT_INDEX to mmu.c
It is not specific to the paging mode, so can be made global (and reusable).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:22 +02:00
Avi Kivity
6eb06cb286 KVM: x86 emulator: remove bad ByteOp specifier from NEG descriptor
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:22 +02:00
roel kluin
41afa02587 KVM: x86 emulator: remove duplicate SrcImm
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:21 +02:00
Avi Kivity
f4bbd9aaaa KVM: Load real mode segments correctly
Real mode segments to not reference the GDT or LDT; they simply compute
base = selector * 16.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:21 +02:00
Avi Kivity
a16b20da87 KVM: VMX: Change segment dpl at reset to 3
This is more emulation friendly, if not 100% correct.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:21 +02:00
Avi Kivity
5706be0daf KVM: VMX: Change cs reset state to be a data segment
Real mode cs is a data segment, not a code segment.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:21 +02:00
Harvey Harrison
ee032c993e KVM: make irq ack notifier functions static
sparse says:

arch/x86/kvm/x86.c:107:32: warning: symbol 'kvm_find_assigned_dev' was not declared. Should it be static?
arch/x86/kvm/i8254.c:225:6: warning: symbol 'kvm_pit_ack_irq' was not declared. Should it be static?

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:21 +02:00
Amit Shah
29c8fa32c5 KVM: Use kvm_set_irq to inject interrupts
... instead of using the pic and ioapic variants

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:21 +02:00
Amit Shah
94c935a1ee KVM: SVM: Fix typo
Fix typo in as-yet unused macro definition.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:20 +02:00
Mohammed Gamal
a89a8fb93b KVM: VMX: Modify mode switching and vmentry functions
This patch modifies mode switching and vmentry function in order to
drive invalid guest state emulation.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:20 +02:00
Mohammed Gamal
ea953ef0ca KVM: VMX: Add invalid guest state handler
This adds the invalid guest state handler function which invokes the x86
emulator until getting the guest to a VMX-friendly state.

[avi: leave atomic context if scheduling]
[guillaume: return to atomic context correctly]

Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:20 +02:00
Mohammed Gamal
04fa4d3211 KVM: VMX: Add module parameter and emulation flag.
The patch adds the module parameter required to enable emulating invalid
guest state, as well as the emulation_required flag used to drive
emulation whenever needed.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:20 +02:00
Mohammed Gamal
648dfaa7df KVM: VMX: Add Guest State Validity Checks
This patch adds functions to check whether guest state is VMX compliant.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:20 +02:00
Amit Shah
6762b7299a KVM: Device assignment: Check for privileges before assigning irq
Even though we don't share irqs at the moment, we should ensure
regular user processes don't try to allocate system resources.

We check for capability to access IO devices (CAP_SYS_RAWIO) before
we request_irq on behalf of the guest.

Noticed by Avi.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:20 +02:00
Avi Kivity
dc7404cea3 KVM: Handle spurious acks for PIT interrupts
Spurious acks can be generated, for example if the PIC is being reset.
Handle those acks gracefully rather than flooding the log with warnings.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:19 +02:00
Marcelo Tosatti
85428ac7c3 KVM: fix i8259 reset irq acking
The irq ack during pic reset has three problems:

- Ignores slave/master PIC, using gsi 0-8 for both.
- Generates an ACK even if the APIC is in control.
- Depends upon IMR being clear, which is broken if the irq was masked
at the time it was generated.

The last one causes the BIOS to hang after the first reboot of
Windows installation, since PIT interrupts stop.

[avi: fix check whether pic interrupts are seen by cpu]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:19 +02:00
Avi Kivity
ecfc79c700 KVM: VMX: Use interrupt queue for !irqchip_in_kernel
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:19 +02:00
Marcelo Tosatti
29415c37f0 KVM: set debug registers after "schedulable" section
The vcpu thread can be preempted after the guest_debug_pre() callback,
resulting in invalid debug registers on the new vcpu.

Move it inside the non-preemptable section.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:19 +02:00
Sheng Yang
464d17c8b7 KVM: VMX: Clean up magic number 0x66 in init_rmode_tss
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:19 +02:00
Dave Hansen
6ad18fba05 KVM: Reduce stack usage in kvm_pv_mmu_op()
We're in a hot path.  We can't use kmalloc() because
it might impact performance.  So, we just stick the buffer that
we need into the kvm_vcpu_arch structure.  This is used very
often, so it is not really a waste.

We also have to move the buffer structure's definition to the
arch-specific x86 kvm header.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:18 +02:00
Dave Hansen
b772ff362e KVM: Reduce stack usage in kvm_arch_vcpu_ioctl()
[sheng: fix KVM_GET_LAPIC using wrong size]

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:18 +02:00
Dave Hansen
f0d662759a KVM: Reduce kvm stack usage in kvm_arch_vm_ioctl()
On my machine with gcc 3.4, kvm uses ~2k of stack in a few
select functions.  This is mostly because gcc fails to
notice that the different case: statements could have their
stack usage combined.  It overflows very nicely if interrupts
happen during one of these large uses.

This patch uses two methods for reducing stack usage.
1. dynamically allocate large objects instead of putting
   on the stack.
2. Use a union{} member for all of the case variables. This
   tricks gcc into combining them all into a single stack
   allocation. (There's also a comment on this)

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:18 +02:00
Ben-Ami Yassour
4d5c5d0fe8 KVM: pci device assignment
Based on a patch from: Amit Shah <amit.shah@qumranet.com>

This patch adds support for handling PCI devices that are assigned to
the guest.

The device to be assigned to the guest is registered in the host kernel
and interrupt delivery is handled.  If a device is already assigned, or
the device driver for it is still loaded on the host, the device
assignment is failed by conveying a -EBUSY reply to the userspace.

Devices that share their interrupt line are not supported at the moment.

By itself, this patch will not make devices work within the guest.
The VT-d extension is required to enable the device to perform DMA.
Another alternative is PVDMA.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:18 +02:00
Glauber Costa
0293615f3f x86: KVM guest: use paravirt function to calculate cpu khz
We're currently facing timing problems in guests that do
calibration under heavy load, and then the load vanishes.
This means we'll have a much lower lpj than we actually should,
and delays end up taking less time than they should, which is a
nasty bug.

Solution is to pass on the lpj value from host to guest, and have it
preset.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:17 +02:00
Glauber Costa
3807f345b2 x86: paravirt: factor out cpu_khz to common code
KVM intends to use paravirt code to calibrate khz. Xen
current code will do just fine. So as a first step, factor out
code to pvclock.c.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:17 +02:00
Marcelo Tosatti
3cf57fed21 KVM: PIT: fix injection logic and count
The PIT injection logic is problematic under the following cases:

1) If there is a higher priority vector to be delivered by the time
kvm_pit_timer_intr_post is invoked ps->inject_pending won't be set.
This opens the possibility for missing many PIT event injections (say if
guest executes hlt at this point).

2) ps->inject_pending is racy with more than two vcpus. Since there's no locking
around read/dec of pt->pending, two vcpu's can inject two interrupts for a single
pt->pending count.

Fix 1 by using an irq ack notifier: only reinject when the previous irq
has been acked. Fix 2 with appropriate locking around manipulation of
pending count and irq_ack by the injection / ack paths.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:17 +02:00
Marcelo Tosatti
f52447261b KVM: irq ack notification
Based on a patch from: Ben-Ami Yassour <benami@il.ibm.com>
which was based on a patch from: Amit Shah <amit.shah@qumranet.com>

Notify IRQ acking on PIC/APIC emulation. The previous patch missed two things:

- Edge triggered interrupts on IOAPIC
- PIC reset with IRR/ISR set should be equivalent to ack (LAPIC probably
needs something similar).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
CC: Amit Shah <amit.shah@qumranet.com>
CC: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:16 +02:00
Avi Kivity
564f15378f KVM: Add irq ack notifier list
This can be used by kvm subsystems that are interested in when
interrupts are acked, for example time drift compensation.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:16 +02:00
Alexander Graf
b5e2fec0eb KVM: Ignore DEBUGCTL MSRs with no effect
Netware writes to DEBUGCTL and reads from the DEBUGCTL and LAST*IP MSRs
without further checks and is really confused to receive a #GP during that.
To make it happy we should just make them stubs, which is exactly what SVM
already does.

Writes to DEBUGCTL that are vendor-specific are resembled to behave as if the
virtual CPU does not know them.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:15 +02:00
Avi Kivity
313dbd49dc KVM: VMX: Avoid vmwrite(HOST_RSP) when possible
Usually HOST_RSP retains its value across guest entries.  Take advantage
of this and avoid a vmwrite() when this is so.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:15 +02:00
Avi Kivity
80e31d4f61 KVM: SVM: Unify register save/restore across 32 and 64 bit hosts
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:14 +02:00
Avi Kivity
c801949ddf KVM: VMX: Unify register save/restore across 32 and 64 bit hosts
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:14 +02:00
Jan Kiszka
77ab6db0a1 KVM: VMX: Reinject real mode exception
As we execute real mode guests in VM86 mode, exception have to be
reinjected appropriately when the guest triggered them. For this purpose
the patch adopts the real-mode injection pattern used in vmx_inject_irq
to vmx_queue_exception, additionally taking care that the IP is set
correctly for #BP exceptions. Furthermore it extends
handle_rmode_exception to reinject all those exceptions that can be
raised in real mode.

This fixes the execution of himem.exe from FreeDOS and also makes its
debug.com work properly.

Note that guest debugging in real mode is broken now. This has to be
fixed by the scheduled debugging infrastructure rework (will be done
once base patches for QEMU have been accepted).

Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:14 +02:00
Jan Kiszka
19bd8afdc4 KVM: Consolidate XX_VECTOR defines
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:14 +02:00
Avi Kivity
7edd0ce058 KVM: Consolidate PIC isr clearing into a function
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:14 +02:00
Mohammed Gamal
60bd83a125 KVM: VMX: Remove redundant check in handle_rmode_exception
Since checking for vcpu->arch.rmode.active is already done whenever we
call handle_rmode_exception(), checking it inside the function is redundant.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:13 +02:00
Avi Kivity
f7d9238f5d KVM: VMX: Move interrupt post-processing to vmx_complete_interrupts()
Instead of looking at failed injections in the vm entry path, move
processing to the exit path in vmx_complete_interrupts().  This simplifes
the logic and removes any state that is hidden in vmx registers.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:13 +02:00
Avi Kivity
937a7eaef9 KVM: Add a pending interrupt queue
Similar to the exception queue, this hold interrupts that have been
accepted by the virtual processor core but not yet injected.

Not yet used.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:13 +02:00
Avi Kivity
35920a3569 KVM: VMX: Fix pending exception processing
The vmx code assumes that IDT-Vectoring can only be set when an exception
is injected due to the exception in question.  That's not true, however:
if the exception is injected correctly, and later another exception occurs
but its delivery is blocked due to a fault, then we will incorrectly assume
the first exception was not delivered.

Fix by unconditionally dequeuing the pending exception, and requeuing it
(or the second exception) if we see it in the IDT-Vectoring field.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:13 +02:00
Avi Kivity
26eef70c3e KVM: Clear exception queue before emulating an instruction
If we're emulating an instruction, either it will succeed, in which case
any previously queued exception will be spurious, or we will requeue the
same exception.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:13 +02:00
Avi Kivity
668f612fa0 KVM: VMX: Move nmi injection failure processing to vm exit path
Instead of processing nmi injection failure in the vm entry path, move
it to the vm exit path (vm_complete_interrupts()).  This separates nmi
injection from nmi post-processing, and moves the nmi state from the VT
state into vcpu state (new variable nmi_injected specifying an injection
in progress).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:13 +02:00
Avi Kivity
cf393f7566 KVM: Move NMI IRET fault processing to new vmx_complete_interrupts()
Currently most interrupt exit processing is handled on the entry path,
which is confusing.  Move the NMI IRET fault processing to a new function,
vmx_complete_interrupts(), which is called on the vmexit path.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:12 +02:00
Avi Kivity
5b5c6a5a60 KVM: MMU: Simplify kvm_mmu_zap_page()
The twisty maze of conditionals can be reduced.

[joerg: fix tlb flushing]

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:12 +02:00
Avi Kivity
31aa2b44af KVM: MMU: Separate the code for unlinking a shadow page from its parents
Place into own function, in preparation for further cleanups.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:12 +02:00
Amit Shah
867767a365 KVM: Introduce kvm_set_irq to inject interrupts in guests
This function injects an interrupt into the guest given the kvm struct,
the (guest) irq number and the interrupt level.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:12 +02:00
Marcelo Tosatti
5fdbf9765b KVM: x86: accessors for guest registers
As suggested by Avi, introduce accessors to read/write guest registers.
This simplifies the ->cache_regs/->decache_regs interface, and improves
register caching which is important for VMX, where the cost of
vmcs_read/vmcs_write is significant.

[avi: fix warnings]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:13:57 +02:00
Sheng Yang
ca60dfbb69 KVM: VMX: Rename misnamed msr bits
MSR_IA32_FEATURE_LOCKED is just a bit in fact, which shouldn't be prefixed with
MSR_.  So is MSR_IA32_FEATURE_VMXON_ENABLED.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:13:57 +02:00
Bjorn Helgaas
758a7f7bb8 x86: register a platform RTC device if PNP doesn't describe it
Most if not all x86 platforms have an RTC device, but sometimes the RTC
is not exposed as a PNP0b00/PNP0b01/PNP0b02 device in PNPBIOS or ACPI:

    http://bugzilla.kernel.org/show_bug.cgi?id=11580
    https://bugzilla.redhat.com/show_bug.cgi?id=451188

It's best if we can discover the RTC via PNP because then we know
which flavor of device it is, where it lives, and which IRQ it uses.

But if we can't, we should register a platform device using the
compiled-in RTC_PORT/RTC_IRQ resource assumptions.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Reported-by: Rik Theys <rik.theys@esat.kuleuven.be>
Reported-by: shr_msn@yahoo.com.tw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-14 16:30:14 -07:00
Anders Kaseorg
8b27386a9c ftrace: make ftrace_test_p6nop disassembler-friendly
Commit 4c3dc21b136f8cb4b72afee16c3ba7e961656c0b in tip introduced the
5-byte NOP ftrace_test_p6nop:

   jmp . + 5
   .byte 0x00, 0x00, 0x00

This is not friendly to disassemblers because an odd number of 0x00s
ends in the middle of an instruction boundary.  This changes the 0x00s
to 1-byte NOPs (0x90).

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:39:29 +02:00
Frédéric Weisbecker
ac2b86fdef x86/ftrace: use uaccess in atomic context
With latest -tip I get this bug:

[   49.439988] in_atomic():0, irqs_disabled():1
[   49.440118] INFO: lockdep is turned off.
[   49.440118] Pid: 2814, comm: modprobe Tainted: G        W 2.6.27-rc7 #4
[   49.440118]  [<c01215e1>] __might_sleep+0xe1/0x120
[   49.440118]  [<c01148ea>] ftrace_modify_code+0x2a/0xd0
[   49.440118]  [<c01148a2>] ? ftrace_test_p6nop+0x0/0xa
[   49.440118]  [<c016e80e>] __ftrace_update_code+0xfe/0x2f0
[   49.440118]  [<c01148a2>] ? ftrace_test_p6nop+0x0/0xa
[   49.440118]  [<c016f190>] ftrace_convert_nops+0x50/0x80
[   49.440118]  [<c016f1d6>] ftrace_init_module+0x16/0x20
[   49.440118]  [<c015498b>] load_module+0x185b/0x1d30
[   49.440118]  [<c01767a0>] ? find_get_page+0x0/0xf0
[   49.440118]  [<c02463c0>] ? sprintf+0x0/0x30
[   49.440118]  [<c034e012>] ? mutex_lock_interruptible_nested+0x1f2/0x350
[   49.440118]  [<c0154eb3>] sys_init_module+0x53/0x1b0
[   49.440118]  [<c0352340>] ? do_page_fault+0x0/0x740
[   49.440118]  [<c0104012>] syscall_call+0x7/0xb
[   49.440118]  =======================

It is because ftrace_modify_code() calls copy_to_user and
copy_from_user.
These functions have been inserted after guessing that there
couldn't be any race condition but copy_[to/from]_user might
sleep and __ftrace_update_code is called with local_irq_saved.

These function have been inserted since this commit:
d5e92e8978fd2574e415dc2792c5eb592978243d:
"ftrace: x86 use copy from user function"

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:38:16 +02:00
Harvey Harrison
37a52f5ef1 x86: suppress trivial sparse signedness warnings
Could just as easily change the three casts to cast to the correct
type...this patch changes the type of ftrace_nop instead.

Supresses sparse warnings:

 arch/x86/kernel/ftrace.c:157:14: warning: incorrect type in assignment (different signedness)
 arch/x86/kernel/ftrace.c:157:14:    expected long *static [toplevel] ftrace_nop
 arch/x86/kernel/ftrace.c:157:14:    got unsigned long *<noident>
 arch/x86/kernel/ftrace.c:161:14: warning: incorrect type in assignment (different signedness)
 arch/x86/kernel/ftrace.c:161:14:    expected long *static [toplevel] ftrace_nop
 arch/x86/kernel/ftrace.c:161:14:    got unsigned long *<noident>
 arch/x86/kernel/ftrace.c:165:14: warning: incorrect type in assignment (different signedness)
 arch/x86/kernel/ftrace.c:165:14:    expected long *static [toplevel] ftrace_nop
 arch/x86/kernel/ftrace.c:165:14:    got unsigned long *<noident>

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:38:14 +02:00
Pekka Paalanen
4427414170 mmiotrace: remove left-over marker cruft
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:37:17 +02:00
Pekka Paalanen
9e57fb35d7 x86 mmiotrace: implement mmiotrace_printk()
Offer mmiotrace users a function to inject markers from inside the kernel.
This depends on the trace_vprintk() patch.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:37:11 +02:00
Pekka Paalanen
bbe5c7830c x86 mmiotrace: fix a rare memory leak
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:37:01 +02:00
Steven Rostedt
6f93fc076a ftrace: x86 use copy to and from user functions
The modification of code is performed either by kstop_machine, before
SMP starts, or on module code before the module is executed. There is
no reason to do the modifications from assembly. The copy to and from
user functions are sufficient and produces cleaner and easier to read
code.

Thanks to Benjamin Herrenschmidt for suggesting the idea.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:36:03 +02:00
Steven Rostedt
732f3ca7d4 ftrace: use only 5 byte nops for x86
Mathieu Desnoyers revealed a bug in the original code. The nop that is
used to relpace the mcount caller can be a two part nop. This runs the
risk where a process can be preempted after executing the first nop, but
before the second part of the nop.

The ftrace code calls kstop_machine to keep multiple CPUs from executing
code that is being modified, but it does not protect against a task preempting
in the middle of a two part nop.

If the above preemption happens and the tracer is enabled, after the
kstop_machine runs, all those nops will be calls to the trace function.
If the preempted process that was preempted between the two nops is executed
again, it will execute half of the call to the trace function, and this
might crash the system.

This patch instead uses what both the latest Intel and AMD spec suggests.
That is the P6_NOP5 sequence of "0x0f 0x1f 0x44 0x00 0x00".

Note, some older CPUs and QEMU might fault on this nop, so this nop
is executed with fault handling first. If it detects a fault, it will then
use the code "0x66 0x66 0x66 0x66 0x90". If that faults, it will then
default to a simple "jmp 1f; .byte 0x00 0x00 0x00; 1:". The jmp is
not optimal but will do if the first two can not be executed.

TODO: Examine the cpuid to determine the nop to use.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:35:01 +02:00
Steven Rostedt
0a37605c22 ftrace: x86 mcount stub
x86 now sets up the mcount locations through the build and no longer
needs to record the ip when the function is executed. This patch changes
the initial mcount to simply return. There's no need to do any other work.
If the ftrace start up test fails, the original mcount will be what everything
will use, so having this as fast as possible is a good thing.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:34:58 +02:00
Steven Rostedt
e4b2b88661 ftrace: enable using mcount recording on x86
Enable the use of the __mcount_loc infrastructure on x86_64 and i386.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:34:54 +02:00
Ingo Molnar
8b1fa1d7b2 ftrace: mark lapic_wd_event() notrace
it can be called in the NMI path:

[    0.645999] calling  ftrace_dynamic_init+0x0/0xd6
[    0.647521] ------------[ cut here ]------------
[    0.647521] WARNING: at kernel/trace/ftrace.c:348 ftrace_record_ip+0x4e/0x252()
[    0.647521] Modules linked in:
[    0.647521] Pid: 15, comm: kstop1 Not tainted 2.6.27-rc1-tip #22686
[    0.647521]
[    0.647521] Call Trace:
[    0.647521]  <NMI>  [<ffffffff8024593f>] warn_on_slowpath+0x5d/0x84
[    0.647521]  [<ffffffff80220b99>] ? lapic_wd_event+0xb/0x5c
[    0.647521]  [<ffffffff80287b3b>] ftrace_record_ip+0x4e/0x252
[    0.647521]  [<ffffffff80211274>] mcount_call+0x5/0x31
[    0.647521]  [<ffffffff80220b9e>] ? lapic_wd_event+0x10/0x5c
[    0.647521]  [<ffffffff8083f3ec>] nmi_watchdog_tick+0x19d/0x1ad
[    0.647521]  [<ffffffff8083e875>] default_do_nmi+0x75/0x1e3
[    0.647521]  [<ffffffff8083f0b3>] do_nmi+0x5d/0x94
[    0.647521]  [<ffffffff8083e2d2>] nmi+0xa2/0xc2
[    0.647521]  [<ffffffff802b48c3>] ? check_bytes_and_report+0x11/0xcc
[    0.647521]  <<EOE>>  [<ffffffff80211274>] ? mcount_call+0x5/0x31
[    0.647521]  [<ffffffff802b49df>] check_object+0x61/0x1b0
[    0.647521]  [<ffffffff802b502a>] __slab_free+0x169/0x2ae
[    0.647521]  [<ffffffff80242dbf>] ? __cleanup_sighand+0x25/0x27
[    0.647521]  [<ffffffff80242dbf>] ? __cleanup_sighand+0x25/0x27
[    0.647521]  [<ffffffff802b60cd>] kmem_cache_free+0x85/0xb9
[    0.647521]  [<ffffffff80242dbf>] __cleanup_sighand+0x25/0x27
[    0.647521]  [<ffffffff80247b3d>] release_task+0x256/0x339
[    0.647521]  [<ffffffff802490b4>] do_exit+0x764/0x7ef
[    0.647521]  [<ffffffff8027624c>] __xchg+0x0/0x38
[    0.647521]  [<ffffffff8027619a>] ? stop_cpu+0x0/0xb2
[    0.647521]  [<ffffffff8027619a>] ? stop_cpu+0x0/0xb2
[    0.647521]  [<ffffffff8025922f>] kthread+0x4e/0x7b
[    0.647521]  [<ffffffff80212979>] child_rip+0xa/0x11
[    0.647521]  [<ffffffff80211c17>] ? restore_args+0x0/0x30
[    0.647521]  [<ffffffff802283a5>] ? native_load_tls+0x14/0x2e
[    0.647521]  [<ffffffff802591e1>] ? kthread+0x0/0x7b
[    0.647521]  [<ffffffff8021296f>] ? child_rip+0x0/0x11
[    0.647521]
[    0.647521] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.672032] initcall ftrace_dynamic_init+0x0/0xd6 returned 0 after 19 msecs

also mark it no-kprobes while at it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:34:36 +02:00
Pekka Paalanen
611b159768 x86: fix mmiotrace 8-bit register decoding
When SIL, DIL, BPL or SPL registers were used in MMIO, the datum
was extracted from AH, BH, CH, or DH, which are incorrect.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: "Vegard Nossum" <vegard.nossum@gmail.com>
Cc: "Steven Rostedt" <srostedt@redhat.com>
Cc: proski@gnu.org
Cc: "Pekka Enberg"
	<penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:33:50 +02:00
Andi Kleen
59512900ba oprofile: discover counters for op ppro too
Discover number of counters for all family 6 models even when not
in arch perfmon mode.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-10-13 19:25:11 +02:00
Andi Kleen
b991702884 oprofile: Implement Intel architectural perfmon support
Newer Intel CPUs (Core1+) have support for architectural
events described in CPUID 0xA. See the IA32 SDM Vol3b.18 for details.

The advantage of this is that it can be done without knowing about
the specific CPU, because the CPU describes by itself what
performance events are supported. This is only a fallback
because only a limited set of 6 events are supported.
This allows to do profiling on Nehalem and on Atom systems
(later not tested)

This patch implements support for that in oprofile's Intel
Family 6 profiling module. It also has the advantage of supporting
an arbitary number of events now as reported by the CPU.
Also allow arbitary counter widths >32bit while we're at it.

Requires a patched oprofile userland to support the new
architecture.

v2: update for latest oprofile tree
    remove force_arch_perfmon

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-10-13 19:25:09 +02:00
Andi Kleen
f645f64064 oprofile: Don't report Nehalem as core_2
This essentially reverts Linus' earlier 4b9f12a377
commit. Nehalem is not core_2, so it shouldn't be reported as such.
However with the earlier arch perfmon patch it will fall back to
arch perfmon mode now, so there is no need to fake it as core_2.
The only drawback is that Linus will need to patch the arch perfmon
support into his oprofile binary now, but I think he can do that.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-10-13 19:25:06 +02:00
Andi Kleen
5d4488027d oprofile: drop const in num counters field
allow to modify it at runtime

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-10-13 19:25:04 +02:00
Linus Torvalds
244dc4e54b Merge git://git.infradead.org/users/dwmw2/random-2.6
* git://git.infradead.org/users/dwmw2/random-2.6:
  Fix autoloading of MacBook Pro backlight driver.
  Automatic MODULE_ALIAS() for DMI match tables.
  Remove asm/a.out.h files for all architectures without a.out support.
  Introduce HAVE_AOUT symbol to remove hard-coded arch list for BINFMT_AOUT
  Remove redundant CONFIG_ARCH_SUPPORTS_AOUT
  S390: Update comments about why we don't use <asm-generic/statfs.h>
  SPARC: Use <asm-generic/statfs.h>
  PowerPC: Use <asm-generic/statfs.h>
  PARISC: Use <asm-generic/statfs.h>
  x86_64: Use <asm-generic/statfs.h>
  IA64: Use <asm-generic/statfs.h>
  ARM: Use <asm-generic/statfs.h>
  Make <asm-generic/statfs.h> suitable for 64-bit platforms.
  Define and use PCI_DEVICE_ID_MARVELL_88ALP01_CCIC for CAFÉ camera driver
  [MTD] [NAND] Define and use PCI_DEVICE_ID_MARVELL_88ALP01_NAND for CAFÉ
  Use PCI_DEVICE_ID_88ALP01 for CAFÉ chip, rather than PCI_DEVICE_ID_CAFE.
  EFS: Don't set f_fsid in statfs().
2008-10-13 09:59:14 -07:00
David Woodhouse
e758936e02 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	include/asm-x86/statfs.h
2008-10-13 17:13:56 +01:00
Ingo Molnar
3a1dfe6eef x86/mm: unify init task OOM handling
Linus noticed that the "again:" versus "survive:" OOM logic for
the init task was arbitrarily different.

The 64-bit codepath is the better one, because it correctly re-lookups
the vma after having dropped the ->mmap_sem.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-13 18:11:13 +02:00
Linus Torvalds
891cffbd6b x86/mm: do not trigger a kernel warning if user-space disables interrupts and generates a page fault
Arjan reported a spike in the following bug pattern in v2.6.27:

   http://www.kerneloops.org/searchweek.php?search=lock_page

which happens because hwclock started triggering warnings due to
a (correct) might_sleep() check in the MM code.

The warning occurs because hwclock uses this dubious sequence of
code to run "atomic" code:

  static unsigned long
  atomic(const char *name, unsigned long (*op)(unsigned long),
         unsigned long arg)
  {
    unsigned long v;
    __asm__ volatile ("cli");
    v = (*op)(arg);
    __asm__ volatile ("sti");
    return v;
  }

Then it pagefaults in that "atomic" section, triggering the warning.

There is no way the kernel could provide "atomicity" in this path,
a page fault is a cannot-continue machine event so the kernel has to
wait for the page to be filled in.

Even if it was just a minor fault we'd have to take locks and might have
to spend quite a bit of time with interrupts disabled - not nice to irq
latencies in general.

So instead just enable interrupts in the pagefault path unconditionally
if we come from user-space, and handle the fault.

Also, while touching this code, unify some trivial parts of the x86
VM paths at the same time.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 17:46:39 +02:00
Ingo Molnar
c00193f9f0 Merge branches 'oprofile-v2' and 'timers/hpet' into x86/core-v4 2008-10-13 14:18:42 +02:00
Ingo Molnar
accba5f396 Merge branch 'linus' into oprofile-v2
Conflicts:
	arch/x86/kernel/apic_32.c
	arch/x86/oprofile/nmi_int.c
	include/linux/pci_ids.h
2008-10-13 11:05:51 +02:00
Ingo Molnar
c493756e2a Merge branch 'linus' into oprofile
Conflicts:
	arch/x86/kernel/apic_32.c
	include/linux/pci_ids.h
2008-10-13 10:52:30 +02:00
Yinghai Lu
c1a2f4b108 x86: change early_ioremap to use slots instead of nesting
so we could remove the requirement that one needs to call
early_iounmap() in exactly reverse order of early_ioremap().

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:34:23 +02:00
Jan Beulich
79aa10dd9f x86: adjust dependencies for CONFIG_X86_CMOV
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:50 +02:00
Alexander van Heukelum
6a2ae2d9f9 dumpstack: x86: various small unification steps, fix
After "dumpstack: x86: various small unification steps", the
assembler gives the following compile error. The error is in
dumpstack_64.c.

{standard input}: Assembler messages:
{standard input}:720: Error: Incorrect register `%rbx' used with `l' suffix
{standard input}:1340: Error: Incorrect register `%r12' used with `l' suffix

Indeed the suffix in get_bp() was wrong.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:49 +02:00
Thomas Gleixner
cb48bb5999 x86: remove additional_cpus
remove remainder of additional_cpus logic. We now just listen to the
disabled_cpus value like we did for years. disabled_cpus is always >=
0 so no need for an extra check.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:48 +02:00
Ingo Molnar
b807305059 x86: remove additional_cpus configurability
additional_cpus=<x> parameter is dangerous and broken: for example
if we boot additional_cpus=-2 on a stock dual-core system it will
crash the box on bootup.

So reduce the maze of code a bit by removingthe user-configurability
angle.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:47 +02:00
Thomas Gleixner
649c6653fa x86: improve UP kernel when CPU-hotplug and SMP is enabled
num_possible_cpus() can be > 1 when disabled CPUs have been accounted.

Disabled CPUs are not in the cpu_present_map, so we can use
num_present_cpus() as a safe indicator to switch to UP alternatives.

Reported-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:46 +02:00
Alexander van Heukelum
8a541665b9 dumpstack: x86: various small unification steps
- define STACKSLOTS_PER_LINE and use it
 - define get_bp macro to hide the %%ebp/%%rbp difference
 - i386: check task==NULL in dump_trace, like x86_64
 - i386: show_trace(NULL, ...) uses current automatically
 - x86_64: use [#%d] for die_counter, like i386
 - whitespace and comments

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:45 +02:00
Alexander van Heukelum
802a67de0c dumpstack: i386: make kstack= an early boot-param and add oops=panic
- make kstack= and early_param
 - add oops=panic, setting panic_on_oops

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:44 +02:00
Alexander van Heukelum
ca0a816403 dumpstack: x86: use log_lvl and unify trace formatting
- x86: Write log_lvl strings if available
 - start raw stack dumps on new line
 - i386: Remove extra indentation for raw stack dumps

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:43 +02:00
Alexander van Heukelum
2ac53721f3 dumptrace: x86: consistently include loglevel, print stack switch
- i386 and x86_64: always printk the 'data' parameter
 - i386: announce stack switch (irq -> normal)
 - i386: check if there is a stack switch before announcing it

There is a warning that 'context' might come out corrupt in early
boot. If this is true it should be fixed, not worked around.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:42 +02:00
Alexander van Heukelum
3a18512db0 dumpstack: x86: add "end" parameter to valid_stack_ptr and print_context_stack
- Add "end" parameter to valid_stack_ptr and print_context_stack
 - use sizeof(long) as the size of a word on the stack

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:41 +02:00
Alexander van Heukelum
161827903b dumpstack: x86: make printk_address equal
- x86_64: use %p to print an address
 - make i386-version the same as the above

The result should be the same on x86_64; on i386 the
output only changes if CONFIG_KALLSYMS is turned off,
in which case the address is printed twice.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:40 +02:00
Alexander van Heukelum
dd6e4eba1c dumpstack: x86: move die_nmi to dumpstack_32.c
For some reason die_nmi is still defined in traps.c for
i386, but is found in dumpstack_64.c for x86_64. Move it
to dumpstack_32.c

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:39 +02:00
Alexander van Heukelum
8728861b4f traps: x86: finalize unification of traps.c
traps_32.c and traps_64.c are now equal. Move one to traps.c,
delete the other one and change the Makefile

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:29 +02:00
Alexander van Heukelum
081f75bbdc traps: x86: make traps_32.c and traps_64.c equal
Use CONFIG_X86_64/CONFIG_X86_32 to condtionally compile the
parts needed for x86_64 or i386 only.

Runs a small userspace for a number of minimal configurations
and boots the defconfigs.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:28 +02:00
Alexander van Heukelum
c1d518c842 traps: x86: various noop-changes preparing for unification of traps_xx.c
- reordering include files
 - whitespace changes
 - comment changes
 - removed unused bad_intr()
 - make default_do_nmi static

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:27 +02:00
Alexander van Heukelum
a5ae2330a5 traps: x86_64: use task_pid_nr(tsk) instead of tsk->pid in do_general_protection
Use task_pid_nr(tsk) instead of tsk->pid in do_general_protection.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:26 +02:00
Alexander van Heukelum
7970479c48 traps: i386: expand clear_mem_error and remove from mach_traps.h
This is the last user of clear_mem_error, which is defined
only on i386. Expand the inline function and remove it from
include/asm-x86/mach-default/mach_traps.h

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:25 +02:00
Alexander van Heukelum
1c9af8a9f4 traps: x86_64: make io_check_error equal to the one on i386
Make io_check_error equal to the one on i386.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:24 +02:00
Alexander van Heukelum
4915a35e35 traps: i386: use preempt_conditional_sti/cli in do_int3
Use preempt_conditional_sti/cli in do_int3, like on x86_64.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:23 +02:00
Alexander van Heukelum
091d30c8f7 traps: x86_64: make math_state_restore more like i386
- rename variable me -> tsk
 - get thread and tsk like i386
 - expand used_math()
 - copy comment

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:22 +02:00
Alexander van Heukelum
699d2937d4 traps: x86: converge trap_init functions
- set_system_gate on i386 is really set_system_trap_gate
 - set_system_gate on x86_64 is really set_system_intr_gate
 - ist=0 means no special stack switch is done:
	- introduce STACKFAULT_STACK, DOUBLEFAULT_STACK, NMI_STACK,
		DEBUG_STACK and MCE_STACK as on x86_64.
	- use the _ist variants with XXX_STACK set to zero
 - remove set_system_gate

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

traps: x86: correct copy/paste bug: a trap is a GATE_TRAP

Fix copy/paste/forgot-to-edit bug in desc.h.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:22 +02:00
Alexander van Heukelum
3d2a71a596 x86, traps: converge do_debug handlers
Make the x86_64-version and the i386-version of do_debug
more similar.

 - introduce preempt_conditional_sti/cli to i386. The preempt-count
	is now elevated during the trap handler, like on x86_64. It
	does not run on a separate stack, however.
 - replace an open-coded "send_sigtrap"
 - copy some comments

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:21 +02:00
Alexander van Heukelum
e407d62088 x86, traps: introduce dotraplinkage
Mark the exception handlers with "dotraplinkage" to hide the
calling convention differences between i386 and x86_64.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:20 +02:00
Alexander van Heukelum
ae82157b3d x86, traps, i386: factor out lazy io-bitmap copy
x86_64 does not do the lazy io-bitmap dance. Putting it in
its own function makes i386's do_general_protection look
much more like x86_64's.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:19 +02:00
Alexander van Heukelum
a28680b4b8 x86, traps: split out math_error and simd_math_error
Split out math_error from do_coprocessor_error and simd_math_error
from do_simd_coprocessor_error, like on i386. While at it, add the
"error_code" parameter to do_coprocessor_error, do_simd_coprocessor_error
and do_spurious_interrupt_bug.

This does not change the generated code, but brings the declarations in
line with all the other trap handlers.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:18 +02:00
Alexander van Heukelum
6fcbede3fd x86_64: split out dumpstack code from traps_64.c
The dumpstack code is logically quite independent from the
hardware traps. Split it out into its own file.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:17 +02:00
Alexander van Heukelum
2bc5f927d4 i386: split out dumpstack code from traps_32.c
The dumpstack code is logically quite independent from the
hardware traps. Split it out into its own file.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:16 +02:00
Vegard Nossum
af5c2bd16a x86: fix virt_addr_valid() with CONFIG_DEBUG_VIRTUAL=y, v2
virt_addr_valid() calls __pa(), which calls __phys_addr(). With
CONFIG_DEBUG_VIRTUAL=y, __phys_addr() will kill the kernel if the
address *isn't* valid. That's clearly wrong for virt_addr_valid().

We also incorporate the debugging checks into virt_addr_valid().

Signed-off-by: Vegard Nossum <vegardno@ben.ifi.uio.no>
Acked-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:15 +02:00
Chuck Ebbert
7f2f49a582 x86: allow number of additional hotplug CPUs to be set at compile time, V2
x86: allow number of additional hotplug CPUs to be set at compile time, V2

The default number of additional CPU IDs for hotplugging is determined
by asking ACPI or mptables how many "disabled" CPUs there are in the
system, but many systems get this wrong so that e.g. a uniprocessor
machine gets an extra CPU allocated and never switches to single CPU
mode.

And sometimes CPU hotplugging is enabled only for suspend/hibernate
anyway, so the additional CPU IDs are not wanted. Allow the number
to be set to zero at compile time.

Also, force the number of extra CPUs to zero if hotplugging is disabled
which allows removing some conditional code.

Tested on uniprocessor x86_64 that ACPI claims has a disabled processor,
with CPU hotplugging configured.

("After" has the number of additional CPUs set to 0)
Before: NR_CPUS: 512, nr_cpu_ids: 2, nr_node_ids 1
After: NR_CPUS: 512, nr_cpu_ids: 1, nr_node_ids 1

[Changed the name of the option and the prompt according to Ingo's
 suggestion.]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:14 +02:00
Krzysztof Helt
94f6bac105 x86: do not allow to optimize flag_is_changeable_p() (rev. 2)
The flag_is_changeable_p() is used by
has_cpuid_p() which can return different results
in the code sequence below:

 if (!have_cpuid_p())
      identify_cpu_without_cpuid(c);

  /* cyrix could have cpuid enabled via c_identify()*/
  if (!have_cpuid_p())
      return;

Otherwise, the gcc 3.4.6 optimizes these two calls
into one which make the code not working correctly.

Cyrix cpus have the CPUID instruction enabled before
the second call to the have_cpuid_p() but
it is not detected due to the gcc optimization.
Thus the ARR registers (mtrr like) are not detected
on such a cpu.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:13 +02:00
Pekka Enberg
e2ce07c804 x86: __show_registers() and __show_regs() API unification
Currently the low-level function to dump user-passed registers on i386 is
called __show_registers() whereas on x86-64 it's called __show_regs(). Unify
the API to simplify porting of kmemcheck to x86-64.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:04 +02:00
Jack Steiner
1e0b5d00b2 x86, UV: new UV genapic functions for x2apic
Add functions that use the infrastructure added by the x2apic code. These
functions were originally stubbed out since the UV code went into the
tree prior to the x2apic code.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:22:53 +02:00
Chuck Ebbert
14adf855ba x86: move prefill_possible_map calling early, fix, V2
Commit 4a701737 ("x86: move prefill_possible_map calling early, fix")
is the wrong fix: prefill_possible_map() needs to be available
even when CONFIG_HOTPLUG_CPU is not set. A followon patch will do that.

Fix this correctly by making prefill_possible_map() available even when
CONFIG_HOTPLUG_CPU is not set. The function is needed so that
the number of possible CPUs can be determined.

Tested on uniprocessor machine with CPU hotplug disabled.

From boot log:
  Before: NR_CPUS: 512, nr_cpu_ids: 512, nr_node_ids 1
  After: NR_CPUS: 512, nr_cpu_ids: 1, nr_node_ids 1

Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:22:50 +02:00
Krzysztof Helt
69d45dd1c3 x86: merge winchip-2 and winchip-2a cpu choices
The Winchip-2 and Winchip-2A cpu choices select the
same options for kernel and compiler.

Merge them to save few bytes and reduce confusion.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:22:48 +02:00
Jack Steiner
d2f904bb9a x86, uv: fix for size of hub mappings
Fix the size of the mappings of UV hub registers. Size must
be a function of the maximum node number within the SSI.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:22:46 +02:00
Yinghai Lu
9b658f6f8b x86: cleanup, remove extra ifdef
also change two functions to static.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:22:44 +02:00
Alexander van Heukelum
3c1326f8a6 traps: i386: make do_trap more like x86_64
This patch hardcodes which traps should be forwarded to
handle_vm86_trap in do_trap. This allows to remove the
vm86 parameter from the i386-version of do_trap, which
makes the DO_VM86_ERROR and DO_VM86_ERROR_INFO macros
unnecessary.

x86_64 part is whitespace only.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:22:34 +02:00
Alexander van Heukelum
69c89b5bf7 traps: x86: remove trace_hardirqs_fixup from pagefault handler
The last use of trace_hardirqs_fixup is unnecessary, because the
trap is taken with interrupt off on i386 as well as x86_64, and
the irq-tracer is notified of this from the assembly code.

trace_hardirqs_fixup and trace_hardirqs_fixup_flags are removed
from include/asm-x86/irqflags.h as they are no longer used.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:22:04 +02:00
Alexander van Heukelum
a491503e4d traps: x86_64: remove trace_hardirqs_fixup from debug handler
All exceptions are taken via interrupt gates. TRACE_IRQS_OFF
is called just before entering the C code, so the irq state
is known to the irq tracer at this point. No need to call
trace_hardirqs_fixup.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:22:02 +02:00
Alexander van Heukelum
8b1c870f19 traps: x86_64: remove trace_hardirqs_fixup from int3 handler
All exceptions are taken via interrupt gates. TRACE_IRQS_OFF
is called just before entering the C code, so the irq state
is known to the irq tracer at this point. No need to call
trace_hardirqs_fixup.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:22:00 +02:00
Alexander van Heukelum
4b986a3652 traps: x86_64: remove trace_hardirqs_fixup from DO_ERROR_INFO macro
All exceptions are taken via interrupt gates. TRACE_IRQS_OFF
is called just before entering the C code, so the irq state
is known to the irq tracer at this point. No need to call
trace_hardirqs_fixup.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:58 +02:00
Alexander van Heukelum
7e61a79324 traps: x86_64: add TRACE_IRQS_OFF in paranoidentry macro
Add TRACE_IRQS_OFF just before entering the C code.

All exceptions are taken via interrupt gates. If irq tracing is
enabled, it should be notified as soon as possible. Interrupts
are only (conditionally) re-enabled in C code.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:55 +02:00
Alexander van Heukelum
6b11d4ef3e traps: x86_64: add TRACE_IRQS_OFF in error_entry
Add TRACE_IRQS_OFF just before entering the C code.

All exceptions are taken via interrupt gates. If irq tracing is
enabled, it should be notified as soon as possible. Interrupts
are only (conditionally) re-enabled in C code.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:53 +02:00
Jack Steiner
2e42060c19 x86, uv: add early detection of UV system types
Portions of the ACPI code needs to know if a system is a UV system prior
to genapic initialization. This patch adds a call early_acpi_boot_init()
so that the apic type is discovered earlier.

V2 of the patch adding fixes from Yinghai Lu.
Much cleaner and smaller.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:51 +02:00
Glauber Costa
e04d645f32 x86: move vgetcpu mode probing to cpu detection
Take it out of time initialization and move it to
cpu detection time.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:49 +02:00
Glauber Costa
33c053d0ae x86: wrap MCA_bus test around an ifdef
Only test for MCA_bus if support for MCA is compiled in.
Also, for x86_64, write the code inside the conditional
for consistency with i386. It won't bite us, since it'll
probably never select CONFIG_MCA anyway.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:47 +02:00
Glauber Costa
2c460d0b68 x86: replace hardcoded number
Replace "4" in time_32.c code by sizeof(long).
This way, it can work on x86_64 too.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:44 +02:00
Glauber Costa
461ebd1095 x86: rename timer_event_interrupt to timer_interrupt
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:42 +02:00
Glauber Costa
780209af71 x86: make init_ISA_irqs nonstatic
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:40 +02:00
Glauber Costa
2f97435e57 x86: factor out irq initialization for x86_64
Provide apic_intr_init and smp_intr_init functions.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:38 +02:00
Glauber Costa
2ff298372d x86: bind irq0 irq data to cpu0
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:36 +02:00
Glauber Costa
8de0b8a7ea x86: use user_mode_vm instead of user_mode
For x86_64, it does not really matter. But makes the
code equal to i386.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:34 +02:00
Glauber Costa
3927fa9e4b x86: use frame pointer information on x86_64 profile_pc
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:29 +02:00
Glauber Costa
097a0788df x86: set bp field in pt_regs properly
Save rbp twice: One is for marking the stack frame, as usual (already
there), and the other, to fill pt_regs properly. This is because bx
comes right before the last saved register in that structure, and not
bp. If the base pointer were in the place bx is today, this would not
be needed.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:27 +02:00
Glauber Costa
2c44e66843 x86: coalesce tests
Coalesce v8086_mode and user_mode into a single
user_mode_vm() test.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:25 +02:00
Glauber Costa
cf4cfb225a x86: use user_mode macro
Instead of using SEGMENT_IS_KERNEL_CODE, use the
"user_mode" macro, which can play the same role. Delete
the former, since it now lacks any user.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:23 +02:00
Yinghai Lu
bd32a8cfa8 x86: cpu don't print duplicated vendor string
Some CPUs have vendor string in the middle of model_id instead of beginning

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:21 +02:00
Jan Beulich
606ee44dbb x86: make mm/gup.c more virtualization friendly
Since pte_flags() is much cheaper than pte_val() in some virtualized
environments (namely, Xen), use the former whereever possible.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: "Nick Piggin" <npiggin@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:18 +02:00
Jan Beulich
5e72d9e485 x86-64: fix combining of regions in init_memory_mapping()
When nr_range gets decremented, the same slot must be considered for
coalescing with its new successor again.

The issue is apparently pretty benign to native code, but surfaces as a
boot time crash in our forward ported Xen tree (where the page table
setup overall works differently than in native).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:16 +02:00
Cyrill Gorcunov
59ef48a58e x86: smpboot - check if we have ESR register in wakeup_secondary_cpu
We should check if we have ESR register before reading from it.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:14 +02:00
Ingo Molnar
3e6de5a393 x86: print out EBDA/lowmem address
it's useful for debugging purposes to know the location of the EBDA.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:10 +02:00
Yinghai Lu
a73aaedd95 x86: check dsdt before find oem table for es7000, v2
v2: use __acpi_unmap_table()

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:07 +02:00
Jeremy Fitzhardinge
a32ad46267 x86-64: don't check for map replacement
The check prevents flags on mappings from being changed, which is not
desireable.  There's no need to check for replacing a mapping, and
x86-32 does not do this check.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:05 +02:00
Jeremy Fitzhardinge
88b4c14696 x86: use early_memremap() in setup.c
The remappings in setup.c are all just ordinary memory, so use
early_memremap() rather than early_ioremap().

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:03 +02:00
Jeremy Fitzhardinge
1494177942 x86: add early_memremap()
early_ioremap() is also used to map normal memory when constructing
the linear memory mapping.  However, since we sometimes need to be able
to distinguish between actual IO mappings and normal memory mappings,
add a early_memremap() call, which maps with PAGE_KERNEL (as opposed
to PAGE_KERNEL_IO for early_ioremap()), and use it when constructing
pagetables.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:01 +02:00
Jeremy Fitzhardinge
be43d72835 x86: add _PAGE_IOMAP pte flag for IO mappings
Use one of the software-defined PTE bits to indicate that a mapping is
intended for an IO address.  On native hardware this is irrelevent,
since a physical address is a physical address.  But in a virtual
environment, physical addresses are also virtualized, so there needs
to be some way to distinguish between pseudo-physical addresses and
actual hardware addresses; _PAGE_IOMAP indicates this intent.

By default, __supported_pte_mask masks out _PAGE_IOMAP, so it doesn't
even appear in the final pagetable.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:56 +02:00
Alexander van Heukelum
07bb2f6236 i386: trace_hardirqs_fixup should now not be necessary: irqs are off.
The exception handlers in entry_32.S should now all call
TRACE_IRQS_OFF before calling the C code. The calls to
trace_hardirqs_fixup should now be unnecessary. Remove them.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:54 +02:00
Alexander van Heukelum
a790392faa i386: add TRACE_IRQS_OFF for the exception 3 (int3)
At this point interrupts are off, so let's inform the tracing
code of that fact before calling into C.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:52 +02:00
Alexander van Heukelum
e0c7317557 i386: add TRACE_IRQS_OFF for the nmi
At this point interrupts are off, so let's inform the tracing
code of that fact before calling into C.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:49 +02:00
Alexander van Heukelum
43024a8a5d i386: add TRACE_IRQS_OFF for exception 1 (debug)
At this point interrupts are off, so let's inform the tracing
code of that fact before calling into C.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:47 +02:00
Alexander van Heukelum
85cea51d7e i386: add TRACE_IRQS_OFF to entry_32.S in 'error_code'
Many exceptions use the same code path via the label 'error_code'
in entry_32.S. At this point interrupts are off, so let's inform
the tracing code of that fact before calling into C.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:45 +02:00
Alexander van Heukelum
f8e0870f58 i386: remove temporary DO_TRAP macros, expanding the last one used
Only one use of the DO_TRAP macros remains. Expand that one and
remove the macros now.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:43 +02:00
Alexander van Heukelum
b939bde278 i386: convert hardware exception 19 to an interrupt gate
Handle SIMD coprocessor exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:40 +02:00
Alexander van Heukelum
eb642f6208 i386: convert hardware exception 18 to an interrupt gate
Handle machine check exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:38 +02:00
Alexander van Heukelum
5feedfd401 i386: convert hardware exception 17 to an interrupt gate
Handle alignment check exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:36 +02:00
Alexander van Heukelum
252d28fe65 i386: convert hardware exception 16 to an interrupt gate
Handle coprocessor error exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:33 +02:00
Alexander van Heukelum
cf81978d5f i386: convert hardware exception 15 to an interrupt gate
Handle exception 15 with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:31 +02:00
Alexander van Heukelum
c6df0d71be i386: convert hardware exception 13 to an interrupt gate
Handle general protection exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:29 +02:00
Alexander van Heukelum
f5ca81878b i386: convert hardware exception 12 to an interrupt gate
Handle stack segment exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:27 +02:00
Alexander van Heukelum
36d936c798 i386: convert hardware exception 11 to an interrupt gate
Handle segment not present exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:24 +02:00
Alexander van Heukelum
6bf77bf939 i386: convert hardware exception 10 to an interrupt gate
Handle invalid TSS exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:22 +02:00
Alexander van Heukelum
51bc1ed606 i386: convert hardware exception 9 to an interrupt gate
Handle coprocessor segment overrun exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:20 +02:00
Alexander van Heukelum
7643e9b936 i386: convert hardware exception 7 to an interrupt gate
Handle no coprocessor exception with interrupt initially off.

device_not_available in entry_32.S calls either math_state_restore
or math_emulate. This patch adds an extra indirection to be
able to re-enable interrupts explicitly in traps_32.c

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:17 +02:00
Alexander van Heukelum
12394cf567 i386: convert hardware exception 6 to an interrupt gate
Handle invalid opcode exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:15 +02:00
Alexander van Heukelum
64f644c0b4 i386: convert hardware exception 5 to an interrupt gate
Handle bounds exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:13 +02:00
Alexander van Heukelum
8d6f9d69bd i386: convert hardware exception 4 to an interrupt gate
Handle overflow exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:11 +02:00
Alexander van Heukelum
b94da1e4b7 i386: expand exception 3 DO_TRAP macro
The int3 exception was already takes as an interrupt and
do_int3 does not fit in the new DO_ERROR macro. This patch
just expands the DO_TRAP macro and rearranges the code a
bit.

No functional changes intended.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:08 +02:00
Alexander van Heukelum
976382dcbe i386: convert hardware exception 0 to an interrupt gate
Handle divide error exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:06 +02:00
Alexander van Heukelum
61aef7d249 i386: prepare to convert exceptions to interrupts
There is some macro magic in traps_32.c to construct standard
exception dispatch functions. This patch renames the DO_ERROR-
like macros to DO_TRAP, and introduces new DO_ERROR ones that
conditionally reenable interrupts explicitly, like x86_64.

No code changes.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:04 +02:00
Alexander van Heukelum
762db43470 i386: remove kprobes' restore_interrupts in favour of conditional_sti
x86_64 uses a helper function conditional_sti in traps_64.c which
is equal to restore_interrupts in kprobes.h. The only user of
restore_interrupts is in traps_32.c. Introduce conditional_sti
for i386 and remove restore_interrupts.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:02 +02:00
Yinghai Lu
927604c759 x86: rename discontig_32.c to numa_32.c
name it in line with its purpose.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:19:59 +02:00
Manfred Spraul
0cefa5b9b0 arch/x86/kernel/smpboot.c: Clarify when irq processing begins.
Secondary cpus start with local interrupts disabled.
start_secondary() first initializes the new cpu, then it enables the
local interrupts. (although interrupts are enabled within smp_callin()
as well).

Right now, the local interrupts are enabled as a side effect of calling
ipi_call_lock_irq().

The attached patch clarifies when local interrupts are enabled.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:19:57 +02:00
Jan Beulich
295286a891 x86-64: slightly stream-line 32-bit syscall entry code
Avoid updating registers or memory twice as well as needlessly loading
or copying registers.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:19:54 +02:00
Ingo Molnar
8daf14cf56 Merge branches 'x86/xen', 'x86/build', 'x86/microcode', 'x86/mm-debug-v2', 'x86/memory-corruption-check', 'x86/early-printk', 'x86/xsave', 'x86/ptrace-v2', 'x86/quirks', 'x86/setup', 'x86/spinlocks' and 'x86/signal' into x86/core-v2 2008-10-12 15:50:02 +02:00
Ingo Molnar
1db5fff9ae x86: make processor type select depend on CONFIG_EMBEDDED
deselecting one of the CPU type CONFIG_CPU_SUP_* config options
can render a kernel unbootable. Make sure this option is only
available if CONFIG_EMBEDDED is enabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 15:44:07 +02:00
Ingo Molnar
b7b3a42533 x86: extend processor type select help text
extend the help text of the CONFIG_CPU_SUP_* config options to
express what it does and what effects it has.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 15:36:24 +02:00
Ingo Molnar
8a66712ba0 x86, amd-iommu: propagate PCI device enabling error
propagate an error in enabling the PCI device.

Also eliminates this warning:

 arch/x86/kernel/amd_iommu_init.c: In function ‘init_iommu_one’:
 arch/x86/kernel/amd_iommu_init.c:726: warning: ignoring return value of ‘pci_enable_device’, declared with attribute warn_unused_result

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 15:24:53 +02:00
Ingo Molnar
d562353a45 warnings: fix arch/x86/kernel/io_apic_64.c
fix:

 arch/x86/kernel/io_apic_64.c: In function ‘print_local_APIC’:
 arch/x86/kernel/io_apic_64.c:1284: warning: format ‘%08x’ expects type ‘unsigned int’, but argument 2 has type ‘long unsigned int’
 arch/x86/kernel/io_apic_64.c:1285: warning: format ‘%08x’ expects type ‘unsigned int’, but argument 2 has type ‘long unsigned int’

We want to print the two halves of 'icr' at 32 bit width.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 15:22:22 +02:00
Ingo Molnar
45e96f26f2 warnings: fix arch/x86/kernel/early_printk.c
fix warning:

  arch/x86/kernel/early_printk.c:993: warning: ‘enable_debug_console’ defined but not used

Eliminate dead code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 15:19:36 +02:00
Ingo Molnar
9f482807a6 x86, fpu: check __clear_user() return value
fix warning:

  arch/x86/kernel/xsave.c: In function ‘save_i387_xstate’:
  arch/x86/kernel/xsave.c:98: warning: ignoring return value of ‘__clear_user’, declared with attribute warn_unused_result

check the return value and act on it. We should not be ignoring faults
at this point.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 15:17:39 +02:00
Ingo Molnar
620f2efcdc Merge branch 'linus' into x86/xsave 2008-10-12 15:17:14 +02:00
Ingo Molnar
46eaa67020 x86: memory corruption check - cleanup
Move the prototypes from the generic kernel.h header to the more
appropriate include/asm-x86/bios_ebda.h header file.

Also, remove the check from the power management code - this is a
pure x86 matter for now.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 15:09:23 +02:00
Ingo Molnar
a9b9e81c91 Merge branch 'linus' into x86/memory-corruption-check 2008-10-12 15:05:39 +02:00
Ingo Molnar
eceb138336 Merge branches 'core/signal' and 'x86/spinlocks' into x86/xen
Conflicts:
	include/asm-x86/spinlock.h
2008-10-12 13:20:25 +02:00
Ingo Molnar
84e9c95ad9 Merge branch 'x86/signal' into core/signal 2008-10-12 13:17:07 +02:00
Ingo Molnar
1389ac4b97 Merge branch 'linus' into x86/signal
Conflicts:
	arch/x86/kernel/signal_64.c
2008-10-12 12:49:27 +02:00
Ingo Molnar
acbaa41a78 Merge branch 'linus' into x86/quirks
Conflicts:
	arch/x86/kernel/early-quirks.c
2008-10-12 12:43:21 +02:00
Ingo Molnar
365d46dc9b Merge branch 'linus' into x86/xen
Conflicts:
	arch/x86/kernel/cpu/common.c
	arch/x86/kernel/process_64.c
	arch/x86/xen/enlighten.c
2008-10-12 12:37:32 +02:00
Roland McGrath
325af5fb14 x86: ioperm user_regset
This adds a user_regset type for the x86 io permissions bitmap.
This makes it appear in core dumps (when ioperm has been used).
It will also make it visible to debuggers in the future.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
[conflict resolutions: Signed-off-by: Ingo Molnar <mingo@elte.hu> ]
2008-10-12 12:05:55 +02:00
Ingo Molnar
206855c321 Merge branch 'x86/urgent' into core/signal
Conflicts:
	arch/x86/kernel/signal_64.c
2008-10-12 11:32:17 +02:00
Petr Vandrovec
cb58ffc388 x86: fix early panic on amd64 due to typo in supported CPU section
Do not crash when enumerating supported CPU architectures

SECURITY_INIT somehow ended up in x86_cpu_dev.init section.  That caused printk
in code which prints supported architectures to hit #GP due to non-canonical
address being used.

Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Cc: thomas.petazzoni@free-electrons.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 11:19:27 +02:00
Alan Cox
c613ec1a7f x86, early_ioremap: fix fencepost error
The x86 implementation of early_ioremap has an off by one error. If we get
an object which ends on the first byte of a page we undermap by one page and
this causes a crash on boot with the ASUS P5QL whose DMI table happens to fit
this alignment.

The size computation is currently

	last_addr = phys_addr + size - 1;
	npages = (PAGE_ALIGN(last_addr) - phys_addr)

(Consider a request for 1 byte at alignment 0...)

Closes #11693

Debugging work by Ian Campbell/Felix Geyer

Signed-off-by: Alan Cox <alan@rehat.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 11:19:04 +02:00
David Rientjes
e1e23bb051 x86: avoid dereferencing beyond stack + THREAD_SIZE
It's possible for get_wchan() to dereference past task->stack + THREAD_SIZE
while iterating through instruction pointers if fp equals the upper boundary,
causing a kernel panic.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 11:18:59 +02:00
Linus Torvalds
ead9d23d80 Merge phase #4 (X2APIC, APIC unification, CPU identification unification) of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-v28-for-linus-phase4-D' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (186 commits)
  x86, debug: print more information about unknown CPUs
  x86 setup: handle more than 8 CPU flag words
  x86: cpuid, fix typo
  x86: move transmeta cap read to early_init_transmeta()
  x86: identify_cpu_without_cpuid v2
  x86: extended "flags" to show virtualization HW feature in /proc/cpuinfo
  x86: move VMX MSRs to msr-index.h
  x86: centaur_64.c remove duplicated setting of CONSTANT_TSC
  x86: intel.c put workaround for old cpus together
  x86: let intel 64-bit use intel.c
  x86: make intel_64.c the same as intel.c
  x86: make intel.c have 64-bit support code
  x86: little clean up of intel.c/intel_64.c
  x86: make 64 bit to use amd.c
  x86: make amd_64 have 32 bit code
  x86: make amd.c have 64bit support code
  x86: merge header in amd_64.c
  x86: add srat_detect_node for amd64
  x86: remove duplicated force_mwait
  x86: cpu make amd.c more like amd_64.c v2
  ...
2008-10-11 11:51:16 -07:00
Ingo Molnar
0afe2db213 Merge branch 'x86/unify-cpu-detect' into x86-v28-for-linus-phase4-D
Conflicts:
	arch/x86/kernel/cpu/common.c
	arch/x86/kernel/signal_64.c
	include/asm-x86/cpufeature.h
2008-10-11 20:23:20 +02:00
Ingo Molnar
d84705969f Merge branch 'x86/apic' into x86-v28-for-linus-phase4-B
Conflicts:
	arch/x86/kernel/apic_32.c
	arch/x86/kernel/apic_64.c
	arch/x86/kernel/setup.c
	drivers/pci/intel-iommu.c
	include/asm-x86/cpufeature.h
	include/asm-x86/dma-mapping.h
2008-10-11 20:17:36 +02:00
Linus Torvalds
bf6f51e3a4 Merge phase #3 (IOMMU) of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-v28-for-linus-phase3-B' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (74 commits)
  AMD IOMMU: use iommu_device_max_index, fix
  AMD IOMMU: use iommu_device_max_index
  x86: add PCI IDs for AMD Barcelona PCI devices
  x86/iommu: use __GFP_ZERO instead of memset for GART
  x86/iommu: convert GART need_flush to bool
  x86/iommu: make GART driver checkpatch clean
  x86 gart: remove unnecessary initialization
  x86: restore old GART alloc_coherent behavior
  revert "x86: make GART to respect device's dma_mask about virtual mappings"
  x86: export pci-nommu's alloc_coherent
  iommu: remove fullflush and nofullflush in IOMMU generic option
  x86: remove set_bit_string()
  iommu: export iommu_area_reserve helper function
  AMD IOMMU: use coherent_dma_mask in alloc_coherent
  add AMD IOMMU tree to MAINTAINERS file
  AMD IOMMU: use cmd_buf_size when freeing the command buffer
  AMD IOMMU: calculate IVHD size with a function
  AMD IOMMU: remove unnecessary cast to u64 in the init code
  AMD IOMMU: free domain bitmap with its allocation order
  AMD IOMMU: simplify dma_mask_to_pages
  ...
2008-10-11 11:03:12 -07:00
Linus Torvalds
ec8deffa33 Merge phase #2 (PAT updates) of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-v28-for-linus-phase2-B' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
  x86, cpa: make the kernel physical mapping initialization a two pass sequence, fix
  x86, pat: cleanups
  x86: fix pagetable init 64-bit breakage
  x86: track memtype for RAM in page struct
  x86, cpa: srlz cpa(), global flush tlb after splitting big page and before doing cpa
  x86, cpa: remove cpa pool code
  x86, cpa: no need to check alias for __set_pages_p/__set_pages_np
  x86, cpa: dont use large pages for kernel identity mapping with DEBUG_PAGEALLOC
  x86, cpa: make the kernel physical mapping initialization a two pass sequence
  x86, cpa: remove USER permission from the very early identity mapping attribute
  x86, cpa: rename PTE attribute macros for kernel direct mapping in early boot
  x86: make sure the CPA test code's use of _PAGE_UNUSED1 is obvious
  linux-next: fix x86 tree build failure
  x86: have set_memory_array_{uc,wb} coalesce memtypes, fix
  agp: enable optimized agp_alloc_pages methods
  x86: have set_memory_array_{uc,wb} coalesce memtypes.
  x86: {reverve,free}_memtype() take a physical address
  x86: fix pageattr-test
  agp: add agp_generic_destroy_pages()
  agp: generic_alloc_pages()
  ...
2008-10-11 11:02:56 -07:00
Linus Torvalds
098ef215b1 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Fix BUG: using smp_processor_id() in preemptible code
  [CPUFREQ] Don't export governors for default governor
  [CPUFREQ][6/6] cpufreq: Add idle microaccounting in ondemand governor
  [CPUFREQ][5/6] cpufreq: Changes to get_cpu_idle_time_us(), used by ondemand governor
  [CPUFREQ][4/6] cpufreq_ondemand: Parameterize down differential
  [CPUFREQ][3/6] cpufreq: get_cpu_idle_time() changes in ondemand for idle-microaccounting
  [CPUFREQ][2/6] cpufreq: Change load calculation in ondemand for software coordination
  [CPUFREQ][1/6] cpufreq: Add cpu number parameter to __cpufreq_driver_getavg()
  [CPUFREQ] use deferrable delayed work init in conservative governor
  [CPUFREQ] drivers/cpufreq/cpufreq.c: Adjust error handling code involving cpufreq_cpu_put
  [CPUFREQ] add error handling for cpufreq_register_governor() error
  [CPUFREQ] acpi-cpufreq: add error handling for cpufreq_register_driver() error
  [CPUFREQ] Coding style fixes to arch/x86/kernel/cpu/cpufreq/powernow-k6.c
  [CPUFREQ] Coding style fixes to arch/x86/kernel/cpu/cpufreq/elanfreq.c
2008-10-11 08:49:34 -07:00
Yinghai Lu
ee29753327 ACPI: don't load acpi_cpufreq if acpi=off
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-10-10 23:45:38 -04:00
Matt Mackall
5000cadcf3 x86: trim ACPI sleep stack buffer
x86_64 SMP suspend to RAM uses a 10k temporary stack for saving the
kernel state, but only 4k of it is used. Shrink it to 4k.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-10-10 18:05:52 -04:00
Matt Mackall
d0d0f7432c x86: remove magic number from ACPI sleep stack buffer
x86_64 SMP suspend to RAM uses a 10k temporary stack for saving the
kernel state, but only 4k of it is used. Shrink it to 4k.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-10-10 18:05:51 -04:00
Linus Torvalds
b11ce8a26d Merge branch 'sched-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (38 commits)
  sched debug: add name to sched_domain sysctl entries
  sched: sync wakeups vs avg_overlap
  sched: remove redundant code in cpu_cgroup_create()
  sched_rt.c: resch needed in rt_rq_enqueue() for the root rt_rq
  cpusets: scan_for_empty_cpusets(), cpuset doesn't seem to be so const
  sched: minor optimizations in wake_affine and select_task_rq_fair
  sched: maintain only task entities in cfs_rq->tasks list
  sched: fixup buddy selection
  sched: more sanity checks on the bandwidth settings
  sched: add some comments to the bandwidth code
  sched: fixlet for group load balance
  sched: rework wakeup preemption
  CFS scheduler: documentation about scheduling policies
  sched: clarify ifdef tangle
  sched: fix list traversal to use _rcu variant
  sched: turn off WAKEUP_OVERLAP
  sched: wakeup preempt when small overlap
  kernel/cpu.c: create a CPU_STARTING cpu_chain notifier
  kernel/cpu.c: Move the CPU_DYING notifiers
  sched: fix __load_balance_iterator() for cfq with only one task
  ...
2008-10-10 12:42:31 -07:00