ept_update_paging_mode_cr4() accesses vcpu->arch.cr4 directly, which usually
needs to be accessed via kvm_read_cr4(). In this case, we can't, since cr4
is in the process of being updated. Instead of adding inane comments, fold
the function into its caller (vmx_set_cr4), so it can use the not-yet-committed
cr4 directly.
Signed-off-by: Avi Kivity <avi@redhat.com>
We make no use of cr4.pge if ept is enabled, but the guest does (to flush
global mappings, as with vmap()), so give the guest ownership of this bit.
Signed-off-by: Avi Kivity <avi@redhat.com>
Instead of specifying the bits which we want to trap on, specify the bits
which we allow the guest to change transparently. This is safer wrt future
changes to cr4.
Signed-off-by: Avi Kivity <avi@redhat.com>
Some bits of cr4 can be owned by the guest on vmx, so when we read them,
we copy them to the vcpu structure. In preparation for making the set of
guest-owned bits dynamic, use helpers to access these bits so we don't need
to know where the bit resides.
No changes to svm since all bits are host-owned there.
Signed-off-by: Avi Kivity <avi@redhat.com>
We don't support these instructions, but guest can execute them even if the
feature('monitor') haven't been exposed in CPUID. So we would trap and inject
a #UD if guest try this way.
Cc: stable@kernel.org
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
In the past we've had errors of single-bit in the other two cases; the
printk() may confirm it for the third case (many->many).
Signed-off-by: Avi Kivity <avi@redhat.com>
Windows 2003 uses task switch to triple fault and reboot (the other
exception being reserved pdptrs bits).
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Move Double-Fault generation logic out of page fault
exception generating function to cover more generic case.
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, uv: Remove recursion in uv_heartbeat_enable()
x86, uv: uv_global_gru_mmr_address() macro fix
x86, uv: Add serial number parameter to uv_bios_get_sn_info()
* 'x86-pci-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Enable NMI on all cpus on UV
vgaarb: Add user selectability of the number of GPUS in a system
vgaarb: Fix VGA arbiter to accept PCI domains other than 0
x86, uv: Update UV arch to target Legacy VGA I/O correctly.
pci: Update pci_set_vga_state() to call arch functions
* 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, setup: Don't skip mode setting for the standard VGA modes
x86-64, setup: Inhibit decompressor output if video info is invalid
x86, setup: When restoring the screen, update boot_params.screen_info
* 'x86-rwsem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64, rwsem: Avoid store forwarding hazard in __downgrade_write
x86-64, rwsem: 64-bit xadd rwsem implementation
x86: Fix breakage of UML from the changes in the rwsem system
x86-64: support native xadd rwsem implementation
x86: clean up rwsem type system
* 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, numa: Remove configurable node size support for numa emulation
x86, numa: Add fixed node size option for numa emulation
x86, numa: Fix numa emulation calculation of big nodes
x86, acpi: Map hotadded cpu to correct node.
* 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Convert set_atomicity_lock to raw_spinlock
x86, mtrr: Kill over the top warn
x86, mtrr: Constify struct mtrr_ops
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, mm: Unify kernel_physical_mapping_init() API
x86, mm: Allow highmem user page tables to be disabled at boot time
x86: Do not reserve brk for DMI if it's not going to be used
x86: Convert tlbstate_lock to raw_spinlock
x86: Use the generic page_is_ram()
x86: Remove BIOS data range from e820
Move page_is_ram() declaration to mm.h
Generic page_is_ram: use __weak
resources: introduce generic page_is_ram()
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, cacheinfo: Enable L3 CID only on AMD
x86, cacheinfo: Remove NUMA dependency, fix for AMD Fam10h rev D1
x86, cpu: Print AMD virtualization features in /proc/cpuinfo
x86, cacheinfo: Calculate L3 indices
x86, cacheinfo: Add cache index disable sysfs attrs only to L3 caches
x86, cacheinfo: Fix disabling of L3 cache indices
intel-agp: Switch to wbinvd_on_all_cpus
x86, lib: Add wbinvd smp helpers
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (28 commits)
ftrace: Add function names to dangling } in function graph tracer
tracing: Simplify memory recycle of trace_define_field
tracing: Remove unnecessary variable in print_graph_return
tracing: Fix typo of info text in trace_kprobe.c
tracing: Fix typo in prof_sysexit_enable()
tracing: Remove CONFIG_TRACE_POWER from kernel config
tracing: Fix ftrace_event_call alignment for use with gcc 4.5
ftrace: Remove memory barriers from NMI code when not needed
tracing/kprobes: Add short documentation for HAVE_REGS_AND_STACK_ACCESS_API
s390: Add pt_regs register and stack access API
tracing/kprobes: Make Kconfig dependencies generic
tracing: Unify arch_syscall_addr() implementations
tracing: Add notrace to TRACE_EVENT implementation functions
ftrace: Allow to remove a single function from function graph filter
tracing: Add correct/incorrect to sort keys for branch annotation output
tracing: Simplify test for function_graph tracing start point
tracing: Drop the tr check from the graph tracing path
tracing: Add stack dump to trace_printk if stacktrace option is set
tracing: Use appropriate perl constructs in recordmcount.pl
tracing: optimize recordmcount.pl for offsets-handling
...
* 'oprofile-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
oprofile/x86: fix msr access to reserved counters
oprofile/x86: use kzalloc() instead of kmalloc()
oprofile/x86: fix perfctr nmi reservation for mulitplexing
oprofile/x86: add comment to counter-in-use warning
oprofile/x86: warn user if a counter is already active
oprofile/x86: implement randomization for IBS periodic op counter
oprofile/x86: implement lsfr pseudo-random number generator for IBS
oprofile/x86: implement IBS cpuid feature detection
oprofile/x86: remove node check in AMD IBS initialization
oprofile/x86: remove OPROFILE_IBS config option
oprofile: remove EXPERIMENTAL from the config option description
oprofile: remove tracing build dependency
* 'core-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
generic-ipi: Optimize accesses by using DEFINE_PER_CPU_SHARED_ALIGNED for IPI data
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
plist: Fix grammar mistake, and c-style mistake
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
kprobes: Add mcount to the kprobes blacklist
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86_64: Print modules like i386 does
* 'x86-doc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Put 'nopat' in kernel-parameters
* 'x86-gpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64: Allow fbdev primary video code
* 'x86-rlimit-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Use helpers for rlimits
Enable NMI on all cpus in UV system and add an NMI handler
to dump_stack on each cpu.
By default on x86 all the cpus except the boot cpu have NMI
masked off. This patch enables NMI on all cpus in UV system
and adds an NMI handler to dump_stack on each cpu. This
way if a system hangs we can NMI the machine and get a
backtrace from all the cpus.
Version 2: Use x86_platform driver mechanism for nmi init, per
Ingo's suggestion.
Version 3: Clean up Ingo's nits.
Signed-off-by: Russ Anderson <rja@sgi.com>
LKML-Reference: <20100226164912.GA24439@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'kmemcheck-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
kmemcheck: Test the full object in kmemcheck_is_obj_initialized()
Split amd,p6,intel into separate files so that we can easily deal with
CONFIG_CPU_SUP_* things, needed to make things build now that perf_event.c
relies on symbols from amd.c
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
During switching virtual counters there is access to perfctr msrs. If
the counter is not available this fails due to an invalid
address. This patch fixes this.
Cc: stable@kernel.org
Signed-off-by: Robert Richter <robert.richter@amd.com>
Multiple virtual counters share one physical counter. The reservation
of virtual counters fails due to duplicate allocation of the same
counter. The counters are already reserved. Thus, virtual counter
reservation may removed at all. This also makes the code easier.
Cc: stable@kernel.org
Signed-off-by: Robert Richter <robert.richter@amd.com>
Currently, oprofile fails silently on platforms where a non-OS entity
such as the system firmware "enables" and uses a performance
counter. There is a warning in the code for this case.
The warning indicates an already running counter. If oprofile doesn't
collect data, then try using a different performance counter on your
platform to monitor the desired event. Delete the counter from the
desired event by editing the
/usr/share/oprofile/<cpu_type>/<cpu>/events
file. If the event cannot be monitored by any other counter, contact
your hardware or BIOS vendor.
Cc: Shashi Belur <shashi-kiran.belur@hp.com>
Cc: Tony Jones <tonyj@suse.de>
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
This patch generates a warning if a counter is already active.
Implemented for AMD and P6 models. P4 is not supported.
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Shashi Belur <shashi-kiran.belur@hp.com>
Cc: Tony Jones <tonyj@suse.de>
Signed-off-by: Robert Richter <robert.richter@amd.com>
IBS selects an op (execution operation) for sampling by counting
either cycles or dispatched ops. Better statistical samples can be
produced by adding a software generated random offset to the periodic
op counter value with each sample.
This patch adds software randomization to the IBS periodic op
counter. The lower 12 bits of the 20 bit counter are
randomized. IbsOpCurCnt is initialized with a 12 bit random value.
There is a work around if the hw can not write to IbsOpCurCnt. Then
the lower 8 bits of the 16 bit IbsOpMaxCnt [15:0] value are randomized
in the range of -128 to +127 by adding/subtracting an offset to the
maximum count (IbsOpMaxCnt).
The linear feedback shift register (LFSR) algorithm is used for
pseudo-random number generation to have low impact to the memory
system.
Signed-off-by: Robert Richter <robert.richter@amd.com>
This patch implements a linear feedback shift register (LFSR) for
pseudo-random number generation for IBS.
For IBS measurements it would be good to minimize memory traffic in
the interrupt handler since every access pollutes the data
caches. Computing a maximal period LFSR just needs shifts and ORs.
The LFSR method is good enough to randomize the ops at low
overhead. 16 pseudo-random bits are enough for the implementation and
it doesn't matter that the pattern repeats with a fairly short
cycle. It only needs to break up (hard) periodic sampling behavior.
The logic was designed by Paul Drongowski.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
This patch adds IBS feature detection using cpuid flags. An IBS
capability mask is introduced to test for certain IBS features. The
bit mask is the same as for IBS cpuid feature flags (Fn8000_001B_EAX),
but bit 0 is used to indicate the existence of IBS.
The patch also changes the handling of the IbsOpCntCtl bit (periodic
op counter count control). The oprofilefs file for this feature
(ibs_op/dispatched_ops) will be only exposed if the feature is
available, also the default for the bit is set to count clock cycles.
In general, the userland can detect the availability of a feature by
checking for the corresponding file in oprofilefs. If it exists, the
feature also exists. This may lead to a dynamic file layout depending
on the cpu type with that the userland has to deal with. Current
opcontrol is compatible.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Standard AMD systems have the same number of nodes as there are
northbridge devices. However, there may kernel configurations
(especially for 32 bit) or system setups exist, where the node number
is different or it can not be detected properly. Thus the check is not
reliable and may fail though IBS setup was fine. For this reason it is
better to remove the check.
Cc: stable <stable@kernel.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
OProfile support for IBS is now for several versions in the
kernel. The feature is stable now and the code can be activated
permanently.
As a side effect IBS now works also on nosmp configs.
Signed-off-by: Robert Richter <robert.richter@amd.com>
We re-program the event control register every time we reset the count,
this appears to be superflous, hence remove it.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since the cpu argument to hw_perf_group_sched_in() is always
smp_processor_id(), simplify the code a little by removing this argument
and using the current cpu where needed.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1265890918.5396.3.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds correct AMD NorthBridge event scheduling.
NB events are events measuring L3 cache, Hypertransport traffic. They are
identified by an event code >= 0xe0. They measure events on the
Northbride which is shared by all cores on a package. NB events are
counted on a shared set of counters. When a NB event is programmed in a
counter, the data actually comes from a shared counter. Thus, access to
those counters needs to be synchronized.
We implement the synchronization such that no two cores can be measuring
NB events using the same counters. Thus, we maintain a per-NB allocation
table. The available slot is propagated using the event_constraint
structure.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4b703957.0702d00a.6bf2.7b7d@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In certain situations, the kernel may need to stop and start the same
event rapidly. The current PMU callbacks do not distinguish between stop
and release (i.e., stop + free the resource). Thus, a counter may be
released, then it will be immediately re-acquired. Event scheduling will
again take place with no guarantee to assign the same counter. On some
processors, this may event yield to failure to assign the event back due
to competion between cores.
This patch is adding a new pair of callback to stop and restart a counter
without actually release the underlying counter resource. On stop, the
counter is stopped, its values saved and that's it. On start, the value
is reloaded and counter is restarted (on x86, actual restart is delayed
until perf_enable()).
Signed-off-by: Stephane Eranian <eranian@google.com>
[ added fallback to ->enable/->disable for all other PMUs
fixed x86_pmu_start() to call x86_pmu.enable()
merged __x86_pmu_disable into x86_pmu_stop() ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4b703875.0a04d00a.7896.ffffb824@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch changes the 32-bit version of kernel_physical_mapping_init() to
return the last mapped address like the 64-bit one so that we can unify the
call-site in init_memory_mapping().
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <alpine.DEB.2.00.1002241703570.1180@melkki.cs.helsinki.fi>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
commit ff097ddd4 (x86/PCI: MMCONFIG: manage pci_mmcfg_region as a
list, not a table) introduced a nasty memory corruption when
pci_mmcfg_list is empty.
pci_mmcfg_check_end_bus_number() dereferences pci_mmcfg_list.prev even
when the list is empty. The following write hits some variable near to
pci_mmcfg_list.
Further down a similar problem exists, where cfg->list.next is
dereferenced unconditionally and a comparison with some variable near
to pci_mmcfg_list happens.
Add a check for the last element into the for_each_entry() loop and
remove all the other crappy logic which is just a leftover of the old
array based code which was replaced by the list conversion.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The code in stop_machine that modifies the kernel text has a bit
of logic to handle the case of NMIs. stop_machine does not prevent
NMIs from executing, and if an NMI were to trigger on another CPU
as the modifying CPU is changing the NMI text, a GPF could result.
To prevent the GPF, the NMI calls ftrace_nmi_enter() which may
modify the code first, then any other NMIs will just change the
text to the same content which will do no harm. The code that
stop_machine called must wait for NMIs to finish while it changes
each location in the kernel. That code may also change the text
to what the NMI changed it to. The key is that the text will never
change content while another CPU is executing it.
To make the above work, the call to ftrace_nmi_enter() must also
do a smp_mb() as well as atomic_inc(). But for applications like
perf that require a high number of NMIs for profiling, this can have
a dramatic effect on the system. Not only is it doing a full memory
barrier on both nmi_enter() as well as nmi_exit() it is also
modifying a global variable with an atomic operation. This kills
performance on large SMP machines.
Since the memory barriers are only needed when ftrace is in the
process of modifying the text (which is seldom), this patch
adds a "modifying_code" variable that gets set before stop machine
is executed and cleared afterwards.
The NMIs will check this variable and store it in a per CPU
"save_modifying_code" variable that it will use to check if it
needs to do the memory barriers and atomic dec on NMI exit.
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>