The help text for RANDOMIZE_BASE_MAX_OFFSET was confusing. This has been
clarified, and updated to be an export-only tunable.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20131210202745.GA2961@www.outflux.net
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Currently we do a read, a dummy write and a final read to fetch
the error code. The value from the final read is taken.
This is not the recommended way and leads to corrupted/lost ESR
values.
Intel(c) 64 and IA-32 Architectures Software Developer's Manual,
Combined Volumes 1, 2ABC, 3ABC, Section 10.5.3 states:
Before attempt to read from the ESR, software should first
write to it. (The value written does not affect the values read
subsequently; only zero may be written in x2APIC mode.) This
write clears any previously logged errors and updates the ESR
with any errors detected since the last write to the ESR.
This write also rearms the APIC error interrupt triggering
mechanism.
This patch removes the first read such that we are conform with
the manual.
On my (very old) Pentium MMX SMP system this patch fixes the
issue that APIC errors:
a) are not always reported and
b) are reported with false error numbers.
Signed-off-by: Richard Weinberger <richard@nod.at>
Cc: seiji.aguchi@hds.com
Cc: rientjes@google.com
Cc: konrad.wilk@oracle.com
Cc: bp@alien8.de
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1389685487-20872-1-git-send-email-richard@nod.at
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We've grown a bunch of microcode loader files all prefixed with
"microcode_". They should be under cpu/ because this is strictly
CPU-related functionality so do that and drop the prefix since they're
in their own directory now which gives that prefix. :)
While at it, drop MICROCODE_INTEL_LIB config item and stash the
functionality under CONFIG_MICROCODE_INTEL as it was its only user.
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
The original idea to use the microcode cache for the APs doesn't pan out
because we do memory allocation there very early and with IRQs disabled
and we don't want to involve GFP_ATOMIC allocations. Not if it can be
helped.
Thus, extend the caching of the BSP patch approach to the APs and
iterate over the ucode in the initrd instead of using the cache. We
still save the relevant patches to it but later, right before we
jettison the initrd.
While at it, fix early ucode loading on 32-bit too.
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
We want to use those in AMD's early loading path too. Also, add a
native_wrmsrl variant.
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
The ramdisk can possibly get relocated if the whole image is not mapped.
And since we're going over it in the microcode loader and fishing out
the relevant microcode patches, we want access it at its new location.
Thus, export it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
With various drivers wanting to inject idle time; we get people
calling idle routines outside of the idle loop proper.
Therefore we need to be extra careful about not missing
TIF_NEED_RESCHED -> PREEMPT_NEED_RESCHED propagations.
While looking at this, I also realized there's a small window in the
existing idle loop where we can miss TIF_NEED_RESCHED; when it hits
right after the tif_need_resched() test at the end of the loop but
right before the need_resched() test at the start of the loop.
So move preempt_fold_need_resched() out of the loop where we're
guaranteed to have TIF_NEED_RESCHED set.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-x9jgh45oeayzajz2mjt0y7d6@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use a ring-buffer like multi-version object structure which allows
always having a coherent object; we use this to avoid having to
disable IRQs while reading sched_clock() and avoids a problem when
getting an NMI while changing the cyc2ns data.
MAINLINE PRE POST
sched_clock_stable: 1 1 1
(cold) sched_clock: 329841 331312 257223
(cold) local_clock: 301773 310296 309889
(warm) sched_clock: 38375 38247 25280
(warm) local_clock: 100371 102713 85268
(warm) rdtsc: 27340 27289 24247
sched_clock_stable: 0 0 0
(cold) sched_clock: 382634 372706 301224
(cold) local_clock: 396890 399275 399870
(warm) sched_clock: 38194 38124 25630
(warm) local_clock: 143452 148698 129629
(warm) rdtsc: 27345 27365 24307
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-s567in1e5ekq2nlyhn8f987r@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fengguang Wu's 0day kernel build service reported the following build warning:
arch/x86/kernel/apic/io_apic.c:2211
smp_irq_move_cleanup_interrupt() warn: always true condition '(irq <= -1) => (0-u32max <= (-1))'
because irq is defined as an unsigned int instead of an int.
Fix this trivial error by redefining irq as a signed int. The
remaining consumers of the int are okay.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Link: http://lkml.kernel.org/r/1389620420-7110-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are no __cycles_2_ns() users outside of arch/x86/kernel/tsc.c,
so move it there.
There are no cycles_2_ns() users.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-01lslnavfgo3kmbo4532zlcj@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add the syscalls needed for supporting scheduling algorithms
with extended scheduling parameters (e.g., SCHED_DEADLINE).
In general, it makes possible to specify a periodic/sporadic task,
that executes for a given amount of runtime at each instance, and is
scheduled according to the urgency of their own timing constraints,
i.e.:
- a (maximum/typical) instance execution time,
- a minimum interval between consecutive instances,
- a time constraint by which each instance must be completed.
Thus, both the data structure that holds the scheduling parameters of
the tasks and the system calls dealing with it must be extended.
Unfortunately, modifying the existing struct sched_param would break
the ABI and result in potentially serious compatibility issues with
legacy binaries.
For these reasons, this patch:
- defines the new struct sched_attr, containing all the fields
that are necessary for specifying a task in the computational
model described above;
- defines and implements the new scheduling related syscalls that
manipulate it, i.e., sched_setattr() and sched_getattr().
Syscalls are introduced for x86 (32 and 64 bits) and ARM only, as a
proof of concept and for developing and testing purposes. Making them
available on other architectures is straightforward.
Since no "user" for these new parameters is introduced in this patch,
the implementation of the new system calls is just identical to their
already existing counterpart. Future patches that implement scheduling
policies able to exploit the new data structure must also take care of
modifying the sched_*attr() calls accordingly with their own purposes.
Signed-off-by: Dario Faggioli <raistlin@linux.it>
[ Rewrote to use sched_attr. ]
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
[ Removed sched_setscheduler2() for now. ]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-3-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJS0miqAAoJEHm+PkMAQRiGbfgIAJSWEfo8ludknhPcHJabBtxu
75SQAKJlL3sBVnxEc58Rtt8gsKYQIrm4IY5Slunklsn04RxuDUIQMgFoAYR5gQwz
+Myqkw/HOqDe5VStGxtLYpWnfglxVwGDCd7ISfL9AOVy5adMWBxh4Tv+qqQc7aIZ
eF7dy+DD+C6Q3Z5OoV8s0FZDxse29vOf17Nki7+7t8WMqyegYwjoOqNeqocGKsPi
eHLrJgTl4T6jB4l9LKKC154DSKjKOTSwZMWgwK8mToyNLT/ufCiKgXloIjEvZZcY
VVKUtncdHiTf+iqVojgpGBzOEeB5DM83iiapFeDiJg8C9yBzvT8lBtA9aPb5Wgw=
=lEeV
-----END PGP SIGNATURE-----
Merge tag 'v3.13-rc8' into core/locking
Refresh the tree with the latest fixes, before applying new changes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
* acpica: (21 commits)
ACPICA: Update version to 20131218.
ACPICA: Utilities: Cleanup declarations of the acpi_gbl_debug_file global.
ACPICA: Linuxize: Cleanup spaces after special macro invocations.
ACPICA: Interpreter: Add additional debug info for an error case.
ACPICA: Update ACPI example code to make it an actual working program.
ACPICA: Add an error message if the Debugger fails initialization.
ACPICA: Conditionally define a local variable that is used for debug only.
ACPICA: Parser: Updates/fixes for debug output.
ACPICA: Enhance ACPI warning for memory/IO address conflicts.
ACPICA: Update several debug statements - no functional change.
ACPICA: Improve exception handling for GPE block installation.
ACPICA: Add helper macros to extract bus/segment numbers from HEST table.
ACPICA: Tables: Add full support for the PCCT table, update table definition.
ACPICA: Tables: Add full support for the DBG2 table.
ACPICA: Add option to favor 32-bit FADT addresses.
ACPICA: Cleanup the option of forcing the use of the RSDT.
ACPICA: Back port and refine validation of the XSDT root table.
ACPICA: Linux Header: Remove unused OSL prototypes.
ACPICA: Remove unused ACPI_FREE_BUFFER macro. No functional change.
ACPICA: Disassembler: Improve pathname support for emitted External() statements.
...
* acpi-cleanup: (22 commits)
ACPI / tables: Return proper error codes from acpi_table_parse() and fix comment.
ACPI / tables: Check if id is NULL in acpi_table_parse()
ACPI / proc: Include appropriate header file in proc.c
ACPI / EC: Remove unused functions and add prototype declaration in internal.h
ACPI / dock: Include appropriate header file in dock.c
ACPI / PCI: Include appropriate header file in pci_link.c
ACPI / PCI: Include appropriate header file in pci_slot.c
ACPI / EC: Mark the function acpi_ec_add_debugfs() as static in ec_sys.c
ACPI / NVS: Include appropriate header file in nvs.c
ACPI / OSL: Mark the function acpi_table_checksum() as static
ACPI / processor: initialize a variable to silence compiler warning
ACPI / processor: use ACPI_COMPANION() to get ACPI device
ACPI: correct minor typos
ACPI / sleep: Drop redundant acpi_disabled check
ACPI / dock: Drop redundant acpi_disabled check
ACPI / table: Replace '1' with specific error return values
ACPI: remove trailing whitespace
ACPI / IBFT: Fix incorrect <acpi/acpi.h> inclusion in iSCSI boot firmware module
ACPI / i915: Fix incorrect <acpi/acpi.h> inclusions via <linux/acpi_io.h>
SFI / ACPI: Fix warnings reported during builds with W=1
...
Conflicts:
drivers/acpi/nvs.c
drivers/hwmon/asus_atk0110.c
So mce_start_timer() has a 'cpu' argument which is supposed to mean to
start a timer on that cpu. However, the code currently starts a timer on
the *current* cpu the function runs on and causes the sanity-check in
mce_timer_fn to fire:
WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/mcheck/mce.c:1286 mce_timer_fn
because it is running on the wrong cpu.
This was triggered by Prarit Bhargava <prarit@redhat.com> by offlining
all the cpus in succession.
Then, we were fiddling with the CMCI storm settings when starting the
timer whereas there's no need for that - if there's storm happening
on this newly restarted cpu, we're going to be in normal CMCI mode
initially and then when the CMCI interrupt starts firing, we're going to
go to the polling mode with the timer real soon.
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Reviewed-by: Chen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1387722156-5511-1-git-send-email-prarit@redhat.com
During heavy CPU-hotplug operations the following spurious kernel warnings
can trigger:
do_IRQ: No ... irq handler for vector (irq -1)
[ See: https://bugzilla.kernel.org/show_bug.cgi?id=64831 ]
When downing a cpu it is possible that there are unhandled irqs
left in the APIC IRR register. The following code path shows
how the problem can occur:
1. CPU 5 is to go down.
2. cpu_disable() on CPU 5 executes with interrupt flag cleared
by local_irq_save() via stop_machine().
3. IRQ 12 asserts on CPU 5, setting IRR but not ISR because
interrupt flag is cleared (CPU unabled to handle the irq)
4. IRQs are migrated off of CPU 5, and the vectors' irqs are set
to -1. 5. stop_machine() finishes cpu_disable()
6. cpu_die() for CPU 5 executes in normal context.
7. CPU 5 attempts to handle IRQ 12 because the IRR is set for
IRQ 12. The code attempts to find the vector's IRQ and cannot
because it has been set to -1. 8. do_IRQ() warning displays
warning about CPU 5 IRQ 12.
I added a debug printk to output which CPU & vector was
retriggered and discovered that that we are getting bogus
events. I see a 100% correlation between this debug printk in
fixup_irqs() and the do_IRQ() warning.
This patchset resolves this by adding definitions for
VECTOR_UNDEFINED(-1) and VECTOR_RETRIGGERED(-2) and modifying
the code to use them.
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=64831
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Rui Wang <rui.y.wang@intel.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Yang Zhang <yang.z.zhang@Intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: janet.morgan@Intel.com
Cc: tony.luck@Intel.com
Cc: ruiv.wang@gmail.com
Link: http://lkml.kernel.org/r/1388938252-16627-1-git-send-email-prarit@redhat.com
[ Cleaned up the code a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A number of situations currently require the heavyweight smp_mb(),
even though there is no need to order prior stores against later
loads. Many architectures have much cheaper ways to handle these
situations, but the Linux kernel currently has no portable way
to make use of them.
This commit therefore supplies smp_load_acquire() and
smp_store_release() to remedy this situation. The new
smp_load_acquire() primitive orders the specified load against
any subsequent reads or writes, while the new smp_store_release()
primitive orders the specifed store against any prior reads or
writes. These primitives allow array-based circular FIFOs to be
implemented without an smp_mb(), and also allow a theoretical
hole in rcu_assign_pointer() to be closed at no additional
expense on most architectures.
In addition, the RCU experience transitioning from explicit
smp_read_barrier_depends() and smp_wmb() to rcu_dereference()
and rcu_assign_pointer(), respectively resulted in substantial
improvements in readability. It therefore seems likely that
replacing other explicit barriers with smp_load_acquire() and
smp_store_release() will provide similar benefits. It appears
that roughly half of the explicit barriers in core kernel code
might be so replaced.
[Changelog by PaulMck]
Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Victor Kaplansky <VICTORK@il.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch adds support for the Intel RAPL energy counter
PP1 (Power Plane 1).
On client processors, it usually corresponds to the
energy consumption of the builtin graphic card. That
is why the sysfs event is called energy-gpu.
New event:
- name: power/energy-gpu/
- code: event=0x4
- unit: 2^-32 Joules
On processors without graphics, this should count 0.
The patch only enables this event on client processors.
Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: ak@linux.intel.com
Cc: acme@redhat.com
Cc: jolsa@redhat.com
Cc: zheng.z.yan@intel.com
Cc: bp@alien8.de
Cc: vincent.weaver@maine.edu
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1389176153-3128-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus disliked the _no_lockdep() naming, so instead
use the more-consistent raw_* prefix to the non-lockdep
enabled seqcount methods.
This also adds raw_ methods for the write operations
as well, which will be utilized in a following patch.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Krzysztof Hałasa <khalasa@piap.pl>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Willy Tarreau <w@1wt.eu>
Link: http://lkml.kernel.org/r/1388704274-5278-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Before we do an EMMS in the AMD FXSAVE information leak workaround we
need to clear any pending exceptions, otherwise we trap with a
floating-point exception inside this code.
Reported-by: halfdog <me@halfdog.net>
Tested-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* pci/resource:
PCI: Allocate 64-bit BARs above 4G when possible
PCI: Enforce bus address limits in resource allocation
PCI: Split out bridge window override of minimum allocation address
agp/ati: Use PCI_COMMAND instead of hard-coded 4
agp/intel: Use CPU physical address, not bus address, for ioremap()
agp/intel: Use pci_bus_address() to get GTTADR bus address
agp/intel: Use pci_bus_address() to get MMADR bus address
agp/intel: Support 64-bit GMADR
agp/intel: Rename gtt_bus_addr to gtt_phys_addr
drm/i915: Rename gtt_bus_addr to gtt_phys_addr
agp: Use pci_resource_start() to get CPU physical address for BAR
agp: Support 64-bit APBASE
PCI: Add pci_bus_address() to get bus address of a BAR
PCI: Convert pcibios_resource_to_bus() to take a pci_bus, not a pci_dev
PCI: Change pci_bus_region addresses to dma_addr_t
The usage of 'select' means it will enable the CONFIG
options without checking their dependencies. That meant
we would inadvertently turn on CONFIG_XEN_PVHM while its
core dependency (CONFIG_PCI) was turned off.
This patch fixes the warnings and compile failures:
warning: (XEN_PVH) selects XEN_PVHVM which has unmet direct
dependencies (HYPERVISOR_GUEST && XEN && PCI && X86_LOCAL_APIC)
Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Function tracing callbacks expect to have the ftrace_ops that registered it
passed to them, not the address of the variable that holds the ftrace_ops
that registered it.
Use a mov instead of a lea to store the ftrace_ops into the parameter
of the function tracing callback.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20131113152004.459787f9@gandalf.local.home
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.8+
Current Intel SOC cores use a MailBox Interface (MBI) to provide access to
configuration registers on devices (called units) connected to the system
fabric. This is a support driver that implements access to this interface on
those platforms that can enumerate the device using PCI. Initial support is for
BayTrail, for which port definitons are provided. This is a requirement for
implementing platform specific features (e.g. RAPL driver requires this to
perform platform specific power management using the registers in PUNIT).
Dependant modules should select IOSF_MBI in their respective Kconfig
configuraiton. Serialized access is handled by all exported routines with
spinlocks.
The API includes 3 functions for access to unit registers:
int iosf_mbi_read(u8 port, u8 opcode, u32 offset, u32 *mdr)
int iosf_mbi_write(u8 port, u8 opcode, u32 offset, u32 mdr)
int iosf_mbi_modify(u8 port, u8 opcode, u32 offset, u32 mdr, u32 mask)
port: indicating the unit being accessed
opcode: the read or write port specific opcode
offset: the register offset within the port
mdr: the register data to be read, written, or modified
mask: bit locations in mdr to change
Returns nonzero on error
Note: GPU code handles access to the GFX unit. Therefore access to that unit
with this driver is disallowed to avoid conflicts.
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1389216471-734-1-git-send-email-david.e.box@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
After free_loaded_vmcs executes, the "loaded_vmcs" structure
is kfreed, and now vmx->loaded_vmcs points to a kfreed area.
Subsequent free_loaded_vmcs then attempts to manipulate
vmx->loaded_vmcs.
Switch the order to avoid the problem.
https://bugzilla.redhat.com/show_bug.cgi?id=1047892
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
fix the 'vcpi' typos when apic_debug is enabled.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
According to Table C-1 of Intel SDM 3C, a VM exit happens on an I/O instruction when
"use I/O bitmaps" VM-execution control was 0 _and_ the "unconditional I/O exiting"
VM-execution control was 1. So we can't just check "unconditional I/O exiting" alone.
This patch was improved by suggestion from Jan Kiszka.
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Zhihui Zhang <zzhsuny@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This change adds a runtime option that will force ACPICA to use the
RSDT instead of the XSDT. Although the ACPI spec requires that an XSDT
be used instead of the RSDT, the XSDT has been found to be corrupt or
ill-formed on some machines.
This option is already in the Linux kernel. When it is back ported to
ACPICA, code is re-written to follow ACPICA coding style. This patch
is the generation of the integration.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When allocating space for 32-bit BARs, we previously limited RESOURCE
addresses so they would fit in 32 bits. However, the BUS address need not
be the same as the resource address, and it's the bus address that must fit
in the 32-bit BAR.
This patch adds:
- pci_clip_resource_to_region(), which clips a resource so it contains
only the range that maps to the specified bus address region, e.g., to
clip a resource to 32-bit bus addresses, and
- pci_bus_alloc_from_region(), which allocates space for a resource from
the specified bus address region,
and changes pci_bus_alloc_resource() to allocate space for 64-bit BARs from
the entire bus address region, and space for 32-bit BARs from only the bus
address region below 4GB.
If we had this window:
pci_root HWP0002:0a: host bridge window [mem 0xf0180000000-0xf01fedfffff] (bus address [0x80000000-0xfedfffff])
we previously could not put a 32-bit BAR there, because the CPU addresses
don't fit in 32 bits. This patch fixes this, so we can use this space for
32-bit BARs.
It's also possible (though unlikely) to have resources with 32-bit CPU
addresses but bus addresses above 4GB. In this case the previous code
would allocate space that a 32-bit BAR could not map.
Remove PCIBIOS_MAX_MEM_32, which is no longer used.
[bhelgaas: reworked starting from http://lkml.kernel.org/r/1386658484-15774-3-git-send-email-yinghai@kernel.org]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Oddly enough it compiles for my ancient compiler but with
the supplied .config it does blow up. Fix is easy enough.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
None of these files are actually using any __init type directives
and hence don't need to include <linux/init.h>. Most are just a
left over from __devinit and __cpuinit removal, or simply due to
code getting copied from one driver to the next.
[ hpa: undid incorrect removal from arch/x86/kernel/head_32.S ]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/1389054026-12947-1-git-send-email-paul.gortmaker@windriver.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Conflicts:
drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
net/ipv6/ip6_tunnel.c
net/ipv6/ip6_vti.c
ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.
qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.
Signed-off-by: David S. Miller <davem@davemloft.net>
PVH allows PV linux guest to utilize hardware extended capabilities,
such as running MMU updates in a HVM container.
The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
with modifications):
"* the guest uses auto translate:
- p2m is managed by Xen
- pagetables are owned by the guest
- mmu_update hypercall not available
* it uses event callback and not vlapic emulation,
* IDT is native, so set_trap_table hcall is also N/A for a PVH guest.
For a full list of hcalls supported for PVH, see pvh_hypercall64_table
in arch/x86/hvm/hvm.c in xen. From the ABI prespective, it's mostly a
PV guest with auto translate, although it does use hvm_op for setting
callback vector."
Use .ascii and .asciz to define xen feature string. Note, the PVH
string must be in a single line (not multiple lines with \) to keep the
assembler from putting null char after each string before \.
This patch allows it to be configured and enabled.
We also use introduce the 'XEN_ELFNOTE_SUPPORTED_FEATURES' ELF note to
tell the hypervisor that 'hvm_callback_vector' is what the kernel
needs. We can not put it in 'XEN_ELFNOTE_FEATURES' as older hypervisor
parse fields they don't understand as errors and refuse to load
the kernel. This work-around fixes the problem.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
In PVH the shared grant frame is the PFN and not MFN,
hence its mapped via the same code path as HVM.
The allocation of the grant frame is done differently - we
do not use the early platform-pci driver and have an
ioremap area - instead we use balloon memory and stitch
all of the non-contingous pages in a virtualized area.
That means when we call the hypervisor to replace the GMFN
with a XENMAPSPACE_grant_table type, we need to lookup the
old PFN for every iteration instead of assuming a flat
contingous PFN allocation.
Lastly, we only use v1 for grants. This is because PVHVM
is not able to use v2 due to no XENMEM_add_to_physmap
calls on the error status page (see commit
69e8f430e243d657c2053f097efebc2e2cd559f0
xen/granttable: Disable grant v2 for HVM domains.)
Until that is implemented this workaround has to
be in place.
Also per suggestions by Stefano utilize the PVHVM paths
as they share common functionality.
v2 of this patch moves most of the PVH code out in the
arch/x86/xen/grant-table driver and touches only minimally
the generic driver.
v3, v4: fixes us some of the code due to earlier patches.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
The 'xen_hvm_resume_frames' used to be an 'unsigned long'
and contain the virtual address of the grants. That was OK
for most architectures (PVHVM, ARM) were the grants are contiguous
in memory. That however is not the case for PVH - in which case
we will have to do a lookup for each virtual address for the PFN.
Instead of doing that, lets make it a structure which will contain
the array of PFNs, the virtual address and the count of said PFNs.
Also provide a generic functions: gnttab_setup_auto_xlat_frames and
gnttab_free_auto_xlat_frames to populate said structure with
appropriate values for PVHVM and ARM.
To round it off, change the name from 'xen_hvm_resume_frames' to
a more descriptive one - 'xen_auto_xlat_grant_frames'.
For PVH, in patch "xen/pvh: Piggyback on PVHVM for grant driver"
we will populate the 'xen_auto_xlat_grant_frames' by ourselves.
v2 moves the xen_remap in the gnttab_setup_auto_xlat_frames
and also introduces xen_unmap for gnttab_free_auto_xlat_frames.
Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v3: Based on top of 'asm/xen/page.h: remove redundant semicolon']
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
PVH is a PV guest with a twist - there are certain things
that work in it like HVM and some like PV. There is
a similar mode - PVHVM where we run in HVM mode with
PV code enabled - and this patch explores that.
The most notable PV interfaces are the XenBus and event channels.
We will piggyback on how the event channel mechanism is
used in PVHVM - that is we want the normal native IRQ mechanism
and we will install a vector (hvm callback) for which we
will call the event channel mechanism.
This means that from a pvops perspective, we can use
native_irq_ops instead of the Xen PV specific. Albeit in the
future we could support pirq_eoi_map. But that is
a feature request that can be shared with PVHVM.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
In xen_add_extra_mem() we can skip updating P2M as it's managed
by Xen. PVH maps the entire IO space, but only RAM pages need
to be repopulated.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
The VCPU bringup protocol follows the PV with certain twists.
From xen/include/public/arch-x86/xen.h:
Also note that when calling DOMCTL_setvcpucontext and VCPU_initialise
for HVM and PVH guests, not all information in this structure is updated:
- For HVM guests, the structures read include: fpu_ctxt (if
VGCT_I387_VALID is set), flags, user_regs, debugreg[*]
- PVH guests are the same as HVM guests, but additionally use ctrlreg[3] to
set cr3. All other fields not used should be set to 0.
This is what we do. We piggyback on the 'xen_setup_gdt' - but modify
a bit - we need to call 'load_percpu_segment' so that 'switch_to_new_gdt'
can load per-cpu data-structures. It has no effect on the VCPU0.
We also piggyback on the %rdi register to pass in the CPU number - so
that when we bootup a new CPU, the cpu_bringup_and_idle will have
passed as the first parameter the CPU number (via %rdi for 64-bit).
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
During early bootup we start life using the Xen provided
GDT, which means that we are running with %cs segment set
to FLAT_KERNEL_CS (FLAT_RING3_CS64 0xe033, GDT index 261).
But for PVH we want to be use HVM type mechanism for
segment operations. As such we need to switch to the HVM
one and also reload ourselves with the __KERNEL_CS:eip
to run in the proper GDT and segment.
For HVM this is usually done in 'secondary_startup_64' in
(head_64.S) but since we are not taking that bootup
path (we start in PV - xen_start_kernel) we need to do
that in the early PV bootup paths.
For good measure we also zero out the %fs, %ds, and %es
(not strictly needed as Xen has already cleared them
for us). The %gs is loaded by 'switch_to_new_gdt'.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
For PVHVM the shared_info structure is provided via the same way
as for normal PV guests (see include/xen/interface/xen.h).
That is during bootup we get 'xen_start_info' via the %esi register
in startup_xen. Then later we extract the 'shared_info' from said
structure (in xen_setup_shared_info) and start using it.
The 'xen_setup_shared_info' is all setup to work with auto-xlat
guests, but there are two functions which it calls that are not:
xen_setup_mfn_list_list and xen_setup_vcpu_info_placement.
This patch modifies the P2M code (xen_setup_mfn_list_list)
while the "Piggyback on PVHVM for event channels" modifies
the xen_setup_vcpu_info_placement.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We also optimize one - the TLB flush. The native operation would
needlessly IPI offline VCPUs causing extra wakeups. Using the
Xen one avoids that and lets the hypervisor determine which
VCPU needs the TLB flush.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>