18734 Commits

Author SHA1 Message Date
Kees Cook
da2b6fb990 x86, kaslr: Clarify RANDOMIZE_BASE_MAX_OFFSET
The help text for RANDOMIZE_BASE_MAX_OFFSET was confusing. This has been
clarified, and updated to be an export-only tunable.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20131210202745.GA2961@www.outflux.net
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-14 10:45:56 -08:00
Wei Yongjun
19259943f0 x86, kaslr: Remove unused including <linux/version.h>
Remove including <linux/version.h> that don't need it.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Link: http://lkml.kernel.org/r/CAPgLHd-Fjx1RybjWFAu1vHRfTvhWwMLL3x46BouC5uNxHPjy1A@mail.gmail.com
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-14 10:45:56 -08:00
Richard Weinberger
60283df7ac x86/apic: Read Error Status Register correctly
Currently we do a read, a dummy write and a final read to fetch
the error code. The value from the final read is taken.
This is not the recommended way and leads to corrupted/lost ESR
values.

Intel(c) 64 and IA-32 Architectures Software Developer's Manual,
Combined Volumes 1, 2ABC, 3ABC, Section 10.5.3 states:

  Before attempt to read from the ESR, software should first
  write to it. (The value written does not affect the values read
  subsequently; only zero may be written in x2APIC mode.) This
  write clears any previously logged errors and updates the ESR
  with any errors detected since the last write to the ESR.
  This write also rearms the APIC error interrupt triggering
  mechanism.

This patch removes the first read such that we are conform with
the manual.

On my (very old) Pentium MMX SMP system this patch fixes the
issue that APIC errors:

  a) are not always reported and
  b) are reported with false error numbers.

Signed-off-by: Richard Weinberger <richard@nod.at>
Cc: seiji.aguchi@hds.com
Cc: rientjes@google.com
Cc: konrad.wilk@oracle.com
Cc: bp@alien8.de
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1389685487-20872-1-git-send-email-richard@nod.at
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-14 14:05:36 +01:00
Borislav Petkov
bad5fa631f x86, microcode: Move to a proper location
We've grown a bunch of microcode loader files all prefixed with
"microcode_". They should be under cpu/ because this is strictly
CPU-related functionality so do that and drop the prefix since they're
in their own directory now which gives that prefix. :)

While at it, drop MICROCODE_INTEL_LIB config item and stash the
functionality under CONFIG_MICROCODE_INTEL as it was its only user.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
2014-01-13 20:00:12 +01:00
Borislav Petkov
5335ba5cf4 x86, microcode, AMD: Fix early ucode loading
The original idea to use the microcode cache for the APs doesn't pan out
because we do memory allocation there very early and with IRQs disabled
and we don't want to involve GFP_ATOMIC allocations. Not if it can be
helped.

Thus, extend the caching of the BSP patch approach to the APs and
iterate over the ucode in the initrd instead of using the cache. We
still save the relevant patches to it but later, right before we
jettison the initrd.

While at it, fix early ucode loading on 32-bit too.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
2014-01-13 19:59:38 +01:00
Borislav Petkov
e1b43e3f13 x86, microcode: Share native MSR accessing variants
We want to use those in AMD's early loading path too. Also, add a
native_wrmsrl variant.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
2014-01-13 19:57:27 +01:00
Borislav Petkov
5aa3d718f2 x86, ramdisk: Export relocated ramdisk VA
The ramdisk can possibly get relocated if the whole image is not mapped.
And since we're going over it in the microcode loader and fishing out
the relevant microcode patches, we want access it at its new location.
Thus, export it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
2014-01-13 19:56:25 +01:00
Peter Zijlstra
8cb75e0c4e sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
With various drivers wanting to inject idle time; we get people
calling idle routines outside of the idle loop proper.

Therefore we need to be extra careful about not missing
TIF_NEED_RESCHED -> PREEMPT_NEED_RESCHED propagations.

While looking at this, I also realized there's a small window in the
existing idle loop where we can miss TIF_NEED_RESCHED; when it hits
right after the tif_need_resched() test at the end of the loop but
right before the need_resched() test at the start of the loop.

So move preempt_fold_need_resched() out of the loop where we're
guaranteed to have TIF_NEED_RESCHED set.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-x9jgh45oeayzajz2mjt0y7d6@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 17:38:55 +01:00
Ingo Molnar
c9c8986847 Merge branch 'x86/idle' into sched/core
Merge these x86 specific bits - we are going to add generic bits as well.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 17:37:05 +01:00
Peter Zijlstra
10b033d434 sched/clock, x86: Avoid a runtime condition in native_sched_clock()
Use a static_key to avoid touching tsc_disabled and a runtime
condition in native_sched_clock() -- less cachelines touched is always
better.

                        MAINLINE   PRE       POST

    sched_clock_stable: 1          1         1
    (cold) sched_clock: 329841     215295    213039
    (cold) local_clock: 301773     220773    216084
    (warm) sched_clock: 38375      25659     25231
    (warm) local_clock: 100371     27242     27601
    (warm) rdtsc:       27340      24208     24203
    sched_clock_stable: 0          0         0
    (cold) sched_clock: 382634     237019    240055
    (cold) local_clock: 396890     294819    299942
    (warm) sched_clock: 38194      25609     25276
    (warm) local_clock: 143452     71232     73232
    (warm) rdtsc:       27345      24243     24244

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-hrz87bo37qke25bty6pnfy4b@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 15:13:17 +01:00
Peter Zijlstra
35af99e646 sched/clock, x86: Use a static_key for sched_clock_stable
In order to avoid the runtime condition and variable load turn
sched_clock_stable into a static_key.

Also provide a shorter implementation of local_clock() and
cpu_clock(int) when sched_clock_stable==1.

                        MAINLINE   PRE       POST

    sched_clock_stable: 1          1         1
    (cold) sched_clock: 329841     221876    215295
    (cold) local_clock: 301773     234692    220773
    (warm) sched_clock: 38375      25602     25659
    (warm) local_clock: 100371     33265     27242
    (warm) rdtsc:       27340      24214     24208
    sched_clock_stable: 0          0         0
    (cold) sched_clock: 382634     235941    237019
    (cold) local_clock: 396890     297017    294819
    (warm) sched_clock: 38194      25233     25609
    (warm) local_clock: 143452     71234     71232
    (warm) rdtsc:       27345      24245     24243

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 15:13:13 +01:00
Peter Zijlstra
20d1c86a57 sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs
Use a ring-buffer like multi-version object structure which allows
always having a coherent object; we use this to avoid having to
disable IRQs while reading sched_clock() and avoids a problem when
getting an NMI while changing the cyc2ns data.

                        MAINLINE   PRE        POST

    sched_clock_stable: 1          1          1
    (cold) sched_clock: 329841     331312     257223
    (cold) local_clock: 301773     310296     309889
    (warm) sched_clock: 38375      38247      25280
    (warm) local_clock: 100371     102713     85268
    (warm) rdtsc:       27340      27289      24247
    sched_clock_stable: 0          0          0
    (cold) sched_clock: 382634     372706     301224
    (cold) local_clock: 396890     399275     399870
    (warm) sched_clock: 38194      38124      25630
    (warm) local_clock: 143452     148698     129629
    (warm) rdtsc:       27345      27365      24307

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-s567in1e5ekq2nlyhn8f987r@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 15:13:06 +01:00
Prarit Bhargava
c7a730fa46 x86/irq: Fix kbuild warning in smp_irq_move_cleanup_interrupt()
Fengguang Wu's 0day kernel build service reported the following build warning:

  arch/x86/kernel/apic/io_apic.c:2211
  smp_irq_move_cleanup_interrupt() warn: always true condition '(irq <= -1) => (0-u32max <= (-1))'

because irq is defined as an unsigned int instead of an int.
Fix this trivial error by redefining irq as a signed int.  The
remaining consumers of the int are okay.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Link: http://lkml.kernel.org/r/1389620420-7110-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 15:08:37 +01:00
Peter Zijlstra
57c67da274 sched/clock, x86: Move some cyc2ns() code around
There are no __cycles_2_ns() users outside of arch/x86/kernel/tsc.c,
so move it there.

There are no cycles_2_ns() users.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-01lslnavfgo3kmbo4532zlcj@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 13:47:39 +01:00
Peter Zijlstra
5dd12c2152 sched/clock, x86: Use mul_u64_u32_shr() for native_sched_clock()
Use mul_u64_u32_shr() so that x86_64 can use a single 64x64->128 mul.

Before:

0000000000000560 <native_sched_clock>:
 560:   44 8b 1d 00 00 00 00    mov    0x0(%rip),%r11d        # 567 <native_sched_clock+0x7>
 567:   55                      push   %rbp
 568:   48 89 e5                mov    %rsp,%rbp
 56b:   45 85 db                test   %r11d,%r11d
 56e:   75 4f                   jne    5bf <native_sched_clock+0x5f>
 570:   0f 31                   rdtsc
 572:   89 c0                   mov    %eax,%eax
 574:   48 c1 e2 20             shl    $0x20,%rdx
 578:   48 c7 c1 00 00 00 00    mov    $0x0,%rcx
 57f:   48 09 c2                or     %rax,%rdx
 582:   48 c7 c7 00 00 00 00    mov    $0x0,%rdi
 589:   65 8b 04 25 00 00 00    mov    %gs:0x0,%eax
 590:   00
 591:   48 98                   cltq
 593:   48 8b 34 c5 00 00 00    mov    0x0(,%rax,8),%rsi
 59a:   00
 59b:   48 89 d0                mov    %rdx,%rax
 59e:   81 e2 ff 03 00 00       and    $0x3ff,%edx
 5a4:   48 c1 e8 0a             shr    $0xa,%rax
 5a8:   48 0f af 14 0e          imul   (%rsi,%rcx,1),%rdx
 5ad:   48 0f af 04 0e          imul   (%rsi,%rcx,1),%rax
 5b2:   5d                      pop    %rbp
 5b3:   48 03 04 3e             add    (%rsi,%rdi,1),%rax
 5b7:   48 c1 ea 0a             shr    $0xa,%rdx
 5bb:   48 01 d0                add    %rdx,%rax
 5be:   c3                      retq

After:

0000000000000550 <native_sched_clock>:
 550:   8b 3d 00 00 00 00       mov    0x0(%rip),%edi        # 556 <native_sched_clock+0x6>
 556:   55                      push   %rbp
 557:   48 89 e5                mov    %rsp,%rbp
 55a:   48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
 55e:   85 ff                   test   %edi,%edi
 560:   75 2c                   jne    58e <native_sched_clock+0x3e>
 562:   0f 31                   rdtsc
 564:   89 c0                   mov    %eax,%eax
 566:   48 c1 e2 20             shl    $0x20,%rdx
 56a:   48 09 c2                or     %rax,%rdx
 56d:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
 574:   00 00
 576:   89 c0                   mov    %eax,%eax
 578:   48 f7 e2                mul    %rdx
 57b:   65 48 8b 0c 25 00 00    mov    %gs:0x0,%rcx
 582:   00 00
 584:   c9                      leaveq
 585:   48 0f ac d0 0a          shrd   $0xa,%rdx,%rax
 58a:   48 01 c8                add    %rcx,%rax
 58d:   c3                      retq

                        MAINLINE   POST

    sched_clock_stable: 1	   1
    (cold) sched_clock: 329841     331312
    (cold) local_clock: 301773     310296
    (warm) sched_clock: 38375      38247
    (warm) local_clock: 100371     102713
    (warm) rdtsc:       27340      27289
    sched_clock_stable: 0          0
    (cold) sched_clock: 382634     372706
    (cold) local_clock: 396890     399275
    (warm) sched_clock: 38194      38124
    (warm) local_clock: 143452     148698
    (warm) rdtsc:       27345      27365

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-piu203ses5y1g36bnyw2n16x@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 13:47:38 +01:00
Dario Faggioli
d50dde5a10 sched: Add new scheduler syscalls to support an extended scheduling parameters ABI
Add the syscalls needed for supporting scheduling algorithms
with extended scheduling parameters (e.g., SCHED_DEADLINE).

In general, it makes possible to specify a periodic/sporadic task,
that executes for a given amount of runtime at each instance, and is
scheduled according to the urgency of their own timing constraints,
i.e.:

 - a (maximum/typical) instance execution time,
 - a minimum interval between consecutive instances,
 - a time constraint by which each instance must be completed.

Thus, both the data structure that holds the scheduling parameters of
the tasks and the system calls dealing with it must be extended.
Unfortunately, modifying the existing struct sched_param would break
the ABI and result in potentially serious compatibility issues with
legacy binaries.

For these reasons, this patch:

 - defines the new struct sched_attr, containing all the fields
   that are necessary for specifying a task in the computational
   model described above;

 - defines and implements the new scheduling related syscalls that
   manipulate it, i.e., sched_setattr() and sched_getattr().

Syscalls are introduced for x86 (32 and 64 bits) and ARM only, as a
proof of concept and for developing and testing purposes. Making them
available on other architectures is straightforward.

Since no "user" for these new parameters is introduced in this patch,
the implementation of the new system calls is just identical to their
already existing counterpart. Future patches that implement scheduling
policies able to exploit the new data structure must also take care of
modifying the sched_*attr() calls accordingly with their own purposes.

Signed-off-by: Dario Faggioli <raistlin@linux.it>
[ Rewrote to use sched_attr. ]
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
[ Removed sched_setscheduler2() for now. ]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-3-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 13:41:04 +01:00
Ingo Molnar
1c62448e39 Linux 3.13-rc8
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJS0miqAAoJEHm+PkMAQRiGbfgIAJSWEfo8ludknhPcHJabBtxu
 75SQAKJlL3sBVnxEc58Rtt8gsKYQIrm4IY5Slunklsn04RxuDUIQMgFoAYR5gQwz
 +Myqkw/HOqDe5VStGxtLYpWnfglxVwGDCd7ISfL9AOVy5adMWBxh4Tv+qqQc7aIZ
 eF7dy+DD+C6Q3Z5OoV8s0FZDxse29vOf17Nki7+7t8WMqyegYwjoOqNeqocGKsPi
 eHLrJgTl4T6jB4l9LKKC154DSKjKOTSwZMWgwK8mToyNLT/ufCiKgXloIjEvZZcY
 VVKUtncdHiTf+iqVojgpGBzOEeB5DM83iiapFeDiJg8C9yBzvT8lBtA9aPb5Wgw=
 =lEeV
 -----END PGP SIGNATURE-----

Merge tag 'v3.13-rc8' into core/locking

Refresh the tree with the latest fixes, before applying new changes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 11:44:41 +01:00
Rafael J. Wysocki
3e7cc142c1 Merge branch 'acpica'
* acpica: (21 commits)
  ACPICA: Update version to 20131218.
  ACPICA: Utilities: Cleanup declarations of the acpi_gbl_debug_file global.
  ACPICA: Linuxize: Cleanup spaces after special macro invocations.
  ACPICA: Interpreter: Add additional debug info for an error case.
  ACPICA: Update ACPI example code to make it an actual working program.
  ACPICA: Add an error message if the Debugger fails initialization.
  ACPICA: Conditionally define a local variable that is used for debug only.
  ACPICA: Parser: Updates/fixes for debug output.
  ACPICA: Enhance ACPI warning for memory/IO address conflicts.
  ACPICA: Update several debug statements - no functional change.
  ACPICA: Improve exception handling for GPE block installation.
  ACPICA: Add helper macros to extract bus/segment numbers from HEST table.
  ACPICA: Tables: Add full support for the PCCT table, update table definition.
  ACPICA: Tables: Add full support for the DBG2 table.
  ACPICA: Add option to favor 32-bit FADT addresses.
  ACPICA: Cleanup the option of forcing the use of the RSDT.
  ACPICA: Back port and refine validation of the XSDT root table.
  ACPICA: Linux Header: Remove unused OSL prototypes.
  ACPICA: Remove unused ACPI_FREE_BUFFER macro. No functional change.
  ACPICA: Disassembler: Improve pathname support for emitted External() statements.
  ...
2014-01-12 23:45:43 +01:00
Rafael J. Wysocki
98feb7cc61 Merge branch 'acpi-cleanup'
* acpi-cleanup: (22 commits)
  ACPI / tables: Return proper error codes from acpi_table_parse() and fix comment.
  ACPI / tables: Check if id is NULL in acpi_table_parse()
  ACPI / proc: Include appropriate header file in proc.c
  ACPI / EC: Remove unused functions and add prototype declaration in internal.h
  ACPI / dock: Include appropriate header file in dock.c
  ACPI / PCI: Include appropriate header file in pci_link.c
  ACPI / PCI: Include appropriate header file in pci_slot.c
  ACPI / EC: Mark the function acpi_ec_add_debugfs() as static in ec_sys.c
  ACPI / NVS: Include appropriate header file in nvs.c
  ACPI / OSL: Mark the function acpi_table_checksum() as static
  ACPI / processor: initialize a variable to silence compiler warning
  ACPI / processor: use ACPI_COMPANION() to get ACPI device
  ACPI: correct minor typos
  ACPI / sleep: Drop redundant acpi_disabled check
  ACPI / dock: Drop redundant acpi_disabled check
  ACPI / table: Replace '1' with specific error return values
  ACPI: remove trailing whitespace
  ACPI / IBFT: Fix incorrect <acpi/acpi.h> inclusion in iSCSI boot firmware module
  ACPI / i915: Fix incorrect <acpi/acpi.h> inclusions via <linux/acpi_io.h>
  SFI / ACPI: Fix warnings reported during builds with W=1
  ...

Conflicts:
	drivers/acpi/nvs.c
	drivers/hwmon/asus_atk0110.c
2014-01-12 23:44:09 +01:00
Ingo Molnar
b769e014f3 SCI reporting for other error types not only correctable ones
+ APEI GHES cleanups
 + mce timer fix
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJS0qbXAAoJEBLB8Bhh3lVK0G4P/1fjGcBzCiC4qp0vUzhdvu1d
 CXVTtF20fdGP+hgwVU/1DLuIeUhyBLFM57rPgz0FxfvPo/bTYq92Lw9sElQpiVad
 trIRpw3bemjDY/8E91vR94SqLKTddoJifldK5ZTUpRJN9up06MwLky4IurdL2ixE
 QLaI20ZQIDjxe+pmh4wHtyV5qPdtkiXY2ICcdTXwAW7RdiG7pXXd5gLljL4oFUMF
 bp0QzM074szkDvhBgZg6KTAy1zfNzZVM9hUBOAesi1tfXqkT7pCEbUkYKZ6QFNVV
 haQNt+RtkVvYcFcZK+9TkRM32KHSy1MVuicv2W6eNvRDtauGFtSEwBBuvyq3Imjb
 PTY6Vd5g/hR/+o968ieYUYdFua4xaza2wqC3mwL6WuNoWGzzb/f4dk9+wU/x0Khu
 th1NMSwy7VfNqi8pbTTy13LG1yjqqorrMLAEAfsNAMjZPIPsBREFYu8J5kko4ird
 1aEsaENUdkzgoDEUxwO7vQEfAL3RBOC8kr4SrBGd5jQtbN4SYRyP0bYVIsQL3Psz
 fo0H+9sVLwkSxcx7MM0bboBcv0NYazvTS5aIiivSdwMq0uxDsRWRmUCr70AlJDp9
 h7EfyovynCpfOcAOIz212A9bRzBcfW46KnQm6xkcgVdKKhEbc5EWhx6Nz3kCMNQZ
 Or1B6ZtUe7Acdg2kRTRL
 =+eFf
 -----END PGP SIGNATURE-----

Merge tag 'ras_for_3.14_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull RAS updates from Borislav Petkov:

 " SCI reporting for other error types not only correctable ones
   + APEI GHES cleanups
   + mce timer fix
 "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12 17:56:54 +01:00
Ingo Molnar
da4540757d Linux 3.13-rc8
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJS0miqAAoJEHm+PkMAQRiGbfgIAJSWEfo8ludknhPcHJabBtxu
 75SQAKJlL3sBVnxEc58Rtt8gsKYQIrm4IY5Slunklsn04RxuDUIQMgFoAYR5gQwz
 +Myqkw/HOqDe5VStGxtLYpWnfglxVwGDCd7ISfL9AOVy5adMWBxh4Tv+qqQc7aIZ
 eF7dy+DD+C6Q3Z5OoV8s0FZDxse29vOf17Nki7+7t8WMqyegYwjoOqNeqocGKsPi
 eHLrJgTl4T6jB4l9LKKC154DSKjKOTSwZMWgwK8mToyNLT/ufCiKgXloIjEvZZcY
 VVKUtncdHiTf+iqVojgpGBzOEeB5DM83iiapFeDiJg8C9yBzvT8lBtA9aPb5Wgw=
 =lEeV
 -----END PGP SIGNATURE-----

Merge tag 'v3.13-rc8' into x86/ras, to pick up fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12 17:56:29 +01:00
Richard Weinberger
106553aa4e um, x86: Fix vDSO build
Commit 663b55b9b39 ("x86: Delete non-required instances of include <linux/init.h>")
broke the UML build.

  arch/x86/um/vdso/vdso.S: Assembler messages:
  arch/x86/um/vdso/vdso.S:2: Error: no such instruction: `__initdata'
  arch/x86/um/vdso/vdso.S:9: Error: no such instruction: `__finit'

UML's vDSO needs linux/init.h.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/1389538341-31383-1-git-send-email-richard@nod.at
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12 16:47:31 +01:00
Borislav Petkov
4f75d84127 x86, mce: Fix mce_start_timer semantics
So mce_start_timer() has a 'cpu' argument which is supposed to mean to
start a timer on that cpu. However, the code currently starts a timer on
the *current* cpu the function runs on and causes the sanity-check in
mce_timer_fn to fire:

	WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/mcheck/mce.c:1286 mce_timer_fn

because it is running on the wrong cpu.

This was triggered by Prarit Bhargava <prarit@redhat.com> by offlining
all the cpus in succession.

Then, we were fiddling with the CMCI storm settings when starting the
timer whereas there's no need for that - if there's storm happening
on this newly restarted cpu, we're going to be in normal CMCI mode
initially and then when the CMCI interrupt starts firing, we're going to
go to the polling mode with the timer real soon.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Reviewed-by: Chen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1387722156-5511-1-git-send-email-prarit@redhat.com
2014-01-12 15:22:25 +01:00
Prarit Bhargava
9345005f4e x86/irq: Fix do_IRQ() interrupt warning for cpu hotplug retriggered irqs
During heavy CPU-hotplug operations the following spurious kernel warnings
can trigger:

  do_IRQ: No ... irq handler for vector (irq -1)

  [ See: https://bugzilla.kernel.org/show_bug.cgi?id=64831 ]

When downing a cpu it is possible that there are unhandled irqs
left in the APIC IRR register.  The following code path shows
how the problem can occur:

 1. CPU 5 is to go down.

 2. cpu_disable() on CPU 5 executes with interrupt flag cleared
    by local_irq_save() via stop_machine().

 3. IRQ 12 asserts on CPU 5, setting IRR but not ISR because
    interrupt flag is cleared (CPU unabled to handle the irq)

 4. IRQs are migrated off of CPU 5, and the vectors' irqs are set
    to -1. 5. stop_machine() finishes cpu_disable()

 6. cpu_die() for CPU 5 executes in normal context.

 7. CPU 5 attempts to handle IRQ 12 because the IRR is set for
    IRQ 12.  The code attempts to find the vector's IRQ and cannot
    because it has been set to -1. 8. do_IRQ() warning displays
    warning about CPU 5 IRQ 12.

I added a debug printk to output which CPU & vector was
retriggered and discovered that that we are getting bogus
events.  I see a 100% correlation between this debug printk in
fixup_irqs() and the do_IRQ() warning.

This patchset resolves this by adding definitions for
VECTOR_UNDEFINED(-1) and VECTOR_RETRIGGERED(-2) and modifying
the code to use them.

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=64831
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Rui Wang <rui.y.wang@intel.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Yang Zhang <yang.z.zhang@Intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: janet.morgan@Intel.com
Cc: tony.luck@Intel.com
Cc: ruiv.wang@gmail.com
Link: http://lkml.kernel.org/r/1388938252-16627-1-git-send-email-prarit@redhat.com
[ Cleaned up the code a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12 13:13:02 +01:00
Peter Zijlstra
47933ad41a arch: Introduce smp_load_acquire(), smp_store_release()
A number of situations currently require the heavyweight smp_mb(),
even though there is no need to order prior stores against later
loads.  Many architectures have much cheaper ways to handle these
situations, but the Linux kernel currently has no portable way
to make use of them.

This commit therefore supplies smp_load_acquire() and
smp_store_release() to remedy this situation.  The new
smp_load_acquire() primitive orders the specified load against
any subsequent reads or writes, while the new smp_store_release()
primitive orders the specifed store against any prior reads or
writes.  These primitives allow array-based circular FIFOs to be
implemented without an smp_mb(), and also allow a theoretical
hole in rcu_assign_pointer() to be closed at no additional
expense on most architectures.

In addition, the RCU experience transitioning from explicit
smp_read_barrier_depends() and smp_wmb() to rcu_dereference()
and rcu_assign_pointer(), respectively resulted in substantial
improvements in readability.  It therefore seems likely that
replacing other explicit barriers with smp_load_acquire() and
smp_store_release() will provide similar benefits.  It appears
that roughly half of the explicit barriers in core kernel code
might be so replaced.

[Changelog by PaulMck]

Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Victor Kaplansky <VICTORK@il.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12 10:37:17 +01:00
Stephane Eranian
f228c5b882 perf/x86/intel: Add Intel RAPL PP1 energy counter support
This patch adds support for the Intel RAPL energy counter
PP1 (Power Plane 1).

On client processors, it usually corresponds to the
energy consumption of the builtin graphic card. That
is why the sysfs event is called energy-gpu.

New event:
 - name: power/energy-gpu/
 - code: event=0x4
 - unit: 2^-32 Joules

On processors without graphics, this should count 0.
The patch only enables this event on client processors.

Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: ak@linux.intel.com
Cc: acme@redhat.com
Cc: jolsa@redhat.com
Cc: zheng.z.yan@intel.com
Cc: bp@alien8.de
Cc: vincent.weaver@maine.edu
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1389176153-3128-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12 10:16:08 +01:00
John Stultz
0c3351d451 seqlock: Use raw_ prefix instead of _no_lockdep
Linus disliked the _no_lockdep() naming, so instead
use the more-consistent raw_* prefix to the non-lockdep
enabled seqcount methods.

This also adds raw_ methods for the write operations
as well, which will be utilized in a following patch.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Krzysztof Hałasa <khalasa@piap.pl>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Willy Tarreau <w@1wt.eu>
Link: http://lkml.kernel.org/r/1388704274-5278-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12 10:13:59 +01:00
Linus Torvalds
26bef1318a x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround
Before we do an EMMS in the AMD FXSAVE information leak workaround we
need to clear any pending exceptions, otherwise we trap with a
floating-point exception inside this code.

Reported-by: halfdog <me@halfdog.net>
Tested-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-01-11 19:15:52 -08:00
Bjorn Helgaas
96702be560 Merge branch 'pci/resource' into next
* pci/resource:
  PCI: Allocate 64-bit BARs above 4G when possible
  PCI: Enforce bus address limits in resource allocation
  PCI: Split out bridge window override of minimum allocation address
  agp/ati: Use PCI_COMMAND instead of hard-coded 4
  agp/intel: Use CPU physical address, not bus address, for ioremap()
  agp/intel: Use pci_bus_address() to get GTTADR bus address
  agp/intel: Use pci_bus_address() to get MMADR bus address
  agp/intel: Support 64-bit GMADR
  agp/intel: Rename gtt_bus_addr to gtt_phys_addr
  drm/i915: Rename gtt_bus_addr to gtt_phys_addr
  agp: Use pci_resource_start() to get CPU physical address for BAR
  agp: Support 64-bit APBASE
  PCI: Add pci_bus_address() to get bus address of a BAR
  PCI: Convert pcibios_resource_to_bus() to take a pci_bus, not a pci_dev
  PCI: Change pci_bus_region addresses to dma_addr_t
2014-01-10 14:23:15 -07:00
Konrad Rzeszutek Wilk
54d44eb3c7 xen/pvh: Use 'depend' instead of 'select'.
The usage of 'select' means it will enable the CONFIG
options without checking their dependencies. That meant
we would inadvertently turn on CONFIG_XEN_PVHM while its
core dependency (CONFIG_PCI) was turned off.

This patch fixes the warnings and compile failures:

warning: (XEN_PVH) selects XEN_PVHVM which has unmet direct
dependencies (HYPERVISOR_GUEST && XEN && PCI && X86_LOCAL_APIC)

Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-10 10:45:35 -05:00
Steven Rostedt
1739f09e33 ftrace/x86: Load ftrace_ops in parameter not the variable holding it
Function tracing callbacks expect to have the ftrace_ops that registered it
passed to them, not the address of the variable that holds the ftrace_ops
that registered it.

Use a mov instead of a lea to store the ftrace_ops into the parameter
of the function tracing callback.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20131113152004.459787f9@gandalf.local.home
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.8+
2014-01-09 13:24:29 -08:00
David E. Box
4618441536 arch: x86: New MailBox support driver for Intel SOC's
Current Intel SOC cores use a MailBox Interface (MBI) to provide access to
configuration registers on devices (called units) connected to the system
fabric. This is a support driver that implements access to this interface on
those platforms that can enumerate the device using PCI. Initial support is for
BayTrail, for which port definitons are provided. This is a requirement for
implementing platform specific features (e.g. RAPL driver requires this to
perform platform specific power management using the registers in PUNIT).
Dependant modules should select IOSF_MBI in their respective Kconfig
configuraiton. Serialized access is handled by all exported routines with
spinlocks.

The API includes 3 functions for access to unit registers:

int iosf_mbi_read(u8 port, u8 opcode, u32 offset, u32 *mdr)
int iosf_mbi_write(u8 port, u8 opcode, u32 offset, u32 mdr)
int iosf_mbi_modify(u8 port, u8 opcode, u32 offset, u32 mdr, u32 mask)

port:	indicating the unit being accessed
opcode:	the read or write port specific opcode
offset:	the register offset within the port
mdr:	the register data to be read, written, or modified
mask:	bit locations in mdr to change

Returns nonzero on error

Note: GPU code handles access to the GFX unit. Therefore access to that unit
with this driver is disallowed to avoid conflicts.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1389216471-734-1-git-send-email-david.e.box@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
2014-01-08 14:36:29 -08:00
Marcelo Tosatti
26a865f4aa KVM: VMX: fix use after free of vmx->loaded_vmcs
After free_loaded_vmcs executes, the "loaded_vmcs" structure
is kfreed, and now vmx->loaded_vmcs points to a kfreed area.
Subsequent free_loaded_vmcs then attempts to manipulate
vmx->loaded_vmcs.

Switch the order to avoid the problem.

https://bugzilla.redhat.com/show_bug.cgi?id=1047892

Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-01-08 19:14:08 -02:00
Chen Fan
96893977b8 KVM: x86: Fix debug typo error in lapic
fix the 'vcpi' typos when apic_debug is enabled.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-01-08 19:09:57 -02:00
Zhihui Zhang
2f0a6397dd KVM: VMX: check use I/O bitmap first before unconditional I/O exit
According to Table C-1 of Intel SDM 3C, a VM exit happens on an I/O instruction when
"use I/O bitmaps" VM-execution control was 0 _and_ the "unconditional I/O exiting"
VM-execution control was 1. So we can't just check "unconditional I/O exiting" alone.
This patch was improved by suggestion from Jan Kiszka.

Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Zhihui Zhang <zzhsuny@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-01-08 19:01:40 -02:00
Lv Zheng
fab4610583 ACPICA: Cleanup the option of forcing the use of the RSDT.
This change adds a runtime option that will force ACPICA to use the
RSDT instead of the XSDT. Although the ACPI spec requires that an XSDT
be used instead of the RSDT, the XSDT has been found to be corrupt or
ill-formed on some machines.

This option is already in the Linux kernel.  When it is back ported to
ACPICA, code is re-written to follow ACPICA coding style.  This patch
is the generation of the integration.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-01-08 15:31:36 +01:00
Yinghai Lu
f75b99d5a7 PCI: Enforce bus address limits in resource allocation
When allocating space for 32-bit BARs, we previously limited RESOURCE
addresses so they would fit in 32 bits.  However, the BUS address need not
be the same as the resource address, and it's the bus address that must fit
in the 32-bit BAR.

This patch adds:

  - pci_clip_resource_to_region(), which clips a resource so it contains
    only the range that maps to the specified bus address region, e.g., to
    clip a resource to 32-bit bus addresses, and

  - pci_bus_alloc_from_region(), which allocates space for a resource from
    the specified bus address region,

and changes pci_bus_alloc_resource() to allocate space for 64-bit BARs from
the entire bus address region, and space for 32-bit BARs from only the bus
address region below 4GB.

If we had this window:

  pci_root HWP0002:0a: host bridge window [mem 0xf0180000000-0xf01fedfffff] (bus address [0x80000000-0xfedfffff])

we previously could not put a 32-bit BAR there, because the CPU addresses
don't fit in 32 bits.  This patch fixes this, so we can use this space for
32-bit BARs.

It's also possible (though unlikely) to have resources with 32-bit CPU
addresses but bus addresses above 4GB.  In this case the previous code
would allocate space that a 32-bit BAR could not map.

Remove PCIBIOS_MAX_MEM_32, which is no longer used.

[bhelgaas: reworked starting from http://lkml.kernel.org/r/1386658484-15774-3-git-send-email-yinghai@kernel.org]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-01-07 16:24:33 -07:00
Wei Yongjun
5602aba808 xen/pvh: remove duplicated include from enlighten.c
Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-07 09:59:49 -05:00
Konrad Rzeszutek Wilk
0869a64232 xen/pvh: Fix compile issues with xen_pvh_domain()
Oddly enough it compiles for my ancient compiler but with
the supplied .config it does blow up. Fix is easy enough.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-07 09:59:28 -05:00
Paul Gortmaker
663b55b9b3 x86: Delete non-required instances of include <linux/init.h>
None of these files are actually using any __init type directives
and hence don't need to include <linux/init.h>.  Most are just a
left over from __devinit and __cpuinit removal, or simply due to
code getting copied from one driver to the next.

[ hpa: undid incorrect removal from arch/x86/kernel/head_32.S ]

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/1389054026-12947-1-git-send-email-paul.gortmaker@windriver.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-01-06 21:25:18 -08:00
David S. Miller
56a4342dfe Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
	net/ipv6/ip6_tunnel.c
	net/ipv6/ip6_vti.c

ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.

qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 17:37:45 -05:00
Mukesh Rathor
4e903a20da xen/pvh: Support ParaVirtualized Hardware extensions (v3).
PVH allows PV linux guest to utilize hardware extended capabilities,
such as running MMU updates in a HVM container.

The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
with modifications):

"* the guest uses auto translate:
 - p2m is managed by Xen
 - pagetables are owned by the guest
 - mmu_update hypercall not available
* it uses event callback and not vlapic emulation,
* IDT is native, so set_trap_table hcall is also N/A for a PVH guest.

For a full list of hcalls supported for PVH, see pvh_hypercall64_table
in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
PV guest with auto translate, although it does use hvm_op for setting
callback vector."

Use .ascii and .asciz to define xen feature string. Note, the PVH
string must be in a single line (not multiple lines with \) to keep the
assembler from putting null char after each string before \.
This patch allows it to be configured and enabled.

We also use introduce the 'XEN_ELFNOTE_SUPPORTED_FEATURES' ELF note to
tell the hypervisor that 'hvm_callback_vector' is what the kernel
needs. We can not put it in 'XEN_ELFNOTE_FEATURES' as older hypervisor
parse fields they don't understand as errors and refuse to load
the kernel. This work-around fixes the problem.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-01-06 10:44:24 -05:00
Konrad Rzeszutek Wilk
6926f6d610 xen/pvh: Piggyback on PVHVM for grant driver (v4)
In PVH the shared grant frame is the PFN and not MFN,
hence its mapped via the same code path as HVM.

The allocation of the grant frame is done differently - we
do not use the early platform-pci driver and have an
ioremap area - instead we use balloon memory and stitch
all of the non-contingous pages in a virtualized area.

That means when we call the hypervisor to replace the GMFN
with a XENMAPSPACE_grant_table type, we need to lookup the
old PFN for every iteration instead of assuming a flat
contingous PFN allocation.

Lastly, we only use v1 for grants. This is because PVHVM
is not able to use v2 due to no XENMEM_add_to_physmap
calls on the error status page (see commit
69e8f430e243d657c2053f097efebc2e2cd559f0
 xen/granttable: Disable grant v2 for HVM domains.)

Until that is implemented this workaround has to
be in place.

Also per suggestions by Stefano utilize the PVHVM paths
as they share common functionality.

v2 of this patch moves most of the PVH code out in the
arch/x86/xen/grant-table driver and touches only minimally
the generic driver.

v3, v4: fixes us some of the code due to earlier patches.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-01-06 10:44:21 -05:00
Konrad Rzeszutek Wilk
efaf30a335 xen/grant: Implement an grant frame array struct (v3).
The 'xen_hvm_resume_frames' used to be an 'unsigned long'
and contain the virtual address of the grants. That was OK
for most architectures (PVHVM, ARM) were the grants are contiguous
in memory. That however is not the case for PVH - in which case
we will have to do a lookup for each virtual address for the PFN.

Instead of doing that, lets make it a structure which will contain
the array of PFNs, the virtual address and the count of said PFNs.

Also provide a generic functions: gnttab_setup_auto_xlat_frames and
gnttab_free_auto_xlat_frames to populate said structure with
appropriate values for PVHVM and ARM.

To round it off, change the name from 'xen_hvm_resume_frames' to
a more descriptive one - 'xen_auto_xlat_grant_frames'.

For PVH, in patch "xen/pvh: Piggyback on PVHVM for grant driver"
we will populate the 'xen_auto_xlat_grant_frames' by ourselves.

v2 moves the xen_remap in the gnttab_setup_auto_xlat_frames
and also introduces xen_unmap for gnttab_free_auto_xlat_frames.

Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v3: Based on top of 'asm/xen/page.h: remove redundant semicolon']
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-01-06 10:44:20 -05:00
Mukesh Rathor
2771374d47 xen/pvh: Piggyback on PVHVM for event channels (v2)
PVH is a PV guest with a twist - there are certain things
that work in it like HVM and some like PV. There is
a similar mode - PVHVM where we run in HVM mode with
PV code enabled - and this patch explores that.

The most notable PV interfaces are the XenBus and event channels.

We will piggyback on how the event channel mechanism is
used in PVHVM - that is we want the normal native IRQ mechanism
and we will install a vector (hvm callback) for which we
will call the event channel mechanism.

This means that from a pvops perspective, we can use
native_irq_ops instead of the Xen PV specific. Albeit in the
future we could support pirq_eoi_map. But that is
a feature request that can be shared with PVHVM.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-01-06 10:44:15 -05:00
Mukesh Rathor
9103bb0f82 xen/pvh: Update E820 to work with PVH (v2)
In xen_add_extra_mem() we can skip updating P2M as it's managed
by Xen. PVH maps the entire IO space, but only RAM pages need
to be repopulated.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-01-06 10:44:13 -05:00
Mukesh Rathor
5840c84b16 xen/pvh: Secondary VCPU bringup (non-bootup CPUs)
The VCPU bringup protocol follows the PV with certain twists.
From xen/include/public/arch-x86/xen.h:

Also note that when calling DOMCTL_setvcpucontext and VCPU_initialise
for HVM and PVH guests, not all information in this structure is updated:

 - For HVM guests, the structures read include: fpu_ctxt (if
 VGCT_I387_VALID is set), flags, user_regs, debugreg[*]

 - PVH guests are the same as HVM guests, but additionally use ctrlreg[3] to
 set cr3. All other fields not used should be set to 0.

This is what we do. We piggyback on the 'xen_setup_gdt' - but modify
a bit - we need to call 'load_percpu_segment' so that 'switch_to_new_gdt'
can load per-cpu data-structures. It has no effect on the VCPU0.

We also piggyback on the %rdi register to pass in the CPU number - so
that when we bootup a new CPU, the cpu_bringup_and_idle will have
passed as the first parameter the CPU number (via %rdi for 64-bit).

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-06 10:44:12 -05:00
Mukesh Rathor
8d656bbe43 xen/pvh: Load GDT/GS in early PV bootup code for BSP.
During early bootup we start life using the Xen provided
GDT, which means that we are running with %cs segment set
to FLAT_KERNEL_CS (FLAT_RING3_CS64 0xe033, GDT index 261).

But for PVH we want to be use HVM type mechanism for
segment operations. As such we need to switch to the HVM
one and also reload ourselves with the __KERNEL_CS:eip
to run in the proper GDT and segment.

For HVM this is usually done in 'secondary_startup_64' in
(head_64.S) but since we are not taking that bootup
path (we start in PV - xen_start_kernel) we need to do
that in the early PV bootup paths.

For good measure we also zero out the %fs, %ds, and %es
(not strictly needed as Xen has already cleared them
for us). The %gs is loaded by 'switch_to_new_gdt'.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
2014-01-06 10:44:10 -05:00
Mukesh Rathor
4dd322bc3b xen/pvh: Setup up shared_info.
For PVHVM the shared_info structure is provided via the same way
as for normal PV guests (see include/xen/interface/xen.h).

That is during bootup we get 'xen_start_info' via the %esi register
in startup_xen. Then later we extract the 'shared_info' from said
structure (in xen_setup_shared_info) and start using it.

The 'xen_setup_shared_info' is all setup to work with auto-xlat
guests, but there are two functions which it calls that are not:
xen_setup_mfn_list_list and xen_setup_vcpu_info_placement.
This patch modifies the P2M code (xen_setup_mfn_list_list)
while the "Piggyback on PVHVM for event channels" modifies
the xen_setup_vcpu_info_placement.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-06 10:44:09 -05:00
Mukesh Rathor
76bcceff0b xen/pvh/mmu: Use PV TLB instead of native.
We also optimize one - the TLB flush. The native operation would
needlessly IPI offline VCPUs causing extra wakeups. Using the
Xen one avoids that and lets the hypervisor determine which
VCPU needs the TLB flush.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-06 10:44:07 -05:00