This patch does cleanup on all intel mid platform code that uses
gpio_get_by_name() function. From now on they should check for any error
code instead of only hardcoded -1.
There are no functional changes from this change.
Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Link: http://lkml.kernel.org/r/1389913624-9149-3-git-send-email-david.a.cohen@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
When Intel MID finds a match between SFI table from FW and registered
SFI devices, it will always register a device regardless the platform
code was successful or not.
This patch adds an extra option for platform code to return error code
and abort device registration on SFI table parsing.
This patch does not contain any functional changes for current intel
mid platform code.
Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Link: http://lkml.kernel.org/r/1389913624-9149-2-git-send-email-david.a.cohen@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
If we aren't going to use the local APIC anyway, we obviously don't
care about its timer frequency.
Link: http://lkml.kernel.org/r/tip-rgm7xmg7k6qnjlw3ynkcjsmh@git.kernel.org
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Bin Gao <bin.gao@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
On AMD family 10h we see following error messages while waking up from
S3 for all non-boot CPUs leading to a failed IBS initialization:
Enabling non-boot CPUs ...
smpboot: Booting Node 0 Processor 1 APIC 0x1
[Firmware Bug]: cpu 1, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu
perf: IBS APIC setup failed on cpu #1
process: Switch to broadcast mode on CPU1
CPU1 is up
...
ACPI: Waking up from system sleep state S3
Reason for this is that during suspend the LVT offset for the IBS
vector gets lost and needs to be reinialized while resuming.
The offset is read from the IBSCTL msr. On family 10h the offset needs
to be 1 as offset 0 is used for the MCE threshold interrupt, but
firmware assings it for IBS to 0 too. The kernel needs to reprogram
the vector. The msr is a readonly node msr, but a new value can be
written via pci config space access. The reinitialization is
implemented for family 10h in setup_ibs_ctl() which is forced during
IBS setup.
This patch fixes IBS setup after waking up from S3 by adding
resume/supend hooks for the boot cpu which does the offset
reinitialization.
Marking it as stable to let distros pick up this fix.
Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org> v3.2..
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1389797849-5565-1-git-send-email-rric.net@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Waiman managed to trigger a PMI while in a emulate_vsyscall() fault,
the PMI in turn managed to trigger a fault while obtaining a stack
trace. This triggered the sig_on_uaccess_error recursive fault logic
and killed the process dead.
Fix this by explicitly excluding interrupts from the recursive fault
logic.
Reported-and-Tested-by: Waiman Long <waiman.long@hp.com>
Fixes: e00b12e64be9 ("perf/x86: Further optimize copy_from_user_nmi()")
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140110200603.GJ7572@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On SoCs that have the calibration MSRs available, either there is no
PIT, HPET or PMTIMER to calibrate against, or the PIT/HPET/PMTIMER is
driven from the same clock as the TSC, so calibration is redundant and
just slows down the boot.
TSC rate is caculated by this formula:
<maximum core-clock to bus-clock ratio> * <maximum resolved frequency>
The ratio and the resolved frequency ID can be obtained from MSR.
See Intel 64 and IA-32 System Programming Guid section 16.12 and 30.11.5
for details.
Signed-off-by: Bin Gao <bin.gao@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-rgm7xmg7k6qnjlw3ynkcjsmh@git.kernel.org
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791
When a cpu is downed on a system, the irqs on the cpu are assigned to
other cpus. It is possible, however, that when a cpu is downed there
aren't enough free vectors on the remaining cpus to account for the
vectors from the cpu that is being downed.
This results in an interesting "overflow" condition where irqs are
"assigned" to a CPU but are not handled.
For example, when downing cpus on a 1-64 logical processor system:
<snip>
[ 232.021745] smpboot: CPU 61 is now offline
[ 238.480275] smpboot: CPU 62 is now offline
[ 245.991080] ------------[ cut here ]------------
[ 245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x246/0x250()
[ 246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out
[ 246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas scsi_transport_sas
[ 246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14
[ 246.044451] Hardware name: Intel Corporation S4600LH ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
[ 246.057371] 0000000000000009 ffff88081fa03d40 ffffffff8164fbf6 ffff88081fa0ee48
[ 246.065728] ffff88081fa03d90 ffff88081fa03d80 ffffffff81054ecc ffff88081fa13040
[ 246.074073] 0000000000000000 ffff88200cce0000 0000000000000040 0000000000000000
[ 246.082430] Call Trace:
[ 246.085174] <IRQ> [<ffffffff8164fbf6>] dump_stack+0x46/0x58
[ 246.091633] [<ffffffff81054ecc>] warn_slowpath_common+0x8c/0xc0
[ 246.098352] [<ffffffff81054fb6>] warn_slowpath_fmt+0x46/0x50
[ 246.104786] [<ffffffff815710d6>] dev_watchdog+0x246/0x250
[ 246.110923] [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
[ 246.119097] [<ffffffff8106092a>] call_timer_fn+0x3a/0x110
[ 246.125224] [<ffffffff8106280f>] ? update_process_times+0x6f/0x80
[ 246.132137] [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
[ 246.140308] [<ffffffff81061db0>] run_timer_softirq+0x1f0/0x2a0
[ 246.146933] [<ffffffff81059a80>] __do_softirq+0xe0/0x220
[ 246.152976] [<ffffffff8165fedc>] call_softirq+0x1c/0x30
[ 246.158920] [<ffffffff810045f5>] do_softirq+0x55/0x90
[ 246.164670] [<ffffffff81059d35>] irq_exit+0xa5/0xb0
[ 246.170227] [<ffffffff8166062a>] smp_apic_timer_interrupt+0x4a/0x60
[ 246.177324] [<ffffffff8165f40a>] apic_timer_interrupt+0x6a/0x70
[ 246.184041] <EOI> [<ffffffff81505a1b>] ? cpuidle_enter_state+0x5b/0xe0
[ 246.191559] [<ffffffff81505a17>] ? cpuidle_enter_state+0x57/0xe0
[ 246.198374] [<ffffffff81505b5d>] cpuidle_idle_call+0xbd/0x200
[ 246.204900] [<ffffffff8100b7ae>] arch_cpu_idle+0xe/0x30
[ 246.210846] [<ffffffff810a47b0>] cpu_startup_entry+0xd0/0x250
[ 246.217371] [<ffffffff81646b47>] rest_init+0x77/0x80
[ 246.223028] [<ffffffff81d09e8e>] start_kernel+0x3ee/0x3fb
[ 246.229165] [<ffffffff81d0989f>] ? repair_env_string+0x5e/0x5e
[ 246.235787] [<ffffffff81d095a5>] x86_64_start_reservations+0x2a/0x2c
[ 246.242990] [<ffffffff81d0969f>] x86_64_start_kernel+0xf8/0xfc
[ 246.249610] ---[ end trace fb74fdef54d79039 ]---
[ 246.254807] ixgbe 0000:c2:00.0 p786p1: initiating reset due to tx timeout
[ 246.262489] ixgbe 0000:c2:00.0 p786p1: Reset adapter
Last login: Mon Nov 11 08:35:14 from 10.18.17.119
[root@(none) ~]# [ 246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
[ 249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[ 246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
[ 249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
(last lines keep repeating. ixgbe driver is dead until module reload.)
If the downed cpu has more vectors than are free on the remaining cpus on the
system, it is possible that some vectors are "orphaned" even though they are
assigned to a cpu. In this case, since the ixgbe driver had a watchdog, the
watchdog fired and notified that something was wrong.
This patch adds a function, check_vectors(), to compare the number of vectors
on the CPU going down and compares it to the number of vectors available on
the system. If there aren't enough vectors for the CPU to go down, an
error is returned and propogated back to userspace.
v2: Do not need to look at percpu irqs
v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest
Priority Mode
v4: Additional changes suggested by Gong Chen.
v5/v6/v7/v8: Updated comment text
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1389613861-3853-1-git-send-email-prarit@redhat.com
Reviewed-by: Gong Chen <gong.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Yang Zhang <yang.z.zhang@Intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Janet Morgan <janet.morgan@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Ruiv Wang <ruiv.wang@gmail.com>
Cc: Gong Chen <gong.chen@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
Pull locking fixes from Ingo Molnar:
"Two fixes from lockdep coverage of seqlocks, which fix deadlocks on
lockdep-enabled ARM systems"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched_clock: Disable seqlock lockdep usage in sched_clock()
seqlock: Use raw_ prefix instead of _no_lockdep
At first Jakub Zawadzki noticed that some divisions by reciprocal_divide
were not correct. (off by one in some cases)
http://www.wireshark.org/~darkjames/reciprocal-buggy.c
He could also show this with BPF:
http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
The reciprocal divide in linux kernel is not generic enough,
lets remove its use in BPF, as it is not worth the pain with
current cpus.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Mircea Gherzan <mgherzan@gmail.com>
Cc: Daniel Borkmann <dxchgb@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Matt Evans <matt@ozlabs.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We want to support all Intel MID (Mobile Internet Device) platforms
with a single config selection. This patch removes deprecated
CONFIG_X86_MDFLD and X86_WANT_INTEL_MID options in favor of having
CONFIG_X86_INTEL_MID only.
Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Link: http://lkml.kernel.org/r/1387244246-20714-1-git-send-email-david.a.cohen@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This code was partially based on Mark Brown's previous work.
Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Link: http://lkml.kernel.org/r/1387224459-25746-4-git-send-email-david.a.cohen@linux.intel.com
Signed-off-by: Fei Yang <fei.yang@intel.com>
Cc: Mark F. Brown <mark.f.brown@intel.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This patch adds Clovertrail support on intel-mid and makes it more
flexible to support other SoCs.
Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Link: http://lkml.kernel.org/r/1387224459-25746-3-git-send-email-david.a.cohen@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
In order make the driver more portable and support other Intel MID
(Mobile Internet Device) platforms we need to move Medfield code from
intel-mid.c core to its own mfld.c file.
This patch contains no functional changes.
Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Link: http://lkml.kernel.org/r/1387224459-25746-2-git-send-email-david.a.cohen@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Make disabled_cpu_apicid static and read_mostly, and fix a couple of
typos.
Reported-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/20140115182511.GA22737@gmail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Add disable_cpu_apicid kernel parameter. To use this kernel parameter,
specify an initial APIC ID of the corresponding CPU you want to
disable.
This is mostly used for the kdump 2nd kernel to disable BSP to wake up
multiple CPUs without causing system reset or hang due to sending INIT
from AP to BSP.
Kdump users first figure out initial APIC ID of the BSP, CPU0 in the
1st kernel, for example from /proc/cpuinfo and then set up this kernel
parameter for the 2nd kernel using the obtained APIC ID.
However, doing this procedure at each boot time manually is awkward,
which should be automatically done by user-land service scripts, for
example, kexec-tools on fedora/RHEL distributions.
This design is more flexible than disabling BSP in kernel boot time
automatically in that in kernel boot time we have no choice but
referring to ACPI/MP table to obtain initial APIC ID for BSP, meaning
that the method is not applicable to the systems without such BIOS
tables.
One assumption behind this design is that users get initial APIC ID of
the BSP in still healthy state and so BSP is uniquely kept in
CPU0. Thus, through the kernel parameter, only one initial APIC ID can
be specified.
In a comparison with disabled_cpu_apicid, we use read_apic_id(), not
boot_cpu_physical_apicid, because on some platforms, the variable is
modified to the apicid reported as BSP through MP table and this
function is executed with the temporarily modified
boot_cpu_physical_apicid. As a result, disabled_cpu_apicid kernel
parameter doesn't work well for apicids of APs.
Fixing the wrong handling of boot_cpu_physical_apicid requires some
reviews and tests beyond some platforms and it could take some
time. The fix here is a kind of workaround to focus on the main topic
of this patch.
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/20140115064458.1545.38775.stgit@localhost6.localdomain6
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
After the previous patch from Marcelo, the comment before this write
became obsolete. In fact, the write is unnecessary. The calls to
kvm_write_tsc ultimately result in a master clock update as soon as
all TSCs agree and the master clock is re-enabled. This master
clock update will rewrite tsc_timestamp.
So, together with the comment, delete the dead write too.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To fix a problem related to different resolution of TSC and system clock,
the offset in TSC units is approximated by
delta = vcpu->hv_clock.tsc_timestamp - vcpu->last_guest_tsc
(Guest TSC value at (Guest TSC value at last VM-exit)
the last kvm_guest_time_update
call)
Delta is then later scaled using mult,shift pair found in hv_clock
structure (which is correct against tsc_timestamp in that
structure).
However, if a frequency change is performed between these two points,
this delta is measured using different TSC frequencies, but scaled using
mult,shift pair for one frequency only.
The end result is an incorrect delta.
The bug which this code works around is not the only cause for
clock backwards events. The global accumulator is still
necessary, so remove the max_kernel_ns fix and rely on the
global accumulator for no clock backwards events.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit e66d2ae7c67bd moved the assignment
vcpu->arch.apic_base = value above a condition with
(vcpu->arch.apic_base ^ value), causing that check
to always fail. Use old_value, vcpu->arch.apic_base's
old value, in the condition instead.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Having u32 and struct cpuinfo_x86 * by the same name is not very smart,
although it was ok in this case due to the limited scope of u32 c and it
being used only once in there.
Fix this.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1389786735-16751-1-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Limit PIT timer frequency similarly to the limit applied by
LAPIC timer.
Cc: stable@kernel.org
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rom Freiman <rom@stratoscale.com> notes other code paths vulnerable to
bug fixed by 989c6b34f6a9480e397b.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We rename aesni-intel_avx.S to aesni-intel_avx-x86_64.S to indicate
that it is only used by x86_64 architecture.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This adds the workaround for erratum 793 as a precaution in case not
every BIOS implements it. This addresses CVE-2013-6885.
Erratum text:
[Revision Guide for AMD Family 16h Models 00h-0Fh Processors,
document 51810 Rev. 3.04 November 2013]
793 Specific Combination of Writes to Write Combined Memory Types and
Locked Instructions May Cause Core Hang
Description
Under a highly specific and detailed set of internal timing
conditions, a locked instruction may trigger a timing sequence whereby
the write to a write combined memory type is not flushed, causing the
locked instruction to stall indefinitely.
Potential Effect on System
Processor core hang.
Suggested Workaround
BIOS should set MSR
C001_1020[15] = 1b.
Fix Planned
No fix planned
[ hpa: updated description, fixed typo in MSR name ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20140114230711.GS29865@pd.tnic
Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The help text for RANDOMIZE_BASE_MAX_OFFSET was confusing. This has been
clarified, and updated to be an export-only tunable.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20131210202745.GA2961@www.outflux.net
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Currently we do a read, a dummy write and a final read to fetch
the error code. The value from the final read is taken.
This is not the recommended way and leads to corrupted/lost ESR
values.
Intel(c) 64 and IA-32 Architectures Software Developer's Manual,
Combined Volumes 1, 2ABC, 3ABC, Section 10.5.3 states:
Before attempt to read from the ESR, software should first
write to it. (The value written does not affect the values read
subsequently; only zero may be written in x2APIC mode.) This
write clears any previously logged errors and updates the ESR
with any errors detected since the last write to the ESR.
This write also rearms the APIC error interrupt triggering
mechanism.
This patch removes the first read such that we are conform with
the manual.
On my (very old) Pentium MMX SMP system this patch fixes the
issue that APIC errors:
a) are not always reported and
b) are reported with false error numbers.
Signed-off-by: Richard Weinberger <richard@nod.at>
Cc: seiji.aguchi@hds.com
Cc: rientjes@google.com
Cc: konrad.wilk@oracle.com
Cc: bp@alien8.de
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1389685487-20872-1-git-send-email-richard@nod.at
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We've grown a bunch of microcode loader files all prefixed with
"microcode_". They should be under cpu/ because this is strictly
CPU-related functionality so do that and drop the prefix since they're
in their own directory now which gives that prefix. :)
While at it, drop MICROCODE_INTEL_LIB config item and stash the
functionality under CONFIG_MICROCODE_INTEL as it was its only user.
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
The original idea to use the microcode cache for the APs doesn't pan out
because we do memory allocation there very early and with IRQs disabled
and we don't want to involve GFP_ATOMIC allocations. Not if it can be
helped.
Thus, extend the caching of the BSP patch approach to the APs and
iterate over the ucode in the initrd instead of using the cache. We
still save the relevant patches to it but later, right before we
jettison the initrd.
While at it, fix early ucode loading on 32-bit too.
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
We want to use those in AMD's early loading path too. Also, add a
native_wrmsrl variant.
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
The ramdisk can possibly get relocated if the whole image is not mapped.
And since we're going over it in the microcode loader and fishing out
the relevant microcode patches, we want access it at its new location.
Thus, export it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
With various drivers wanting to inject idle time; we get people
calling idle routines outside of the idle loop proper.
Therefore we need to be extra careful about not missing
TIF_NEED_RESCHED -> PREEMPT_NEED_RESCHED propagations.
While looking at this, I also realized there's a small window in the
existing idle loop where we can miss TIF_NEED_RESCHED; when it hits
right after the tif_need_resched() test at the end of the loop but
right before the need_resched() test at the start of the loop.
So move preempt_fold_need_resched() out of the loop where we're
guaranteed to have TIF_NEED_RESCHED set.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-x9jgh45oeayzajz2mjt0y7d6@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use a ring-buffer like multi-version object structure which allows
always having a coherent object; we use this to avoid having to
disable IRQs while reading sched_clock() and avoids a problem when
getting an NMI while changing the cyc2ns data.
MAINLINE PRE POST
sched_clock_stable: 1 1 1
(cold) sched_clock: 329841 331312 257223
(cold) local_clock: 301773 310296 309889
(warm) sched_clock: 38375 38247 25280
(warm) local_clock: 100371 102713 85268
(warm) rdtsc: 27340 27289 24247
sched_clock_stable: 0 0 0
(cold) sched_clock: 382634 372706 301224
(cold) local_clock: 396890 399275 399870
(warm) sched_clock: 38194 38124 25630
(warm) local_clock: 143452 148698 129629
(warm) rdtsc: 27345 27365 24307
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-s567in1e5ekq2nlyhn8f987r@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fengguang Wu's 0day kernel build service reported the following build warning:
arch/x86/kernel/apic/io_apic.c:2211
smp_irq_move_cleanup_interrupt() warn: always true condition '(irq <= -1) => (0-u32max <= (-1))'
because irq is defined as an unsigned int instead of an int.
Fix this trivial error by redefining irq as a signed int. The
remaining consumers of the int are okay.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Link: http://lkml.kernel.org/r/1389620420-7110-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are no __cycles_2_ns() users outside of arch/x86/kernel/tsc.c,
so move it there.
There are no cycles_2_ns() users.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-01lslnavfgo3kmbo4532zlcj@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add the syscalls needed for supporting scheduling algorithms
with extended scheduling parameters (e.g., SCHED_DEADLINE).
In general, it makes possible to specify a periodic/sporadic task,
that executes for a given amount of runtime at each instance, and is
scheduled according to the urgency of their own timing constraints,
i.e.:
- a (maximum/typical) instance execution time,
- a minimum interval between consecutive instances,
- a time constraint by which each instance must be completed.
Thus, both the data structure that holds the scheduling parameters of
the tasks and the system calls dealing with it must be extended.
Unfortunately, modifying the existing struct sched_param would break
the ABI and result in potentially serious compatibility issues with
legacy binaries.
For these reasons, this patch:
- defines the new struct sched_attr, containing all the fields
that are necessary for specifying a task in the computational
model described above;
- defines and implements the new scheduling related syscalls that
manipulate it, i.e., sched_setattr() and sched_getattr().
Syscalls are introduced for x86 (32 and 64 bits) and ARM only, as a
proof of concept and for developing and testing purposes. Making them
available on other architectures is straightforward.
Since no "user" for these new parameters is introduced in this patch,
the implementation of the new system calls is just identical to their
already existing counterpart. Future patches that implement scheduling
policies able to exploit the new data structure must also take care of
modifying the sched_*attr() calls accordingly with their own purposes.
Signed-off-by: Dario Faggioli <raistlin@linux.it>
[ Rewrote to use sched_attr. ]
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
[ Removed sched_setscheduler2() for now. ]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-3-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJS0miqAAoJEHm+PkMAQRiGbfgIAJSWEfo8ludknhPcHJabBtxu
75SQAKJlL3sBVnxEc58Rtt8gsKYQIrm4IY5Slunklsn04RxuDUIQMgFoAYR5gQwz
+Myqkw/HOqDe5VStGxtLYpWnfglxVwGDCd7ISfL9AOVy5adMWBxh4Tv+qqQc7aIZ
eF7dy+DD+C6Q3Z5OoV8s0FZDxse29vOf17Nki7+7t8WMqyegYwjoOqNeqocGKsPi
eHLrJgTl4T6jB4l9LKKC154DSKjKOTSwZMWgwK8mToyNLT/ufCiKgXloIjEvZZcY
VVKUtncdHiTf+iqVojgpGBzOEeB5DM83iiapFeDiJg8C9yBzvT8lBtA9aPb5Wgw=
=lEeV
-----END PGP SIGNATURE-----
Merge tag 'v3.13-rc8' into core/locking
Refresh the tree with the latest fixes, before applying new changes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
* acpica: (21 commits)
ACPICA: Update version to 20131218.
ACPICA: Utilities: Cleanup declarations of the acpi_gbl_debug_file global.
ACPICA: Linuxize: Cleanup spaces after special macro invocations.
ACPICA: Interpreter: Add additional debug info for an error case.
ACPICA: Update ACPI example code to make it an actual working program.
ACPICA: Add an error message if the Debugger fails initialization.
ACPICA: Conditionally define a local variable that is used for debug only.
ACPICA: Parser: Updates/fixes for debug output.
ACPICA: Enhance ACPI warning for memory/IO address conflicts.
ACPICA: Update several debug statements - no functional change.
ACPICA: Improve exception handling for GPE block installation.
ACPICA: Add helper macros to extract bus/segment numbers from HEST table.
ACPICA: Tables: Add full support for the PCCT table, update table definition.
ACPICA: Tables: Add full support for the DBG2 table.
ACPICA: Add option to favor 32-bit FADT addresses.
ACPICA: Cleanup the option of forcing the use of the RSDT.
ACPICA: Back port and refine validation of the XSDT root table.
ACPICA: Linux Header: Remove unused OSL prototypes.
ACPICA: Remove unused ACPI_FREE_BUFFER macro. No functional change.
ACPICA: Disassembler: Improve pathname support for emitted External() statements.
...
* acpi-cleanup: (22 commits)
ACPI / tables: Return proper error codes from acpi_table_parse() and fix comment.
ACPI / tables: Check if id is NULL in acpi_table_parse()
ACPI / proc: Include appropriate header file in proc.c
ACPI / EC: Remove unused functions and add prototype declaration in internal.h
ACPI / dock: Include appropriate header file in dock.c
ACPI / PCI: Include appropriate header file in pci_link.c
ACPI / PCI: Include appropriate header file in pci_slot.c
ACPI / EC: Mark the function acpi_ec_add_debugfs() as static in ec_sys.c
ACPI / NVS: Include appropriate header file in nvs.c
ACPI / OSL: Mark the function acpi_table_checksum() as static
ACPI / processor: initialize a variable to silence compiler warning
ACPI / processor: use ACPI_COMPANION() to get ACPI device
ACPI: correct minor typos
ACPI / sleep: Drop redundant acpi_disabled check
ACPI / dock: Drop redundant acpi_disabled check
ACPI / table: Replace '1' with specific error return values
ACPI: remove trailing whitespace
ACPI / IBFT: Fix incorrect <acpi/acpi.h> inclusion in iSCSI boot firmware module
ACPI / i915: Fix incorrect <acpi/acpi.h> inclusions via <linux/acpi_io.h>
SFI / ACPI: Fix warnings reported during builds with W=1
...
Conflicts:
drivers/acpi/nvs.c
drivers/hwmon/asus_atk0110.c
So mce_start_timer() has a 'cpu' argument which is supposed to mean to
start a timer on that cpu. However, the code currently starts a timer on
the *current* cpu the function runs on and causes the sanity-check in
mce_timer_fn to fire:
WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/mcheck/mce.c:1286 mce_timer_fn
because it is running on the wrong cpu.
This was triggered by Prarit Bhargava <prarit@redhat.com> by offlining
all the cpus in succession.
Then, we were fiddling with the CMCI storm settings when starting the
timer whereas there's no need for that - if there's storm happening
on this newly restarted cpu, we're going to be in normal CMCI mode
initially and then when the CMCI interrupt starts firing, we're going to
go to the polling mode with the timer real soon.
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Reviewed-by: Chen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1387722156-5511-1-git-send-email-prarit@redhat.com
During heavy CPU-hotplug operations the following spurious kernel warnings
can trigger:
do_IRQ: No ... irq handler for vector (irq -1)
[ See: https://bugzilla.kernel.org/show_bug.cgi?id=64831 ]
When downing a cpu it is possible that there are unhandled irqs
left in the APIC IRR register. The following code path shows
how the problem can occur:
1. CPU 5 is to go down.
2. cpu_disable() on CPU 5 executes with interrupt flag cleared
by local_irq_save() via stop_machine().
3. IRQ 12 asserts on CPU 5, setting IRR but not ISR because
interrupt flag is cleared (CPU unabled to handle the irq)
4. IRQs are migrated off of CPU 5, and the vectors' irqs are set
to -1. 5. stop_machine() finishes cpu_disable()
6. cpu_die() for CPU 5 executes in normal context.
7. CPU 5 attempts to handle IRQ 12 because the IRR is set for
IRQ 12. The code attempts to find the vector's IRQ and cannot
because it has been set to -1. 8. do_IRQ() warning displays
warning about CPU 5 IRQ 12.
I added a debug printk to output which CPU & vector was
retriggered and discovered that that we are getting bogus
events. I see a 100% correlation between this debug printk in
fixup_irqs() and the do_IRQ() warning.
This patchset resolves this by adding definitions for
VECTOR_UNDEFINED(-1) and VECTOR_RETRIGGERED(-2) and modifying
the code to use them.
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=64831
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Rui Wang <rui.y.wang@intel.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Yang Zhang <yang.z.zhang@Intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: janet.morgan@Intel.com
Cc: tony.luck@Intel.com
Cc: ruiv.wang@gmail.com
Link: http://lkml.kernel.org/r/1388938252-16627-1-git-send-email-prarit@redhat.com
[ Cleaned up the code a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>