The iommu pool patch has a bug where it would cause a crash when using
only one pool (based on the size of the DMA window).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, BOOKE watchdog code for checking "wdt" and "wdt_period" is
in setup_32.c, it cannot be used in 64-bit, so move it to a common place
setup-common.c, which will be shared by 32-bit and 64-bit.
Also, replace the simple_strtoul with kstrtol.
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Just like the module loader, ftrace needs to be updated to use r12
instead of r11 with newer gcc's.
Signed-off-by: Roger Blofeld <blofeldus@yahoo.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@vger.kernel.org
If arch_validate_hwbkpt_settings() fails, bp->ctx won't be valid and the
kernel panics. Add a check to fix this.
Reported-by: Edjunior Barbosa Machado <emachado@linux.vnet.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We have a request for a fast method of getting CPU and NUMA node IDs
from userspace. This patch implements a getcpu VDSO function,
similar to x86.
Ben suggested we use SPRG3 which is userspace readable. SPRG3 can be
modified by a KVM guest, so we save the SPRG3 value in the paca and
restore it when transitioning from the guest to the host.
I have a glibc patch that implements sched_getcpu on top of this.
Testing on a POWER7:
baseline: 538 cycles
vdso: 30 cycles
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Purely for cosmetic purposes, otherwise it can appear that we are in
single_step_pSeries() which is slightly confusing.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When I "fixed" the CONFIG_TRACE_IRQFLAGS case on interrupt entry,
I screwed up a little bit with the test for user space vs. kernel.
The code is fine, there's just some dead code around it. I basically
removed the test and always create the added stack frame whether
coming from user or kernel since in any case we do need to save
a bunch of volatile registers or bad things would happen (we can
take page faults in the kernel for example).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
So that we can call it when improving SPE switch like book3e did for fp
switch.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Olivia Yin <hong-hua.yin@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Add the ability to inject IOMMU faults. We enable this per device
via a fail_iommu sysfs property, similar to fault injection on other
subsystems.
An example:
...
0003:01:00.1 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 02)
To inject one error to this device:
echo 1 > /sys/bus/pci/devices/0003:01:00.1/fail_iommu
echo 1 > /sys/kernel/debug/fail_iommu/probability
echo 1 > /sys/kernel/debug/fail_iommu/times
As feared, the first failure injected on the be3 results in an
unrecoverable error, taking down both functions of the card
permanently:
be2net 0003:01:00.1: Unrecoverable error in the card
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The DMA API debug code has hooks to verify all DMA entries have been
freed at time of hot unplug. We need to call dma_debug_add_bus for
this to work.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Similar to PCI, separate the bus probe from device probe. This allows
us to attach bus notifiers for DMA debug and IOMMU fault injection
before devices have been probed.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
During boot we see a number of these warnings:
vio 30000000: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, table unavailable
The reason for this is that we set IOMMU properties for all VIO
devices even if they are not DMA capable.
Only set DMA ops, table and mask for devices with a DMA window.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some macros use RA where when RA=R0 the values is 0, so make this
the enforced mnemonic in the macro.
Idea suggested by Andreas Schwab.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Enforce the use of R0-R31 in macros where possible now we have all the
fixes in.
R0-R31 macros are removed here so that can't be used anymore. They
should not be defined anywhere.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
These macros are using integers where they could be using logical
names since they take registers.
We are going to enforce this soon, so fix these up now.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anything that uses a constructed instruction (ie. from ppc-opcode.h),
need to use the new R0 macro, as %r0 is not going to work.
Also convert usages of macros where we are just determining an offset
(usually for a load/store), like:
std r14,STK_REG(r14)(r1)
Can't use STK_REG(r14) as %r14 doesn't work in the STK_REG macro since
it's just calculating an offset.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There was a typo, checking for CONFIG_TRACE_IRQFLAG instead of
CONFIG_TRACE_IRQFLAGS causing some useful debug code to not be
built
This in turns causes a build error on BookE 64-bit due to incorrect
semicolons at the end of a couple of macros, so let's fix that too
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@vger.kernel.org [v3.4]
Looks like we still have issues with pSeries and Cell idle code
vs. the lazy irq state. In fact, the reset fixes that went upstream
are exposing the problem more by causing BUG_ON() to trigger (which
this patch turns into a WARN_ON instead).
We need to be careful when using a variant of low power state that
has the side effect of turning interrupts back on, to properly set
all the SW & lazy state to look as if everything is enabled before
we enter the low power state with MSR:EE off as we will return with
MSR:EE on. If not, we have a discrepancy of state which can cause
things to go very wrong later on.
This patch moves the logic into a helper and uses it from the
pseries and cell idle code. The power4/970 idle code already got
things right (in assembly even !) so I'm not touching it. The power7
"bare metal" idle code is subtly different and correct. Remains PA6T
and some hypervisor based Cell platforms which have questionable
code in there, but they are mostly dead platforms so I'll fix them
when I manage to get final answers from the respective maintainers
about how the low power state actually works on them.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@vger.kernel.org [v3.4]
The pattern (np ? np->full_name : "<none>") is rather common in the
kernel, but can also make for quite long lines. This patch adds a new
inline function, of_node_full_name() so that the test for a valid node
pointer doesn't need to be open coded at all call sites.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
* pci/myron-pcibios_setup:
xtensa/PCI: factor out pcibios_setup()
x86/PCI: adjust section annotations for pcibios_setup()
unicore32/PCI: adjust section annotations for pcibios_setup()
tile/PCI: factor out pcibios_setup()
sparc/PCI: factor out pcibios_setup()
sh/PCI: adjust section annotations for pcibios_setup()
sh/PCI: factor out pcibios_setup()
powerpc/PCI: factor out pcibios_setup()
parisc/PCI: factor out pcibios_setup()
MIPS/PCI: adjust section annotations for pcibios_setup()
MIPS/PCI: factor out pcibios_setup()
microblaze/PCI: factor out pcibios_setup()
ia64/PCI: factor out pcibios_setup()
cris/PCI: factor out pcibios_setup()
alpha/PCI: factor out pcibios_setup()
PCI: pull pcibios_setup() up into core
The PCI core provides a generic pcibios_setup() routine. Drop this
architecture-specific version in favor of that.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Warning(arch/powerpc/kernel/pci_of_scan.c:210): Excess function parameter 'node' description in 'of_scan_pci_bridge'
Warning(arch/powerpc/kernel/vio.c:636): No description found for parameter 'desired'
Warning(arch/powerpc/kernel/vio.c:636): Excess function parameter 'new_desired' description in 'vio_cmo_set_dev_desired'
Warning(arch/powerpc/kernel/vio.c:1270): No description found for parameter 'viodrv'
Warning(arch/powerpc/kernel/vio.c:1270): Excess function parameter 'drv' description in '__vio_register_driver'
Warning(arch/powerpc/kernel/vio.c:1289): No description found for parameter 'viodrv'
Warning(arch/powerpc/kernel/vio.c:1289): Excess function parameter 'driver' description in 'vio_unregister_driver'
Signed-off-by: Wanpeng Li <liwp@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
At the moment all queues in a multiqueue adapter will serialise
against the IOMMU table lock. This is proving to be a big issue,
especially with 10Gbit ethernet.
This patch creates 4 pools and tries to spread the load across
them. If the table is under 1GB in size we revert back to the
original behaviour of 1 pool and 1 largealloc pool.
We create a hash to map CPUs to pools. Since we prefer interrupts to
be affinitised to primary CPUs, without some form of hashing we are
very likely to end up using the same pool. As an example, POWER7
has 4 way SMT and with 4 pools all primary threads will map to the
same pool.
The largealloc pool is reduced from 1/2 to 1/4 of the space to
partially offset the overhead of breaking the table up into pools.
Some performance numbers were obtained with a Chelsio T3 adapter on
two POWER7 boxes, running a 100 session TCP round robin test.
Performance improved 69% with this patch applied.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In preparation for IOMMU pools, push the spinlock into
iommu_range_alloc and __iommu_free.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch moves tce_free outside of the lock in iommu_free.
Some performance numbers were obtained with a Chelsio T3 adapter on
two POWER7 boxes, running a 100 session TCP round robin test.
Performance improved 25% with this patch applied.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We currently hold the IOMMU spinlock around tce_build and tce_flush.
This causes our spinlock hold times to be much higher than required
and can impact multiqueue adapters.
This patch moves tce_build and tce_flush outside of the lock in
iommu_alloc, and tce_flush outside of the lock in iommu_free.
Some performance numbers were obtained with a Chelsio T3 adapter on
two POWER7 boxes, running a 100 session TCP round robin test.
Performance improved 32% with this patch applied.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
While creating the PCI root bus through function pci_create_root_bus()
of PCI core, it should have assigned the secondary bus number for the
newly created PCI root bus. Thus we needn't do the explicit assignment
for the secondary bus number again in pcibios_scan_phb().
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
mtmsrd is an expensive instruction, we save a few cycles by
doing it once instead of twice.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
1) call_function.lock used in smp_call_function_many() is just to protect
call_function.queue and &data->refs, cpu_online_mask is outside of the
lock. And it's not necessary to protect cpu_online_mask,
because data->cpumask is pre-calculate and even if a cpu is brougt up
when calling arch_send_call_function_ipi_mask(), it's harmless because
validation test in generic_smp_call_function_interrupt() will take care
of it.
2) For cpu down issue, stop_machine() will guarantee that no concurrent
smp_call_fuction() is processing.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch_instruction() interface is made to modify kernel text. It is
safer to use that then the probe_kernel_write() when modifying kernel
code.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
PowerPC does not have the synchronization issues that x86 has with
modifying code on one CPU while another CPU is executing it.
The other CPU will either see the old or new code without any
issues, unlike x86 which may issue a GPF.
Instead of calling the heavy stop_machine, just update the code.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
As I was adding code that affects all archs, I started testing function
tracer against PPC64 and found that it currently locks up with 3.4
kernel. I figured it was due to tracing a function that shouldn't be, so
I went through the following process to bisect to find the culprit:
cat /debug/tracing/available_filter_functions > t
num=`wc -l t`
sed -ne "1,${num}p" t > t1
let num=num+1
sed -ne "${num},$p" t > t2
cat t1 > /debug/tracing/set_ftrace_filter
echo function /debug/tracing/current_tracer
<failed? bisect t1, if not bisect t2>
It finally came down to this function: restore_interrupts()
I'm not sure why this locks up the system. It just seems to prevent
scheduling from occurring. Interrupts seem to still work, as I can ping
the box. But all user processes freeze.
When restore_interrupts() is not traced, function tracing works fine.
Cc: stable@kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patches tries to fix a couple of Section mismatch warnings like
following one:
WARNING: arch/powerpc/kernel/built-in.o(.text+0x2923c): Section mismatch
in reference from the function .prom_query_opal() to the
function .init.text:.call_prom()
The function .prom_query_opal() references
the function __init .call_prom().
This is often because .prom_query_opal lacks a __init
annotation or the annotation of .call_prom is wrong.
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In entry_64.S version of ret_from_except_lite, you'll notice that
in the !preempt case, after we've checked MSR_PR we test for any
TIF flag in _TIF_USER_WORK_MASK to decide whether to go to do_work
or not. However, in the preempt case, we do a convoluted trick to
test SIGPENDING only if PR was set and always test NEED_RESCHED ...
but we forget to test any other bit of _TIF_USER_WORK_MASK !!! So
that means that with preempt, we completely fail to test for things
like single step, syscall tracing, etc...
This should be fixed as the following path:
- Test PR. If not set, go to resume_kernel, else continue.
- If go resume_kernel, to do that original do_work.
- If else, then always test for _TIF_USER_WORK_MASK to decide to do
that original user_work, else restore directly.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add the host bridge bus number aperture to the resource list.
Like the MMIO and I/O port apertures, this is used when assigning
resources to hot-added devices or in the case of conflicts.
[bhelgaas: changelog]
CC: Paul Mackerras <paulus@samba.org>
CC: linuxppc-dev@lists.ozlabs.org
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Replace the struct pci_bus secondary/subordinate members with the
struct resource busn_res. Later we'll build a resource tree of these
bus numbers.
[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This fixes a problem which can causes kernel oopses while loading
a kernel module.
According to the PowerPC EABI specification, GPR r11 is assigned
the dedicated function to point to the previous stack frame.
In the powerpc-specific kernel module loader, do_plt_call()
(in arch/powerpc/kernel/module_32.c), GPR r11 is also used
to generate trampoline code.
This combination crashes the kernel, in the case where the compiler
chooses to use a helper function for saving GPRs on entry, and the
module loader has placed the .init.text section far away from the
.text section, meaning that it has to generate a trampoline for
functions in the .init.text section to call the GPR save helper.
Because the trampoline trashes r11, references to the stack frame
using r11 can cause an oops.
The fix just uses GPR r12 instead of GPR r11 for generating the
trampoline code. According to the statements from Freescale, this is
safe from an EABI perspective.
I've tested the fix for kernel 2.6.33 on MPC8541.
Cc: stable@vger.kernel.org
Signed-off-by: Steffen Rumler <steffen.rumler.ext@nsn.com>
[paulus@samba.org: reworded the description]
Signed-off-by: Paul Mackerras <paulus@samba.org>
This reverts 68568add2c ("powerpc/time: Remove unnecessary sanity check
of decrementer expiration"). We do need to check whether we have reached
the expiration time of the next event, because we sometimes get an early
decrementer interrupt, most notably when we set the decrementer to 1 in
arch_irq_work_raise(). The effect of not having the sanity check is that
if timer_interrupt() gets called early, we leave the decrementer set to
its maximum value, which means we then don't get any more decrementer
interrupts for about 4 seconds (or longer, depending on timebase
frequency). I saw these pauses as a consequence of getting a stray
hypervisor decrementer interrupt left over from exiting a KVM guest.
This isn't quite a straight revert because of changes to the surrounding
code, but it restores the same algorithm as was previously used.
Cc: stable@vger.kernel.org
Acked-by: Anton Blanchard <anton@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Alex says:
"Changes this time include:
- Generalize KVM_GUEST support to overall ePAPR code
- Fix reset for Book3S HV
- Fix machine check deferral when CONFIG_KVM_GUEST=y
- Add support for BookE register DECAR"
* 'for-upstream' of git://github.com/agraf/linux-2.6:
KVM: PPC: Not optimizing MSR_CE and MSR_ME with paravirt.
KVM: PPC: booke: Added DECAR support
KVM: PPC: Book3S HV: Make the guest hash table size configurable
KVM: PPC: Factor out guest epapr initialization
Signed-off-by: Avi Kivity <avi@redhat.com>
Does block_sigmask() + tracehook_signal_handler(); called when
sigframe has been successfully built. All architectures converted
to it; block_sigmask() itself is gone now (merged into this one).
I'm still not too happy with the signature, but that's a separate
story (IMO we need a structure that would contain signal number +
siginfo + k_sigaction, so that get_signal_to_deliver() would fill one,
signal_delivered(), handle_signal() and probably setup...frame() -
take one).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Only 3 out of 63 do not. Renamed the current variant to __set_current_blocked(),
added set_current_blocked() that will exclude unblockable signals, switched
open-coded instances to it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
replace boilerplate "should we use ->saved_sigmask or ->blocked?"
with calls of obvious inlined helper...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
first fruits of ..._restore_sigmask() helpers: now we can take
boilerplate "signal didn't have a handler, clear RESTORE_SIGMASK
and restore the blocked mask from ->saved_mask" into a common
helper. Open-coded instances switched...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull second pile of signal handling patches from Al Viro:
"This one is just task_work_add() series + remaining prereqs for it.
There probably will be another pull request from that tree this
cycle - at least for helpers, to get them out of the way for per-arch
fixes remaining in the tree."
Fix trivial conflict in kernel/irq/manage.c: the merge of Andrew's pile
had brought in commit 97fd75b7b8 ("kernel/irq/manage.c: use the
pr_foo() infrastructure to prefix printks") which changed one of the
pr_err() calls that this merge moves around.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
keys: kill task_struct->replacement_session_keyring
keys: kill the dummy key_replace_session_keyring()
keys: change keyctl_session_to_parent() to use task_work_add()
genirq: reimplement exit_irq_thread() hook via task_work_add()
task_work_add: generic process-context callbacks
avr32: missed _TIF_NOTIFY_RESUME on one of do_notify_resume callers
parisc: need to check NOTIFY_RESUME when exiting from syscall
move key_repace_session_keyring() into tracehook_notify_resume()
TIF_NOTIFY_RESUME is defined on all targets now
If there is pending critical or machine check interrupt then guest
would like to capture it when guest enable MSR.CE and MSR_ME respectively.
Also as mostly MSR_CE and MSR_ME are updated with rfi/rfci/rfmii
which anyway traps so removing the the paravirt optimization for MSR.CE
and MSR.ME.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
epapr paravirtualization support is now a Kconfig
selectable option
Signed-off-by: Liu Yu <yu.liu@freescale.com>
[stuart.yoder@freescale.com: misc minor fixes, description update]
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This is much the same as for SPARC except that we can do the find_zero()
function more efficiently using the count-leading-zeroes instructions.
Tested on 32-bit and 64-bit PowerPC.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull KVM changes from Avi Kivity:
"Changes include additional instruction emulation, page-crossing MMIO,
faster dirty logging, preventing the watchdog from killing a stopped
guest, module autoload, a new MSI ABI, and some minor optimizations
and fixes. Outside x86 we have a small s390 and a very large ppc
update.
Regarding the new (for kvm) rebaseless workflow, some of the patches
that were merged before we switch trees had to be rebased, while
others are true pulls. In either case the signoffs should be correct
now."
Fix up trivial conflicts in Documentation/feature-removal-schedule.txt
arch/powerpc/kvm/book3s_segment.S and arch/x86/include/asm/kvm_para.h.
I suspect the kvm_para.h resolution ends up doing the "do I have cpuid"
check effectively twice (it was done differently in two different
commits), but better safe than sorry ;)
* 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (125 commits)
KVM: make asm-generic/kvm_para.h have an ifdef __KERNEL__ block
KVM: s390: onereg for timer related registers
KVM: s390: epoch difference and TOD programmable field
KVM: s390: KVM_GET/SET_ONEREG for s390
KVM: s390: add capability indicating COW support
KVM: Fix mmu_reload() clash with nested vmx event injection
KVM: MMU: Don't use RCU for lockless shadow walking
KVM: VMX: Optimize %ds, %es reload
KVM: VMX: Fix %ds/%es clobber
KVM: x86 emulator: convert bsf/bsr instructions to emulate_2op_SrcV_nobyte()
KVM: VMX: unlike vmcs on fail path
KVM: PPC: Emulator: clean up SPR reads and writes
KVM: PPC: Emulator: clean up instruction parsing
kvm/powerpc: Add new ioctl to retreive server MMU infos
kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM
KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler
KVM: PPC: Book3S: Enable IRQs during exit handling
KVM: PPC: Fix PR KVM on POWER7 bare metal
KVM: PPC: Fix stbux emulation
KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields
...
Pull first series of signal handling cleanups from Al Viro:
"This is just the first part of the queue (about a half of it);
assorted fixes all over the place in signal handling.
This one ends with all sigsuspend() implementations switched to
generic one (->saved_sigmask-based).
With this, a bunch of assorted old buglets are fixed and most of the
missing bits of NOTIFY_RESUME hookup are in place. Two more fixes sit
in arm and um trees respectively, and there's a couple of broken ones
that need obvious fixes - parisc and avr32 check TIF_NOTIFY_RESUME
only on one of two codepaths; fixes for that will happen in the next
series"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (55 commits)
unicore32: if there's no handler we need to restore sigmask, syscall or no syscall
xtensa: add handling of TIF_NOTIFY_RESUME
microblaze: drop 'oldset' argument of do_notify_resume()
microblaze: handle TIF_NOTIFY_RESUME
score: add handling of NOTIFY_RESUME to do_notify_resume()
m68k: add TIF_NOTIFY_RESUME and handle it.
sparc: kill ancient comment in sparc_sigaction()
h8300: missing checks of __get_user()/__put_user() return values
frv: missing checks of __get_user()/__put_user() return values
cris: missing checks of __get_user()/__put_user() return values
powerpc: missing checks of __get_user()/__put_user() return values
sh: missing checks of __get_user()/__put_user() return values
sparc: missing checks of __get_user()/__put_user() return values
avr32: struct old_sigaction is never used
m32r: struct old_sigaction is never used
xtensa: xtensa_sigaction doesn't exist
alpha: tidy signal delivery up
score: don't open-code force_sigsegv()
cris: don't open-code force_sigsegv()
blackfin: don't open-code force_sigsegv()
...
Pull fpu state cleanups from Ingo Molnar:
"This tree streamlines further aspects of FPU handling by eliminating
the prepare_to_copy() complication and moving that logic to
arch_dup_task_struct().
It also fixes the FPU dumps in threaded core dumps, removes and old
(and now invalid) assumption plus micro-optimizes the exit path by
avoiding an FPU save for dead tasks."
Fixed up trivial add-add conflict in arch/sh/kernel/process.c that came
in because we now do the FPU handling in arch_dup_task_struct() rather
than the legacy (and now gone) prepare_to_copy().
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, fpu: drop the fpu state during thread exit
x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state()
coredump: ensure the fpu state is flushed for proper multi-threaded core dump
fork: move the real prepare_to_copy() users to arch_dup_task_struct()
Pull powerpc updates from Benjamin Herrenschmidt:
"Here are the powerpc goodies for 3.5. Main highlights are:
- Support for the NX crypto engine in Power7+
- A bunch of Anton goodness, including some micro optimization of our
syscall entry on Power7
- I converted a pile of our thermal control drivers to the new i2c
APIs (essentially turning the old therm_pm72 into a proper set of
windfarm drivers). That's one more step toward removing the
deprecated i2c APIs, there's still a few drivers to fix, but we are
getting close
- kexec/kdump support for 47x embedded cores
The big missing thing here is no updates from Freescale. Not sure
what's up here, but with Kumar not working for them anymore things are
a bit in a state of flux in that area."
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (71 commits)
powerpc: Fix irq distribution
Revert "powerpc/hw-breakpoint: Use generic hw-breakpoint interfaces for new PPC ptrace flags"
powerpc: Fixing a cputhread code documentation
powerpc/crypto: Enable the PFO-based encryption device
powerpc/crypto: Build files for the nx device driver
powerpc/crypto: debugfs routines and docs for the nx device driver
powerpc/crypto: SHA512 hash routines for nx encryption
powerpc/crypto: SHA256 hash routines for nx encryption
powerpc/crypto: AES-XCBC mode routines for nx encryption
powerpc/crypto: AES-GCM mode routines for nx encryption
powerpc/crypto: AES-ECB mode routines for nx encryption
powerpc/crypto: AES-CTR mode routines for nx encryption
powerpc/crypto: AES-CCM mode routines for nx encryption
powerpc/crypto: AES-CBC mode routines for nx encryption
powerpc/crypto: nx driver code supporting nx encryption
powerpc/pseries: Enable the PFO-based RNG accelerator
powerpc/pseries/hwrng: PFO-based hwrng driver
powerpc/pseries: Add PFO support to the VIO bus
powerpc/pseries: Add pseries update notifier for OFDT prop changes
powerpc/pseries: Add new hvcall constants to support PFO
...
setting CONFIG_IRQ_ALL_CPUS distributes IRQs to CPUs only when
the number of online CPUs equals NR_CPUS. See commit
280ff97494 "sparc64: fix and
optimize irq distribution" for more details.
Using the online mask fixes IRQ-to-CPU distribution on systems
that boot with less than NR_CPUS.
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This reverts commit 1b788400bb.
It causes oopses when passed incorrect arguments and has a
design fault using IPIs with interrupts disabled.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
guts of saved_sigmask-based sigsuspend/rt_sigsuspend. Takes
kernel sigset_t *.
Open-coded instances replaced with calling it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull security subsystem updates from James Morris:
"New notable features:
- The seccomp work from Will Drewry
- PR_{GET,SET}_NO_NEW_PRIVS from Andy Lutomirski
- Longer security labels for Smack from Casey Schaufler
- Additional ptrace restriction modes for Yama by Kees Cook"
Fix up trivial context conflicts in arch/x86/Kconfig and include/linux/filter.h
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits)
apparmor: fix long path failure due to disconnected path
apparmor: fix profile lookup for unconfined
ima: fix filename hint to reflect script interpreter name
KEYS: Don't check for NULL key pointer in key_validate()
Smack: allow for significantly longer Smack labels v4
gfp flags for security_inode_alloc()?
Smack: recursive tramsmute
Yama: replace capable() with ns_capable()
TOMOYO: Accept manager programs which do not start with / .
KEYS: Add invalidation support
KEYS: Do LRU discard in full keyrings
KEYS: Permit in-place link replacement in keyring list
KEYS: Perform RCU synchronisation on keys prior to key destruction
KEYS: Announce key type (un)registration
KEYS: Reorganise keys Makefile
KEYS: Move the key config into security/keys/Kconfig
KEYS: Use the compat keyctl() syscall wrapper on Sparc64 for Sparc32 compat
Yama: remove an unused variable
samples/seccomp: fix dependencies on arch macros
Yama: add additional ptrace scopes
...
Pull smp hotplug cleanups from Thomas Gleixner:
"This series is merily a cleanup of code copied around in arch/* and
not changing any of the real cpu hotplug horrors yet. I wish I'd had
something more substantial for 3.5, but I underestimated the lurking
horror..."
Fix up trivial conflicts in arch/{arm,sparc,x86}/Kconfig and
arch/sparc/include/asm/thread_info_32.h
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (79 commits)
um: Remove leftover declaration of alloc_task_struct_node()
task_allocator: Use config switches instead of magic defines
sparc: Use common threadinfo allocator
score: Use common threadinfo allocator
sh-use-common-threadinfo-allocator
mn10300: Use common threadinfo allocator
powerpc: Use common threadinfo allocator
mips: Use common threadinfo allocator
hexagon: Use common threadinfo allocator
m32r: Use common threadinfo allocator
frv: Use common threadinfo allocator
cris: Use common threadinfo allocator
x86: Use common threadinfo allocator
c6x: Use common threadinfo allocator
fork: Provide kmemcache based thread_info allocator
tile: Use common threadinfo allocator
fork: Provide weak arch_release_[task_struct|thread_info] functions
fork: Move thread info gfp flags to header
fork: Remove the weak insanity
sh: Remove cpu_idle_wait()
...
Historical prepare_to_copy() is mostly a no-op, duplicated for majority of
the architectures and the rest following the x86 model of flushing the extended
register state like fpu there.
Remove it and use the arch_dup_task_struct() instead.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1336692811-30576-1-git-send-email-suresh.b.siddha@intel.com
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This patch adds the cas bits to advertise support for the Platform
Facilities Option (PFO) based encryption accelerator device. The nx
device driver provides support for this hardware feature.
Signed-off-by: Kent Yoder <key@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds the cas bits to advertise support for the Platform
Facilities Option (PFO) based random number generator accerator.
The pseries-rng driver provides support for this hardware feature.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Kent Yoder <key@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add support for the Platform Facilities Option (PFO) to the VIO bus.
These devices have a separate root node in OpenFirmware which
requires additional parsing to map into the existing VIO device
structure fields. This adds the interface for PFO device drivers to
make synchronous hypervisor calls.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Kent Yoder <key@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
PPC_PTRACE_GETHWDBGINFO, PPC_PTRACE_SETHWDEBUG and PPC_PTRACE_DELHWDEBUG are
PowerPC specific ptrace flags that use the watchpoint register. While they are
targeted primarily towards BookE users, user-space applications such as GDB
have started using them for BookS too. This patch enables the use of generic
hardware breakpoint interfaces for these new flags.
Apart from the usual benefits of using generic hw-breakpoint interfaces, these
changes allow debuggers (such as GDB) to use a common set of ptrace flags for
their watchpoint needs and allow more precise breakpoint specification (length
of the variable can be specified).
Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch changes the architecture vector to advertise support for a
lower minimum virtual processor entitled capacity. The default
minimum without this patch is 10%, this patch specifies 1%.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
So we have another case of paca->irq_happened getting out of
sync with the HW irq state. This can happen when a perfmon
interrupt occurs while soft disabled, as it will return to a
soft disabled but hard enabled context while leaving a stale
PACA_IRQ_HARD_DIS flag set.
This patch fixes it, and also adds a test for the condition
of those flags being out of sync in arch_local_irq_restore()
when CONFIG_TRACE_IRQFLAGS is enabled.
This helps catching those gremlins faster (and so far I
can't seem see any anymore, so that's good news).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Alignment was the last user of the ENABLE_INTS macro, which we can
now remove. All non-syscall exceptions now disable interrupts on
entry, they get re-enabled conditionally from C code. Don't
unconditionally re-enable in program check either, check the
original context.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We had a case where we could turn on hard interrupts while
leaving the PACA_IRQ_HARD_DIS bit set in the PACA. This can
in turn cause a BUG_ON() to hit in __check_irq_replay() due
to interrupt state getting out of sync.
The assembly code was also way too convoluted. Instead, we
now leave it to the C code to do the right thing which ends
up being smaller and more readable.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The core now has a threadinfo allocator which uses a kmemcache when
THREAD_SIZE < PAGE_SIZE.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Link: http://lkml.kernel.org/r/20120505150142.059161130@linutronix.de
cpuidle uses a generic function now. Remove the cruft.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Link: http://lkml.kernel.org/r/20120507175652.330322737@linutronix.de
commit 771dae818 (powerpc/cpuidle: Add cpu_idle_wait() to allow
switching of idle routines) implemented cpu_idle_wait() for powerpc.
The changelog says:
"The equivalent routine for x86 is in arch/x86/kernel/process.c
but the powerpc implementation is different.":
Unfortunately the changelog is completely useless as it does not tell
_WHY_ it is different.
Aside of being different the implementation is patently wrong.
The rescheduling IPI is async. That means that there is no guarantee,
that the other cores have executed the IPI when cpu_idle_wait()
returns. But that's the whole purpose of this function: to guarantee
that no CPU uses the old idle handler anymore.
Use the smp_functional_call() based implementation, which fulfils the
requirements.
[ This code is going to replaced by a core version to remove all the
pointless copies in arch/*, but this one should go to stable ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Cc: Trinabh Gupta <g.trinabh@gmail.com>
Cc: Arun R Bharadwaj <arun.r.bharadwaj@gmail.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Link: http://lkml.kernel.org/r/20120507175651.980164748@linutronix.de
Cc: stable@vger.kernel.org
This is necessary for qemu to be able to pass the right information
to the guest, such as the supported page sizes and corresponding
encodings in the SLB and hash table, which can vary depending
on the processor type, the type of KVM used (PR vs HV) and the
version of KVM
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[agraf: fix compilation on hv, adjust for newer ioctl numbers]
Signed-off-by: Alexander Graf <agraf@suse.de>
Time for which the hrtimer is started for decrementer emulation is calculated
using tb_ticks_per_usec. While hrtimer uses the clockevent for DEC
reprogramming (if needed) and which calculate timebase ticks using the
multiplier and shifter mechanism implemented within clockevent layer.
It was observed that this conversion (timebase->time->timebase) are not
correct because the mechanism are not consistent.
In our setup it adds 2% jitter.
With this patch clockevent multiplier and shifter mechanism are used when
starting hrtimer for decrementer emulation. Now the jitter is < 0.5%.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQEcBAABAgAGBQJPnb50AAoJEHm+PkMAQRiGAE0H/A4zFZIUGmF3miKPDYmejmrZ
oVDYxVAu6JHjHWhu8E3VsinvyVscowjV8dr15eSaQzmDmRkSHAnUQ+dB7Di7jLC2
MNopxsWjwyZ8zvvr3rFR76kjbWKk/1GYytnf7GPZLbJQzd51om2V/TY/6qkwiDSX
U8Tt7ihSgHAezefqEmWp2X/1pxDCEt+VFyn9vWpkhgdfM1iuzF39MbxSZAgqDQ/9
JJrBHFXhArqJguhENwL7OdDzkYqkdzlGtS0xgeY7qio2CzSXxZXK4svT6FFGA8Za
xlAaIvzslDniv3vR2ZKd6wzUwFHuynX222hNim3QMaYdXm012M+Nn1ufKYGFxI0=
=4d4w
-----END PGP SIGNATURE-----
Merge tag 'v3.4-rc5' into next
Linux 3.4-rc5
Merge to pull in prerequisite change for Smack:
86812bb0de
Requested by Casey.
This patch adds support for creating 1:1 mapping for the PPC_47x during
a KEXEC. The implementation is similar to that of the PPC440x which is
described here :
http://patchwork.ozlabs.org/patch/104323/
PPC_47x MMU :
The 47x uses Unified TLB 1024 entries, with 4-way associative mapping
(4 x 256 entries). The index to be used is calculated by the MMU by
hashing the PID, EPN and TS. The software can choose to specify the way
by setting bit 0(enable way select) and the way in bits 1-2 in the TLB
Word 0.
Implementation:
The patch erases all the UTLB entries which includes the tlb covering
the mapping for our code. The shadow TLB caches the mapping for the
running code which helps us to continue the execution until we do
isync/rfi. We then create a tmp mapping for the current code in the
other address space (TS) and switch to it.
Then we create a 1:1 mapping(EPN=RPN) for 0-2GiB in the original
address space and switch to the new mapping.
TODO: Add SMP support.
Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
Initialize the PID register with kernel pid (0) before we start
setting the TLB mapping for KEXEC. Also set the MMUCR[TID] to kernel
PID.
This was spotted while testing the kexec on ISS for 47x. ISS doesn't
return a successful tlbsx for a kernel address with PID set to a user PID.
Though the hardware/qemu/simics work fine.
This patch is harmless and initializes the PID to 0 (kernel PID) which
is usually the case during a normal kernel boot. This would fix the kexec
on ISS for 440. I have tested this patch on sequoia board.
Signed-off-by: Suzuki K Poulose <suzuki@in.ibm.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
PowerPC has non standard getregs calls that only dump the GPRs or
FPRs and have their arguments reversed. commit e17666ba48 (ptrace
updates & new, better requests) in 2.6.3 deprecated them and introduced
more standard versions.
It's been about 5 years and I know of no users of the old calls so
lets remove them.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Remove CONFIG_POWER4_ONLY, the option is badly named and only does two
things:
- It wraps the MMU segment table code. With feature fixups there is
little downside to compiling this in.
- It uses the newer mtocrf instruction in various assembly functions.
Instead of making this a compile option just do it at runtime via
a feature fixup.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add two optimisations to enable_kernel_altivec:
- enable_kernel_altivec has already determined if we need to
save the previous task's state but we call giveup_altivec
in both cases, requiring an extra branch in giveup_altivec. Create
giveup_altivec_notask which only turns on the VMX bit in the
MSR.
- We write the VMX MSR bit each time we call enable_kernel_altivec
even it was already set. Check the bit and branch out if we have
already set it. The classic case for this is vectored IO
where we have to copy multiple buffers to or from userspace.
The following testcase was used to confirm this patch improves
performance:
http://ozlabs.org/~anton/junkcode/copy_to_user.c
Since the current breakpoint for using VMX in copy_tofrom_user is
4096 bytes, I'm using buffers of 4096 + 1 cacheline (4224) bytes.
A benchmark of 16 entry readvs (-s 16):
time copy_to_user -l 4224 -s 16 -i 1000000
completes 5.2% faster on a POWER7 PS700.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use an empty inline instead of an empty function to implement
giveup_altivec on book3e CPUs, similar to flush_altivec_to_thread.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Remove all the iseries specific fields in the lppaca.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
At the moment system call entry looks like:
crclr so
...
mfcr r9
...
std r9,_CCR(r1)
commit bd19c8994a ([POWERPC] system call micro optimisation) put
some space between the crclr and mfcr in order to avoid a stall.
There is still a stall seen between the mfcr and std. We can avoid
the crclr by doing it in a GPR with rlwinm which gives us more room
to better schedule the sequence.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The count register is volatile so we don't need to preserve it.
Store zero to the entry in the exception frame.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The XER is a volatile register so there is no need to save and restore
it over a system call - zero it out in the exception stack frame
instead.
This should fix a 5 cycle stall of the mfxer/std seen on POWER7.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
syscall_dotrace_cont and syscall_error_cont tend to complicate perf
output so make them local.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The switch from using irq_map to irq_alloc_desc*() for managing irq
number allocations introduced new bugs in some of the powerpc
interrupt code. Several functions rely on the value of NR_IRQS to
determine the maximum irq number that could get allocated. However,
with sparse_irq and using irq_alloc_desc*() the maximum possible irq
number is now specified with 'nr_irqs' which may be a number larger
than NR_IRQS. This has caused breakage on powermac when
CONFIG_NR_IRQS is set to 32.
This patch removes most of the direct references to NR_IRQS in the
powerpc code and replaces them with either a nr_irqs reference or by
using the common for_each_irq_desc() macro. The powerpc-specific
for_each_irq() macro is removed at the same time.
Also, the Cell axon_msi driver is refactored to remove the global
build assumption on the size of NR_IRQS and instead add a limit to the
maximum irq number when calling irq_domain_add_nomap().
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Link: http://lkml.kernel.org/r/20120420124557.311212868@linutronix.de
Preparatory patch to make the idle thread allocation for secondary
cpus generic.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20120420124556.964170564@linutronix.de
Merge reason: development work has dependency on kvm patches merged
upstream.
Conflicts:
Documentation/feature-removal-schedule.txt
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>