The discard limits that get established for a thin-pool or thin device
may be incompatible with the pool's data device. Avoid this by checking
the discard limits of the pool's data device. If an incompatibility is
found then the pool's 'discard passdown' feature is disabled.
Change thin_io_hints to ensure that a thin device always uses the same
queue limits as its pool device.
Introduce requested_pf to track whether or not the table line originally
contained the no_discard_passdown flag and use this directly for table
output. We prepare the correct setting for discard_passdown directly in
bind_control_target (called from pool_io_hints) and store it in
adjusted_pf rather than waiting until we have access to pool->pf in
pool_preresume.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
A little thin discard code refactoring to make the next patch (dm thin:
fix discard support for data devices) more readable.
Pull out a couple of functions (and uses bools instead of unsigned for
features).
No functional changes.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Add a safety net that will re-use the DM device's existing limits in the
event that DM device has a temporary table that doesn't have any
component devices. This is to reduce the chance that requests not
respecting the hardware limits will reach the device.
DM recalculates queue limits based only on devices which currently exist
in the table. This creates a problem in the event all devices are
temporarily removed such as all paths being lost in multipath. DM will
reset the limits to the maximum permissible, which can then assemble
requests which exceed the limits of the paths when the paths are
restored. The request will fail the blk_rq_check_limits() test when
sent to a path with lower limits, and will be retried without end by
multipath. This became a much bigger issue after v3.6 commit fe86cdcef
("block: do not artificially constrain max_sectors for stacking
drivers").
Reported-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Always clear QUEUE_FLAG_ADD_RANDOM if any underlying device does not
have it set. Otherwise devices with predictable characteristics may
contribute entropy.
QUEUE_FLAG_ADD_RANDOM specifies whether or not queue IO timings
contribute to the random pool.
For bio-based targets this flag is always 0 because such devices have no
real queue.
For request-based devices this flag was always set to 1 by default.
Now set it according to the flags on underlying devices. If there is at
least one device which should not contribute, set the flag to zero: If a
device, such as fast SSD storage, is not suitable for supplying entropy,
a request-based queue stacked over it will not be either.
Because the checking logic is exactly same as for the rotational flag,
share the iteration function with device_is_nonrot().
Signed-off-by: Milan Broz <mbroz@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
The access beyond the end of device BUG_ON that was introduced to
dm_request_fn via commit 29e4013de7 ("dm: implement
REQ_FLUSH/FUA support for request-based dm") was an overly
drastic (but simple) response to this situation.
I have received a report that this BUG_ON was hit and now think
it would be better to use dm_kill_unmapped_request() to fail the clone
and original request with -EIO.
map_request() will assign the valid target returned by
dm_table_find_target to tio->ti. But when the target
isn't valid tio->ti is never assigned (because map_request isn't
called); so add a check for tio->ti != NULL to dm_done().
Reported-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: stable@vger.kernel.org # v2.6.37+
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
When there are no paths and multipath receives an ioctl, it waits until
a path becomes available. This behaviour is incorrect if the
"queue_if_no_path" setting was not specified, as then the ioctl should
be rejected immediately, which this patch now does.
commit 35991652b ("dm mpath: allow ioctls to trigger pg init") should
have checked if queue_if_no_path was configured before queueing IO.
Checking for the queue_if_no_path feature, like is done in map_io(),
allows the following table load to work without blocking in the
multipath_ioctl retry loop:
echo "0 1024 multipath 0 0 0 0" | dmsetup create mpath_nodevs
Without this fix the multipath_ioctl will block with the following stack
trace:
blkid D 0000000000000002 0 23936 1 0x00000000
ffff8802b89e5cd8 0000000000000082 ffff8802b89e5fd8 0000000000012440
ffff8802b89e4010 0000000000012440 0000000000012440 0000000000012440
ffff8802b89e5fd8 0000000000012440 ffff88030c2aab30 ffff880325794040
Call Trace:
[<ffffffff814ce099>] schedule+0x29/0x70
[<ffffffff814cc312>] schedule_timeout+0x182/0x2e0
[<ffffffff8104dee0>] ? lock_timer_base+0x70/0x70
[<ffffffff814cc48e>] schedule_timeout_uninterruptible+0x1e/0x20
[<ffffffff8104f840>] msleep+0x20/0x30
[<ffffffffa0000839>] multipath_ioctl+0x109/0x170 [dm_multipath]
[<ffffffffa06bfb9c>] dm_blk_ioctl+0xbc/0xd0 [dm_mod]
[<ffffffff8122a408>] __blkdev_driver_ioctl+0x28/0x30
[<ffffffff8122a79e>] blkdev_ioctl+0xce/0x730
[<ffffffff811970ac>] block_ioctl+0x3c/0x40
[<ffffffff8117321c>] do_vfs_ioctl+0x8c/0x340
[<ffffffff81166293>] ? sys_newfstat+0x33/0x40
[<ffffffff81173571>] sys_ioctl+0xa1/0xb0
[<ffffffff814d70a9>] system_call_fastpath+0x16/0x1b
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.5+
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
The dm thin pool target claims to support the zeroing of discarded
data areas. This turns out to be incorrect when processing discards
that do not exactly cover a complete number of blocks, so the target
must always set discard_zeroes_data_unsupported.
The thin pool target will zero blocks when they are allocated if the
skip_block_zeroing feature is not specified. The block layer
may send a discard that only partly covers a block. If a thin pool
block is partially discarded then there is no guarantee that the
discarded data will get zeroed before it is accessed again.
Due to this, thin devices cannot claim discards will always zero data.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Not event_format->name, that doesn't contains the sys: part.
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-v9x3q9rv4caxtox7wtjpchq5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit d97b46a64 ("syscalls, x86: add __NR_kcmp syscall" ) added a new
syscall to support checkpoint restore. It is currently x86-only, but
that restriction will be removed in a subsequent patch. Unfortunately,
the kernel checksyscalls script had a bug which suppressed any warning
to other architectures that the kcmp syscall was not implemented. A
patch to checksyscalls is being tested in linux-next and other
architectures are seeing warnings about kcmp being unimplemented.
This patch adds __NR_kcmp to <asm-generic/unistd.h> so that kcmp is
wired in for architectures using the generic syscall list.
Signed-off-by: Mark Salter <msalter@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
The ELSE IF statements do not work as expected if another ELSE statement
follows. This is because the $if_set is not set. If the ELSE IF
condition is true, the following ELSE should be ignored. But because the
$if_set is not set, the following ELSE will also be executed.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
To apply a filter to all the evsels in an evlist.
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-v9x3q9rv4caxtox7wtjpchq5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Because that is what it really does, i.e. it applies the filters that
were parsed from the command line and stashed into the evsels they refer
to.
We'll need the set_filter method name to actually apply a filter to all
the evsels in an evlist, for instance, to ask that a syswide tracer
doesn't trace itself.
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-v9x3q9rv4caxtox7wtjpchq5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It will set up a syscall open tracepoint event, generate an open with
invalid flags, then check those flags were the ones reported in the
tracepoint fired.
For the filename we need vfs:getname, but that will go thru some more
iterations as the vfs getname codebase is going thru changes lately.
When that is in I'll just check that the perf_evsel__newtp constructor
is not bailing out and then add it to the evlist, catch the event and
check the filename against the one used in the 'open' call used to
trigger the event.
Cc: David Ahern <dsahern@gmail.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-p5w9aq0jcbb91ghzqomowm16@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We were relying on the info in pevent, but since we have it in
perf_evsel, set up by the perf_session routine if read from a perf.data
file or by whoever creates the evsels, use it.
New 'perf test' entries will use it to parse locally generated events,
in a non perf.data centered workflow.
As well as use byteswap.h to get per arch optimized swap routines, like
other parts of perf (header, perf_evsel__parse_sample, symbol, etc)
already do.
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-8tjuxk09mlsfmh7macgkxsip@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Instead of passing it around for parsing as an explicit parameter, will
help with reading tracepoint fields when not using a perf session or
pevent structure, i.e. for non perf.data centered workflows.
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-qa67ikv2sm49cwa7dyjhhp6g@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Or one with cpu_map->map[0] == -1.
Reducing the boilerplate in setting up an evlist by nor requiring a
cpu_map to be created at all.
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-rnaqn3dtnsfo1wlbbf3fhx00@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It needs to properly set the sample_type, sample_period and the KVM
related perf_event_attr fields.
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-v9x3q9rv4caxtox7wtjpchq5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Checking "user" before "is_idle_task()" allows better optimizations
in cases where inlining is possible. Also, "bool" should be passed
"true" or "false" rather than "1" or "0". This commit therefore makes
these changes, as noted in Josh's review.
Reported-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Provide a config option that enables the userspace
RCU extended quiescent state on every CPUs by default.
This is for testing purpose.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
do_notify_resume() may be called on irq or exception
exit. But at that time the exception has already called
rcu_user_enter() and the irq has already called rcu_irq_exit().
Since it can use RCU read side critical section, we must call
rcu_user_exit() before doing anything there. Then we must call
back rcu_user_enter() after this function because we know we are
going to userspace from there.
This complete support for userspace RCU extended quiescent state
in x86-64.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
This way we can exit the RCU extended quiescent state before
we schedule a new task from irq/exception exit.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
When exceptions or irq are about to resume userspace, if
the task needs to be rescheduled, the arch low level code
calls schedule() directly.
If we call it, it is because we have the TIF_RESCHED flag:
- It can be set after random local calls to set_need_resched()
(RCU, drm, ...)
- A wake up happened and the CPU needs preemption. This can
happen in several ways:
* Remotely: the remote waking CPU has set TIF_RESCHED and send the
wakee an IPI to schedule the new task.
* Remotely enqueued: the remote waking CPU sends an IPI to the target
and the wake up is made by the target.
* Locally: waking CPU == wakee CPU and the wakeup is done locally.
set_need_resched() is called without IPI.
In the case of local and remotely enqueued wake ups, the tick can
be restarted when we enqueue the new task and RCU can exit the
extended quiescent state at the same time. Then by the time we reach
irq exit path and we call schedule, we are not in RCU user mode.
But if we call schedule() only because something called set_need_resched(),
RCU may still be in user mode when we reach schedule.
Also if a wake up is done remotely, the CPU might see the TIF_RESCHED
flag and call schedule while the IPI has not yet happen to restart the
tick and exit RCU user mode.
We need to manually protect against these corner cases.
Create a new API schedule_user() that calls schedule() inside
rcu_user_exit()-rcu_user_enter() in order to protect it. Archs
will need to rely on it now to implement user preemption safely.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
When an exception or an irq exits, and we are going to resume into
interrupted kernel code, the low level architecture code calls
preempt_schedule_irq() if there is a need to reschedule.
If the interrupt/exception occured between a call to rcu_user_enter()
(from syscall exit, exception exit, do_notify_resume exit, ...) and
a real resume to userspace (iret,...), preempt_schedule_irq() can be
called whereas RCU thinks we are in userspace. But preempt_schedule_irq()
is going to run kernel code and may be some RCU read side critical
section. We must exit the userspace extended quiescent state before
we call it.
To solve this, just call rcu_user_exit() in the beginning of
preempt_schedule_irq().
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Add necessary hooks to x86 exception for userspace
RCU extended quiescent state support.
This includes traps, page fault, debug exceptions, etc...
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There is some unnatural label based layout in this function.
Convert the unnecessary goto to readable conditional blocks.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Add syscall slow path hooks to notify syscall entry
and exit on CPUs that want to support userspace RCU
extended quiescent state.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Clear the syscalls hook of a task when it's scheduled out so that if
the task migrates, it doesn't run the syscall slow path on a CPU
that might not need it.
Also set the syscalls hook on the next task if needed.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
By default we don't want to enter into RCU extended quiescent
state while in userspace because doing this produces some overhead
(eg: use of syscall slowpath). Set it off by default and ready to
run when some feature like adaptive tickless need it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Allow calls to rcu_user_enter() even if we are already
in userspace (as seen by RCU) and allow calls to rcu_user_exit()
even if we are already in the kernel.
This makes the APIs more flexible to be called from architectures.
Exception entries for example won't need to know if they come from
userspace before calling rcu_user_exit().
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Create a new config option under the RCU menu that put
CPUs under RCU extended quiescent state (as in dynticks
idle mode) when they run in userspace. This require
some contribution from architectures to hook into kernel
and userspace boundaries.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
The current implementation of RCU_FAST_NO_HZ tries reasonably hard to rid
the current CPU of RCU callbacks. This is appropriate when the CPU is
entering idle, where it doesn't have much useful to do anyway, but is most
definitely not what you want when transitioning to user-mode execution.
This commit therefore detects the adaptive-tick case, and refrains from
burning CPU time getting rid of RCU callbacks in that case.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
In some cases, it is necessary to enter or exit userspace-RCU-idle mode
from an interrupt handler, for example, if some other CPU sends this
CPU a resched IPI. In this case, the current CPU would enter the IPI
handler in userspace-RCU-idle mode, but would need to exit the IPI handler
after having exited that mode.
To allow this to work, this commit adds two new APIs to TREE_RCU:
- rcu_user_enter_after_irq(). This must be called from an interrupt between
rcu_irq_enter() and rcu_irq_exit(). After the irq calls rcu_irq_exit(),
the irq handler will return into an RCU extended quiescent state.
In theory, this interrupt is never a nested interrupt, but in practice
it might interrupt softirq, which looks to RCU like a nested interrupt.
- rcu_user_exit_after_irq(). This must be called from a non-nesting
interrupt, interrupting an RCU extended quiescent state, also
between rcu_irq_enter() and rcu_irq_exit(). After the irq calls
rcu_irq_exit(), the irq handler will return in an RCU non-quiescent
state.
[ Combined with "Allow calls to rcu_exit_user_irq from nesting irqs." ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
RCU currently insists that only idle tasks can enter RCU idle mode, which
prohibits an adaptive tickless kernel (AKA nohz cpusets), which in turn
would mean that usermode execution would always take scheduling-clock
interrupts, even when there is only one task runnable on the CPU in
question.
This commit therefore adds rcu_user_enter() and rcu_user_exit(), which
allow non-idle tasks to enter RCU idle mode. These are quite similar
to rcu_idle_enter() and rcu_idle_exit(), respectively, except that they
omit the idle-task checks.
[ Updated to use "user" flag rather than separate check functions. ]
[ paulmck: Updated to drop exports of new functions based on Josh's patch
getting rid of the need for them. ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cleanup the label maze in this function. Having a
seperate function to first handle the traps that don't
generate a signal makes it easier to convert into
more readable conditional paths.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1348577479-2564-1-git-send-email-fweisbec@gmail.com
[ Fixed 32-bit build failure. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
GAS in binutils(2.16.91) could not parse parentheses within
macro parameters unless fully parenthesized, and this is a
workaround to make old gas work without generating below errors:
arch/x86/kernel/entry_64.S: Assembler messages:
arch/x86/kernel/entry_64.S:387: Error: too many positional arguments
arch/x86/kernel/entry_64.S:389: Error: too many positional arguments
[...]
Signed-off-by: Tao Guo <glorioustao@gmail.com>
Reluctantly-Acked-by: Jan Beulich <jbeulich@novell.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1348648102-12653-1-git-send-email-glorioustao@gmail.com
[ Jan argues that these old GAS versions are fragile - which is so, but lets give them a chance. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 31d2092eb0 ("x86: move
mp_register_lapic_address to boot.c") renamed mp_register_lapic
to acpi_register_lapic. But mp_register_lapic remains in a
comment. So the patch rename it.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/50625239.3050403@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Otherwise when X starts we commonly get a black screen scanning
out nothing, its wierd dpms on/off from userspace brings it back,
With this on F18, multi-seat works again with my 1920x1200 monitor
which is above the sku limit for the device I have.
Cc: stable@vger.kernel.org
Reviewed-by: Alex Deucher <alexander.deucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We don't allocate enough data for this struct. As soon as we start
modifying event->event on the next lines, then we're going beyond the
end of the memory we allocated.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@gmail.com>
These just silence some printks that we are seeing that we shouldn't
* 'drm-nouveau-fixes' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
drm/nvc0/ltcg: mask off intr 0x10
drm/nouveau: silence a debug message triggered by newer userspace
Pull v3.7 RCU commits from Paul E. McKenney:
"
0. A fix for a latent bug that has been in RCU ever since the
addition of CPU stall warnings. This bug results in
false-positive stall warnings, but thus far only on embedded
systems with severely cut-down userspace configurations.
This fix is located on an rcu/urgent branch, with the rest
of the commits based on top of it. This commit CCs stable.
Given that the merge window is coming quite soon and given
the small number of affected users, I do -not- recommend
pushing it to 3.6, but the separate branch makes it easy to
find if someone needs it.
1. Further reductions in latency spikes for huge systems, along
with additional boot-time adaptation to the actual hardware.
This is a large change, as it moves RCU grace-period
initialization and cleanup, along with quiescent-state forcing,
from softirq to a kthread. However, it appears to be in
quite good shape (famous last words). Posted to LKML at
https://lkml.org/lkml/2012/9/20/427.
2. Updates to documentation and rcutorture, the latter category
including keeping statistics on CPU-hotplug latencies and
fixing some initialization-time races. Posted to LKML at
https://lkml.org/lkml/2012/8/30/193.
3. Miscellaneous fixes and improvements, posted to LKML at
https://lkml.org/lkml/2012/8/30/199.
4. CPU-hotplug fixes and improvements, posted to LKML at
https://lkml.org/lkml/2012/8/30/292 for first three and at
https://lkml.org/lkml/2012/8/3/416.
5. Idle-loop fixes that were omitted on an earlier submission,
posted to LKML at https://lkml.org/lkml/2012/8/30/251.
"
Signed-off-by: Ingo Molnar <mingo@kernel.org>
. The new perf_evsel__tp_sched_test 'perf test' broke the build by setting the
'ret' variable but not using it, caught by newer gcc
-Werror=unused-but-set-variable, fix from Namhyung Kim.
. pevent_parse_event should return a proper PEVENT_ERRNO__ and call
pevent_free_format on its failure path, fixes from Namhyung Kim.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJQYa9kAAoJENZQFvNTUqpAcqcP/RsWR9t2DRzX5H2ICkVOxwzA
rf3LxYZPzd/4b8hNbqcLoS12ttg6a2vrUE02lGwuTtx2aX4a3SgnlmbkehAUJNjA
iDy6+lfp6nZWJF23krFkl/sAte976IBCoiTscyX/8doiSMelwhz5toxwtalLNIl4
cO4k+3pMSEWjiE30CAqcxR7OF3B6SJM8gAe9B6ZhCwLTaYEBlH4mdUDI3gztx6bc
G0lOPegDDC1dGAv8msBC3WwWxSgQaOGHetkC1cvD77CUL1YM1KzkxC5TEN8gT8qu
5Pz5rXQxQvyVzmFzhkOiC52PmDfpjNRgmIIvpGSaNfngHEWfFBFTnOcLpW7E9JyW
ipmHRp1jw2hjPKgGR29pXMlZetTg0+QLUAlD1Thg9xiasnKDPmM5zFEcidloIFWk
K5WNO2Y6ezQ4vd75LolIXA85mS8TPfU4FwKcLmQwmIk4fxIF4BrRXP+z69K3esIu
yxS1rhj3qQY4LBOqqy3gE7KfthaKz1xsNpesAW6n2KDsqEO9I6lb0yVGMAhrDZMv
o17u2lIlB3amVDZPO1vBWtU6OSlVb1i1dD1laUFt+nGVo7EeETMrckG8rxhVtTLj
HAIipZzMsxcSWagiLlocVg4qGmnZ4aNQIvGjvHLlVDrBgZBC8QzZyWh9EfzKxbUe
qkWZSi3dov59gwoWqNnV
=EqiG
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core fixes from Arnaldo Carvalho de Melo:
* The new perf_evsel__tp_sched_test 'perf test' broke the build by setting the
'ret' variable but not using it, caught by newer gcc
-Werror=unused-but-set-variable, fix from Namhyung Kim.
* pevent_parse_event should return a proper PEVENT_ERRNO__ and call
pevent_free_format on its failure path, fixes from Namhyung Kim.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
NVIDIA do that at startup too on Fermi, so perhaps the heap of 0x10
intrs we receive are normal and we can ignore them.
On Kepler NVIDIA *don't* do this, but the hardware appears to come up
with the bit masked off by default - so that's probably why :)
This should silence some interrupt spam seen on Fermi+ boards.
Backported patch from reworked nouveau kernel tree.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Commit v2.6.19-rc1~1272^2~41 tells us that r->cost != 0 can happen when
a running state is saved to userspace and then reinstated from there.
Make sure that private xt_limit area is initialized with correct values.
Otherwise, random matchings due to use of uninitialized memory.
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>