linux

mirror of https://github.com/FEX-Emu/linux.git synced 2025-01-19 16:17:40 +00:00

Author	SHA1	Message	Date
Dmitry Torokhov	3ee2c40b7a	rtc: switch to using is_visible() to control sysfs attributes Instead of creating wakealarm attribute manually, after the device has been registered, let's rely on facilities provided by the attribute groups to control which attributes are visible and which are not. This allows to create all needed attributes at once, at the same time that we register RTC class device. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:07 +02:00
Dmitry Torokhov	a17ccd1c6a	rtc: switch wakealarm attribute to DEVICE_ATTR_RW Instead of using older style DEVICE_ATTR for wakealarm attribute let's switch to using DEVICE_ATTR_RW that ensures consistent across the kernel permissions on the attribute. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:07 +02:00
Dmitry Torokhov	df100c017e	rtc: make rtc_does_wakealarm() return boolean Users of rtc_does_wakealarm() return value treat it as boolean so let's change the signature accordingly. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:07 +02:00
Henri Roosen	f2284f9c90	rtc: rx8025: remove obsolete local_irq_disable() and local_irq_enable() for rtc_update_irq() Since commit e6229bec25be ("rtc: make rtc_update_irq callable with irqs enabled") rtc_update_irq() is callable with irqs enabled. Signed-off-by: Henri Roosen <henriroosen@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:07 +02:00
Octavian Purdila	0d9030a2c3	rtc: fix drivers that consider 0 as a valid IRQ in client->irq Since dab472eb931b ("i2c / ACPI: Use 0 to indicate that device does not have interrupt assigned"), 0 is not a valid i2c client irq anymore, so change all driver's checks accordingly. The same issue occurs when the device is instantiated via device tree with no IRQ, or from the i2c sysfs interface, even before the patch above. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:07 +02:00
Dmitry Torokhov	1e4cd62558	rtc: dev: properly manage lifetime of dev and cdev in rtc device struct rtc embeds both struct dev and struct cdev. Unfortunately character device structure may outlive the parent rtc structure unless we set it up as parent of character device so that it will stay pinned until character device is freed. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:07 +02:00
Dmitry Torokhov	c3b399a4b6	rtc: class: remove unnecessary device_get() in rtc_device_unregister Technically the address of rtc->dev can never be NULL, so get_device() can never fail. Also caller of rtc_device_unregister() supposed to be the owner of the device and thus have a valid reference. Therefore call to get_device() is not needed here. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:07 +02:00
Dmitry Torokhov	6706664d92	rtc: class: fix double free in rtc_register_device() error path Commit 59cca865f21e ("drivers/rtc/class.c: fix device_register() error handling") correctly noted that naked kfree() should not be used after failed device_register() call, however, while it added the needed put_device() it forgot to remove the original kfree() causing double-free. Cc: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:07 +02:00
Guo Zeng	dfe6c04aa2	rtc: sirfsoc: move to regmap APIs from platform-specific APIs The current codes use CSR platform specific API exported by machine codes to read/write RTC registers. they are: sirfsoc_rtc_iobrg_readl() sirfsoc_rtc_iobrg_writel() commit b1999477ed91 ("ARM: prima2: move to use REGMAP APIs for rtciobrg") moves to regmap support, now we can move to use regmap APIs in RTC driver. Signed-off-by: Guo Zeng <guo.zeng@csr.com> Signed-off-by: Barry Song <Baohua.Song@csr.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:06 +02:00
Vaibhav Jain	f4a2eecb3f	rtc: opal: Enable alarms only when opal supports tpo rtc-opal driver provides support for rtc alarms via timed-power-on(tpo). However some Power platforms like BML use a fake rtc clock and don't support tpo. Such platforms are indicated by the missing 'has-tpo' property in the device tree. Current implementation however enables callback for rtc_class_ops.read/set alarm irrespective of the tpo support from the platform. This results in a failed opal call when kernel tries to read an existing alarms via opal_get_tpo_time during rtc device registration. This patch fixes this issue by setting opal_rtc_ops.read/set_alarm callback pointers only when tpo is supported. Acked-by: Michael Neuling <mikey@neuling.org> Acked-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:06 +02:00
Joachim Eastwood	c28b42e3ae	rtc: add rtc-lpc24xx driver Add driver for the RTC found on NXP LPC178x/18xx/408x/43xx devices. The RTC provides calendar and clock functionality together with alarm interrupt support. Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:06 +02:00
Joachim Eastwood	dcb9372b34	doc: dt: add documentation for nxp,lpc1788-rtc Document NXP LPC178x/18xx/408x/43xx bindings Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:06 +02:00
Krzysztof Kozlowski	045c6fdd37	rtc: Drop owner assignment from platform_driver platform_driver does not need to set an owner because platform_driver_register() will set it. Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:06 +02:00
Krzysztof Kozlowski	b28845433e	rtc: Drop owner assignment from i2c_driver i2c_driver does not need to set an owner because i2c_register_driver() will set it. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:06 +02:00
Andrea Scian	653ebd75e9	rtc: pcf2127: use OFS flag to detect unreliable date and warn the user The PCF2127 datasheet states that it's wrong to say that the date in unreliable if BLF (battery low flag) is set but instead, OSF (seconds register) should be used to check if oscillator, for any reason, stopped. Battery may be low (usually below 2V5 threshold) but the date may be anyway correct (typically date is unreliable when input voltage is below 1V2). Signed-off-by: Andrea Scian <andrea.scian@dave.eu> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:06 +02:00
Andrea Scian	821f51c4da	rtc: use rtc_valid_tm() error code when reading date/time There's a wrong comment in some RTC drivers that say it's better to ignore rtc_valid_tm() when reading RTC timestamp. However this is wrong and is better to return to the userspace the error if timestamp is not valid. Signed-off-by: Andrea Scian <andrea.scian@dave.eu> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:06 +02:00
Vaibhav Hiremath	4ab8210313	rtc: 88pm80x: add device tree support Along with DT support, this patch also cleans up the unnecessary code around 'rtc_wakeup' initialization. Signed-off-by: Chao Xie <chao.xie@marvell.com> Signed-off-by: Vaibhav Hiremath <vaibhav.hiremath@linaro.org> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:05 +02:00
Maninder Singh	617f6f7ef5	rtc: bq32k: remove redundant check removing below static analysis error: (error) Possible null pointer dereference: client if (!i2c_check_functionality(client->adapter, I2C_FUNC_I2C)) ^^^^^^^ Error comes because client is dereferenced before NULL check. So probably NULL this check is not required. Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:05 +02:00
Vaishali Thakkar	508db592e2	rtc: ds1685: Use module_platform_driver Use module_platform_driver for drivers whose init and exit functions only register and unregister, respectively. A simplified version of the Coccinelle semantic patch that performs this transformation is as follows: @a@ identifier f, x; @@ -static f(...) { return platform_driver_register(&x); } @b depends on a@ identifier e, a.x; @@ -static e(...) { platform_driver_unregister(&x); } @c depends on a && b@ identifier a.f; declarer name module_init; @@ -module_init(f); @d depends on a && b && c@ identifier b.e, a.x; declarer name module_exit; declarer name module_platform_driver; @@ -module_exit(e); +module_platform_driver(x); Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:05 +02:00
Nishanth Menon	7abea617a4	rtc: ds1307: Support optional wakeup interrupt source With the recent pinctrl-single changes, SoCs such as Texas Instrument's OMAP processors can treat wake-up events from deeper idle states as interrupts. Let's add support for the optional second interrupt for wake-up using the generic wakeirq support added in commit 4990d4fe327b ("PM / Wakeirq: Add automated device wake IRQ handling") Finally, to pass the wake-up interrupt in the dts file, interrupts-extended property needs to be passed. This is similar in approach to commit 2a0b965cfb6e ("serial: omap: Add support for optional wake-up") + ee83bd3b6483 ("serial: omap: Switch wake-up interrupt to generic wakeirq") Signed-off-by: Nishanth Menon <nm@ti.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:05 +02:00
Nishanth Menon	eac7237fd8	rtc: ds1307: Sort the headers It is always a good practice to keep the #includes sorted Signed-off-by: Nishanth Menon <nm@ti.com> Acked-by: Felipe Balbi <balbi@ti.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:05 +02:00
Nishanth Menon	c598319136	rtc: ds1307: Switch to managed irq allocation Since we are not doing anything fancy in remove function that requires us to sequence IRQ free operation, we might as well switch over to devm_ equivalent of managed IRQ allocation and remove the explicit free_irq since it'd be done automatically at remove. Signed-off-by: Nishanth Menon <nm@ti.com> Acked-by: Felipe Balbi <balbi@ti.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:05 +02:00
Felipe Balbi	2fb07a10e0	rtc: ds1307: Convert to threaded IRQ The driver currently emulates the concept of threaded IRQ using a workqueue, which it really does not need to. Instead, switch over to threaded_irq handlers which is meant precisely for the same purpose. Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Nishanth Menon <nm@ti.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:05 +02:00
NeilBrown	e89c6fdf9e	Merge linux-block/for-4.3/core into md/for-linux There were a few conflicts that are fairly easy to resolve. Signed-off-by: NeilBrown <neilb@suse.com>	2015-09-05 11:08:32 +02:00
Nicholas Krause	559ec2f8fd	mm/hugetlb.c: make vma_has_reserves() return bool This makes vma_has_reserves() return bool due to this particular function only returning either one or zero as its return value. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Nicholas Krause	1ecef9ed0f	mm/madvise.c: make madvise_behaviour_valid() return bool This makes the madvise_bahaviour_valid() function return bool due to this particular function always returning the value of either one or zero as its return value. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Nicholas Krause	ca1d6c7d9d	mm/memory.c: make tlb_next_batch() return bool This makes the tlb_next_batch() bool due to this particular function only ever returning either one or zero as its return value. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Nicholas Krause	d9e7e37b4d	mm/dmapool.c: change is_page_busy() return from int to bool This makes the function is_page_busy() return bool rather then an int now due to this particular function's single return statement only ever evaulating to either one or zero. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
minkyung88.kim	4e6dab4233	mm: remove struct node_active_region struct node_active_region is not used anymore. Remove it. Signed-off-by: minkyung88.kim <minkyung88.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Oleg Nesterov	9943242ca4	mremap: simplify the "overlap" check in mremap_to() Minor, but this check is overcomplicated. Two half-intervals do NOT overlap if END1 <= START2 \|\| END2 <= START1, mremap_to() just needs to negate this check. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Oleg Nesterov	1d39168697	mremap: don't do uneccesary checks if new_len == old_len The "new_len > old_len" branch in vma_to_resize() looks very confusing. It only covers the VM_DONTEXPAND/pgoff checks but everything below is equally unneeded if new_len == old_len. Change this code to return if "new_len == old_len", new_len < old_len is not possible, otherwise the code below is wrong anyway. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Oleg Nesterov	d456fb9e52	mremap: don't do mm_populate(new_addr) on failure move_vma() sets *locked even if move_page_tables() or ->mremap() fails, change sys_mremap() to check "ret & ~PAGE_MASK". I think we should simply remove the VM_LOCKED code in move_vma(), that is why this patch doesn't change move_vma(). But this needs more cleanups. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Oleg Nesterov	5477e70a64	mm: move ->mremap() from file_operations to vm_operations_struct vma->vm_ops->mremap() looks more natural and clean in move_vma(), and this way ->mremap() can have more users. Say, vdso. While at it, s/aio_ring_remap/aio_ring_mremap/. Note: this is the minimal change before ->mremap() finds another user in file_operations; this method should have more arguments, and it can be used to kill arch_remap(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Oleg Nesterov	df1eab303c	mremap: don't leak new_vma if f_op->mremap() fails move_vma() can't just return if f_op->mremap() fails, we should unmap the new vma like we do if move_page_tables() fails. To avoid the code duplication this patch moves the "move entries back" under the new "if (err)" branch. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Nicholas Krause	31aafb45f4	mm/hugetlb.c: make vma_shareable() return bool This makes vma_shareable() return bool now due to this particular function only ever returning either one or zero as its return value. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Kirill A. Shutemov	1027e4436b	mm: make GUP handle pfn mapping unless FOLL_GET is requested With DAX, pfn mapping becoming more common. The patch adjusts GUP code to cover pfn mapping for cases when we don't need struct page to proceed. To make it possible, let's change follow_page() code to return -EEXIST error code if proper page table entry exists, but no corresponding struct page. __get_user_page() would ignore the error code and move to the next page frame. The immediate effect of the change is working MAP_POPULATE and mlock() on DAX mappings. [akpm@linux-foundation.org: fix arm64 build] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Acked-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Kirill A. Shutemov	d899844e9c	mm: fix status code which move_pages() returns for zero page The manpage for move_pages(2) specifies that status code for zero page is supposed to be -EFAULT. Currently kernel return -ENOENT in this case. follow_page() can do it for us, if we would ask for FOLL_DUMP. The use of FOLL_DUMP also means that the upper layer page tables pages are no longer allocated. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Christoph Lameter <cl@linux.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Sebastian Andrzej Siewior	ce9ce6659a	mm: memcontrol: bring back the VM_BUG_ON() in mem_cgroup_swapout() Clark stumbled over a VM_BUG_ON() in -RT which was then was removed by Johannes in commit f371763a79d ("mm: memcontrol: fix false-positive VM_BUG_ON() on -rt"). The comment before that patch was a tiny bit better than it is now. While the patch claimed to fix a false-postive on -RT this was not the case. None of the -RT folks ACKed it and it was not a false positive report. That was a real problem. This patch updates the comment that is improper because it refers to "disabled preemption" as a consequence of that lock being taken. A spin_lock() disables preemption, true, but in this case the code relies on the fact that the lock _also_ disables interrupts once it is acquired. And this is the important detail (which was checked the VM_BUG_ON()) which needs to be pointed out. This is the hint one needs while looking at the code. It was explained by Johannes on the list that the per-CPU variables are protected by local_irq_save(). The BUG_ON() was helpful. This code has been workarounded in -RT in the meantime. I wouldn't mind running into more of those if the code in question uses special kind of locking since now there is no verification (in terms of lockdep or BUG_ON()) and therefore I bring the VM_BUG_ON() check back in. The two functions after the comment could also have a "local_irq_save()" dance around them in order to serialize access to the per-CPU variables. This has been avoided because the interrupts should be off. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Clark Williams <williams@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Vladimir Zapolskiy	c98c36355d	genalloc: add support of multiple gen_pools per device This change fills devm_gen_pool_create()/gen_pool_get() "name" argument stub with contents and extends of_gen_pool_get() functionality on this basis. If there is no associated platform device with a device node passed to of_gen_pool_get(), the function attempts to get a label property or device node name (= repeats MTD OF partition standard) and seeks for a named gen_pool registered by device of the parent device node. The main idea of the change is to allow registration of independent gen_pools under the same umbrella device, say "partitions" on "storage device", the original functionality of one "partition" per "storage device" is untouched. [akpm@linux-foundation.org: fix constness in devres_find()] [dan.carpenter@oracle.com: freeing const data pointers] Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Vladimir Zapolskiy	7385817359	genalloc: add name arg to gen_pool_get() and devm_gen_pool_create() This change modifies gen_pool_get() and devm_gen_pool_create() client interfaces adding one more argument "name" of a gen_pool object. Due to implementation gen_pool_get() is capable to retrieve only one gen_pool associated with a device even if multiple gen_pools are created, fortunately right at the moment it is sufficient for the clients, hence provide NULL as a valid argument on both producer devm_gen_pool_create() and consumer gen_pool_get() sides. Because only one created gen_pool per device is addressable, explicitly add a restriction to devm_gen_pool_create() to create only one gen_pool per device, this implies two possible error codes returned by the function, account it on client side (only misc/sram). This completes client side changes related to genalloc updates. [akpm@linux-foundation.org: gen_pool_get() cleanup] Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Wei Yang	c0a2949883	mm/memblock: WARN_ON when nid differs from overlap region Each memblock_region has nid to indicates the Node ID of this range. For the overlap case, memblock_add_range() inserts the lower part and leave the upper part as indicated in the overlapped region. If the nid of the new range differs from the overlapped region, the information recorded is not correct. This patch adds a WARN_ON when the nid of the new range differs from the overlapped region. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Mel Gorman	c7e1e3ccfb	Documentation/features/vm: add feature description and arch support status for batched TLB flush after unmap Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Mel Gorman	d950c9477d	mm: defer flush of writable TLB entries If a PTE is unmapped and it's dirty then it was writable recently. Due to deferred TLB flushing, it's best to assume a writable TLB cache entry exists. With that assumption, the TLB must be flushed before any IO can start or the page is freed to avoid lost writes or data corruption. This patch defers flushing of potentially writable TLBs as long as possible. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Mel Gorman	72b252aed5	mm: send one IPI per CPU to TLB flush all entries after unmapping pages An IPI is sent to flush remote TLBs when a page is unmapped that was potentially accesssed by other CPUs. There are many circumstances where this happens but the obvious one is kswapd reclaiming pages belonging to a running process as kswapd and the task are likely running on separate CPUs. On small machines, this is not a significant problem but as machine gets larger with more cores and more memory, the cost of these IPIs can be high. This patch uses a simple structure that tracks CPUs that potentially have TLB entries for pages being unmapped. When the unmapping is complete, the full TLB is flushed on the assumption that a refill cost is lower than flushing individual entries. Architectures wishing to do this must give the following guarantee. If a clean page is unmapped and not immediately flushed, the architecture must guarantee that a write to that linear address from a CPU with a cached TLB entry will trap a page fault. This is essentially what the kernel already depends on but the window is much larger with this patch applied and is worth highlighting. The architecture should consider whether the cost of the full TLB flush is higher than sending an IPI to flush each individual entry. An additional architecture helper called flush_tlb_local is required. It's a trivial wrapper with some accounting in the x86 case. The impact of this patch depends on the workload as measuring any benefit requires both mapped pages co-located on the LRU and memory pressure. The case with the biggest impact is multiple processes reading mapped pages taken from the vm-scalability test suite. The test case uses NR_CPU readers of mapped files that consume 10*RAM. Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 Ops lru-file-mmap-read-elapsed 159.62 ( 0.00%) 120.68 ( 24.40%) Ops lru-file-mmap-read-time_range 30.59 ( 0.00%) 2.80 ( 90.85%) Ops lru-file-mmap-read-time_stddv 6.70 ( 0.00%) 0.64 ( 90.38%) 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 User 581.00 611.43 System 5804.93 4111.76 Elapsed 161.03 122.12 This is showing that the readers completed 24.40% faster with 29% less system CPU time. From vmstats, it is known that the vanilla kernel was interrupted roughly 900K times per second during the steady phase of the test and the patched kernel was interrupts 180K times per second. The impact is lower on a single socket machine. 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 Ops lru-file-mmap-read-elapsed 25.33 ( 0.00%) 20.38 ( 19.54%) Ops lru-file-mmap-read-time_range 0.91 ( 0.00%) 1.44 (-58.24%) Ops lru-file-mmap-read-time_stddv 0.28 ( 0.00%) 0.47 (-65.34%) 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 User 58.09 57.64 System 111.82 76.56 Elapsed 27.29 22.55 It's still a noticeable improvement with vmstat showing interrupts went from roughly 500K per second to 45K per second. The patch will have no impact on workloads with no memory pressure or have relatively few mapped pages. It will have an unpredictable impact on the workload running on the CPU being flushed as it'll depend on how many TLB entries need to be refilled and how long that takes. Worst case, the TLB will be completely cleared of active entries when the target PFNs were not resident at all. [sasha.levin@oracle.com: trace tlb flush after disabling preemption in try_to_unmap_flush] Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Mel Gorman	5b74283ab2	x86, mm: trace when an IPI is about to be sent When unmapping pages it is necessary to flush the TLB. If that page was accessed by another CPU then an IPI is used to flush the remote CPU. That is a lot of IPIs if kswapd is scanning and unmapping >100K pages per second. There already is a window between when a page is unmapped and when it is TLB flushed. This series increases the window so multiple pages can be flushed using a single IPI. This should be safe or the kernel is hosed already. Patch 1 simply made the rest of the series easier to write as ftrace could identify all the senders of TLB flush IPIS. Patch 2 tracks what CPUs potentially map a PFN and then sends an IPI to flush the entire TLB. Patch 3 tracks when there potentially are writable TLB entries that need to be batched differently Patch 4 increases SWAP_CLUSTER_MAX to further batch flushes The performance impact is documented in the changelogs but in the optimistic case on a 4-socket machine the full series reduces interrupts from 900K interrupts/second to 60K interrupts/second. This patch (of 4): It is easy to trace when an IPI is received to flush a TLB but harder to detect what event sent it. This patch makes it easy to identify the source of IPIs being transmitted for TLB flushes on x86. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Andrea Arcangeli	c47174fc36	userfaultfd: selftest This test allocates two virtual areas and bounces the physical memory across the two virtual areas using only userfaultfd. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Shuah Khan <shuah.kh@samsung.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Andrea Arcangeli	2c5b7e1be7	userfaultfd: avoid missing wakeups during refile in userfaultfd_read During the refile in userfaultfd_read both waitqueues could look empty to the lockless wake_userfault(). Use a seqcount to prevent this false negative that could leave an userfault blocked. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Andrea Arcangeli	230c92a879	userfaultfd: propagate the full address in THP faults The THP faults were not propagating the original fault address. The latest version of the API with uffd.arg.pagefault.address is supposed to propagate the full address through THP faults. This was not a kernel crashing bug and it wouldn't risk to corrupt user memory, but it would cause a SIGBUS failure because the wrong page was being copied. For various reasons this wasn't easily reproducible in the qemu workload, but the strestest exposed the problem immediately. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Andrea Arcangeli	dfa37dc3fc	userfaultfd: allow signals to interrupt a userfault This is only simple to achieve if the userfault is going to return to userland (not to the kernel) because we can avoid returning VM_FAULT_RETRY despite we temporarily released the mmap_sem. The fault would just be retried by userland then. This is safe at least on x86 and powerpc (the two archs with the syscall implemented so far). Hint to verify for which archs this is safe: after handle_mm_fault returns, no access to data structures protected by the mmap_sem must be done by the fault code in arch/*/mm/fault.c until up_read(&mm->mmap_sem) is called. This has two main benefits: signals can run with lower latency in production (signals aren't blocked by userfaults and userfaults are immediately repeated after signal processing) and gdb can then trivially debug the threads blocked in this kind of userfaults coming directly from userland. On a side note: while gdb has a need to get signal processed, coredumps always worked perfectly with userfaults, no matter if the userfault is triggered by GUP a kernel copy_user or directly from userland. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Andrea Arcangeli	e6485a47b7	userfaultfd: require UFFDIO_API before other ioctls UFFDIO_API was already forced before read/poll could work. This makes the code more strict to force it also for all other ioctls. All users would already have been required to call UFFDIO_API before invoking other ioctls but this makes it more explicit. This will ensure we can change all ioctls (all but UFFDIO_API/struct uffdio_api) with a bump of uffdio_api.api. There's no actual plan or need to change the API or the ioctl, the current API already should cover fine even the non cooperative usage, but this is just for the longer term future just in case. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00

1 2 3 4 5 ...

545250 Commits