linux

mirror of https://github.com/FEX-Emu/linux.git synced 2024-12-18 06:50:08 +00:00

Author	SHA1	Message	Date
Tejun Heo	1d3650f713	cfq-iosched: implement hierarchy-ready cfq_group charge scaling Currently, cfqg charges are scaled directly according to cfqg->weight. Regardless of the number of active cfqgs or the amount of active weights, a given weight value always scales charge the same way. This works fine as long as all cfqgs are treated equally regardless of their positions in the hierarchy, which is what cfq currently implements. It can't work in hierarchical settings because the interpretation of a given weight value depends on where the weight is located in the hierarchy. This patch reimplements cfqg charge scaling so that it can be used to support hierarchy properly. The scheme is fairly simple and light-weight. * When a cfqg is added to the service tree, v(disktime)weight is calculated. It walks up the tree to root calculating the fraction it has in the hierarchy. At each level, the fraction can be calculated as cfqg->weight / parent->level_weight By compounding these, the global fraction of vdisktime the cfqg has claim to - vfraction - can be determined. * When the cfqg needs to be charged, the charge is scaled inversely proportionally to the vfraction. The new scaling scheme uses the same CFQ_SERVICE_SHIFT for fixed point representation as before; however, the smallest scaling factor is now 1 (ie. 1 << CFQ_SERVICE_SHIFT). This is different from before where 1 was for CFQ_WEIGHT_DEFAULT and higher weight would result in smaller scaling factor. While this shifts the global scale of vdisktime a bit, it doesn't change the relative relationships among cfqgs and the scheduling result isn't different. cfq_group_notify_queue_add uses fixed CFQ_IDLE_DELAY when appending new cfqg to the service tree. The specific value of CFQ_IDLE_DELAY didn't have any relevance to vdisktime before and is unlikely to cause any visible behavior difference now especially as the scale shift isn't that large. As the new scheme now makes proper distinction between cfqg->weight and ->leaf_weight, reverse the weight aliasing for root cfqgs. For root, both weights are now mapped to ->leaf_weight instead of the other way around. Because we're still using cfqg_flat_parent(), this patch shouldn't change the scheduling behavior in any noticeable way. v2: Beefed up comments on vfraction as requested by Vivek. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>	2013-01-09 08:05:11 -08:00
Tejun Heo	7918ffb5b8	cfq-iosched: implement cfq_group->nr_active and ->children_weight To prepare for blkcg hierarchy support, add cfqg->nr_active and ->children_weight. cfqg->nr_active counts the number of active cfqgs at the cfqg's level and ->children_weight is sum of weights of those cfqgs. The level covers itself (cfqg->leaf_weight) and immediate children. The two values are updated when a cfqg enters and leaves the group service tree. Unless the hierarchy is very deep, the added overhead should be negligible. Currently, the parent is determined using cfqg_flat_parent() which makes the root cfqg the parent of all other cfqgs. This is to make the transition to hierarchy-aware scheduling gradual. Scheduling logic will be converted to use cfqg->children_weight without actually changing the behavior. When everything is ready, blkcg_weight_parent() will be replaced with proper parent function. This patch doesn't introduce any behavior chagne. v2: s/cfqg->level_weight/cfqg->children_weight/ as per Vivek. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>	2013-01-09 08:05:11 -08:00
Tejun Heo	e71357e118	cfq-iosched: add leaf_weight cfq blkcg is about to grow proper hierarchy handling, where a child blkg's weight would nest inside the parent's. This makes tasks in a blkg to compete against both tasks in the sibling blkgs and the tasks of child blkgs. We're gonna use the existing weight as the group weight which decides the blkg's weight against its siblings. This patch introduces a new weight - leaf_weight - which decides the weight of a blkg against the child blkgs. It's named leaf_weight because another way to look at it is that each internal blkg nodes have a hidden child leaf node which contains all its tasks and leaf_weight is the weight of the leaf node and handled the same as the weight of the child blkgs. This patch only adds leaf_weight fields and exposes it to userland. The new weight isn't actually used anywhere yet. Note that cfq-iosched currently offcially supports only single level hierarchy and root blkgs compete with the first level blkgs - ie. root weight is basically being used as leaf_weight. For root blkgs, the two weights are kept in sync for backward compatibility. v2: cfqd->root_group->leaf_weight initialization was missing from cfq_init_queue() causing divide by zero when !CONFIG_CFQ_GROUP_SCHED. Fix it. Reported by Fengguang. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Fengguang Wu <fengguang.wu@intel.com>	2013-01-09 08:05:10 -08:00
Tejun Heo	3c54786590	blkcg: make blkcg_gq's hierarchical Currently a child blkg (blkcg_gq) can be created even if its parent doesn't exist. ie. Given a blkg, it's not guaranteed that its ancestors will exist. This makes it difficult to implement proper hierarchy support for blkcg policies. Always create blkgs recursively and make a child blkg hold a reference to its parent. blkg->parent is added so that finding the parent is easy. blkcg_parent() is also added in the process. This change can be visible to userland. e.g. while issuing IO in a nested cgroup didn't affect the ancestors at all, now it will initialize all ancestor blkgs and zero stats for the request_queue will always appear on them. While this is userland visible, this shouldn't cause any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>	2013-01-09 08:05:10 -08:00
Tejun Heo	93e6d5d8f5	blkcg: cosmetic updates to blkg_create() * Rename out_* labels to err_. Do ERR_PTR() conversion once in the error return path. This patch is cosmetic and to prepare for the hierarchy support. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>	2013-01-09 08:05:10 -08:00
Tejun Heo	86cde6b623	blkcg: reorganize blkg_lookup_create() and friends Reorganize such that * __blkg_lookup() takes bool param @update_hint to determine whether to update hint. * __blkg_lookup_create() no longer performs lookup before trying to create. Renamed to blkg_create(). * blkg_lookup_create() now performs lookup and then invokes blkg_create() if lookup fails. * root_blkg creation in blkcg_activate_policy() updated accordingly. Note that blkcg_activate_policy() no longer updates lookup hint if root_blkg already exists. Except for the last lookup hint bit which is immaterial, this is pure reorganization and doesn't introduce any visible behavior change. This is to prepare for proper hierarchy support. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>	2013-01-09 08:05:10 -08:00
Tejun Heo	356d2e5810	blkcg: fix minor bug in blkg_alloc() blkg_alloc() was mistakenly checking blkcg_policy_enabled() twice. The latter test should have been on whether pol->pd_init_fn() exists. This doesn't cause actual problems because both blkcg policies implement pol->pd_init_fn(). Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>	2013-01-09 08:05:10 -08:00
Vivek Goyal	b226e5c411	cfq-iosched: Print sync-noidle information in blktrace messages Currently we attach a character "S" or "A" to the cfqq<pid>, to represent whether queues is sync or async. Add one more character "N" to represent whether it is sync-noidle queue or sync queue. So now three different type of queues will look as follows. cfq1234S --> sync queus cfq1234SN --> sync noidle queue cfq1234A --> Async queue Previously S/A classification was being printed only if group scheduling was enabled. This patch also makes sure that this classification is displayed even if group idling is disabled. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-01-09 08:05:09 -08:00
Vivek Goyal	1f23f12151	cfq-iosched: Get rid of unnecessary local variable Use of local varibale "n" seems to be unnecessary. Remove it. This brings it inline with function __cfq_group_st_add(), which is also doing the similar operation of adding a group to a rb tree. No functionality change here. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-01-09 08:05:09 -08:00
Vivek Goyal	6d816ec7c8	cfq-iosched: Rename few functions related to selecting workload choose_service_tree() selects/sets both wl_class and wl_type. Rename it to choose_wl_class_and_type() to make it very clear. cfq_choose_wl() only selects and sets wl_type. It is easy to confuse it with choose_st(). So rename it to cfq_choose_wl_type() to make it clear what does it do. Just renaming. No functionality change. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-01-09 08:05:09 -08:00
Vivek Goyal	34b98d03bd	cfq-iosched: Rename "service_tree" to "st" at some places At quite a few places we use the keyword "service_tree". At some places, especially local variables, I have abbreviated it to "st". Also at couple of places moved binary operator "+" from beginning of line to end of previous line, as per Tejun's feedback. v2: Reverted most of the service tree name change based on Jeff Moyer's feedback. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-01-09 08:05:09 -08:00
Vivek Goyal	4d2ceea4cb	cfq-iosched: More renaming to better represent wl_class and wl_type Some more renaming. Again making the code uniform w.r.t use of wl_class/class to represent IO class (RT, BE, IDLE) and using wl_type/type to represent subclass (SYNC, SYNC-IDLE, ASYNC). At places this patch shortens the string "workload" to "wl". Renamed "saved_workload" to "saved_wl_type". Renamed "saved_serving_class" to "saved_wl_class". For uniformity with "saved_wl_*" variables, renamed "serving_class" to "serving_wl_class" and renamed "serving_type" to "serving_wl_type". Again, just trying to improve upon code uniformity and improve readability. No functional change. v2: - Restored the usage of keyword "service" based on Jeff Moyer's feedback. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-01-09 08:05:09 -08:00
Vivek Goyal	3bf10fea3b	cfq-iosched: Properly name all references to IO class Currently CFQ has three IO classes, RT, BE and IDLE. At many a places we are calling workloads belonging to these classes as "prio". This gets very confusing as one starts to associate it with ioprio. So this patch just does bunch of renaming so that reading code becomes easier. All reference to RT, BE and IDLE workload are done using keyword "class" and all references to subclass, SYNC, SYNC-IDLE, ASYNC are made using keyword "type". This makes me feel much better while I am reading the code. There is no functionality change due to this patch. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-01-09 08:05:08 -08:00
Linus Torvalds	d1c3ed669a	Linux 3.8-rc2	2013-01-02 18:13:21 -08:00
Linus Torvalds	d50403dcc5	Merge branch 'fixes-for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds Pull LED fix from Bryan Wu. * 'fixes-for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds: leds: leds-gpio: set devm_gpio_request_one() flags param correctly	2013-01-02 18:12:35 -08:00
Javier Martinez Canillas	2d7c22f67d	leds: leds-gpio: set devm_gpio_request_one() flags param correctly commit `a99d76f` leds: leds-gpio: use gpio_request_one changed the leds-gpio driver to use gpio_request_one() instead of gpio_request() + gpio_direction_output() Unfortunately, it also made a semantic change that breaks the leds-gpio driver. The gpio_request_one() flags parameter was set to: GPIOF_DIR_OUT \| (led_dat->active_low ^ state) Since GPIOF_DIR_OUT is 0, the final flags value will just be the XOR'ed value of led_dat->active_low and state. This value were used to distinguish between HIGH/LOW output initial level and call gpio_direction_output() accordingly. With this new semantic gpio_request_one() will take the flags value of 1 as a configuration of input direction (GPIOF_DIR_IN) and will call gpio_direction_input() instead of gpio_direction_output(). int gpio_request_one(unsigned gpio, unsigned long flags, const char *label) { .. if (flags & GPIOF_DIR_IN) err = gpio_direction_input(gpio); else err = gpio_direction_output(gpio, (flags & GPIOF_INIT_HIGH) ? 1 : 0); .. } The right semantic is to evaluate led_dat->active_low ^ state and set the output initial level explicitly. Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Reported-by: Arnaud Patard <arnaud.patard@rtp-net.org> Tested-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: Bryan Wu <cooloney@gmail.com>	2013-01-02 17:58:41 -08:00
Linus Torvalds	ef05e9b960	Merge git://www.linux-watchdog.org/linux-watchdog Pull watchdog fixes from Wim Van Sebroeck: "This fixes some small errors in the new da9055 driver, eliminates a compiler warning and adds DT support for the twl4030_wdt driver (so that we can have multiple watchdogs with DT on the omap platforms)." * git://www.linux-watchdog.org/linux-watchdog: watchdog: twl4030_wdt: add DT support watchdog: omap_wdt: eliminate unused variable and a compiler warning watchdog: da9055: Don't update wdt_dev->timeout in da9055_wdt_set_timeout error path watchdog: da9055: Fix invalid free of devm_ allocated data	2013-01-02 17:46:14 -08:00
Linus Torvalds	080a62e2ce	PCI updates for v3.8: PCI: Reduce Ricoh 0xe822 SD card reader base clock frequency to 50MHz PCI: Remove spurious error for sriov_numvfs store and simplify flow PCI: Add PCIe Link Capability link speed and width names PCI/PM: Do not suspend port if any subordinate device needs PME polling PCI: Work around Stratus ftServer broken PCIe hierarchy (fix DMI check) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAABAgAGBQJQ5HnVAAoJEPGMOI97Hn6zprgP/1we4RhVaXLRnmLyc9OpS97Z 9KZUU/rsMQ/RYstdNFV30JOypMFL1BlK7jpLR14gjSgCulKK9etvjBTwiV26Rfor n/LWru4CWUtGUH/2c4IwuN0FKxfU7W4GxuVfKi3uACh7yJRwKgxZhFKLLb4OZ/T0 A1CiIktdpZhH5A8+WdoSkZSsfQPuUA6UVKQleEQh/qJl9qgxwEDLsdj5fIZLsFUB Fo3bbusq2X+pHU0uuBIzrheSUeSmxXzeZcte8JxTEEwB/Gdsn24lJ39MK5PHAaOE gSVC7HDi+vNCICZhi7H93musPczL1TqeyMZQWSa/rj7KV836kG+Phz61SmsXTxyR VpfnEZOx7GreErpBuLKrOVslXJl1TBc/ZiiLd5SBUlO4ZClAssPcevtUexCR3xr6 eHoSYMtwblW/vgJ3rn/PD8SgksZVJsd6+JAlVbC53XAdeJuEheCdEU7HnQZ3ZQRF 6wpWOBfIxdSQM4AukncNjUSTQpVjNFoEXNcPBCamazDz9NgRIcrnBAd/94+AVD0t WQpoU0HDP6h00pK8Ls3Fsv23qbfPDPP9i6zhSGlv5Q9Sz5T8b178j8h7tkUgtPy/ vxAtwUgFwz5cxE053lrht0JEQUikv99VcUJKrQc17g6GIMenh4duXrxF1I1EERXD fcVZNas3SrnLSKfsVwws =Ehs+ -----END PGP SIGNATURE----- Merge tag '3.8-pci-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Some fixes for v3.8. They include a fix for the new SR-IOV sysfs management support, an expanded quirk for Ricoh SD card readers, a Stratus DMI quirk fix, and a PME polling fix." * tag '3.8-pci-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Reduce Ricoh 0xe822 SD card reader base clock frequency to 50MHz PCI/PM: Do not suspend port if any subordinate device needs PME polling PCI: Add PCIe Link Capability link speed and width names PCI: Work around Stratus ftServer broken PCIe hierarchy (fix DMI check) PCI: Remove spurious error for sriov_numvfs store and simplify flow	2013-01-02 17:44:29 -08:00
David Howells	8a7eab2b54	UAPI: Strip _UAPI prefix on header install no matter the whitespace Commit `56c176c9ca` ("UAPI: strip the _UAPI prefix from header guards during header installation") strips the _UAPI prefix from header guards, but only if there's a single space between the cpp directive and the label. Make it more flexible and able to handle tabs and multiple white space characters. Signed-off-by: David Howells <dhowell@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-01-02 17:36:10 -08:00
David Howells	3d33fcc11b	UAPI: Remove empty Kbuild files Empty files can get deleted by the patch program, so remove empty Kbuild files and their links from the parent Kbuilds. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-01-02 17:36:10 -08:00
Linus Torvalds	007f6c3a63	Two self-explanatory fixes and a third patch which improves performance. When overwriting a full page in the eCryptfs page cache, skip reading in and decrypting the corresponding lower page. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABCgAGBQJQ5HR+AAoJENaSAD2qAscKg/gQAJSGpz9Frh3QqV30smvbKASI vBcHpbEBMhpExzkcLF3Gqdj7KqcwpN3Nh+oAD1vNyvermeczazEebr5wFfNTv4eE TetUfa2e92RS0c0yxgS+9k1Fhxi8BCovNxmFfiq5iPFHSNwjixPBHLLZVFPCdp9N il/dV8Y7wg1exDikZQc8lqiVULZxvkBc+R/dgXFhAnwFxDMT2jiInXbBU4Onct0P +YX4FwrKnDCOg7bk8Mk/lW6mwAuhoelnuF3dy9v/soBeclOeTfmUmO44dv0D3IPY iGpGofhs+cDSKxOZ0XXocAdFdmY7fbcijppoF00XyZiuqcd59zc0l+LDRuCBcXD7 SFSTzR0uFf8C0rM4Mjfz6WGbwW7Ae0KqLbFIVg03MJDCquOtDBr0Xdpviy1GYNo3 H0Z3400olyGqp/3ZoEjefOoz9DbzqHtzhcMtGBN/ihyaolPJzS81pLTYCsja2SJM pHUjId3abWOVRgtrAk+XUO9Sn6W8Or5bug4+idYwD6LfUILz9OpHin/mplnHoF9F 8lEjhzNHyvU3HQPyR4v/TidExyx7IBeP0tOLk4X2N+fmH45ukl/pPDNfpF/2lxpd mN7HK2H2cYtGrYSwSmwuG0q9W365vmk8mvu2Xz5aIMe9r5SeucgPjzZ3zg+kHgRE OqJljwln6TaSB/7o0MQ5 =JNeQ -----END PGP SIGNATURE----- Merge tag 'ecryptfs-3.8-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull ecryptfs fixes from Tyler Hicks: "Two self-explanatory fixes and a third patch which improves performance: when overwriting a full page in the eCryptfs page cache, skip reading in and decrypting the corresponding lower page." * tag 'ecryptfs-3.8-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: fs/ecryptfs/crypto.c: make ecryptfs_encode_for_filename() static eCryptfs: fix to use list_for_each_entry_safe() when delete items eCryptfs: Avoid unnecessary disk read and data decryption during writing	2013-01-02 17:33:50 -08:00
Linus Torvalds	58890c0669	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "Two of Alex's patches deal with a race when reseting server connections for open RBD images, one demotes some non-fatal BUGs to WARNs, and my patch fixes a protocol feature bit failure path." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: fix protocol feature mismatch failure path libceph: WARN, don't BUG on unexpected connection states libceph: always reset osds when kicking libceph: move linger requests sooner in kick_requests()	2013-01-02 17:32:49 -08:00
Mel Gorman	42288fe366	mm: mempolicy: Convert shared_policy mutex to spinlock Sasha was fuzzing with trinity and reported the following problem: BUG: sleeping function called from invalid context at kernel/mutex.c:269 in_atomic(): 1, irqs_disabled(): 0, pid: 6361, name: trinity-main 2 locks held by trinity-main/6361: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff810aa314>] __do_page_fault+0x1e4/0x4f0 #1: (&(&mm->page_table_lock)->rlock){+.+...}, at: [<ffffffff8122f017>] handle_pte_fault+0x3f7/0x6a0 Pid: 6361, comm: trinity-main Tainted: G W 3.7.0-rc2-next-20121024-sasha-00001-gd95ef01-dirty #74 Call Trace: __might_sleep+0x1c3/0x1e0 mutex_lock_nested+0x29/0x50 mpol_shared_policy_lookup+0x2e/0x90 shmem_get_policy+0x2e/0x30 get_vma_policy+0x5a/0xa0 mpol_misplaced+0x41/0x1d0 handle_pte_fault+0x465/0x6a0 This was triggered by a different version of automatic NUMA balancing but in theory the current version is vunerable to the same problem. do_numa_page -> numa_migrate_prep -> mpol_misplaced -> get_vma_policy -> shmem_get_policy It's very unlikely this will happen as shared pages are not marked pte_numa -- see the page_mapcount() check in change_pte_range() -- but it is possible. To address this, this patch restores sp->lock as originally implemented by Kosaki Motohiro. In the path where get_vma_policy() is called, it should not be calling sp_alloc() so it is not necessary to treat the PTL specially. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-01-02 17:32:13 -08:00
Linus Torvalds	5439ca6b8f	Various bug fixes for ext4. Perhaps the most serious bug fixed is one which could cause file system corruptions when performing file punch operations. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABCAAGBQJQ374OAAoJENNvdpvBGATwEGAP/jKUwjQhBZiF0k9dg1kQ5eTz bdli4fy1vxrEMIOym8IZa4nBQJVCkArwRgjc28gCBD6k9u6X3GPa26vUydsoPfP6 odPdc9c9HtsbYQGuaq1SohID5HfjxHewTcUmCs4X4SpGcSurUcT7eQYWqSuIxFHR 0nKk8NO4EcWh2uqIoGPrc8QpSdor0DXXYYjZmHCeVLH1n6PyoMsnrFMfO9KqMLUL vNR54CX9n1GRTfAfJNkNzcwfs8IfNkDUyv5hFpDh15tLltogU0TqnlAl3vSeZGSx vVfhwHmQTK/bJyC3YaoRZqq9CQJVk2f/OTBpJDFY/USaapuitJd6vqbmh7NiRNAN LaKmFt99MPfwyjEhIA7+J0LCTraAxc536q43oWWK5dAJhWI7DW0lbHARVeQTixNy KJ1Lp0pmmz1mX8/lugOnK1SPBF525kTaoiz2bWqg4oQgn7mBzUlgj+EV22/6Rq83 TpKOKstl4BiZi8t5AhmFiwqtknCDiT5vUKQNy2kuM/oXtPJID/lM/TJbR5viYD3l AH3Ef7xj61CynFZ0oBeraGwtXc2BHJpJdWz+8uj0/VhFfC+uNUYapSLFwyiAVZKO xxaItT3ylfKpa0AWK6HBc2SLuL72SCHAPks06YKFtSyHtr5C8SCcafxU2DSOSi7K VrhkcH6STa77Br7a1ORt =9R/D -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bug fixes from Ted Ts'o: "Various bug fixes for ext4. Perhaps the most serious bug fixed is one which could cause file system corruptions when performing file punch operations." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: avoid hang when mounting non-journal filesystems with orphan list ext4: lock i_mutex when truncating orphan inodes ext4: do not try to write superblock on ro remount w/o journal ext4: include journal blocks in df overhead calcs ext4: remove unaligned AIO warning printk ext4: fix an incorrect comment about i_mutex ext4: fix deadlock in journal_unmap_buffer() ext4: split off ext4_journalled_invalidatepage() jbd2: fix assertion failure in jbd2_journal_flush() ext4: check dioread_nolock on remount ext4: fix extent tree corruption caused by hole punch	2013-01-02 09:57:34 -08:00
Hugh Dickins	a7a88b2373	mempolicy: remove arg from mpol_parse_str, mpol_to_str Remove the unused argument (formerly no_context) from mpol_parse_str() and from mpol_to_str(). Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-01-02 09:27:10 -08:00
Hugh Dickins	f2a07f40db	tmpfs mempolicy: fix /proc/mounts corrupting memory Recently I suggested using "mount -o remount,mpol=local /tmp" in NUMA mempolicy testing. Very nasty. Reading /proc/mounts, /proc/pid/mounts or /proc/pid/mountinfo may then corrupt one bit of kernel memory, often in a page table (causing "Bad swap" or "Bad page map" warning or "Bad pagetable" oops), sometimes in a vm_area_struct or rbnode or somewhere worse. "mpol=prefer" and "mpol=prefer:Node" are equally toxic. Recent NUMA enhancements are not to blame: this dates back to 2.6.35, when commit `e17f74af35` "mempolicy: don't call mpol_set_nodemask() when no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(), which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags. With slab poisoning, you can then rely on mpol_to_str() to set the bit for node 0x6b6b, probably in the next page above the caller's stack. mpol_parse_str() is only called from shmem_parse_options(): no_context is always true, so call it unused for now, and remove !no_context code. Set v.nodes or v.preferred_node or MPOL_F_LOCAL as mpol_to_str() might expect. Then mpol_to_str() can ignore its no_context argument also, the mpol being appropriately initialized whether contextualized or not. Rename its no_context unused too, and let subsequent patch remove them (that's not needed for stable backporting, which would involve rejects). I don't understand why MPOL_LOCAL is described as a pseudo-policy: it's a reasonable policy which suffers from a confusing implementation in terms of MPOL_PREFERRED with MPOL_F_LOCAL. I believe this would be much more robust if MPOL_LOCAL were recognized in switch statements throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly empty) nodes mask like everyone else, instead of its preferred_node variant (I presume an optimization from the days before MPOL_LOCAL). But that would take me too long to get right and fully tested. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-01-02 09:27:10 -08:00
Eric Wong	128dd1759d	epoll: prevent missed events on EPOLL_CTL_MOD EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to ensure events are not missed. Since the modifications to the interest mask are not protected by the same lock as ep_poll_callback, we need to ensure the change is visible to other CPUs calling ep_poll_callback. We also need to ensure f_op->poll() has an up-to-date view of past events which occured before we modified the interest mask. So this barrier also pairs with the barrier in wq_has_sleeper(). This should guarantee either ep_poll_callback or f_op->poll() (or both) will notice the readiness of a recently-ready/modified item. This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in: http://thread.gmane.org/gmane.linux.kernel/1408782/ Signed-off-by: Eric Wong <normalperson@yhbt.net> Cc: Hans Verkuil <hans.verkuil@cisco.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andreas Voellmy <andreas.voellmy@yale.edu> Tested-by: "Junchang(Jason) Wang" <junchang.wang@yale.edu> Cc: netdev@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-01-02 09:16:43 -08:00
Aaro Koskinen	8899b8d93e	watchdog: twl4030_wdt: add DT support Add DT support for twl4030_wdt. This is needed to get twl4030_wdt to probe when booting with DT. Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2013-01-02 12:07:05 +01:00
Aaro Koskinen	412b3729dd	watchdog: omap_wdt: eliminate unused variable and a compiler warning We forgot to delete this in the commit `4f4753d9` (watchdog: omap_wdt: convert to devm_ functions), and as a result the following compilation warning was introduced: drivers/watchdog/omap_wdt.c: In function 'omap_wdt_remove': drivers/watchdog/omap_wdt.c:299:19: warning: unused variable 'res' [-Wunused-variable] Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Reviewed-by: Paul Walmsley <paul@pwsan.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2013-01-02 12:06:58 +01:00
Axel Lin	98e4a29389	watchdog: da9055: Don't update wdt_dev->timeout in da9055_wdt_set_timeout error path Otherwise, WDIOC_GETTIMEOUT returns wrong value if set_timeout fails. This patch also removes unnecessary ret variable in da9055_wdt_ping function. Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2013-01-02 12:06:49 +01:00
Axel Lin	ee8c94adff	watchdog: da9055: Fix invalid free of devm_ allocated data It is not required to free devm_ allocated data. Since kref_put needs a valid release function, da9055_wdt_release_resources() is not deleted. Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2013-01-02 12:06:43 +01:00
Linus Torvalds	4a490b78cb	Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux Pull DRM update from Dave Airlie: "This is a bit larger due to me not bothering to do anything since before Xmas, and other people working too hard after I had clearly given up. It's got the 3 main x86 driver fixes pulls, and a bunch of tegra fixes, doesn't fix the Ironlake bug yet, but that does seem to be getting closer. - radeon: gpu reset fixes and userspace packet support - i915: watermark fixes, workarounds, i830/845 fix, - nouveau: nvd9/kepler microcode fixes, accel is now enabled and working, gk106 support - tegra: misc fixes." * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (34 commits) Revert "drm: tegra: protect DC register access with mutex" drm: tegra: program only one window during modeset drm: tegra: clean out old gem prototypes drm: tegra: remove redundant tegra2_tmds_config entry drm: tegra: protect DC register access with mutex drm: tegra: don't leave clients host1x member uninitialized drm: tegra: fix front_porch <-> back_porch mixup drm/nve0/graph: fix fuc, and enable acceleration on all known chipsets drm/nvc0/graph: fix fuc, and enable acceleration on GF119 drm/nouveau/bios: cache ramcfg strap on later chipsets drm/nouveau/mxm: silence output if no bios data drm/nouveau/bios: parse/display extra version component drm/nouveau/bios: implement opcode 0xa9 drm/nouveau/bios: update gpio parsing apis to match current design drm/nouveau: initial support for GK106 drm/radeon: add WAIT_UNTIL to evergreen VM safe reg list drm/i915: disable shrinker lock stealing for create_mmap_offset drm/i915: optionally disable shrinker lock stealing drm/i915: fix flags in dma buf exporting drm/radeon: add support for MEM_WRITE packet ...	2012-12-30 10:00:37 -08:00
Linus Torvalds	8d91a42e54	ARM: arm-soc: late cleanups for omap From Tony Lindgren: Here are few more patches to finish the omap changes for multiplatform conversion that are not strictly fixes, but were too complex to do with the dependencies during the merge window. Those are to move of serial-omap.h to platform_data, and the removal of remaining cpu_is_omap macro usage outside mach-omap2. Then there are several trivial fixes for typos and few minimal omap2plus_defconfig updates. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJQ3es9AAoJEIwa5zzehBx3bpwP/j4k4mGm4RMLTY9e8KNNxrDW FAeU52h91qDONn3EXcF7oSd5IqbDoQRCQx9vn4XBfSm23uOzvP3hOSDe8iIz9t8C XdLSXJfJDuYKuEBYxFNOVBdBziZoE7Gtp1JP9s79qfGfQurUfnRQ9gg6zoMMoxMg KdVnzFbaCeN/Dgbyq/WToDdOCUv2jeiSNI0gf6oe3Tju0uVjBSmlZUnkbAsRMvOW RL2uwp66yS6uLc/w5cfPI/wcp1jRZDYm2E7JB+TmHWqCYx2qpDE5EuDuW5+cY/DR RTBr1c0fSl3k3o3iNOEijkFwGShivf1PksesMA+Fx9BrFwtrhtYT/dVZbuS3AjVL 1vIn3c+8rp3p63EsS1WxAMBvBq3qgqAl2shgow1B3QlXdErB9HNRErFKjU9rJ1ES fZgtWkzLtCqFaN/WnQAj0nfooPmrmhaDrbHJd7Bt7g8dfgGBHl9nkmQ4tMAa/z4x f/mZ0MxtXwhSUYBDvJ9vvtcNdeo9WmVV5zCfHUiMB7LB4wjxJUf7xMT9AquQUGqj H5EkaSyRjWG35293WDmxAOKlAkMP+AH5cuROKNNwNN5+i7ypJC+jrMDcgwfwQoXl i0yJeJhzFg9WV9buNB7+uWTA9vz7uveHSeaVXbbTVbuZ+apTgWl+MTsua0C2hhbU 81k7fmqW2fCWkiw5eUeZ =RE6Z -----END PGP SIGNATURE----- Merge tag 'omap-late-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull late ARM cleanups for omap from Olof Johansson: "From Tony Lindgren: Here are few more patches to finish the omap changes for multiplatform conversion that are not strictly fixes, but were too complex to do with the dependencies during the merge window. Those are to move of serial-omap.h to platform_data, and the removal of remaining cpu_is_omap macro usage outside mach-omap2. Then there are several trivial fixes for typos and few minimal omap2plus_defconfig updates." * tag 'omap-late-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: arch/arm/mach-omap2/dpll3xxx.c: drop if around WARN_ON OMAP2: Fix a typo - replace regist with register. ARM/omap: use module_platform_driver macro ARM: OMAP2+: PMU: Remove unused header ARM: OMAP4: remove duplicated include from omap_hwmod_44xx_data.c ARM: OMAP2+: omap2plus_defconfig: enable twl4030 SoC audio ARM: OMAP2+: omap2plus_defconfig: Add tps65217 support ARM: OMAP2+: enable devtmpfs and devtmpfs automount ARM: OMAP2+: omap_twl: Change TWL4030_MODULE_PM_RECEIVER to TWL_MODULE_PM_RECEIVER ARM: OMAP2+: Drop plat/cpu.h for omap2plus ARM: OMAP: Split fb.c to remove last remaining cpu_is_omap usage MAINTAINERS: Add an entry for omap related .dts files	2012-12-30 09:59:21 -08:00
Linus Torvalds	4fe2dfabe4	ARM: arm-soc: fixes for -rc2 It's been quiet over the holidays, but we have had a couple of trivial fixes coming in for the newly introduced sunxi platform; one to add it to the multiplatform defconfig for build coverage, and one fixup for device tree strings. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJQ3evMAAoJEIwa5zzehBx3wNUP/jraZ3hA+3YxedAXlrB3BYzr Sfwq2k41cZTiZKK2PT93vU8A1xYE/8f1pK9fp/mO12W/KY9jBAJ2Xf6RI/cHe5aQ sEIJ5jHk7ty5vGJXyxYVVWOIy5ZSxdjwOfP3mxd5rz1pWUv0sLWAlfjrQ/nCkVxt wj6cvT1pkyZqLyWbmkm9E8BumDTmAdlaQbppTeLJ36zBgnYqIuhpxASu/K87XEH8 dj57HEcZ67YKcp0nXYZNlkCzOlQHeUeW+1FbaVfPei1zh2dP/zai4Z+4Q8g/C2ee Rx2b1YJ3Pb/2DquL67n/BwmQDiJreHCzgPX1MjcDEL8U67VtJLWg3XYpat8MjtJR 3wQKMyxmlSY7UuBRM42biM963g2qaO7XokBwc2f+ChaaGqdvJI8/JfnNjO+VGdWX jiEVMdODo4AlJMtyrj6G/24rKx4mrMqaZEtHRr6uwD2mO4Kk2Mgf+BPXkDwDrkMP dkxLgpaDFpIrbub9+vYD9oGtjTNKU/dS8F8FvrxPctDMPYMuJ1Xjw3ckbDKA/pDs VCmhmuXQy3DSzgN22rY58FfLt09+AzUAdG7ioW095nZTL8rT3RcKpnYR+ADI9XJ0 OIBX33IQgXLwkeCIhsaAaXA9Sf+KFVIJXmAFlPvYmSA8wDYKfaX75uc2gz4VNnsr NKb40iwl5ulI/kpEo0jG =9cQy -----END PGP SIGNATURE----- Merge tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "It's been quiet over the holidays, but we have had a couple of trivial fixes coming in for the newly introduced sunxi platform; one to add it to the multiplatform defconfig for build coverage, and one fixup for device tree strings." * tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: sunxi: Change the machine compatible string. ARM: multi_v7_defconfig: Add ARCH_SUNXI	2012-12-30 09:58:36 -08:00
Dave Airlie	d5757dbe79	Revert "drm: tegra: protect DC register access with mutex" This reverts commit `83c0bcb694`. Lucas pointed out this was a mistake, and I missed the discussion, so just revert it out to save a rebase. Signed-off-by: Dave Airlie <airlied@redhat.com>	2012-12-30 21:58:20 +10:00
Lucas Stach	500df2e5d8	drm: tegra: program only one window during modeset The intention is to program exactly WIN_A, not WIN_A and possibly others. Signed-off-by: Lucas Stach <dev@lynxeye.de> Signed-off-by: Dave Airlie <airlied@redhat.com>	2012-12-30 14:01:35 +10:00
Lucas Stach	e39250aa52	drm: tegra: clean out old gem prototypes There is no gem.c anymore, those functions are implemented by the drm_cma_helpers now. Signed-off-by: Lucas Stach <dev@lynxeye.de> Signed-off-by: Dave Airlie <airlied@redhat.com>	2012-12-30 14:01:34 +10:00
Lucas Stach	fa416ddc0a	drm: tegra: remove redundant tegra2_tmds_config entry The 720p and 1080p entries are completely redundant, as we are matching the table entries against <=pclk. Also generalize the comment, as we are using those table entries even when driving other modes than the standard TV ones. Signed-off-by: Lucas Stach <dev@lynxeye.de> Signed-off-by: Dave Airlie <airlied@redhat.com>	2012-12-30 14:01:33 +10:00
Lucas Stach	83c0bcb694	drm: tegra: protect DC register access with mutex Window properties are programmed through a shared aperture and have to happen atomically. Also we do the read-update-write dance on some of the shared regs. To make sure that different functions don't stumble over each other protect the register access with a mutex. Signed-off-by: Lucas Stach <dev@lynxeye.de> Signed-off-by: Dave Airlie <airlied@redhat.com>	2012-12-30 14:01:33 +10:00
Lucas Stach	4026bfb39a	drm: tegra: don't leave clients host1x member uninitialized No real problem for now, as nothing is using this, but leaving it unitialized is asking for trouble later on. Signed-off-by: Lucas Stach <dev@lynxeye.de> Signed-off-by: Dave Airlie <airlied@redhat.com>	2012-12-30 14:01:32 +10:00
Lucas Stach	4049508988	drm: tegra: fix front_porch <-> back_porch mixup Fixes wrong picture offset observed when using HDMI output with a Technisat HD TV. Signed-off-by: Lucas Stach <dev@lynxeye.de> Acked-by: Mark Zhang <markz@nvidia.com> Tested-by: Mark Zhang <markz@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2012-12-30 14:01:31 +10:00
Dave Airlie	8be0e5c427	Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-next Some fixes for 3.8: - Watermark fixups from Chris Wilson (4 pieces). - 2 snb workarounds, seem to be recently added to our internal DB. - workaround for the infamous i830/i845 hang, seems now finally solid! Based on Chris' fix for SNA, now also for UXA/mesa&old SNA. - Some more fixlets for shrinker-pulls-the-rug issues (Chris&me). - Fix dma-buf flags when exporting (you). - Disable the VGA plane if it's enabled on lid open - similar fix in spirit to the one I've sent you last weeek, BIOS' really like to mess with the display when closing the lid (awesome debug work from Krzysztof Mazur). * 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel: drm/i915: disable shrinker lock stealing for create_mmap_offset drm/i915: optionally disable shrinker lock stealing drm/i915: fix flags in dma buf exporting i915: ensure that VGA plane is disabled drm/i915: Preallocate the drm_mm_node prior to manipulating the GTT drm_mm manager drm: Export routines for inserting preallocated nodes into the mm manager drm/i915: don't disable disconnected outputs drm/i915: Implement workaround for broken CS tlb on i830/845 drm/i915: Implement WaSetupGtModeTdRowDispatch drm/i915: Implement WaDisableHiZPlanesWhenMSAAEnabled drm/i915: Prefer CRTC 'active' rather than 'enabled' during WM computations drm/i915: Clear self-refresh watermarks when disabled drm/i915: Double the cursor self-refresh latency on Valleyview drm/i915: Fixup cursor latency used for IVB lp3 watermarks	2012-12-30 13:54:12 +10:00
Dave Airlie	b1d778b970	Merge branch 'drm-fixes-3.8' of git://people.freedesktop.org/~agd5f/linux into drm-next Misc fixes for reset and new packets for userspace usage. * 'drm-fixes-3.8' of git://people.freedesktop.org/~agd5f/linux: drm/radeon: add WAIT_UNTIL to evergreen VM safe reg list drm/radeon: add support for MEM_WRITE packet drm/radeon: restore modeset late in GPU reset path drm/radeon: avoid deadlock in pm path when waiting for fence drm/radeon: don't leave fence blocked process on failed GPU reset	2012-12-30 13:02:48 +10:00
Dave Airlie	344f9067d5	Merge branch 'drm-nouveau-fixes-3.8' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next Fixes the accel support for nvd9 + kepler chipsets, also fixes GK106 support. * 'drm-nouveau-fixes-3.8' of git://anongit.freedesktop.org/git/nouveau/linux-2.6: drm/nve0/graph: fix fuc, and enable acceleration on all known chipsets drm/nvc0/graph: fix fuc, and enable acceleration on GF119 drm/nouveau/bios: cache ramcfg strap on later chipsets drm/nouveau/mxm: silence output if no bios data drm/nouveau/bios: parse/display extra version component drm/nouveau/bios: implement opcode 0xa9 drm/nouveau/bios: update gpio parsing apis to match current design drm/nouveau: initial support for GK106	2012-12-30 13:01:52 +10:00
Zlatko Calusic	ecccd1248d	mm: fix null pointer dereference in wait_iff_congested() An unintended consequence of commit `4ae0a48b5e` ("mm: modify pgdat_balanced() so that it also handles order-0") is that wait_iff_congested() can now be called with NULL 'struct zone *' producing kernel oops like this: BUG: unable to handle kernel NULL pointer dereference IP: [<ffffffff811542d9>] wait_iff_congested+0x59/0x140 This trivial patch fixes it. Reported-by: Zhouping Liu <zliu@redhat.com> Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Zlatko Calusic <zlatko.calusic@iskon.hr> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-28 08:42:39 -08:00
Olof Johansson	2e376799b2	Fixes for the sunxi core to be merged in 3.8-rc2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iEYEABECAAYFAlDbOW4ACgkQGxsu9jQV9naGaQCfQmoDu0hvCzZmSs8yLj3VUY7I S+YAn3Hx8ixoaZLlo/GEgFH7hE2EXQoI =qZXn -----END PGP SIGNATURE----- Merge tag 'sunxi-fixes-for-3.8-rc2' of git://github.com/mripard/linux into fixes From Maxime Ripard: Fixes for the sunxi core to be merged in 3.8-rc2 * tag 'sunxi-fixes-for-3.8-rc2' of git://github.com/mripard/linux: sunxi: Change the machine compatible string. ARM: multi_v7_defconfig: Add ARCH_SUNXI	2012-12-28 08:53:01 +01:00
Sage Weil	0fa6ebc600	libceph: fix protocol feature mismatch failure path We should not set con->state to CLOSED here; that happens in ceph_fault() in the caller, where it first asserts that the state is not yet CLOSED. Avoids a BUG when the features don't match. Since the fail_protocol() has become a trivial wrapper, replace calls to it with direct calls to reset_connection(). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-12-27 20:27:04 -06:00
Alex Elder	122070a2ff	libceph: WARN, don't BUG on unexpected connection states A number of assertions in the ceph messenger are implemented with BUG_ON(), killing the system if connection's state doesn't match what's expected. At this point our state model is (evidently) not well understood enough for these assertions to trigger a BUG(). Convert all BUG_ON(con->state...) calls to be WARN_ON(con->state...) so we learn about these issues without killing the machine. We now recognize that a connection fault can occur due to a socket closure at any time, regardless of the state of the connection. So there is really nothing we can assert about the state of the connection at that point so eliminate that assertion. Reported-by: Ugis <ugis22@gmail.com> Tested-by: Ugis <ugis22@gmail.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-12-27 20:27:04 -06:00
Alex Elder	e6d50f67a6	libceph: always reset osds when kicking When ceph_osdc_handle_map() is called to process a new osd map, kick_requests() is called to ensure all affected requests are updated if necessary to reflect changes in the osd map. This happens in two cases: whenever an incremental map update is processed; and when a full map update (or the last one if there is more than one) gets processed. In the former case, the kick_requests() call is followed immediately by a call to reset_changed_osds() to ensure any connections to osds affected by the map change are reset. But for full map updates this isn't done. Both cases should be doing this osd reset. Rather than duplicating the reset_changed_osds() call, move it into the end of kick_requests(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-12-27 20:27:04 -06:00
Alex Elder	ab60b16d3c	libceph: move linger requests sooner in kick_requests() The kick_requests() function is called by ceph_osdc_handle_map() when an osd map change has been indicated. Its purpose is to re-queue any request whose target osd is different from what it was when it was originally sent. It is structured as two loops, one for incomplete but registered requests, and a second for handling completed linger requests. As a special case, in the first loop if a request marked to linger has not yet completed, it is moved from the request list to the linger list. This is as a quick and dirty way to have the second loop handle sending the request along with all the other linger requests. Because of the way it's done now, however, this quick and dirty solution can result in these incomplete linger requests never getting re-sent as desired. The problem lies in the fact that the second loop only arranges for a linger request to be sent if it appears its target osd has changed. This is the proper handling for completed linger requests (it avoids issuing the same linger request twice to the same osd). But although the linger requests added to the list in the first loop may have been sent, they have not yet completed, so they need to be re-sent regardless of whether their target osd has changed. The first required fix is we need to avoid calling __map_request() on any incomplete linger request. Otherwise the subsequent __map_request() call in the second loop will find the target osd has not changed and will therefore not re-send the request. Second, we need to be sure that a sent but incomplete linger request gets re-sent. If the target osd is the same with the new osd map as it was when the request was originally sent, this won't happen. This can be fixed through careful handling when we move these requests from the request list to the linger list, by unregistering the request before it is registered as a linger request. This works because a side-effect of unregistering the request is to make the request's r_osd pointer be NULL, and that will ensure the second loop actually re-sends the linger request. Processing of such a request is done at that point, so continue with the next one once it's been moved. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-12-27 20:27:04 -06:00

1 2 3 4 5 ...

348149 Commits