... and simplify alloc_balloon_pages() interface by removing redundant
alloc_error from it.
If we happen to enter balloon_up() with balloon_wrk.num_pages = 0 we will enter
infinite 'while (!done)' loop as alloc_balloon_pages() will be always returning
0 and not setting alloc_error. We will also be sending a meaningless message to
the host on every iteration.
The 'alloc_unit == 1 && alloc_error -> num_ballooned == 0' change and
alloc_error elimination requires a special comment. We do alloc_balloon_pages()
with 2 different alloc_unit values and there are 4 different
alloc_balloon_pages() results, let's check them all.
alloc_unit = 512:
1) num_ballooned = 0, alloc_error = 0: we do 'alloc_unit=1' and retry pre- and
post-patch.
2) num_ballooned > 0, alloc_error = 0: we check 'num_ballooned == num_pages'
and act accordingly, pre- and post-patch.
3) num_ballooned > 0, alloc_error > 0: we report this chunk and remain within
the loop, no changes here.
4) num_ballooned = 0, alloc_error > 0: we do 'alloc_unit=1' and retry pre- and
post-patch.
alloc_unit = 1:
1) num_ballooned = 0, alloc_error = 0: this can happen in two cases: when we
passed 'num_pages=0' to alloc_balloon_pages() or when there was no space in
bl_resp to place a single response. The second option is not possible as
bl_resp is of PAGE_SIZE size and single response 'union dm_mem_page_range' is
8 bytes, but the first one is (in theory, I think that Hyper-V host never
places such requests). Pre-patch code loops forever, post-patch code sends
a reply with more_pages = 0 and finishes.
2) num_ballooned > 0, alloc_error = 0: we ran out of space in bl_resp, we
report partial success and remain within the loop, no changes pre- and
post-patch.
3) num_ballooned > 0, alloc_error > 0: pre-patch code finishes, post-patch code
does one more try and if there is no progress (we finish with
'num_ballooned = 0') we finish. So we try a bit harder with this patch.
4) num_ballooned = 0, alloc_error > 0: both pre- and post-patch code enter
'more_pages = 0' branch and finish.
So this patch has two real effects:
1) We reply with an empty response to 'num_pages=0' request.
2) We try a bit harder on alloc_unit=1 allocations (and reply with an empty
tail reply in case we fail).
An empty reply should be supported by host as we were able to send it even with
pre-patch code when we were not able to allocate a single page.
Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 79208c57da ("Drivers: hv: hv_balloon: Make adjustments in computing
the floor") was inacurate as it introduced a jump in our piecewiese linear
'floor' function:
At 2048MB we have:
Left limit:
104 + 2048/8 = 360
Right limit:
256 + 2048/16 = 384 (so the right value is 232)
We now have to make an adjustment at 8192 boundary:
232 + 8192/16 = 744
512 + 8192/32 = 768 (so the right value is 488)
Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently we add memory in 128Mb blocks but the request from host can be
aligned differently. In such case we add a partially backed block and
when this block goes online we skip onlining pages which are not backed
(hv_online_page() callback serves this purpose). When we receive next
request for the same host add region we online pages which were not backed
before with hv_bring_pgs_online(). However, we don't check if the the block
in question was onlined and online this tail unconditionally. This is bad as
we avoid all online_pages() logic: these pages are not accounted, we don't
send notifications (and hv_balloon is not the only receiver of them),...
And, first of all, nobody asked as to online these pages. Solve the issue by
checking if the last previously backed page was onlined and onlining the tail
only in case it was.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It's not necessary any longer, since we can safely run the blocking
message handlers in vmbus_connection.work_queue now.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since the 2 fucntions can safely run in vmbus_connection.work_queue without
hang, we don't need to schedule new work items into the per-channel workqueue.
Actally we can even remove the per-channel workqueue now -- we'll do it
in the next patch.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A work item in vmbus_connection.work_queue can sleep, waiting for a new
host message (usually it is some kind of "completion" message). Currently
the new message will be handled in the same workqueue, but since work items
in the workqueue is serialized, we actually have no chance to handle
the new message if the current work item is sleeping -- as as result, the
current work item will hang forever.
K. Y. has posted the below fix to resolve the issue:
Drivers: hv: vmbus: Perform device register in the per-channel work element
Actually we can simplify the fix by directly running non-blocking message
handlers in the dispatch tasklet (inspired by K. Y.).
This patch is the fundamental change. The following 2 patches will simplify
the message offering and rescind-offering handling a lot.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Don't wait after sending request for offers to the host. This wait is
unnecessary and simply adds 5 seconds to the boot time.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Handle the case when the write to the ringbuffer fails. In this case,
unconditionally signal the host. Since we may have deferred signalling
the host based on the kick_q parameter, signalling the host
unconditionally in this case deals with the issue.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Export the vmbus_sendpacket_pagebuffer_ctl() interface. This export will be
used by the netvsc driver.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a channel has been rescinded, the close operation is a noop.
Restructure the code so we deal with the rescind condition after
we properly cleanup the channel. I would like to thank
Dexuan Cui <decui@microsoft.com> for observing this problem.
The current code leaks memory when the channel is rescinded.
The current char-next branch is broken and this patch fixes
the bug.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The indenting makes it clear that there were curly braces intended here.
Fixes: 2dd37cb815 ('Drivers: hv: vmbus: Handle both rescind and offer messages in the same context')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
HV_CRASH_CTL_CRASH_NOTIFY is a 64 bit number. Depending on the usage context,
the value may be truncated. This patch is in response from the following
email from Wu Fengguang <fengguang.wu@intel.com>:
From: Wu Fengguang <fengguang.wu@intel.com>
Subject: [char-misc:char-misc-testing 25/45] drivers/hv/vmbus_drv.c:67:9: sparse:
constant 0x8000000000000000 is so big it is unsigned long
tree: git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git char-misc-testing
head: b3de8e3719
commit: 96c1d0581d [25/45] Drivers: hv: vmbus: Add support for VMBus panic notifier handler
reproduce:
# apt-get install sparse
git checkout 96c1d0581d
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__
sparse warnings: (new ones prefixed by >>)
drivers/hv/vmbus_drv.c:67:9: sparse: constant 0x8000000000000000 is so big it is unsigned long
...
Signed-off-by: Nick Meier <nmeier@microsoft.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Memory blocks can be onlined in random order. When this order is not natural
some memory pages are not onlined because of the redundant check in
hv_online_page().
Here is a real world scenario:
1) Host tries to hot-add the following (process_hot_add):
pg_start=rg_start=0x48000, pfn_cnt=111616, rg_size=262144
2) This results in adding 4 memory blocks:
[ 109.057866] init_memory_mapping: [mem 0x48000000-0x4fffffff]
[ 114.102698] init_memory_mapping: [mem 0x50000000-0x57ffffff]
[ 119.168039] init_memory_mapping: [mem 0x58000000-0x5fffffff]
[ 124.233053] init_memory_mapping: [mem 0x60000000-0x67ffffff]
The last one is incomplete but we have special has->covered_end_pfn counter to
avoid onlining non-backed frames and hv_bring_pgs_online() function to bring
them online later on.
3) Now we have 4 offline memory blocks: /sys/devices/system/memory/memory9-12
$ for f in /sys/devices/system/memory/memory*/state; do echo $f `cat $f`; done | grep -v onlin
/sys/devices/system/memory/memory10/state offline
/sys/devices/system/memory/memory11/state offline
/sys/devices/system/memory/memory12/state offline
/sys/devices/system/memory/memory9/state offline
4) We bring them online in non-natural order:
$grep MemTotal /proc/meminfo
MemTotal: 966348 kB
$echo online > /sys/devices/system/memory/memory12/state && grep MemTotal /proc/meminfo
MemTotal: 1019596 kB
$echo online > /sys/devices/system/memory/memory11/state && grep MemTotal /proc/meminfo
MemTotal: 1150668 kB
$echo online > /sys/devices/system/memory/memory9/state && grep MemTotal /proc/meminfo
MemTotal: 1150668 kB
As you can see memory9 block gives us zero additional memory. We can also
observe a huge discrepancy between host- and guest-reported memory sizes.
The root cause of the issue is the redundant pg >= covered_start_pfn check (and
covered_start_pfn advancing) in hv_online_page(). When upper memory block in
being onlined before the lower one (memory12 and memory11 in the above case) we
advance the covered_start_pfn pointer and all memory9 pages do not pass the
check. If the assumption that host always gives us requests in sequential order
and pg_start always equals rg_start when the first request for the new HA
region is received (that's the case in my testing) is correct than we can get
rid of covered_start_pfn and pg >= start_pfn check in hv_online_page() is
sufficient.
The current char-next branch is broken and this patch fixes
the bug.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When add_memory() fails the following BUG is observed:
[ 743.646107] hv_balloon: hot_add memory failed error is -17
[ 743.679973]
[ 743.680930] =====================================
[ 743.680930] [ BUG: bad unlock balance detected! ]
[ 743.680930] 3.19.0-rc5_bug1131426+ #552 Not tainted
[ 743.680930] -------------------------------------
[ 743.680930] kworker/0:2/255 is trying to release lock (&dm_device.ha_region_mutex) at:
[ 743.680930] [<ffffffff81aae5fe>] mutex_unlock+0xe/0x10
[ 743.680930] but there are no more locks to release!
This happens as we don't acquire ha_region_mutex and hot_add_req() expects us
to as it does unconditional mutex_unlock(). Acquire the lock on the error path.
The current char-next branch is broken and this patch fixes
the bug.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch is a continuation of the rescind handling cleanup work. We cannot
block in the global message handling work context especially if we are blocking
waiting for the host to wake us up. I would like to thank
Dexuan Cui <decui@microsoft.com> for observing this problem.
The current char-next branch is broken and this patch fixes
the bug.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/hv/vmbus_drv.c:51:5: sparse: symbol 'hyperv_panic_event' was not declared. Should it be static?
drivers/hv/vmbus_drv.c:51:5: sparse: symbol 'hyperv_panic_event' was not declared. Should it be static?
drivers/hv/vmbus_drv.c:51:5: sparse: symbol 'hyperv_panic_event' was not declared. Should it be static?
drivers/hv/vmbus_drv.c:51:5: sparse: symbol 'hyperv_panic_event' was not declared. Should it be static?
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Implement an API that gives additional control on the what VMBUS flags will be
set as well as if the host needs to be signalled. This API will be
useful for clients that want to batch up requests to the host.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Implement an API for sending pagebuffers that gives more control to the client
in terms of setting the vmbus flags as well as deciding when to
notify the host. This will be useful for enabling batch processing.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current algorithm for picking an outgoing channel was not distributing
the load well. Implement a simple round-robin scheme to ensure good
distribution of the outgoing traffic.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hyper-V allows a guest to notify the Hyper-V host that a panic
condition occured. This notification can include up to five 64
bit values. These 64 bit values are written into crash MSRs.
Once the data has been written into the crash MSRs, the host is
then notified by writing into a Crash Control MSR. On the Hyper-V
host, the panic notification data is captured in the Windows Event
log as a 18590 event.
Crash MSRs are defined in appendix H of the Hypervisor Top Level
Functional Specification. At the time of this patch, v4.0 is the
current functional spec. The URL for the v4.0 document is:
http://download.microsoft.com/download/A/B/4/AB43A34E-BDD0-4FA6-BDEF-79EEF16E880B/Hypervisor Top Level Functional Specification v4.0.docx
Signed-off-by: Nick Meier <nmeier@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When host asks us to balloon up we need to be sure we're not committing suicide
by overballooning. Use already existent 'floor' metric as our lowest possible
value for free ram.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When hot-added memory pages are not brought online or when some memory blocks
are sent offline the subsequent ballooning process kills the guest with OOM
killer. This happens as we don't report these pages as neither used nor free
and apparently host algorithm considers them as being unused. Keep track of
all online/offline operations and report all currently offline pages as being
used so host won't try to balloon them out.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When many memory regions are being added and automatically onlined the
following lockup is sometimes observed:
INFO: task udevd:1872 blocked for more than 120 seconds.
...
Call Trace:
[<ffffffff816ec0bc>] schedule_timeout+0x22c/0x350
[<ffffffff816eb98f>] wait_for_common+0x10f/0x160
[<ffffffff81067650>] ? default_wake_function+0x0/0x20
[<ffffffff816eb9fd>] wait_for_completion+0x1d/0x20
[<ffffffff8144cb9c>] hv_memory_notifier+0xdc/0x120
[<ffffffff816f298c>] notifier_call_chain+0x4c/0x70
...
When several memory blocks are going online simultaneously we got several
hv_memory_notifier() trying to acquire the ha_region_mutex. When this mutex is
being held by hot_add_req() all these competing acquire_region_mutex() do
mutex_trylock, fail, and queue themselves into wait_for_completion(..). However
when we do complete() from release_region_mutex() only one of them wakes up.
This could be solved by changing complete() -> complete_all() memory onlining
can be delayed as well, in that case we can still get several
hv_memory_notifier() runners at the same time trying to grab the mutex.
Only one of them will succeed and the others will hang for forever as
complete() is not being called. We don't see this issue often because we have
5sec onlining timeout in hv_mem_hot_add() and usually all udev events arrive
in this time frame.
Get rid of the trylock path, waiting on the mutex is supposed to provide the
required serialization.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently we log messages when either we are not able to map an ID to a
channel or when the channel does not have a callback associated
(in the channel interrupt handling path). These messages don't add
any value, get rid of them.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the offer is rescinded, vmbus_close() can free up the channel;
deinitialize the service before closing the channel.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Properly rollback state in vmbus_pocess_offer() in the failure paths.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Execute both ressind and offer messages in the same work context. This serializes these
operations and naturally addresses the various corner cases.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In response to a rescind message, we need to remove the channel and the
corresponding device. Cleanup this code path by factoring out the code
to remove a channel.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Handle the case when the device may be removed when the device has no driver
attached to it.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
NetworkDirect is a service that supports guest RDMA.
Define the GUID for this service.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Correctly rollback state if the failure occurs after we have handed over
the ownership of the buffer to the host.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
return type of wait_for_completion_timeout is unsigned long not int, this
patch changes the type of t from int to unsigned long.
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
return type of wait_for_completion_timeout is unsigned long not int, this
patch changes the type of t from int to unsigned long.
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
return type of wait_for_completion_timeout is unsigned long not int, this
patch changes the type of t from int to unsigned long.
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Without this patch, the state is put to CHANNEL_OPENING_STATE, and when
the driver is loaded next time, vmbus_open() will fail immediately due to
newchannel->state != CHANNEL_OPEN_STATE.
CC: "K. Y. Srinivasan" <kys@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I got HV_STATUS_INVALID_CONNECTION_ID on Hyper-V 2008 R2 when keeping running
"rmmod hv_netvsc; modprobe hv_netvsc; rmmod hv_utils; modprobe hv_utils"
in a Linux guest. Looks the host has some kind of throttling mechanism if
some kinds of hypercalls are sent too frequently.
Without the patch, the driver can occasionally fail to load.
Also let's retry HV_STATUS_INSUFFICIENT_MEMORY, though we didn't get it
before.
Removed 'case -ENOMEM', since the hypervisor doesn't return this.
CC: "K. Y. Srinivasan" <kys@microsoft.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Before the line vmbus_open() returns, srv->util_cb can be already running
and the variables, like util_fw_version, are needed by the srv->util_cb.
So we have to make sure the variables are initialized before the vmbus_open().
CC: "K. Y. Srinivasan" <kys@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Newly introduced clockevent devices made it impossible to unload hv_vmbus
module as clockevents_config_and_register() takes additional reverence to
the module. To make it possible again we do the following:
- avoid setting dev->owner for clockevent devices;
- implement hv_synic_clockevents_cleanup() doing clockevents_unbind_device();
- call it from vmbus_exit().
In theory hv_synic_clockevents_cleanup() can be merged with hv_synic_cleanup(),
however, we call hv_synic_cleanup() from smp_call_function_single() and this
doesn't work for clockevents_unbind_device() as it does such call on its own. I
opted for a separate function.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
SynIC has to be switched off when we unload the module, otherwise registered
memory pages can get corrupted after (as Hyper-V host still writes there) and
we see the following crashes for random processes:
[ 89.116774] BUG: Bad page map in process sh pte:4989c716 pmd:36f81067
[ 89.159454] addr:0000000000437000 vm_flags:00000875 anon_vma: (null) mapping:ffff88007bba55a0 index:37
[ 89.226146] vma->vm_ops->fault: filemap_fault+0x0/0x410
[ 89.257776] vma->vm_file->f_op->mmap: generic_file_mmap+0x0/0x60
[ 89.297570] CPU: 0 PID: 215 Comm: sh Tainted: G B 3.19.0-rc5_bug923184+ #488
[ 89.353738] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 05/23/2012
[ 89.409138] 0000000000000000 000000004e083d7b ffff880036e9fa18 ffffffff81a68d31
[ 89.468724] 0000000000000000 0000000000437000 ffff880036e9fa68 ffffffff811a1e3a
[ 89.519233] 000000004989c716 0000000000000037 ffffea0001edc340 0000000000437000
[ 89.575751] Call Trace:
[ 89.591060] [<ffffffff81a68d31>] dump_stack+0x45/0x57
[ 89.625164] [<ffffffff811a1e3a>] print_bad_pte+0x1aa/0x250
[ 89.667234] [<ffffffff811a2c95>] vm_normal_page+0x55/0xa0
[ 89.703818] [<ffffffff811a3105>] unmap_page_range+0x425/0x8a0
[ 89.737982] [<ffffffff811a3601>] unmap_single_vma+0x81/0xf0
[ 89.780385] [<ffffffff81184320>] ? lru_deactivate_fn+0x190/0x190
[ 89.820130] [<ffffffff811a4131>] unmap_vmas+0x51/0xa0
[ 89.860168] [<ffffffff811ad12c>] exit_mmap+0xac/0x1a0
[ 89.890588] [<ffffffff810763c3>] mmput+0x63/0x100
[ 89.919205] [<ffffffff811eba48>] flush_old_exec+0x3f8/0x8b0
[ 89.962135] [<ffffffff8123b5bb>] load_elf_binary+0x32b/0x1260
[ 89.998581] [<ffffffff811a14f2>] ? get_user_pages+0x52/0x60
hv_synic_cleanup() function exists but noone calls it now. Do the following:
- call hv_synic_cleanup() on each cpu from vmbus_exit();
- write global disable bit through MSR;
- use hv_synic_free_cpu() to avoid memory leask and code duplication.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We need to destroy hv_vmbus_con on module shutdown, otherwise the following
crash is sometimes observed:
[ 76.569845] hv_vmbus: Hyper-V Host Build:9600-6.3-17-0.17039; Vmbus version:3.0
[ 82.598859] BUG: unable to handle kernel paging request at ffffffffa0003480
[ 82.599287] IP: [<ffffffffa0003480>] 0xffffffffa0003480
[ 82.599287] PGD 1f34067 PUD 1f35063 PMD 3f72d067 PTE 0
[ 82.599287] Oops: 0010 [#1] SMP
[ 82.599287] Modules linked in: [last unloaded: hv_vmbus]
[ 82.599287] CPU: 0 PID: 26 Comm: kworker/0:1 Not tainted 3.19.0-rc5_bug923184+ #488
[ 82.599287] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012
[ 82.599287] Workqueue: hv_vmbus_con 0xffffffffa0003480
[ 82.599287] task: ffff88007b6ddfa0 ti: ffff88007f8f8000 task.ti: ffff88007f8f8000
[ 82.599287] RIP: 0010:[<ffffffffa0003480>] [<ffffffffa0003480>] 0xffffffffa0003480
[ 82.599287] RSP: 0018:ffff88007f8fbe00 EFLAGS: 00010202
...
To avoid memory leaks we need to free monitor_pages and int_page for
vmbus_connection. Implement vmbus_disconnect() function by separating cleanup
path from vmbus_connect().
As we use hv_vmbus_con to release channels (see free_channel() in channel_mgmt.c)
we need to make sure the work was done before we remove the queue, do that with
drain_workqueue(). We also need to avoid handling messages which can (potentially)
create new channels, so set vmbus_connection.conn_state = DISCONNECTED at the very
beginning of vmbus_exit() and check for that in vmbus_onmessage_work().
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On driver shutdown device_obj is being freed twice:
1) In vmbus_free_channels()
2) vmbus_device_release() (which is being triggered by device_unregister() in
vmbus_device_unregister().
This double kfree leads to the following sporadic crash on driver unload:
[ 23.469876] general protection fault: 0000 [#1] SMP
[ 23.470036] Modules linked in: hv_vmbus(-)
[ 23.470036] CPU: 2 PID: 213 Comm: rmmod Not tainted 3.19.0-rc5_bug923184+ #488
[ 23.470036] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 05/23/2012
[ 23.470036] task: ffff880036ef1cb0 ti: ffff880036ce8000 task.ti: ffff880036ce8000
[ 23.470036] RIP: 0010:[<ffffffff811d2e1b>] [<ffffffff811d2e1b>] __kmalloc_node_track_caller+0xdb/0x1e0
[ 23.470036] RSP: 0018:ffff880036cebcc8 EFLAGS: 00010246
...
When this crash does not happen on driver unload the similar one is expected if
we try to load hv_vmbus again.
Remove kfree from vmbus_free_channels() as freeing it from
vmbus_device_release() seems right.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
All channel work queues are named 'hv_vmbus_ctl', this makes them
indistinguishable in ps output and makes it hard to link to the corresponding
vmbus device. Rename them to hv_vmbus_ctl/N and make vmbus device names match,
e.g. now vmbus_1 device is served by hv_vmbus_ctl/1 work queue.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When an SMP Hyper-V guest is running on top of 2012R2 Server and secondary
cpus are sent offline (with echo 0 > /sys/devices/system/cpu/cpu$cpu/online)
the system freeze is observed. This happens due to the fact that on newer
hypervisors (Win8, WS2012R2, ...) vmbus channel handlers are distributed
across all cpus (see init_vp_index() function in drivers/hv/channel_mgmt.c)
and on cpu offlining nobody reassigns them to CPU0. Prevent cpu offlining
when vmbus is loaded until the issue is fixed host-side.
This patch also disables hibernation but it is OK as it is also broken (MCE
error is hit on resume). Suspend still works.
Tested with WS2008R2 and WS2012R2.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Here's the big char/misc driver update for 3.20-rc1.
Lots of little things in here, all described in the changelog. Nothing
major or unusual, except maybe the binder selinux stuff, which was all
acked by the proper selinux people and they thought it best to come
through this tree.
All of this has been in linux-next with no reported issues for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlTgs80ACgkQMUfUDdst+yn86gCeMLbxANGExVLd+PR46GNsAUQb
SJ4AmgIqrkIz+5LCwZWM02ldbYhPeBVf
=lfmM
-----END PGP SIGNATURE-----
Merge tag 'char-misc-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc patches from Greg KH:
"Here's the big char/misc driver update for 3.20-rc1.
Lots of little things in here, all described in the changelog.
Nothing major or unusual, except maybe the binder selinux stuff, which
was all acked by the proper selinux people and they thought it best to
come through this tree.
All of this has been in linux-next with no reported issues for a while"
* tag 'char-misc-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (90 commits)
coresight: fix function etm_writel_cp14() parameter order
coresight-etm: remove check for unknown Kconfig macro
coresight: fixing CPU hwid lookup in device tree
coresight: remove the unnecessary function coresight_is_bit_set()
coresight: fix the debug AMBA bus name
coresight: remove the extra spaces
coresight: fix the link between orphan connection and newly added device
coresight: remove the unnecessary replicator property
coresight: fix the replicator subtype value
pdfdocs: Fix 'make pdfdocs' failure for 'uio-howto.tmpl'
mcb: Fix error path of mcb_pci_probe
virtio/console: verify device has config space
ti-st: clean up data types (fix harmless memory corruption)
mei: me: release hw from reset only during the reset flow
mei: mask interrupt set bit on clean reset bit
extcon: max77693: Constify struct regmap_config
extcon: adc-jack: Release IIO channel on driver remove
extcon: Remove duplicated include from extcon-class.c
Drivers: hv: vmbus: hv_process_timer_expiration() can be static
Drivers: hv: vmbus: serialize Offer and Rescind offer
...
struct acpi_resource_address and struct acpi_resource_extended_address64 share substracts
just at different offsets. To unify the parsing functions, OSPMs like Linux
need a new ACPI_ADDRESS64_ATTRIBUTE as their substructs, so they can
extract the shared data.
This patch also synchronizes the structure changes to the Linux kernel.
The usages are searched by matching the following keywords:
1. acpi_resource_address
2. acpi_resource_extended_address
3. ACPI_RESOURCE_TYPE_ADDRESS
4. ACPI_RESOURCE_TYPE_EXTENDED_ADDRESS
And we found and fixed the usages in the following files:
arch/ia64/kernel/acpi-ext.c
arch/ia64/pci/pci.c
arch/x86/pci/acpi.c
arch/x86/pci/mmconfig-shared.c
drivers/xen/xen-acpi-memhotplug.c
drivers/acpi/acpi_memhotplug.c
drivers/acpi/pci_root.c
drivers/acpi/resource.c
drivers/char/hpet.c
drivers/pnp/pnpacpi/rsparser.c
drivers/hv/vmbus_drv.c
Build tests are passed with defconfig/allnoconfig/allyesconfig and
defconfig+CONFIG_ACPI=n.
Original-by: Thomas Gleixner <tglx@linutronix.de>
Original-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
drivers/hv/vmbus_drv.c:582:6: sparse: symbol 'hv_process_timer_expiration' was not declared. Should it be static?
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 4b2f9abea5 ("staging: hv: convert channel_mgmt.c to not call
osd_schedule_callback")' was written under an assumption that we never receive
Rescind offer while we're still processing the initial Offer request. However,
the issue we fixed in 04a258c162 could be caused by this assumption not
always being true.
In particular, we need to protect against the following:
1) Receiving a Rescind offer after we do queue_work() for processing an Offer
request and before we actually enter vmbus_process_offer(). work.func points
to vmbus_process_offer() at this moment and in vmbus_onoffer_rescind() we do
another queue_work() without a check so we'll enter vmbus_process_offer()
twice.
2) Receiving a Rescind offer after we enter vmbus_process_offer() and
especially after we set >state = CHANNEL_OPEN_STATE. Many things can go
wrong in that case, e.g. we can call free_channel() while we're still using
it.
Implement the required protection by changing work->func at the very end of
vmbus_process_offer() and checking work->func in vmbus_onoffer_rescind(). In
case we receive rescind offer during or before vmbus_process_offer() is done
we set rescind flag to true and we check it at the end of vmbus_process_offer()
so such offer will not get lost.
Suggested-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sc_lock spinlock in struct vmbus_channel is being used to not only protect the
sc_list field, e.g. vmbus_open() function uses it to implement test-and-set
access to the state field. Rename it to the more generic 'lock' and add the
description.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
vmbus_device_create() result is not being checked in vmbus_process_offer() and
it can fail if kzalloc() fails. Add the check and do minor cleanup to avoid
additional duplication of "free_channel(); return;" block.
Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the case the user-space daemon crashes, hangs or is killed, we
need to down the semaphore, otherwise, after the daemon starts next
time, the obsolete data in fcopy_transaction.message or
fcopy_transaction.fcopy_msg will be used immediately.
Cc: Jason Wang <jasowang@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>