Commit Graph

243 Commits

Author SHA1 Message Date
Vitaly Kuznetsov 797f88c987 Drivers: hv: hv_balloon: correctly handle num_pages>INT_MAX case
balloon_wrk.num_pages is __u32 and it comes from host in struct dm_balloon
where it is also __u32. We, however, use 'int' in balloon_up() and in case
we happen to receive num_pages>INT_MAX request we'll end up allocating zero
pages as 'num_pages < alloc_unit' check in alloc_balloon_pages() will pass.
Change num_pages type to unsigned int.

In real life ballooning request come with num_pages in [512, 32768] range so
this is more a future-proof/cleanup.

Reported-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-03 16:20:12 +02:00
Vitaly Kuznetsov ba0c444153 Drivers: hv: hv_balloon: correctly handle val.freeram<num_pages case
'Drivers: hv: hv_balloon: refuse to balloon below the floor' fix does not
correctly handle the case when val.freeram < num_pages as val.freeram is
__kernel_ulong_t and the 'val.freeram - num_pages' value will be a huge
positive value instead of being negative.

Usually host doesn't ask us to balloon more than val.freeram but in case
he have a memory hog started after we post the last pressure report we
can get into troubles.

Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-03 16:20:12 +02:00
Haiyang Zhang e1c0d82dab hv_vmbus: Add gradually increased delay for retries in vmbus_post_msg()
Most of the retries can be done within a millisecond successfully, so we
sleep 1ms before the first retry, then gradually increase the retry
interval to 2^n with max value of 2048ms. Doing so, we will have shorter
overall delay time, because most of the cases succeed within 1-2 attempts.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-03 16:18:02 +02:00
Vitaly Kuznetsov 0a1a86ac04 Drivers: hv: hv_balloon: survive ballooning request with num_pages=0
... and simplify alloc_balloon_pages() interface by removing redundant
alloc_error from it.

If we happen to enter balloon_up() with balloon_wrk.num_pages = 0 we will enter
infinite 'while (!done)' loop as alloc_balloon_pages() will be always returning
0 and not setting alloc_error. We will also be sending a meaningless message to
the host on every iteration.

The 'alloc_unit == 1 && alloc_error -> num_ballooned == 0' change and
alloc_error elimination requires a special comment. We do alloc_balloon_pages()
with 2 different alloc_unit values and there are 4 different
alloc_balloon_pages() results, let's check them all.

alloc_unit = 512:
1) num_ballooned = 0, alloc_error = 0: we do 'alloc_unit=1' and retry pre- and
  post-patch.
2) num_ballooned > 0, alloc_error = 0: we check 'num_ballooned == num_pages'
  and act accordingly,  pre- and post-patch.
3) num_ballooned > 0, alloc_error > 0: we report this chunk and remain within
  the loop, no changes here.
4) num_ballooned = 0, alloc_error > 0: we do 'alloc_unit=1' and retry pre- and
  post-patch.

alloc_unit = 1:
1) num_ballooned = 0, alloc_error = 0: this can happen in two cases: when we
  passed 'num_pages=0' to alloc_balloon_pages() or when there was no space in
  bl_resp to place a single response. The second option is not possible as
  bl_resp is of PAGE_SIZE size and single response 'union dm_mem_page_range' is
  8 bytes, but the first one is (in theory, I think that Hyper-V host never
  places such requests). Pre-patch code loops forever, post-patch code sends
  a reply with more_pages = 0 and finishes.
2) num_ballooned > 0, alloc_error = 0: we ran out of space in bl_resp, we
  report partial success and remain within the loop, no changes pre- and
  post-patch.
3) num_ballooned > 0, alloc_error > 0: pre-patch code finishes, post-patch code
  does one more try and if there is no progress (we finish with
  'num_ballooned = 0') we finish. So we try a bit harder with this patch.
4) num_ballooned = 0, alloc_error > 0: both pre- and post-patch code enter
 'more_pages = 0' branch and finish.

So this patch has two real effects:
1) We reply with an empty response to 'num_pages=0' request.
2) We try a bit harder on alloc_unit=1 allocations (and reply with an empty
   tail reply in case we fail).

An empty reply should be supported by host as we were able to send it even with
pre-patch code when we were not able to allocate a single page.

Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-03 16:18:02 +02:00
Vitaly Kuznetsov 7fb0e1a650 Drivers: hv: hv_balloon: eliminate jumps in piecewiese linear floor function
Commit 79208c57da ("Drivers: hv: hv_balloon: Make adjustments in computing
the floor") was inacurate as it introduced a jump in our piecewiese linear
'floor' function:

At 2048MB we have:
Left limit:
104 + 2048/8 = 360
Right limit:
256 + 2048/16 = 384 (so the right value is 232)

We now have to make an adjustment at 8192 boundary:
232 + 8192/16 = 744
512 + 8192/32 = 768 (so the right value is 488)

Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-03 16:18:02 +02:00
Vitaly Kuznetsov d6cbd2c3a3 Drivers: hv: hv_balloon: do not online pages in offline blocks
Currently we add memory in 128Mb blocks but the request from host can be
aligned differently. In such case we add a partially backed block and
when this block goes online we skip onlining pages which are not backed
(hv_online_page() callback serves this purpose). When we receive next
request for the same host add region we online pages which were not backed
before with hv_bring_pgs_online(). However, we don't check if the the block
in question was onlined and online this tail unconditionally. This is bad as
we avoid all online_pages() logic: these pages are not accounted, we don't
send notifications (and hv_balloon is not the only receiver of them),...
And, first of all, nobody asked as to online these pages. Solve the issue by
checking if the last previously backed page was onlined and onlining the tail
only in case it was.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-03 16:18:02 +02:00
Dexuan Cui aadc3780f3 hv: remove the per-channel workqueue
It's not necessary any longer, since we can safely run the blocking
message handlers in vmbus_connection.work_queue now.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-03 16:18:02 +02:00
Dexuan Cui d43e2fe7da hv: don't schedule new works in vmbus_onoffer()/vmbus_onoffer_rescind()
Since the 2 fucntions can safely run in vmbus_connection.work_queue without
hang, we don't need to schedule new work items into the per-channel workqueue.

Actally we can even remove the per-channel workqueue now -- we'll do it
in the next patch.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-03 16:18:02 +02:00
Dexuan Cui 652594c7df hv: run non-blocking message handlers in the dispatch tasklet
A work item in vmbus_connection.work_queue can sleep, waiting for a new
host message (usually it is some kind of "completion" message). Currently
the new message will be handled in the same workqueue, but since work items
in the workqueue is serialized, we actually have no chance to handle
the new message if the current work item is sleeping -- as as result, the
current work item will hang forever.

K. Y. has posted the below fix to resolve the issue:
Drivers: hv: vmbus: Perform device register in the per-channel work element

Actually we can simplify the fix by directly running non-blocking message
handlers in the dispatch tasklet (inspired by K. Y.).

This patch is the fundamental change. The following 2 patches will simplify
the message offering and rescind-offering handling a lot.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-03 16:18:01 +02:00
K. Y. Srinivasan 73cffdb65e Drivers: hv: vmbus: Don't wait after requesting offers
Don't wait after sending request for offers to the host. This wait is
unnecessary and simply adds 5 seconds to the boot time.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-26 23:51:14 +01:00
K. Y. Srinivasan 5f5cc81733 Drivers: hv: vmbus: Fix a siganlling host signalling issue
Handle the case when the write to the ringbuffer fails. In this case,
unconditionally signal the host. Since we may have deferred signalling
the host based on the kick_q parameter, signalling the host
unconditionally in this case deals with the issue.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25 11:53:55 +01:00
K. Y. Srinivasan b3a19b36ad Drivers: hv: vmbus: Export the vmbus_sendpacket_pagebuffer_ctl()
Export the vmbus_sendpacket_pagebuffer_ctl() interface. This export will be
used by the netvsc driver.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25 11:53:55 +01:00
K. Y. Srinivasan f2eddbc9f1 Drivers: hv: vmbus: Fix a bug in rescind processing in vmbus_close_internal()
When a channel has been rescinded, the close operation is a noop.
Restructure the code so we deal with the rescind condition after
we properly cleanup the channel. I would like to thank
Dexuan Cui <decui@microsoft.com> for observing this problem.
The current code leaks memory when the channel is rescinded.

The current char-next branch is broken and this patch fixes
the bug.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25 11:53:54 +01:00
Dan Carpenter 177757423f hv: vmbus: missing curly braces in vmbus_process_offer()
The indenting makes it clear that there were curly braces intended here.

Fixes: 2dd37cb815 ('Drivers: hv: vmbus: Handle both rescind and offer messages in the same context')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25 11:53:54 +01:00
Nick Meier 5ef5b6927f Drivers: hv: vmbus: Correcting truncation error for constant HV_CRASH_CTL_CRASH_NOTIFY
HV_CRASH_CTL_CRASH_NOTIFY is a 64 bit number.  Depending on the usage context,
the value may be truncated. This patch is in response from the following
email from Wu Fengguang <fengguang.wu@intel.com>:

    From: Wu Fengguang <fengguang.wu@intel.com>
    Subject:  [char-misc:char-misc-testing 25/45] drivers/hv/vmbus_drv.c:67:9: sparse:
              constant 0x8000000000000000 is so big it is unsigned long

    tree:   git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git char-misc-testing
    head:   b3de8e3719
    commit: 96c1d0581d [25/45] Drivers: hv: vmbus: Add support for VMBus panic notifier handler
    reproduce:
      # apt-get install sparse
      git checkout 96c1d0581d
      make ARCH=x86_64 allmodconfig
      make C=1 CF=-D__CHECK_ENDIAN__

    sparse warnings: (new ones prefixed by >>)

    drivers/hv/vmbus_drv.c:67:9: sparse: constant 0x8000000000000000 is so big it is unsigned long
    ...

Signed-off-by: Nick Meier <nmeier@microsoft.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25 11:53:54 +01:00
Vitaly Kuznetsov 5abbbb75d7 Drivers: hv: hv_balloon: don't lose memory when onlining order is not natural
Memory blocks can be onlined in random order. When this order is not natural
some memory pages are not onlined because of the redundant check in
hv_online_page().

Here is a real world scenario:
1) Host tries to hot-add the following (process_hot_add):
  pg_start=rg_start=0x48000, pfn_cnt=111616, rg_size=262144

2) This results in adding 4 memory blocks:
[  109.057866] init_memory_mapping: [mem 0x48000000-0x4fffffff]
[  114.102698] init_memory_mapping: [mem 0x50000000-0x57ffffff]
[  119.168039] init_memory_mapping: [mem 0x58000000-0x5fffffff]
[  124.233053] init_memory_mapping: [mem 0x60000000-0x67ffffff]
The last one is incomplete but we have special has->covered_end_pfn counter to
avoid onlining non-backed frames and hv_bring_pgs_online() function to bring
them online later on.

3) Now we have 4 offline memory blocks: /sys/devices/system/memory/memory9-12
$ for f in /sys/devices/system/memory/memory*/state; do echo $f `cat $f`; done | grep -v onlin
/sys/devices/system/memory/memory10/state offline
/sys/devices/system/memory/memory11/state offline
/sys/devices/system/memory/memory12/state offline
/sys/devices/system/memory/memory9/state offline

4) We bring them online in non-natural order:
$grep MemTotal /proc/meminfo
MemTotal:         966348 kB
$echo online > /sys/devices/system/memory/memory12/state && grep MemTotal /proc/meminfo
MemTotal:        1019596 kB
$echo online > /sys/devices/system/memory/memory11/state && grep MemTotal /proc/meminfo
MemTotal:        1150668 kB
$echo online > /sys/devices/system/memory/memory9/state && grep MemTotal /proc/meminfo
MemTotal:        1150668 kB

As you can see memory9 block gives us zero additional memory. We can also
observe a huge discrepancy between host- and guest-reported memory sizes.

The root cause of the issue is the redundant pg >= covered_start_pfn check (and
covered_start_pfn advancing) in hv_online_page(). When upper memory block in
being onlined before the lower one (memory12 and memory11 in the above case) we
advance the covered_start_pfn pointer and all memory9 pages do not pass the
check. If the assumption that host always gives us requests in sequential order
and pg_start always equals rg_start when the first request for the new HA
region is received (that's the case in my testing) is correct than we can get
rid of covered_start_pfn and pg >= start_pfn check in hv_online_page() is
sufficient.

The current char-next branch is broken and this patch fixes
the bug.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25 11:53:54 +01:00
Vitaly Kuznetsov f3f6eb8087 Drivers: hv: hv_balloon: keep locks balanced on add_memory() failure
When add_memory() fails the following BUG is observed:
[  743.646107] hv_balloon: hot_add memory failed error is -17
[  743.679973]
[  743.680930] =====================================
[  743.680930] [ BUG: bad unlock balance detected! ]
[  743.680930] 3.19.0-rc5_bug1131426+ #552 Not tainted
[  743.680930] -------------------------------------
[  743.680930] kworker/0:2/255 is trying to release lock (&dm_device.ha_region_mutex) at:
[  743.680930] [<ffffffff81aae5fe>] mutex_unlock+0xe/0x10
[  743.680930] but there are no more locks to release!

This happens as we don't acquire ha_region_mutex and hot_add_req() expects us
to as it does unconditional mutex_unlock(). Acquire the lock on the error path.

The current char-next branch is broken and this patch fixes
the bug.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25 11:53:54 +01:00
K. Y. Srinivasan fde25d25db Drivers: hv: vmbus: Perform device register in the per-channel work element
This patch is a continuation of the rescind handling cleanup work. We cannot
block in the global message handling work context especially if we are blocking
waiting for the host to wake us up. I would like to thank
Dexuan Cui <decui@microsoft.com> for observing this problem.

The current char-next branch is broken and this patch fixes
the bug.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25 11:53:53 +01:00
kbuild test robot ed99d846f1 mei: bus: () can be static
drivers/hv/vmbus_drv.c:51:5: sparse: symbol 'hyperv_panic_event' was not declared. Should it be static?
drivers/hv/vmbus_drv.c:51:5: sparse: symbol 'hyperv_panic_event' was not declared. Should it be static?
drivers/hv/vmbus_drv.c:51:5: sparse: symbol 'hyperv_panic_event' was not declared. Should it be static?
drivers/hv/vmbus_drv.c:51:5: sparse: symbol 'hyperv_panic_event' was not declared. Should it be static?

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01 21:43:37 -08:00
K. Y. Srinivasan e9395e3f89 Drivers: hv: vmbus: Suport an API to send packet with additional control
Implement an API that gives additional control on the what VMBUS flags will be
set as well as if the host needs to be signalled. This API will be
useful for clients that want to batch up requests to the host.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01 19:31:47 -08:00
K. Y. Srinivasan 87e93d6170 Drivers: hv: vmbus: Suport an API to send pagebuffers with additional control
Implement an API for sending pagebuffers that gives more control to the client
in terms of setting the vmbus flags as well as deciding when to
notify the host. This will be useful for enabling batch processing.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01 19:31:47 -08:00
K. Y. Srinivasan a13e8bbe85 Drivers: hv: vmbus: Use a round-robin algorithm for picking the outgoing channel
The current algorithm for picking an outgoing channel was not distributing
the load well. Implement a simple round-robin scheme to ensure good
distribution of the outgoing traffic.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01 19:31:47 -08:00
Nick Meier 96c1d0581d Drivers: hv: vmbus: Add support for VMBus panic notifier handler
Hyper-V allows a guest to notify the Hyper-V host that a panic
condition occured.  This notification can include up to five 64
bit values.  These 64 bit values are written into crash MSRs.
Once the data has been written into the crash MSRs, the host is
then notified by writing into a Crash Control MSR.  On the Hyper-V
host, the panic notification data is captured in the Windows Event
log as a 18590 event.

Crash MSRs are defined in appendix H of the Hypervisor Top Level
Functional Specification.  At the time of this patch, v4.0 is the
current functional spec.  The URL for the v4.0 document is:

http://download.microsoft.com/download/A/B/4/AB43A34E-BDD0-4FA6-BDEF-79EEF16E880B/Hypervisor Top Level Functional Specification v4.0.docx

Signed-off-by: Nick Meier <nmeier@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01 19:31:47 -08:00
Vitaly Kuznetsov 530d15b907 Drivers: hv: hv_balloon: refuse to balloon below the floor
When host asks us to balloon up we need to be sure we're not committing suicide
by overballooning. Use already existent 'floor' metric as our lowest possible
value for free ram.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01 19:31:47 -08:00
Vitaly Kuznetsov 549fd280b1 Drivers: hv: hv_balloon: report offline pages as being used
When hot-added memory pages are not brought online or when some memory blocks
are sent offline the subsequent ballooning process kills the guest with OOM
killer. This happens as we don't report these pages as neither used nor free
and apparently host algorithm considers them as being unused. Keep track of
all online/offline operations and report all currently offline pages as being
used so host won't try to balloon them out.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01 19:31:47 -08:00