In win8 we have a feature that allows for interrupt driven flow management
for host/guest communication. For instance, if the host were blocked because
there was no space available in the ringbuffer, the host could request that the
guest send an interrupt when space becomes available in the ringbuffer (when
the guest drains the ringbuffer).
While this feature was implemented in the guest a while ago, we had not
advertised that the guest supported this feature. This patch advertises
the support to the host.
For pre-win8 hosts, this has no effect since the size of the ringbuffer
control structure has not changed and all changes have been backward
compatible - unused/reserved space has been used to implement this
feature.
In this version of the patch I have cleaned up the commit log based on
feedback from Greg KH.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Disable preemption when sampling current processor ID when preemption
is otherwise possible.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Minimize failures in this function by pre-allocating the buffer
for posting messages. The hypercall for posting the message can fail
for a number of reasons:
1. Transient resource related issues
2. Buffer alignment
3. Buffer cannot span a page boundry
We address issues 2 and 3 by preallocating a per-cpu page for the buffer.
Transient resource related failures are handled by retrying by the callers
of this function.
This patch is based on the investigation
done by Dexuan Cui <decui@microsoft.com>.
I would like to thank Sitsofe Wheeler <sitsofe@yahoo.com>
for reporting the issue and helping in debuggging.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Cc: <stable@vger.kernel.org>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eliminate calls to BUG_ON() in vmbus_close_internal().
We have chosen to potentially leak memory, than crash the guest
in case of failures.
In this version of the patch I have addressed comments from
Dan Carpenter (dan.carpenter@oracle.com).
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eliminate the call to BUG_ON() by waiting for the host to respond. We are
trying to reclaim the ownership of memory that was given to the host and so
we will have to wait until the host responds.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eliminate calls to BUG_ON() by properly handling errors. In cases where
rollback is possible, we will return the appropriate error to have the
calling code decide how to rollback state. In the case where we are
transferring ownership of the guest physical pages to the host,
we will wait for the host to respond.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Posting messages to the host can fail because of transient resource
related failures. Correctly deal with these failures and increase the
number of attempts to post the message before giving up.
In this version of the patch, I have normalized the error code to
Linux error code.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull char / misc driver patches from Greg KH:
"Here's the big driver misc / char pull request for 3.17-rc1.
Lots of things in here, the thunderbolt support for Apple laptops,
some other new drivers, testing fixes, and other good things. All
have been in linux-next for a long time"
* tag 'char-misc-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (119 commits)
misc: bh1780: Introduce the use of devm_kzalloc
Lattice ECP3 FPGA: Correct endianness
drivers/misc/ti-st: Load firmware from ti-connectivity directory.
dt-bindings: extcon: Add support for SM5502 MUIC device
extcon: sm5502: Change internal hardware switch according to cable type
extcon: sm5502: Detect cable state after completing platform booting
extcon: sm5502: Add support new SM5502 extcon device driver
extcon: arizona: Get MICVDD against extcon device
extcon: Remove unnecessary OOM messages
misc: vexpress: Fix sparse non static symbol warnings
mei: drop unused hw dependent fw status functions
misc: bh1770glc: Use managed functions
pcmcia: remove DEFINE_PCI_DEVICE_TABLE usage
misc: remove DEFINE_PCI_DEVICE_TABLE usage
ipack: Replace DEFINE_PCI_DEVICE_TABLE macro use
drivers/char/dsp56k.c: drop check for negativity of unsigned parameter
mei: fix return value on disconnect timeout
mei: don't schedule suspend in pm idle
mei: start disconnect request timer consistently
mei: reset client connection state on timeout
...
We should schedule the 5s "timer work" before starting the data transfer,
otherwise, the data transfer code may finish so fast on another
virtual cpu that when the code(fcopy_write()) trying to cancel the 5s
"timer work" can occasionally fail because the "timer work" may haven't
been scheduled yet and as a result the fcopy process will be aborted
wrongly by fcopy_work_func() in 5s.
Thank Liz Zhang <lizzha@microsoft.com> for the initial investigation
on the bug.
This addresses https://bugzilla.redhat.com/show_bug.cgi?id=1118123
Tested-by: Liz Zhang <lizzha@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This resolves a number of merge issues with changes in this tree and
Linus's tree at the same time.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add code to poll the channel since we process only one message
at a time and the host may not interrupt us. Also increase the
receive buffer size since some KVP messages are close to 8K bytes in size.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Starting with Win8, we have implemented several optimizations to improve the
scalability and performance of the VMBUS transport between the Host and the
Guest. Some of the non-performance critical services cannot leverage these
optimization since they only read and process one message at a time.
Make adjustments to the callback dispatch code to account for the way
non-performance critical drivers handle reading of the channel.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
All its callers depends on the return value of -ENOBUFS to reallocate a
bigger buffer and retry the receiving. So there's no need to call
pr_err() here since it was not a real issue, otherwise syslog will be
flooded by this false warning.
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull networking updates from David Miller:
1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.
2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
Benniston.
3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
Mork.
4) BPF now has a "random" opcode, from Chema Gonzalez.
5) Add more BPF documentation and improve test framework, from Daniel
Borkmann.
6) Support TCP fastopen over ipv6, from Daniel Lee.
7) Add software TSO helper functions and use them to support software
TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia.
8) Support software TSO in fec driver too, from Nimrod Andy.
9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.
10) Handle broadcasts more gracefully over macvlan when there are large
numbers of interfaces configured, from Herbert Xu.
11) Allow more control over fwmark used for non-socket based responses,
from Lorenzo Colitti.
12) Do TCP congestion window limiting based upon measurements, from Neal
Cardwell.
13) Support busy polling in SCTP, from Neal Horman.
14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.
15) Bridge promisc mode handling improvements from Vlad Yasevich.
16) Don't use inetpeer entries to implement ID generation any more, it
performs poorly, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
tcp: fixing TLP's FIN recovery
net: fec: Add software TSO support
net: fec: Add Scatter/gather support
net: fec: Increase buffer descriptor entry number
net: fec: Factorize feature setting
net: fec: Enable IP header hardware checksum
net: fec: Factorize the .xmit transmit function
bridge: fix compile error when compiling without IPv6 support
bridge: fix smatch warning / potential null pointer dereference
via-rhine: fix full-duplex with autoneg disable
bnx2x: Enlarge the dorq threshold for VFs
bnx2x: Check for UNDI in uncommon branch
bnx2x: Fix 1G-baseT link
bnx2x: Fix link for KR with swapped polarity lane
sctp: Fix sk_ack_backlog wrap-around problem
net/core: Add VF link state control policy
net/fsl: xgmac_mdio is dependent on OF_MDIO
net/fsl: Make xgmac_mdio read error message useful
net_sched: drr: warn when qdisc is not work conserving
...
We try to free two pages when only one has been allocated.
Cleanup path is unlikely, so I haven't found any trace that would fit,
but I hope that free_pages_prepare() does catch it.
Cc: stable@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Amos Kong <akong@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current code posts periodic memory pressure status from a dedicated thread.
Under some conditions, especially when we are releasing a lot of memory into
the guest, we may not send timely pressure reports back to the host. Fix this
issue by reporting pressure in all contexts that can be active in this driver.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
pfncount is of type u32 and thus can never be smaller than 0.
Found by the coverity scanner, CID 143213.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently the mapping of the relID to channel is done under the protection of a
single spin lock. Starting with ws2012, each channel is bound to a specific VCPU
in the guest. Use this binding to eliminate the spin lock by setting up
per-cpu state for mapping relId to the channel.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
By ensuring that we set the callback handler to NULL in the channel close
path on the same CPU that the channel is bound to, we can eliminate this lock
acquisition and release in a performance critical path.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Only ws2012r2 hosts support the ability to reconnect to the host on VMBUS. This functionality
is needed by kexec in Linux. To use this functionality we need to negotiate version 3.0 of the
VMBUS protocol.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org> [3.9+]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull char/misc driver patches from Greg KH:
"Here's the big char/misc driver updates for 3.15-rc1.
Lots of various things here, including the new mcb driver subsystem.
All of these have been in linux-next for a while"
* tag 'char-misc-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (118 commits)
extcon: Move OF helper function to extcon core and change function name
extcon: of: Remove unnecessary function call by using the name of device_node
extcon: gpio: Use SIMPLE_DEV_PM_OPS macro
extcon: palmas: Use SIMPLE_DEV_PM_OPS macro
mei: don't use deprecated DEFINE_PCI_DEVICE_TABLE macro
mei: amthif: fix checkpatch error
mei: client.h fix checkpatch errors
mei: use cl_dbg where appropriate
mei: fix Unnecessary space after function pointer name
mei: report consistently copy_from/to_user failures
mei: drop pr_fmt macros
mei: make me hw headers private to me hw.
mei: fix memory leak of pending write cb objects
mei: me: do not reset when less than expected data is received
drivers: mcb: Fix build error discovered by 0-day bot
cs5535-mfgpt: Simplify dependencies
spmi: pm: drop bus-level PM suspend/resume routines
spmi: pmic_arb: make selectable on ARCH_QCOM
Drivers: hv: vmbus: Increase the limit on the number of pfns we can handle
pch_phub: Report error writing MAC back to user
...
Compiling last minute changes without setting the proper config
options is not really clever.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The vmbus/hyperv interrupt handling is another complete trainwreck and
probably the worst of all currently in tree.
If CONFIG_HYPERV=y then the interrupt delivery to the vmbus happens
via the direct HYPERVISOR_CALLBACK_VECTOR. So far so good, but:
The driver requests first a normal device interrupt. The only reason
to do so is to increment the interrupt stats of that device
interrupt. For no reason it also installs a private flow handler.
We have proper accounting mechanisms for direct vectors, but of
course it's too much effort to add that 5 lines of code.
Aside of that the alloc_intr_gate() is not protected against
reallocation which makes module reload impossible.
Solution to the problem is simple to rip out the whole mess and
implement it correctly.
First of all move all that code to arch/x86/kernel/cpu/mshyperv.c and
merily install the HYPERVISOR_CALLBACK_VECTOR with proper reallocation
protection and use the proper direct vector accounting mechanism.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linuxdrivers <devel@linuxdriverproject.org>
Cc: x86 <x86@kernel.org>
Link: http://lkml.kernel.org/r/20140223212739.028307673@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>