Commit Graph

85 Commits

Author SHA1 Message Date
Asias He
22fa90c7fb vhost: Remove custom vhost rcu usage
Now, vq->private_data is always accessed under vq mutex. No need to play
the vhost rcu trick.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-07-11 15:38:40 +03:00
Asias He
2e26af79b7 vhost-net: Always access vq->private_data under vq mutex
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-07-11 15:36:45 +03:00
Asias He
0a1febf7ba vhost: Make local function static
$ make C=1 M=drivers/vhost

drivers/vhost/net.c:168:5: warning: symbol 'vhost_net_set_ubuf_info' was not declared. Should it be static?
drivers/vhost/net.c:194:6: warning: symbol 'vhost_net_vq_reset' was not declared. Should it be static?
drivers/vhost/scsi.c:219:6: warning: symbol 'tcm_vhost_done_inflight' was not declared. Should it be static?

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-07-07 17:34:07 +03:00
Michael S. Tsirkin
c38e39c378 vhost-net: fix use-after-free in vhost_net_flush
vhost_net_ubuf_put_and_wait has a confusing name:
it will actually also free it's argument.
Thus since commit 1280c27f8e
    "vhost-net: flush outstanding DMAs on memory change"
vhost_net_flush tries to use the argument after passing it
to vhost_net_ubuf_put_and_wait, this results
in use after free.
To fix, don't free the argument in vhost_net_ubuf_put_and_wait,
add an new API for callers that want to free ubufs.

Acked-by: Asias He <asias@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-07-07 14:06:22 +03:00
Michael S. Tsirkin
288cfe78c8 vhost: fix ubuf_info cleanup
vhost_net_clear_ubuf_info didn't clear ubuf_info
after kfree, this could trigger double free.
Fix this and simplify this code to make it more robust: make sure
ubuf info is always freed through vhost_net_clear_ubuf_info.

Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:46:21 -07:00
Michael S. Tsirkin
05c0535194 vhost: check owner before we overwrite ubuf_info
If device has an owner, we shouldn't touch ubuf_info
since it might be in use.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:46:21 -07:00
Jason Wang
4364d5f96e vhost_net: clear msg.control for non-zerocopy case during tx
When we decide not use zero-copy, msg.control should be set to NULL otherwise
macvtap/tap may set zerocopy callbacks which may decrease the kref of ubufs
wrongly.

Bug were introduced by commit cedb9bdce0
(vhost-net: skip head management if no outstanding).

This solves the following warnings:

WARNING: at include/linux/kref.h:47 handle_tx+0x477/0x4b0 [vhost_net]()
Modules linked in: vhost_net macvtap macvlan tun nfsd exportfs bridge stp llc openvswitch kvm_amd kvm bnx2 megaraid_sas [last unloaded: tun]
CPU: 5 PID: 8670 Comm: vhost-8668 Not tainted 3.10.0-rc2+ #1566
Hardware name: Dell Inc. PowerEdge R715/00XHKG, BIOS 1.5.2 04/19/2011
ffffffffa0198323 ffff88007c9ebd08 ffffffff81796b73 ffff88007c9ebd48
ffffffff8103d66b 000000007b773e20 ffff8800779f0000 ffff8800779f43f0
ffff8800779f8418 000000000000015c 0000000000000062 ffff88007c9ebd58
Call Trace:
[<ffffffff81796b73>] dump_stack+0x19/0x1e
[<ffffffff8103d66b>] warn_slowpath_common+0x6b/0xa0
[<ffffffff8103d6b5>] warn_slowpath_null+0x15/0x20
[<ffffffffa0197627>] handle_tx+0x477/0x4b0 [vhost_net]
[<ffffffffa0197690>] handle_tx_kick+0x10/0x20 [vhost_net]
[<ffffffffa019541e>] vhost_worker+0xfe/0x1a0 [vhost_net]
[<ffffffffa0195320>] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
[<ffffffffa0195320>] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
[<ffffffff81061f46>] kthread+0xc6/0xd0
[<ffffffff81061e80>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff817a1aec>] ret_from_fork+0x7c/0xb0
[<ffffffff81061e80>] ? kthread_freezable_should_stop+0x70/0x70

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 14:31:45 -07:00
Asias He
fe729a57c8 vhost-net: Cleanup vhost_ubuf and vhost_zcopy
- Rename vhost_ubuf to vhost_net_ubuf
- Rename vhost_zcopy_mask to vhost_net_zcopy_mask
- Make funcs static

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-05-06 13:25:47 +03:00
Asias He
8570a6e72c vhost: Move VHOST_NET_FEATURES to net.c
vhost.h should not depend on device specific marcos like
VHOST_NET_F_VIRTIO_NET_HDR and VIRTIO_NET_F_MRG_RXBUF.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-05-06 13:21:00 +03:00
Asias He
b1ad8496c9 vhost-net: Free ubuf when vhost_dev_set_owner fails
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-05-06 12:57:54 +03:00
Michael S. Tsirkin
150b9e51ae vhost: fix error handling in RESET_OWNER ioctl
RESET_OWNER ioctl would leave the fd in a bad state if
memory allocation failed: device is stopped
but owner is not reset. Make state changes
after allocating memory, such that a failed
ioctl has no effect.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-05-01 10:02:54 +03:00
Michael S. Tsirkin
81f95a5580 vhost: move per-vq net specific fields out to net
This will remove the need for vhost scsi to pull
in virtio-net.h.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-05-01 10:02:53 +03:00
Asias He
2839400f8f vhost: move vhost-net zerocopy fields to net.c
On top of 'vhost: Allow device specific fields per vq', we can move device
specific fields to device virt queue from vhost virt queue.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-05-01 10:02:52 +03:00
Asias He
3ab2e420ec vhost: Allow device specific fields per vq
This is useful for any device who wants device specific fields per vq.
For example, tcm_vhost wants a per vq field to track requests which are
in flight on the vq. Also, on top of this we can add patches to move
things like ubufs from vhost.h out to net.c.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-05-01 10:02:45 +03:00
Jason Wang
70181d5120 vhost_net: remove tx polling state
After commit 2b8b328b61 (vhost_net: handle polling
errors when setting backend), we in fact track the polling state through
poll->wqh, so there's no need to duplicate the work with an extra
vhost_net_polling_state. So this patch removes this and make the code simpler.

This patch also removes the all tx starting/stopping code in tx path according
to Michael's suggestion.

Netperf test shows almost the same result in stream test, but gets improvements
on TCP_RR tests (both zerocopy or copy) especially on low load cases.

Tested between multiqueue kvm guest and external host with two direct
connected 82599s.

zerocopy disabled:

sessions|transaction rates|normalize|
before/after/+improvements
1 | 9510.24/11727.29/+23.3%    | 693.54/887.68/+28.0%   |
25| 192931.50/241729.87/+25.3% | 2376.80/2771.70/+16.6% |
50| 277634.64/291905.76/+5%    | 3118.36/3230.11/+3.6%  |

zerocopy enabled:

sessions|transaction rates|normalize|
before/after/+improvements
1 | 7318.33/11929.76/+63.0%    | 521.86/843.30/+61.6%   |
25| 167264.88/242422.15/+44.9% | 2181.60/2788.16/+27.8% |
50| 272181.02/294347.04/+8.1%  | 3071.56/3257.85/+6.1%  |

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-11 16:16:22 -04:00
Michael S. Tsirkin
46aa92d1ba vhost/net: fix heads usage of ubuf_info
ubuf info allocator uses guest controlled head as an index,
so a malicious guest could put the same head entry in the ring twice,
and we will get two callbacks on the same value.
To fix use upend_idx which is guaranteed to be unique.

Reported-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-17 14:28:54 -04:00
Jason Wang
2b8b328b61 vhost_net: handle polling errors when setting backend
Currently, the polling errors were ignored, which can lead following issues:

- vhost remove itself unconditionally from waitqueue when stopping the poll,
  this may crash the kernel since the previous attempt of starting may fail to
  add itself to the waitqueue
- userspace may think the backend were successfully set even when the polling
  failed.

Solve this by:

- check poll->wqh before trying to remove from waitqueue
- report polling errors in vhost_poll_start(), tx_poll_start(), the return value
  will be checked and returned when userspace want to set the backend

After this fix, there still could be a polling failure after backend is set, it
will addressed by the next patch.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 15:43:03 -05:00
Jason Wang
692a998b90 vhost_net: correct error handling in vhost_net_set_backend()
Currently, when vhost_init_used() fails the sock refcnt and ubufs were
leaked. Correct this by calling vhost_init_used() before assign ubufs and
restore the oldsock when it fails.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 15:43:03 -05:00
Michael S. Tsirkin
f9611c43ab vhost-net: enable zerocopy tx by default
Zero copy TX has been around for a while now.
We seem to be down to eliminating theoretical bugs
and performance tuning at this point:
it's probably time to enable it by default so that
most users get the benefit.

Keep the flag around meanwhile so users can experiment
with disabling this if they experience regressions.
I expect that we will remove it in the future.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-06 17:09:18 +02:00
Michael S. Tsirkin
cedb9bdce0 vhost-net: skip head management if no outstanding
For short packets zerocopy mode adds overhead
of managing heads which isn't necessary: we
could simly update used ring directly
same as with zerocopy disabled.

Things seem to run a bit faster if we detect
and bypass head management when zcopy isn't used.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-06 17:09:18 +02:00
Michael S. Tsirkin
1280c27f8e vhost-net: flush outstanding DMAs on memory change
When memory map changes, we need to flush outstanding
DMAs as they might in theory reference old memory addresses.
To do this simply stop initiating new DMAs
and wait for ubufs ref count to drop to 0.
Afterwards reset the count back to 1.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-06 17:09:18 +02:00
Michael S. Tsirkin
935cdee7ee vhost: avoid backend flush on vring ops
vring changes already do a flush internally where appropriate, so we do
not need a second flush.

It's currently not very expensive but a follow-up patch makes flush more
heavy-weight, so remove the extra flush here to avoid regressing
performance if call or kick fds are changed on data path.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-06 17:09:18 +02:00
Michael S. Tsirkin
64e9a9b8a0 vhost-net: initialize zcopy packet counters
These packet counters are used to drive the zercopy
selection heuristic so nothing too bad happens if they are off a bit -
and they are also reset once in a while.
But it's cleaner to clear them when backend is set so that
we start in a known state.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03 13:42:17 -05:00
David S. Miller
d4185bbf62 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c

Minor conflict between the BCM_CNIC define removal in net-next
and a bug fix added to net.  Based upon a conflict resolution
patch posted by Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-10 18:32:51 -05:00
Michael S. Tsirkin
24eb21a148 vhost-net: reduce vq polling on tx zerocopy
It seems that to avoid deadlocks it is enough to poll vq before
 we are going to use the last buffer.  This is faster than
c70aa540c7.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-02 21:29:58 -04:00