Commit Graph

51 Commits

Author SHA1 Message Date
Marcelo Tosatti
84a139a985 virtio: fix suspend when using virtio_balloon
Break out of wait_event_interruptible() if freezing has been requested,
in the vballoon thread. Without this change vballoon refuses to stop and
the system can't suspend.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
2009-04-19 23:14:01 +09:30
Rusty Russell
c5f841f178 virtio: more neatening of virtio_ring macros.
Impact: cleanup

Roel Kluin drew attention to these macros with his patch: here I
neaten them a little further:
1) Add a comment on what START_USE and END_USE are checking,
2) Brackets around _vq in BAD_RING,
3) Neaten formatting for START_USE so it's less than 80 cols.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-30 21:55:23 +10:30
Roel Kluin
3a35ce7dce virtio: fix BAD_RING, START_US and END_USE macros
Impact: cleanup

fix BAD_RING, START_US and END_USE macros

When these macros aren't called with a variable named vq as first
argument, this would result in a build failure.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-30 21:55:22 +10:30
Mark McLoughlin
3fff0179e3 virtio-pci: do not oops on config change if driver not loaded
The host really shouldn't be notifying us of config changes
before the device status is VIRTIO_CONFIG_S_DRIVER or
VIRTIO_CONFIG_S_DRIVER_OK.

However, if we do happen to be interrupted while we're not
attached to a driver, we really shouldn't oops. Prevent
this simply by checking that device->driver is non-NULL
before trying to notify the driver of config changes.

Problem observed by doing a "set_link virtio.0 down" with
QEMU before the net driver had been loaded.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-02 19:17:56 -08:00
Mark McLoughlin
63d1255670 virtio: do not statically allocate root device
We shouldn't be statically allocating the root device object,
so dynamically allocate it using root_device_register()
instead.

Also avoids this warning from 'rmmod virtio_pci':

  Device 'virtio-pci' does not have a release() function, it is broken and must be fixed

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-06 10:44:34 -08:00
Mark McLoughlin
29f9f12ec7 virtio: add PCI device release() function
Add a release() function for virtio_pci devices so as to avoid:

  Device 'virtio0' does not have a release() function, it is broken and must be fixed

Move the code to free the resources associated with the device
from virtio_pci_remove() into this new function. virtio_pci_remove()
now merely unregisters the device which should cause the final
ref to be dropped and virtio_pci_release_dev() to be called.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-12-30 09:26:10 +10:30
Hollis Blanchard
1b4aa2faec virtio: avoid implicit use of Linux page size in balloon interface
Make the balloon interface always use 4K pages, and convert Linux pfns if
necessary. This patch assumes that Linux's PAGE_SHIFT will never be less than
12.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (modified)
2008-12-30 09:26:04 +10:30
Rusty Russell
87c7d57c17 virtio: hand virtio ring alignment as argument to vring_new_virtqueue
This allows each virtio user to hand in the alignment appropriate to
their virtio_ring structures.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
2008-12-30 09:26:03 +10:30
Rusty Russell
498af14783 virtio: Don't use PAGE_SIZE for vring alignment in virtio_pci.
That doesn't work for non-4k guests which are now appearing.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-12-30 09:25:58 +10:30
Rusty Russell
480daab42c virtio: Don't use PAGE_SIZE in virtio_pci.c
The virtio PCI devices don't depend on the guest page size.  This matters
now PowerPC virtio is gaining ground (they like 64k pages).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-12-30 09:25:57 +10:30
Kay Sievers
99e0b6c8e3 virtio: struct device - replace bus_id with dev_name(), dev_set_name()
This patch is part of a larger patch series which will remove
the "char bus_id[20]" name string from struct device. The device
name is managed in the kobject anyway, and without any size
limitation, and just needlessly copied into "struct device".

To set and read the device name dev_name(dev) and dev_set_name(dev)
must be used. If your code uses static kobjects, which it shouldn't
do, "const char *init_name" can be used to statically provide the
name the registered device should have. At registration time, the
init_name field is cleared, to enforce the use of dev_name(dev) to
access the device name at a later time.

We need to get rid of all occurrences of bus_id in the entire tree
to be able to enable the new interface. Please apply this patch,
and possibly convert any remaining remaining occurrences of bus_id.

We want to submit a patch to -next, which will remove bus_id from
"struct device", to find the remaining pieces to convert, and finally
switch over to the new api, which will remove the 20 bytes array
and does no longer have a size limitation.

Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-12-30 09:25:56 +10:30
Hollis Blanchard
13b1eb333b virtio-pci queue allocation not page-aligned
kzalloc() does not guarantee page alignment, and in fact this broke when
I enabled CONFIG_SLUB_DEBUG_ON.

(Thanks to Anthony Liguori for spotting the missing kfree sub)

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (fixed kfree)
Tested-by: Anthony Liguori <aliguori@us.ibm.com>
2008-12-30 09:25:56 +10:30
Anthony Liguori
532a6086e3 virtio_balloon: fix towards_target when deflating balloon
Both v and vb->num_pages are u32 and unsigned int respectively.  If v is less
than vb->num_pages (and it is, when deflating the balloon), the result is a
very large 32-bit number.  Since we're returning a s64, instead of getting the
same negative number we desire, we get a very large positive number.

This handles the case where v < vb->num_pages and ensures we get a small,
negative, s64 as the result.

Rusty: please push this for 2.6.27-rc4.  It's probably appropriate for the
stable tree too as it will cause an unexpected OOM when ballooning.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (simplified)
2008-08-26 00:19:25 +10:00
Rusty Russell
e34f872567 virtio: Add transport feature handling stub for virtio_ring.
To prepare for virtio_ring transport feature bits, hook in a call in
all the users to manipulate them.  This currently just clears all the
bits, since it doesn't understand any features.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-25 12:06:14 +10:00
Rusty Russell
c624896e48 virtio: Rename set_features to finalize_features
Rather than explicitly handing the features to the lower-level, we just
hand the virtio_device and have it set the features.  This make it clear
that it has the chance to manipulate the features of the device at this
point (and that all feature negotiation is already done).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-25 12:06:12 +10:00
Rusty Russell
dd7c7bc462 virtio: Formally reserve bits 28-31 to be 'transport' features.
We assign feature bits as required, but it makes sense to reserve some
for the particular transport, rather than the particular device.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-25 12:06:07 +10:00
Mark McLoughlin
e962fa660d virtio: Use bus_type probe and remove methods
Hook up to the probe() and remove() methods in bus_type
rather than device_driver. The latter has been preferred
since 2.6.16.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-25 12:06:05 +10:00
Rusty Russell
44653eae14 virtio: don't always force a notification when ring is full
We force notification when the ring is full, even if the host has
indicated it doesn't want to know.  This seemed like a good idea at
the time: if we fill the transmit ring, we should tell the host
immediately.

Unfortunately this logic also applies to the receiving ring, which is
refilled constantly.  We should introduce real notification thesholds
to replace this logic.  Meanwhile, removing the logic altogether breaks
the heuristics which KVM uses, so we use a hack: only notify if there are
outgoing parts of the new buffer.

Here are the number of exits with lguest's crappy network implementation:
Before:
	network xmit 7859051 recv 236420
After:
	network xmit 7858610 recv 118136

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-25 12:06:04 +10:00
Mark McLoughlin
b92dea67cc virtio: Complete feature negotation before updating status
lguest (in rusty's use-tun-ringfd patch) assumes that the
guest has updated its feature bits before setting its status
to VIRTIO_CONFIG_S_DRIVER_OK.

That's pretty reasonable, so let's make it so.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-15 13:46:16 -07:00
Rusty Russell
b4f68be6c5 virtio: force callback on empty.
virtio allows drivers to suppress callbacks (ie. interrupts) for
efficiency (no locking, it's just an optimization).

There's a similar mechanism for the host to suppress notifications
coming from the guest: in that case, we ignore the suppression if the
ring is completely full.

It turns out that life is simpler if the host similarly ignores
callback suppression when the ring is completely empty: the network
driver wants to free up old packets in a timely manner, and otherwise
has to use a timer to poll.

We have to remove the code which ignores interrupts when the driver
has disabled them (again, it had no locking and hence was unreliable
anyway).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-30 15:09:46 +10:00
Christian Borntraeger
52a3a05f3a virtio_net: another race with virtio_net and enable_cb
Hello Rusty,

seems that we still have a problem with virtio_net and the enable_cb callback.
During a long running network stress tests with virtio and got the following
oops:

------------[ cut here ]------------
kernel BUG at drivers/virtio/virtio_ring.c:230!
illegal operation: 0001 [#1] SMP
Modules linked in:
CPU: 0 Not tainted 2.6.26-rc2-kvm-00436-gc94c08b-dirty #34
Process netserver (pid: 2582, task: 000000000fbc4c68, ksp: 000000000f42b990)
Krnl PSW : 0704c00180000000 00000000002d0ec8 (vring_enable_cb+0x1c/0x60)
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
Krnl GPRS: 0000000000000000 0000000000000000 000000000ef3d000 0000000010009800
           0000000000000000 0000000000419ce0 0000000000000080 000000000000007b
           000000000adb5538 000000000ef40900 000000000ef40000 000000000ef40920
           0000000000000000 0000000000000005 000000000029c1b0 000000000fea7d18
Krnl Code: 00000000002d0ebc: a7110001           tmll    %r1,1
           00000000002d0ec0: a7740004           brc     7,2d0ec8
           00000000002d0ec4: a7f40001           brc     15,2d0ec6
          >00000000002d0ec8: a517fffe           nill    %r1,65534
           00000000002d0ecc: 40103000           sth     %r1,0(%r3)
           00000000002d0ed0: 07f0               bcr     15,%r0
           00000000002d0ed2: e31020380004       lg      %r1,56(%r2)
           00000000002d0ed8: a7480000           lhi     %r4,0
Call Trace:
([<000000000029c0fc>] virtnet_poll+0x290/0x3b8)
 [<0000000000333fb8>] net_rx_action+0x9c/0x1b8
 [<00000000001394bc>] __do_softirq+0x74/0x108
 [<000000000010d16a>] do_softirq+0x92/0xac
 [<0000000000139826>] irq_exit+0x72/0xc8
 [<000000000010a7b6>] do_extint+0xe2/0x104
 [<0000000000110508>] ext_no_vtime+0x16/0x1a
Last Breaking-Event-Address:
 [<00000000002d0ec4>] vring_enable_cb+0x18/0x60

I looked into the virtio_net code for some time and I think the following
scenario happened. Please look at virtnet_poll:
[...]
        /* Out of packets? */
        if (received < budget) {
                netif_rx_complete(vi->dev, napi);
                if (unlikely(!vi->rvq->vq_ops->enable_cb(vi->rvq))
                    && napi_schedule_prep(napi)) {
                        vi->rvq->vq_ops->disable_cb(vi->rvq);
                        __netif_rx_schedule(vi->dev, napi);
                        goto again;
                }
        }

If an interrupt arrives after netif_rx_complete, a second poll routine can run
on a different cpu. The second check for napi_schedule_prep would prevent any
harm in the network stack, but we have called enable_cb possibly after the
disable_cb in skb_recv_done.

static void skb_recv_done(struct virtqueue *rvq)
{
        struct virtnet_info *vi = rvq->vdev->priv;
        /* Schedule NAPI, Suppress further interrupts if successful. */
        if (netif_rx_schedule_prep(vi->dev, &vi->napi)) {
                rvq->vq_ops->disable_cb(rvq);
                __netif_rx_schedule(vi->dev, &vi->napi);
        }
}

That means that the second poll routine runs with interrupts enabled, which is
ok, since we can handle additional interrupts. The problem is now that the
second poll routine might also call enable_cb, triggering the BUG.

The only solution I can come up with, is to remove the BUG statement in
enable_cb - similar to disable_cb. Opinions or better ideas where the oops
could come from?

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-30 15:09:45 +10:00
Rusty Russell
b769f57908 virtio: set device index in common code.
Anthony Liguori points out that three different transports use the virtio code,
but each one keeps its own counter to set the virtio_device's index field.  In
theory (though not in current practice) this means that names could be
duplicated, and that risk grows as more transports are created.

So we move the selection of the unique virtio_device.index into the common code
in virtio.c, which has the side-benefit of removing duplicate code.

The only complexity is that lguest and S/390 use the index to uniquely identify
the device in case of catastrophic failure before register_virtio_device() is
called: now we use the offset within the descriptor page as a unique identifier
for the printks.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Lalancette <clalance@redhat.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
2008-05-30 15:09:42 +10:00
Rusty Russell
5610bd1524 virtio: virtio_pci should not set bus_id.
The common virtio code sets the bus_id, overriding anything virtio_pci
sets anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Lalancette <clalance@redhat.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
2008-05-30 15:09:42 +10:00
Rusty Russell
2ad3cfbac5 virtio: bus_id for devices should contain 'virtio'
Chris Lalancette <clalance@redhat.com> points out that virtio.c sets all device
names to '0', '1', etc, which looks silly in /proc/interrupts.  We change this
from '%d' to 'virtio%d'.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Lalancette <clalance@redhat.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
2008-05-30 15:09:42 +10:00
Rusty Russell
c45a6816c1 virtio: explicit advertisement of driver features
A recent proposed feature addition to the virtio block driver revealed
some flaws in the API: in particular, we assume that feature
negotiation is complete once a driver's probe function returns.

There is nothing in the API to require this, however, and even I
didn't notice when it was violated.

So instead, we require the driver to specify what features it supports
in a table, we can then move the feature negotiation into the virtio
core.  The intersection of device and driver features are presented in
a new 'features' bitmap in the struct virtio_device.

Note that this highlights the difference between Linux unsigned-long
bitmaps where each unsigned long is in native endian, and a
straight-forward little-endian array of bytes.

Drivers can still remove feature bits in their probe routine if they
really have to.

API changes:
- dev->config->feature() no longer gets and acks a feature.
- drivers should advertise their features in the 'feature_table' field
- use virtio_has_feature() for extra sanity when checking feature bits

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:50 +10:00