The sysfs files for virtio produce the wrong format and are missing
the required newline. The output for virtio bus vendor/device should
have the same format as the corresponding entries for PCI devices.
Although this technically changes the ABI for sysfs, these files were
broken to start with!
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can't rely on indirect buffers for capacity
calculations because they need a memory allocation
which might fail. In particular, virtio_net can get
into this situation under stress, and it drops packets
and performs badly.
So return the number of buffers we can guarantee users.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reported-By: Krishna Kumar2 <krkumar2@in.ibm.com>
virtio ring was changed to return an error code on OOM,
but one caller was missed and still checks for vq->vring.num.
The fix is just to check for <0 error code.
Long term it might make sense to change goto add_head to
just return an error on oom instead, but let's apply
a minimal fix for 2.6.35.
Reported-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Chris Mason <chris.mason@oracle.com>
Cc: stable@kernel.org # .34.x
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
virtio-pci resets the device at startup by writing to the status
register, but this does not clear the pci config space,
specifically msi enable status which affects register
layout.
This breaks things like kdump when they try to use e.g. virtio-blk.
Fix by forcing msi off at startup. Since pci.c already has
a routine to do this, we export and use it instead of duplicating code.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
* 'virtio' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (27 commits)
drivers/char: Eliminate use after free
virtio: console: Accept console size along with resize control message
virtio: console: Store each console's size in the console structure
virtio: console: Resize console port 0 on config intr only if multiport is off
virtio: console: Add support for nonblocking write()s
virtio: console: Rename wait_is_over() to will_read_block()
virtio: console: Don't always create a port 0 if using multiport
virtio: console: Use a control message to add ports
virtio: console: Move code around for future patches
virtio: console: Remove config work handler
virtio: console: Don't call hvc_remove() on unplugging console ports
virtio: console: Return -EPIPE to hvc_console if we lost the connection
virtio: console: Let host know of port or device add failures
virtio: console: Add a __send_control_msg() that can send messages without a valid port
virtio: Revert "virtio: disable multiport console support."
virtio: add_buf_gfp
trans_virtio: use virtqueue_xxx wrappers
virtio-rng: use virtqueue_xxx wrappers
virtio_ring: remove a level of indirection
virtio_net: use virtqueue_xxx wrappers
...
Fix up conflicts in drivers/net/virtio_net.c due to new virtqueue_xxx
wrappers changes conflicting with some other cleanups.
Add an add_buf variant that gets gfp parameter. Use that
to allocate indirect buffers.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have a single virtqueue_ops implementation,
and it seems unlikely we'll get another one
at this point. So let's remove an unnecessary
level of indirection: it would be very easy to
re-add it if another implementation surfaces.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Switch virtio_balloon to new virtqueue_xxx wrappers.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The virtio balloon driver can dig into the reservation pools of the OS
to satisfy a balloon request. This is not advisable and other balloon
drivers (drivers/xen/balloon.c) avoid this as well.
The patch also adds changes to avoid printing a warning if allocation
fails, since we retry after sometime anyway.
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: kvm <kvm@vger.kernel.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
As all virtio devices perform DMA, we
must enable bus mastering for them to be
spec compliant.
This patch fixes hotplug of virtio devices
with Linux guests and qemu 0.11-0.12.
Tested-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
I have observed the following error on virtio-net module unload:
------------[ cut here ]------------
WARNING: at kernel/irq/manage.c:858 __free_irq+0xa0/0x14c()
Hardware name: Bochs
Trying to free already-free IRQ 0
Modules linked in: virtio_net(-) virtio_blk virtio_pci virtio_ring
virtio af_packet e1000 shpchp aacraid uhci_hcd ohci_hcd ehci_hcd [last
unloaded: scsi_wait_scan]
Pid: 1957, comm: rmmod Not tainted 2.6.33-rc8-vhost #24
Call Trace:
[<ffffffff8103e195>] warn_slowpath_common+0x7c/0x94
[<ffffffff8103e204>] warn_slowpath_fmt+0x41/0x43
[<ffffffff810a7a36>] ? __free_pages+0x5a/0x70
[<ffffffff8107cc00>] __free_irq+0xa0/0x14c
[<ffffffff8107cceb>] free_irq+0x3f/0x65
[<ffffffffa0081424>] vp_del_vqs+0x81/0xb1 [virtio_pci]
[<ffffffffa0091d29>] virtnet_remove+0xda/0x10b [virtio_net]
[<ffffffffa0075200>] virtio_dev_remove+0x22/0x4a [virtio]
[<ffffffff812709ee>] __device_release_driver+0x66/0xac
[<ffffffff81270ab7>] driver_detach+0x83/0xa9
[<ffffffff8126fc66>] bus_remove_driver+0x91/0xb4
[<ffffffff81270fcf>] driver_unregister+0x6c/0x74
[<ffffffffa0075418>] unregister_virtio_driver+0xe/0x10 [virtio]
[<ffffffffa0091c4d>] fini+0x15/0x17 [virtio_net]
[<ffffffff8106997b>] sys_delete_module+0x1c3/0x230
[<ffffffff81007465>] ? old_ich_force_enable_hpet+0x117/0x164
[<ffffffff813bb720>] ? do_page_fault+0x29c/0x2cc
[<ffffffff81028e58>] sysenter_dispatch+0x7/0x27
---[ end trace 15e88e4c576cc62b ]---
The bug is in virtio-pci: we use msix_vector as array index to get irq
entry, but some vqs do not have a dedicated vector so this causes an out
of bounds access. By chance, we seem to often get 0 value, which
results in this error.
Fix by verifying that vector is legal before using it as index.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Anthony Liguori <aliguori@us.ibm.com>
Acked-by: Shirley Ma <xma@us.ibm.com>
Acked-by: Amit Shah <amit.shah@redhat.com>
vq operations depend on vq->data[i] being NULL to figure out if the vq
entry is in use (since the previous patch).
We have to initialize them to NULL to ensure we don't work with junk
data and trigger false BUG_ONs.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Shirley Ma <xma@us.ibm.com>
There's currently no way for a virtio driver to ask for unused
buffers, so it has to keep a list itself to reclaim them at shutdown.
This is redundant, since virtio_ring stores that information. So
add a new hook to do this.
Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio is communicating with a virtual "device" that actually runs on
another host processor. Thus SMP barriers can be used to control
memory access ordering.
Where possible, we should use SMP barriers which are more lightweight than
mandatory barriers, because mandatory barriers also control MMIO effects on
accesses through relaxed memory I/O windows (which virtio does not use)
(compare specifically smp_rmb and rmb on x86_64).
We can't just use smp_mb and friends though, because
we must force memory ordering even if guest is UP since host could be
running on another CPU, but SMP barriers are defined to barrier() in
that configuration. So, for UP fall back to mandatory barriers instead.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
With DEBUG defined, we add an ->in_use flag to detect if the caller
invokes two virtio methods in parallel. The barriers attempt to ensure
timely update of the ->in_use flag.
But they're voodoo: if we need these barriers it implies that the
calling code doesn't have sufficient synchronization to ensure the
code paths aren't invoked at the same time anyway, and we want to
detect it.
Also, adding barriers changes timing, so turning on debug has more
chance of hiding real problems.
Thanks to MST for drawing my attention to this code...
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a fix for my earlier patch: "virtio: Add memory statistics reporting to
the balloon driver (V4)".
I discovered that all_vm_events() can sleep and therefore stats collection
cannot be done in interrupt context. One solution is to handle the interrupt
by noting that stats need to be collected and waking the existing vballoon
kthread which will complete the work via stats_handle_request(). Rusty, is
this a saner way of doing business?
There is one issue that I would like a broader opinion on. In stats_request, I
update vb->need_stats_update and then wake up the kthread. The kthread uses
vb->need_stats_update as a condition variable. Do I need a memory barrier
between the update and wake_up to ensure that my kthread sees the correct
value? My testing suggests that it is not needed but I would like some
confirmation from the experts.
Signed-off-by: Adam Litke <agl@us.ibm.com>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: Anthony Liguori <aliguori@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changes since V3:
- Do not do endian conversions as they will be done in the host
- Report stats that reference a quantity of memory in bytes
- Minor coding style updates
Changes since V2:
- Increase stat field size to 64 bits
- Report all sizes in kb (not pages)
- Drop anon_pages stat and fix endianness conversion
Changes since V1:
- Use a virtqueue instead of the device config space
When using ballooning to manage overcommitted memory on a host, a system for
guests to communicate their memory usage to the host can provide information
that will minimize the impact of ballooning on the guests. The current method
employs a daemon running in each guest that communicates memory statistics to a
host daemon at a specified time interval. The host daemon aggregates this
information and inflates and/or deflates balloons according to the level of
host memory pressure. This approach is effective but overly complex since a
daemon must be installed inside each guest and coordinated to communicate with
the host. A simpler approach is to collect memory statistics in the virtio
balloon driver and communicate them directly to the hypervisor.
This patch enables the guest-side support by adding stats collection and
reporting to the virtio balloon driver.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (minor fixes)
This is needed to compile with CONFIG_VIRTIO_PCI=y,
because virtio_pci_remove is marked __devexit.
Signed-off-by: Jamie Lokier <jamie@shareable.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fix fixes the following warnings by renaming the driver structures to be
suffixed with _driver.
WARNING: drivers/virtio/virtio_balloon.o(.data+0x88): Section mismatch in reference from the variable virtio_balloon to the function .devexit.text:virtballoon_remove()
WARNING: drivers/char/hw_random/virtio-rng.o(.data+0x88): Section mismatch in reference from the variable virtio_rng to the function .devexit.text:virtrng_remove()
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On SMP guests, reads from the ring might bypass used index reads. This
causes guest crashes because host writes to used index to signal ring
data readiness. Fix this by inserting rmb before used ring reads.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org