Commit Graph

165 Commits

Author SHA1 Message Date
Rusty Russell 1842f23c05 lguest and virtio: cleanup struct definitions to Linux style.
I've been doing this for years, and akpm picked me up on it about 12
months ago.  lguest partly serves as example code, so let's do it Right.

Also, remove two unused fields in struct vblk_info in the example launcher.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@redhat.com>
2009-07-30 16:03:46 +09:30
Rusty Russell a91d74a3c4 lguest: update commentry
Every so often, after code shuffles, I need to go through and unbitrot
the Lguest Journey (see drivers/lguest/README).  Since we now use RCU in
a simple form in one place I took the opportunity to expand that explanation.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
2009-07-30 16:03:46 +09:30
Rusty Russell 2e04ef7691 lguest: fix comment style
I don't really notice it (except to begrudge the extra vertical
space), but Ingo does.  And he pointed out that one excuse of lguest
is as a teaching tool, it should set a good example.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@redhat.com>
2009-07-30 16:03:45 +09:30
Dan Carpenter f294526279 lguest: dereferencing freed mem in add_eventfd()
"new" was freed and then dereferenced.  Also the return value wasn't being
used so I modified the caller as well.

Compile tested only.  Found by smatch (http://repo.or.cz/w/smatch.git).

regards,
dan carpenter

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-07-30 16:03:43 +09:30
Davide Libenzi 27de22d03d lguest: remove unnecessary forward struct declaration
While fixing lg.h to drop the fwd declaration, I noticed
there's another one ;)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-07-17 21:47:44 +09:30
Davide Libenzi 133890103b eventfd: revised interface and cleanups
Change the eventfd interface to de-couple the eventfd memory context, from
the file pointer instance.

Without such change, there is no clean way to racely free handle the
POLLHUP event sent when the last instance of the file* goes away.  Also,
now the internal eventfd APIs are using the eventfd context instead of the
file*.

This patch is required by KVM's IRQfd code, which is still under
development.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Gregory Haskins <ghaskins@novell.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-30 18:55:58 -07:00
Linus Torvalds 7f3591cfac Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest: (31 commits)
  lguest: add support for indirect ring entries
  lguest: suppress notifications in example Launcher
  lguest: try to batch interrupts on network receive
  lguest: avoid sending interrupts to Guest when no activity occurs.
  lguest: implement deferred interrupts in example Launcher
  lguest: remove obsolete LHREQ_BREAK call
  lguest: have example Launcher service all devices in separate threads
  lguest: use eventfds for device notification
  eventfd: export eventfd_signal and eventfd_fget for lguest
  lguest: allow any process to send interrupts
  lguest: PAE fixes
  lguest: PAE support
  lguest: Add support for kvm_hypercall4()
  lguest: replace hypercall name LHCALL_SET_PMD with LHCALL_SET_PGD
  lguest: use native_set_* macros, which properly handle 64-bit entries when PAE is activated
  lguest: map switcher with executable page table entries
  lguest: fix writev returning short on console output
  lguest: clean up length-used value in example launcher
  lguest: Segment selectors are 16-bit long. Fix lg_cpu.ss1 definition.
  lguest: beyond ARRAY_SIZE of cpu->arch.gdt
  ...
2009-06-12 09:32:26 -07:00
Rusty Russell 5dac051bc6 lguest: remove obsolete LHREQ_BREAK call
We no longer need an efficient mechanism to force the Guest back into
host userspace, as each device is serviced without bothering the main
Guest process (aka. the Launcher).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:11 +09:30
Rusty Russell df60aeef4f lguest: use eventfds for device notification
Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with
an address: the main Launcher process returns with this address, and figures
out what device to run.

A far nicer model is to let processes bind an eventfd to an address: if we
find one, we simply signal the eventfd.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Davide Libenzi <davidel@xmailserver.org>
2009-06-12 22:27:10 +09:30
Rusty Russell 9f155a9b3d lguest: allow any process to send interrupts
We currently only allow the Launcher process to send interrupts, but it
as we already send interrupts from the hrtimer, it's a simple matter of
extracting that code into a common set_interrupt routine.

As we switch to a thread per virtqueue, this avoids a bottleneck through the
main Launcher process.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:09 +09:30
Rusty Russell 92b4d8df84 lguest: PAE fixes
1) j wasn't initialized in setup_pagetables, so they weren't set up for me
   causing immediate guest crashes.

2) gpte_addr should not re-read the pmd from the Guest.  Especially
   not BUG_ON() based on the value.  If we ever supported SMP guests,
   they could trigger that.  And the Launcher could also trigger it
   (tho currently root-only).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:08 +09:30
Matias Zabaljauregui acdd0b6292 lguest: PAE support
This version requires that host and guest have the same PAE status.
NX cap is not offered to the guest, yet.

Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:08 +09:30
Matias Zabaljauregui ebe0ba84f5 lguest: replace hypercall name LHCALL_SET_PMD with LHCALL_SET_PGD
replace LHCALL_SET_PMD with LHCALL_SET_PGD hypercall name
(That's really what it is, and the confusion gets worse with PAE support)

Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reported-by: Jeremy Fitzhardinge <jeremy@goop.org>
2009-06-12 22:27:07 +09:30
Matias Zabaljauregui 90603d15fa lguest: use native_set_* macros, which properly handle 64-bit entries when PAE is activated
Some cleanups and replace direct assignment with native_set_* macros which properly handle 64-bit entries when PAE is activated

Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:06 +09:30
Matias Zabaljauregui ed1dc77810 lguest: map switcher with executable page table entries
Map switcher with executable page table entries.
(This bug didn't matter before PAE and hence NX support -- RR)

Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:06 +09:30
Matias Zabaljauregui f086122bb6 lguest: Segment selectors are 16-bit long. Fix lg_cpu.ss1 definition.
If GDT_ENTRIES were every > 256, this could become a problem.

Signed-off-by: Matias Zabaljauregui <zabaljauregui at gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:04 +09:30
Roel Kluin 81b79b01d0 lguest: beyond ARRAY_SIZE of cpu->arch.gdt
Do not go beyond ARRAY_SIZE of cpu->arch.gdt

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:04 +09:30
Rusty Russell a32a8813d0 lguest: improve interrupt handling, speed up stream networking
lguest never checked for pending interrupts when enabling interrupts, and
things still worked.  However, it makes a significant difference to TCP
performance, so it's time we fixed it by introducing a pending_irq flag
and checking it on irq_restore and irq_enable.

These two routines are now too big to patch into the 8/10 bytes
patch space, so we drop that code.

Note: The high latency on interrupt delivery had a very curious
effect: once everything else was optimized, networking without GSO was
faster than networking with GSO, since more interrupts were sent and
hence a greater chance of one getting through to the Guest!

Note2: (Almost) Closing the same loophole for iret doesn't have any
measurable effect, so I'm leaving that patch for the moment.

Before:
	1GB tcpblast Guest->Host:		30.7 seconds
	1GB tcpblast Guest->Host (no GSO):	76.0 seconds

After:
	1GB tcpblast Guest->Host:		6.8 seconds
	1GB tcpblast Guest->Host (no GSO):	27.8 seconds

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 +09:30
Rusty Russell abd41f037e lguest: fix race in halt code
When the Guest does the LHCALL_HALT hypercall, we go to sleep, expecting
that a timer or the Waker will wake_up_process() us.

But we do it in a stupid way, leaving a classic missing wakeup race.

So split maybe_do_interrupt() into interrupt_pending() and
try_deliver_interrupt(), and check maybe_do_interrupt() and the
"break_out" flag before calling schedule.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:02 +09:30
Rusty Russell a6c372de6e lguest: fix lguest wake on guest clock tick, or fd activity
The Launcher could be inside the Guest on another CPU; wake_up_process
will do nothing because it is "running".  kick_process will knock it
back into our kernel in this case, otherwise we'll miss it until the
next guest exit.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:01 +09:30
Michael S. Tsirkin d2a7ddda9f virtio: find_vqs/del_vqs virtio operations
This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations,
and updates all drivers. This is needed for MSI support, because MSI
needs to know the total number of vectors upfront.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ lguest/9p compile fixes)
2009-06-12 22:16:36 +09:30
Rusty Russell 9499f5e7ed virtio: add names to virtqueue struct, mapping from devices to queues.
Add a linked list of all virtqueues for a virtio device: this helps for
debugging and is also needed for upcoming interface change.

Also, add a "name" field for clearer debug messages.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:16:36 +09:30
Rusty Russell 564346224d lguest: fix on Intel when KVM loaded (unhandled trap 13)
When KVM is loaded, and hence VT set up, the vmcall instruction in an
lguest guest causes a #GP, not #UD.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-26 12:13:11 -07:00
Rusty Russell a489f0b555 lguest: fix guest crash on non-linear addresses in gdt pvops
Fixes guest crash 'lguest: bad read address 0x4800000 len 256'

The new per-cpu allocator ends up handing a non-linear address to
write_gdt_entry.  We do __pa() on it, and hand it to the host, which
kills us.

I've long wanted to make the hypercall "LOAD_GDT_ENTRY" to match the IDT
code, but had no pressing reason until now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: lguest@ozlabs.org
2009-04-19 23:14:01 +09:30
Matias Zabaljauregui 88df781afb lguest: fix crash on vmlinux images
Typical message: 'lguest: unhandled trap 6 at 0x418726 (0x0)'

vmlinux guests were broken by 4cd8b5e2a1
'lguest: use KVM hypercalls', which rewrites guest text from kvm hypercalls
to trap 31.

The Launcher mmaps the kernel image.  The Guest executes and
immediately faults in the first text page (read-only).  Then it hits a
hypercall, and we rewrite that hypercall, causing a copy-on-write.
But the Guest pagetables still refer to the old page: we fault again,
but as Host we see the hypercall already rewritten, and pass the fault
back to the Guest.  The Guest hasn't set up an IDT yet, so we kill it.

This doesn't happen with bzImages: they unpack themselves and so the
text pages are already read-write.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Patrick McHardy <kaber@trash.net>
2009-04-19 23:14:00 +09:30