Impact: remove lots of lguest boot WARN_ON() when CONFIG_SPARSE_IRQ=y
We now need to call irq_to_desc_alloc_cpu() before
set_irq_chip_and_handler_name(), but we can't do that from init_IRQ (no
kmalloc available).
So do it as we use interrupts instead. Also means we only alloc for
irqs we use, which was the intent of CONFIG_SPARSE_IRQ anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@redhat.com>
Fix a memory leak identified by Rusty Russell during LCA09 by
kfree'ing the lg object instead of just clearing it when the
launcher closes.
Signed-off-by: Mark Wallis <mwallis@serialmonkey.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We shouldn't be statically allocating the root device object,
so dynamically allocate it using root_device_register()
instead.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
x86: export vector_used_by_percpu_irq
x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
sched: nominate preferred wakeup cpu, fix
x86: fix lguest used_vectors breakage, -v2
x86: fix warning in arch/x86/kernel/io_apic.c
sched: fix warning in kernel/sched.c
sched: move test_sd_parent() to an SMP section of sched.h
sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
sched: activate active load balancing in new idle cpus
sched: bias task wakeups to preferred semi-idle packages
sched: nominate preferred wakeup cpu
sched: favour lower logical cpu number for sched_mc balance
sched: framework for sched_mc/smt_power_savings=N
sched: convert BALANCE_FOR_xx_POWER to inline functions
x86: use possible_cpus=NUM to extend the possible cpus allowed
x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
x86: update io_apic.c to the new cpumask code
x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
x86: xen: use smp_call_function_many()
x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
...
Fixed up trivial conflict in kernel/time/tick-sched.c manually
This patch moves the initial guest page table creation code to the host,
so the launcher keeps working with PAE enabled configs.
Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This allows each virtio user to hand in the alignment appropriate to
their virtio_ring structures.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
This doesn't really matter, since lguest is i386 only at the moment,
but we could actually choose a different value. (lguest doesn't have
a guarenteed ABI).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Impact: fix lguest, clean up
32-bit lguest used used_vectors to record vectors, but that model of
allocating vectors changed and got broken, after we changed vector
allocation to a per_cpu array.
Try enable that for 64bit, and the array is used for all vectors that
are not managed by vector_irq per_cpu array.
Also kill system_vectors[], that is now a duplication of the
used_vectors bitmap.
[ merged in cpus4096 due to io_apic.c cpumask changes. ]
[ -v2, fix build failure ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Using a simple page table thrashing program I measure a slight
improvement. The program creates five processes. Each touches 1000
pages then schedules the next process. We repeat this 1000 times. As
lguest only caches 4 cr3 values, this rebuilds a lot of shadow page
tables requiring virt->phys mappings.
Before: 5.93 seconds
After: 5.40 seconds
(Counts of slow vs fastpath in this usage are 6092 and 2852462 respectively.)
And more importantly for lguest, the code is simpler.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
map_switcher allocates the array, unmap_switcher has to free it
accordingly.
Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Ron Minnich noticed that guest userspace gets a GPF when it tries to int3:
we need to copy the privilege level from the guest-supplied IDT to the real
IDT. int3 is the only common case where guest userspace expects to invoke
an interrupt, so that's the symptom of failing to do this.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To prepare for virtio_ring transport feature bits, hook in a call in
all the users to manipulate them. This currently just clears all the
bits, since it doesn't understand any features.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rather than explicitly handing the features to the lower-level, we just
hand the virtio_device and have it set the features. This make it clear
that it has the chance to manipulate the features of the device at this
point (and that all feature negotiation is already done).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I am able to reproduce the oops reported by Simon in __switch_to() with
lguest.
My debug showed that there is at least one lguest specific
issue (which should be present in 2.6.25 and before aswell) and it got
exposed with a kernel oops with the recent fpu dynamic allocation patches.
In addition to the previous possible scenario (with fpu_counter), in the
presence of lguest, it is possible that the cpu's TS bit it still set and the
lguest launcher task's thread_info has TS_USEDFPU still set.
This is because of the way the lguest launcher handling the guest's TS bit.
(look at lguest_set_ts() in lguest_arch_run_guest()). This can result
in a DNA fault while doing unlazy_fpu() in __switch_to(). This will
end up causing a DNA fault in the context of new process thats
getting context switched in (as opossed to handling DNA fault in the context
of lguest launcher/helper process).
This is wrong in both pre and post 2.6.25 kernels. In the recent
2.6.26-rc series, this is showing up as NULL pointer dereferences or
sleeping function called from atomic context(__switch_to()), as
we free and dynamically allocate the FPU context for the newly
created threads. Older kernels might show some FPU corruption for processes
running inside of lguest.
With the appended patch, my test system is running for more than 50 mins
now. So atleast some of your oops (hopefully all!) should get fixed.
Please give it a try. I will spend more time with this fix tomorrow.
Reported-by: Simon Holm Thøgersen <odie@cs.aau.dk>
Reported-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Anthony Liguori points out that three different transports use the virtio code,
but each one keeps its own counter to set the virtio_device's index field. In
theory (though not in current practice) this means that names could be
duplicated, and that risk grows as more transports are created.
So we move the selection of the unique virtio_device.index into the common code
in virtio.c, which has the side-benefit of removing duplicate code.
The only complexity is that lguest and S/390 use the index to uniquely identify
the device in case of catastrophic failure before register_virtio_device() is
called: now we use the offset within the descriptor page as a unique identifier
for the printks.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Lalancette <clalance@redhat.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Thanks to Jon Corbet & LWN. Only took me a day to join the dots.
Host->Guest netcat before (with unnecessily large receive buffers):
1073741824 bytes (1.1 GB) copied, 24.7528 seconds, 43.4 MB/s
After:
1073741824 bytes (1.1 GB) copied, 17.6369 seconds, 60.9 MB/s
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>