The SIE instruction on s390 uses the 2nd half of the page table page to
virtualize the storage keys of a guest. This patch offers the s390_enable_sie
function, which reorganizes the page tables of a single-threaded process to
reserve space in the page table:
s390_enable_sie makes sure that the process is single threaded and then uses
dup_mm to create a new mm with reorganized page tables. The old mm is freed
and the process has now a page status extended field after every page table.
Code that wants to exploit pgstes should SELECT CONFIG_PGSTE.
This patch has a small common code hit, namely making dup_mm non-static.
Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's
review feedback. Now we do have the prototype for dup_mm in
include/linux/sched.h. Following Martin's suggestion, s390_enable_sie() does now
call task_lock() to prevent race against ptrace modification of mm_users.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This emulates the x86 hardware task switch mechanism in software, as it is
unsupported by either vmx or svm. It allows operating systems which use it,
like freedos, to run as kvm guests.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
When mmu_set_spte() checks if a page related to spte should be release as
dirty or clean, it check if the shadow pte was writeble, but in case
rmap_write_protect() is called called it is possible for shadow ptes that were
writeble to become readonly and therefor mmu_set_spte will release the pages
as clean.
This patch fix this issue by marking the page as dirty inside
rmap_write_protect().
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
If we populate a shadow pte due to a fault (and not speculatively due to a
pte write) then we can set the accessed bit on it, as we know it will be
set immediately on the next guest instruction. This saves a read-modify-write
operation.
Signed-off-by: Avi Kivity <avi@qumranet.com>
include/linux/kvm.h defines struct kvm_dirty_log to
[...]
union {
void __user *dirty_bitmap; /* one bit per page */
__u64 padding;
};
__user requires compiler.h to compile. Currently, this works on x86
only coincidentally due to other include files. This patch makes
kvm.h compile in all cases.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch writes 0 (actually, what really matters is that the
LSB is cleared) to the system time msr before shutting down
the machine for kexec.
Without it, we can have a random memory location being written
when the guest comes back
It overrides the functions shutdown, used in the path of kernel_kexec() (sys.c)
and crash_shutdown, used in the path of crash_kexec() (kexec.c)
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
it will allow external users to call it. It is mainly
useful for routines that will override its machine_ops
field for its own special purposes, but want to call the
normal shutdown routine after they're done
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch a llows machine_crash_shutdown to
be replaced, just like any of the other functions
in machine_ops
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.
Don't report the feature if two dimensional paging is enabled.
[avi:
- guest/host split
- fix 32-bit truncation issues
- adjust to mmu_op
- adjust to ->release_*() renamed
- add ->release_pud()]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.
Don't report the feature if two dimensional paging is enabled.
[avi:
- one mmu_op hypercall instead of one per op
- allow 64-bit gpa on hypercall
- don't pass host errors (-ENOMEM) to guest]
[akpm: warning fix on i386]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The patch moves the PIT model from userspace to kernel, and increases
the timer accuracy greatly.
[marcelo: make last_injected_time per-guest]
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Tested-and-Acked-by: Alex Davis <alex14641@yahoo.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Most Intel hosts have a stable tsc, and playing with the offset only
reduces accuracy. By limiting tsc offset adjustment only to forward updates,
we effectively disable tsc offset adjustment on these hosts.
Signed-off-by: Avi Kivity <avi@qumranet.com>