The code to use the ECTG instruction to calculate the cputime for the
current thread is currently used only for the per-thread CPU-clock
with the clockid -2 (PID=0, VIRT=1). Use the same code for the clockid
CLOCK_THREAD_CPUTIME_ID to speed up the more common clockid as well.
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Switch to the improved update_vsyscall interface that provides
sub-nanosecond precision for gettimeofday and clock_gettime.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pull KVM fixes from Paolo Bonzini:
"On the x86 side, there are some optimizations and documentation
updates. The big ARM/KVM change for 3.11, support for AArch64, will
come through Catalin Marinas's tree. s390 and PPC have misc cleanups
and bugfixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (87 commits)
KVM: PPC: Ignore PIR writes
KVM: PPC: Book3S PR: Invalidate SLB entries properly
KVM: PPC: Book3S PR: Allow guest to use 1TB segments
KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match
KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry
KVM: PPC: Book3S PR: Fix proto-VSID calculations
KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL
KVM: Fix RTC interrupt coalescing tracking
kvm: Add a tracepoint write_tsc_offset
KVM: MMU: Inform users of mmio generation wraparound
KVM: MMU: document fast invalidate all mmio sptes
KVM: MMU: document fast invalidate all pages
KVM: MMU: document fast page fault
KVM: MMU: document mmio page fault
KVM: MMU: document write_flooding_count
KVM: MMU: document clear_spte_count
KVM: MMU: drop kvm_mmu_zap_mmio_sptes
KVM: MMU: init kvm generation close to mmio wrap-around value
KVM: MMU: add tracepoint for check_mmio_spte
KVM: MMU: fast invalidate all mmio sptes
...
Copy the interrupt parameters from the lowcore to the pt_regs structure
in entry[64].S and reduce the arguments of the low level interrupt handler
to the pt_regs pointer only. In addition move the test-pending-interrupt
loop from do_IRQ to entry[64].S to make sure that interrupt information
is always delivered via pt_regs.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Lets provide functions to prevent KVM from reentering SIE and
to kick cpus out of SIE. We cannot use the common kvm_vcpu_kick code,
since we need to kick out guests in places that hold architecture
specific locks (e.g. pgste lock) which might be necessary on the
other cpus - so no waiting possible.
So lets provide a bit in a private field of the sie control block
that acts as a gate keeper, after we claimed we are in SIE.
Please note that we do not reuse prog0c, since we want to access
that bit without atomic ops.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Lets track in a private bit if the sie control block is active.
We want to track this as closely as possible, so we also have to
instrument the interrupt and program check handler. Lets use the
existing HANDLE_SIE_INTERCEPT macro.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Add a pointer to the system call table to the thread_info structure.
The TIF_31BIT bit is set or cleared by SET_PERSONALITY exactly once
for the lifetime of a process. With the pointer to the correct system
call table in thread_info the system call code in entry64.S path can
drop the check for TIF_31BIT which saves a couple of instructions.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Allow user-space processes to use transactional execution (TX).
If the TX facility is available user space programs can use
transactions for fine-grained serialization based on the data
objects that are referenced during a transaction. This is
useful for lockless data structures and speculative compiler
optimizations.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The current virtual timer interface is inherently per-cpu and hard to
use. The sole user of the interface is appldata which uses it to execute
a function after a specific amount of cputime has been used over all cpus.
Rework the virtual timer interface to hook into the cputime accounting.
This makes the interface independent from the CPU timer interrupts, and
makes the virtual timers global as opposed to per-cpu.
Overall the code is greatly simplified. The downside is that the accuracy
is not as good as the original implementation, but it is still good enough
for appldata.
Reviewed-by: Jan Glauber <jang@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Setting the cpu restart parameters is done in three different fashions:
- directly setting the four parameters individually
- copying the four parameters with memcpy (using 4 * sizeof(long))
- copying the four parameters using a private structure
In addition code in entry*.S relies on a certain order of the restart
members of struct _lowcore.
Make all of this more robust to future changes by adding a
mem_absolute_assign(dest, val) define, which assigns val to dest
using absolute addressing mode. Also the load multiple instructions
in entry*.S have been split into separate load instruction so the
order of the struct _lowcore members doesn't matter anymore.
In addition move the prototypes of memcpy_real/absolute from uaccess.h
to processor.h. These memcpy* variants are not related to uaccess at all.
string.h doesn't seem to match as well, so lets use processor.h.
Also replace the eight byte array in struct _lowcore which represents a
misaliged u64 with a u64. The compiler will always create code that
handles the misaligned u64 correctly.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Whenever the cpu loads an enabled wait PSW it will appear as idle to the
underlying host system. The code in default_idle calls vtime_stop_cpu
which does the necessary voodoo to get the cpu time accounting right.
The udelay code just loads an enabled wait PSW. To correct this rework
the vtime_stop_cpu/vtime_start_cpu logic and move the difficult parts
to entry[64].S, vtime_stop_cpu can now be called from anywhere and
vtime_start_cpu is gone. The correction of the cpu time during wakeup
from an enabled wait PSW is done with a critical section in entry[64].S.
As vtime_start_cpu is gone, s390_idle_check can be removed as well.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Define struct pcpu and merge some of the NR_CPUS arrays into it, including
__cpu_logical_map, current_set and smp_cpu_state. Split smp related
functions to those operating on physical cpus and the functions operating
on a logical cpu number. Make the functions for physical cpus use a
pointer to a struct pcpu. This hides the knowledge about cpu addresses in
smp.c, entry[64].S and swsusp_asm64.S, thus remove the sigp.h header.
The PSW restart mechanism is used to start secondary cpus, calling a
function on an online cpu, calling a function on the ipl cpu, and for
the nmi signal. Replace the different assembler functions with a
single function restart_int_handler. The new entry point calls a function
whose pointer is stored in the lowcore of the target cpu and it can wait
for the source cpu to stop. This covers all existing use cases.
Overall the code is now simpler and there are ~380 lines less code.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The 16 bit value at the lowcore location with offset 0x84 is the
cpu address that is associated with an external interrupt. Rename
the field from cpu_addr to ext_cpu_addr to make that clear.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move the program interruption code and the translation exception identifier
to the pt_regs structure as 'int_code' and 'int_parm_long' and make the
first level interrupt handler in entry[64].S store the two values. That
makes it possible to drop 'prot_addr' and 'trap_no' from the thread_struct
and to reduce the number of arguments to a lot of functions. Finally
un-inline do_trap. Overall this saves 5812 bytes in the .text section of
the 64 bit kernel.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Another round of cleanup for entry[64].S, in particular the program check
handler looks more reasonable now. The code size for the 31 bit kernel
has been reduced by 616 byte and by 528 byte for the 64 bit version.
Even better the code is a bit faster as well.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There is no reason for the cpu-measurement-facility host id constant to
reside in the lowcore where space is precious. Use an entry in the literal
pool in HANDLE_SIE_INTERCEPT and a stack slot in sie64a.
While we are at it replace the id -1 with 0 to indicate host execution.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For a ERESTARTNOHAND/ERESTARTSYS/ERESTARTNOINTR restarting system call
do_signal will prepare the restart of the system call with a rewind of
the PSW before calling get_signal_to_deliver (where the debugger might
take control). For A ERESTART_RESTARTBLOCK restarting system call
do_signal will set -EINTR as return code.
There are two issues with this approach:
1) strace never sees ERESTARTNOHAND, ERESTARTSYS, ERESTARTNOINTR or
ERESTART_RESTARTBLOCK as the rewinding already took place or the
return code has been changed to -EINTR
2) if get_signal_to_deliver does not return with a signal to deliver
the restart via the repeat of the svc instruction is left in place.
This opens a race if another signal is made pending before the
system call instruction can be reexecuted. The original system call
will be restarted even if the second signal would have ended the
system call with -EINTR.
These two issues can be solved by dropping the early rewind of the
system call before get_signal_to_deliver has been called and by using
the TIF_RESTART_SVC magic to do the restart if no signal has to be
delivered. The only situation where the system call restart via the
repeat of the svc instruction is appropriate is when a SA_RESTART
signal is delivered to user space.
Unfortunately this breaks inferior calls by the debugger again. The
system call number and the length of the system call instruction is
lost over the inferior call and user space will see ERESTARTNOHAND/
ERESTARTSYS/ERESTARTNOINTR/ERESTART_RESTARTBLOCK. To correct this a
new ptrace interface is added to save/restore the system call number
and system call instruction length.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove the save_area_64 field from the 0xe00 - 0xf00 area in the lowcore.
Use a free slot in the save_area array instead.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
598841ca99 ([S390] use gmap address
spaces for kvm guest images) changed kvm to use a separate address
space for kvm guests. This address space was switched in __vcpu_run
In some cases (preemption, page fault) there is the possibility that
this address space switch is lost.
The typical symptom was a huge amount of validity intercepts or
random guest addressing exceptions.
Fix this by doing the switch in sie_loop and sie_exit and saving the
address space in the gmap structure itself. Also use the preempt
notifier.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Because of readability reasons we ignore the 80 character line limit
in asm offsets. Just one line per define, nothing else.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
With this patch a new S390 shutdown trigger "restart" is added. If under
z/VM "systerm restart" is entered or under the HMC the "PSW restart" button
is pressed, the PSW located at 0 (31 bit) or 0x1a0 (64 bit) bit is loaded.
Now we execute do_restart() that processes the restart action that is
defined under /sys/firmware/shutdown_actions/on_restart. Currently the
following actions are possible: reipl (default), stop, vmcmd, dump, and
dump_reipl.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Add code that allows KVM to control the virtual memory layout that
is seen by a guest. The guest address space uses a second page table
that shares the last level pte-tables with the process page table.
If a page is unmapped from the process page table it is automatically
unmapped from the guest page table as well.
The guest address space mapping starts out empty, KVM can map any
individual 1MB segments from the process virtual memory to any 1MB
aligned location in the guest virtual memory. If a target segment in
the process virtual memory does not exist or is unmapped while a
guest mapping exists the desired target address is stored as an
invalid segment table entry in the guest page table.
The population of the guest page table is fault driven.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On cpu hot remove a PFAULT CANCEL command is sent to the hypervisor
which in turn will cancel all outstanding pfault requests that have
been issued on that cpu (the same happens with a SIGP cpu reset).
The result is that we end up with uninterruptible processes where
the interrupt that would wake up these processes never arrives.
In order to solve this all processes which wait for a pfault
completion interrupt get woken up after a cpu hot remove. The worst
case that could happen is that they fault again and in turn need to
wait again.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The noexec support on s390 does not rely on a bit in the page table
entry but utilizes the secondary space mode to distinguish between
memory accesses for instructions vs. data. The noexec code relies
on the assumption that the cpu will always use the secondary space
page table for data accesses while it is running in the secondary
space mode. Up to the z9-109 class machines this has been the case.
Unfortunately this is not true anymore with z10 and later machines.
The load-relative-long instructions lrl, lgrl and lgfrl access the
memory operand using the same addressing-space mode that has been
used to fetch the instruction.
This breaks the noexec mode for all user space binaries compiled
with march=z10 or later. The only option is to remove the current
noexec support.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>