Commit Graph

61269 Commits

Author SHA1 Message Date
Christian Borntraeger 388186bc92 [S390] kvm: Handle diagnose 0x10 (release pages)
Linux on System z uses a ballooner based on diagnose 0x10. (aka as
collaborative memory management). This patch implements diagnose
0x10 on the guest address space.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:45 +01:00
Carsten Otte 499069e1a4 [S390] take mmap_sem when walking guest page table
gmap_fault needs to walk the guest page table. However, parts of
that may change if some other thread does munmap. In that case
gmap_unmap_notifier will also unmap the corresponding parts from
the guest page table. We need to take mmap_sem in order to serialize
these operations.
do_exception now calls __gmap_fault with mmap_sem held which does
not get exported to modules. The exported function, which is called
from KVM, now takes mmap_sem.

Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:45 +01:00
Carsten Otte cc772456ac [S390] fix list corruption in gmap reverse mapping
This introduces locking via mm->page_table_lock to protect
the rmap list for guest mappings from being corrupted by concurrent
operations.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:44 +01:00
Carsten Otte a9162f238a [S390] fix possible deadlock in gmap_map_segment
Fix possible deadlock reported by lockdep:
qemu-system-s39/2963 is trying to acquire lock:
(&mm->mmap_sem){++++++}, at: gmap_alloc_table+0x9c/0x120
but task is already holding lock:
(&mm->mmap_sem){++++++}, at: gmap_map_segment+0xa6/0x27c

Actually gmap_alloc_table is the only called in gmap_map_segment with
mmap_sem held, thus it's safe to simply remove the inner lock.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:44 +01:00
Carsten Otte 69ba974366 [S390] load user asce on sie_fault
On sie_fault we need to switch back to user ASCE. Otherwise we get
interresting effects when exiting to "userspace" while the guest
space is still active.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:44 +01:00
Martin Schwidefsky d98e19ccef [S390] smp: external call vs. emergency signal
Use a sigp sense running to decide which signal processor order to use
for an ipi. If the target cpu is running use external call, if the target
cpu is not running use emergency signal.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:44 +01:00
Sebastian Ott 65b4e403ac [S390] chsc_sch: add support for irq statistics
Add support for CHSC I/O interrupt statistics in /proc/interrupts.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:44 +01:00
Martin Schwidefsky d4e81b35b8 [S390] allow all addressing modes
The user space program can change its addressing mode between the
24-bit, 31-bit and the 64-bit mode if the kernel is 64 bit. Currently
the kernel always forces the standard amode on signal delivery and
signal return and on ptrace: 64-bit for a 64-bit process, 31-bit for
a compat process and 31-bit kernels. Change the signal and ptrace code
to allow the full range of addressing modes. Signal handlers are
run in the standard addressing mode for the process.

One caveat is that even an 31-bit compat process can switch to the
64-bit mode. The next signal will switch back into the 31-bit mode
and there is no room in the 31-bit compat signal frame to store the
information that the program came from the 64-bit mode.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:43 +01:00
Martin Schwidefsky b50511e41a [S390] cleanup psw related bits and pieces
Split out addressing mode bits from PSW_BASE_BITS, rename PSW_BASE_BITS
to PSW_MASK_BASE, get rid of psw_user32_bits, remove unused function
enabled_wait(), introduce PSW_MASK_USER, and drop PSW_MASK_MERGE macros.
Change psw_kernel_bits / psw_user_bits to contain only the bits that
are always set in the respective mode.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:43 +01:00
Martin Schwidefsky b6ef5bb3d9 [S390] add TIF_SYSCALL thread flag
Add an explicit TIF_SYSCALL bit that indicates if a task is inside
a system call. The svc_code in the pt_regs structure is now only
valid if TIF_SYSCALL is set. With this definition TIF_RESTART_SVC
can be replaced with TIF_SYSCALL. Overall do_signal is a bit more
readable and it saves a few lines of code.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:43 +01:00
Martin Schwidefsky ccf45cafb0 [S390] addressing mode limits and psw address wrapping
An instruction with an address right below the adress limit for the
current addressing mode will wrap. The instruction restart logic in
the protection fault handler and the signal code need to follow the
wrapping rules to find the correct instruction address.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:43 +01:00
Martin Schwidefsky 20b40a794b [S390] signal race with restarting system calls
For a ERESTARTNOHAND/ERESTARTSYS/ERESTARTNOINTR restarting system call
do_signal will prepare the restart of the system call with a rewind of
the PSW before calling get_signal_to_deliver (where the debugger might
take control). For A ERESTART_RESTARTBLOCK restarting system call
do_signal will set -EINTR as return code.
There are two issues with this approach:
1) strace never sees ERESTARTNOHAND, ERESTARTSYS, ERESTARTNOINTR or
   ERESTART_RESTARTBLOCK as the rewinding already took place or the
   return code has been changed to -EINTR
2) if get_signal_to_deliver does not return with a signal to deliver
   the restart via the repeat of the svc instruction is left in place.
   This opens a race if another signal is made pending before the
   system call instruction can be reexecuted. The original system call
   will be restarted even if the second signal would have ended the
   system call with -EINTR.

These two issues can be solved by dropping the early rewind of the
system call before get_signal_to_deliver has been called and by using
the TIF_RESTART_SVC magic to do the restart if no signal has to be
delivered. The only situation where the system call restart via the
repeat of the svc instruction is appropriate is when a SA_RESTART
signal is delivered to user space.

Unfortunately this breaks inferior calls by the debugger again. The
system call number and the length of the system call instruction is
lost over the inferior call and user space will see ERESTARTNOHAND/
ERESTARTSYS/ERESTARTNOINTR/ERESTART_RESTARTBLOCK. To correct this a
new ptrace interface is added to save/restore the system call number
and system call instruction length.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:43 +01:00
Hendrik Brueckner 3ee49c8f12 [S390] defconfig: switch on CONFIG_DEVTMPFS
Switching on the DEVTMPFS kernel option helpes to maintain a /dev
file system early in the boot process, especially, in limited
environments.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:42 +01:00
Martin Schwidefsky 0edc8faa76 [S390] lowcore cleanup
Remove the save_area_64 field from the 0xe00 - 0xf00 area in the lowcore.
Use a free slot in the save_area array instead.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:42 +01:00
Michael Holzheu dab7a7b153 [S390] Add architecture code for unmapping crashkernel memory
This patch implements the crash_map_pages() function for s390.
KEXEC_CRASH_MEM_ALIGN is set to HPAGE_SIZE, in order to support
kernel mappings that use large pages. We also use HPAGE_SIZE alignment
for CONFIG_HUGETLB_PAGE=n in order to have the same 1 MiB alignment on
all s390 systems.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:42 +01:00
Michael Holzheu d38593f938 [S390] Export vmcoreinfo note
This patch defines for s390 an ABI defined pointer to the vmcoreinfo note at
a well known address. With this patch tools are able to find this information
in dumps created by stand-alone or hypervisor dump tools.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:42 +01:00
Michael Holzheu 60a0c68df2 [S390] kdump backend code
This patch provides the architecture specific part of the s390 kdump
support.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:42 +01:00
Michael Holzheu 7f0bf656c6 [S390] Add real memory access functions
Add access function for real memory needed by s390 kdump backend.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:42 +01:00
Michael Holzheu 1943f53c9c [S390] Force PSW restart on online CPU
PSW restart can be triggered on offline CPUs. If this happens, currently
the PSW restart code fails, because functions like smp_processor_id()
do not work on offline CPUs. This patch fixes this as follows:

If PSW restart is triggered on an offline CPU, the PSW restart (sigp restart)
is done a second time on another CPU that is online and the old CPU is
stopped afterwards.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:41 +01:00
Tejun Heo 80853a8ab4 [S390] fix _TIF_SINGLE_STEP definition
_TIF_SINGLE_STEP is incorrectly defined as 1<<TIF_FREEZE.  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:41 +01:00
Jan Glauber 017ec18360 [S390] use ENTRY macro for sys_setns_wrapper
Use the ENTRY macro for the system call wrapper sys_setns_wrapper
similarly to the other wrappers.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:16 +01:00
Martin Schwidefsky e73b7fffe4 [S390] memory leak with RCU_TABLE_FREE
The rcu page table free code uses a couple of bits in the page table
pointer passed to tlb_remove_table to discern the different page table
types. __tlb_remove_table extracts the type with an incorrect mask which
leads to memory leaks. The correct mask is ((FRAG_MASK << 4) | FRAG_MASK).

Cc: stable@kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:15 +01:00
Martin Schwidefsky a45aff5285 [S390] user per registers vs. ptrace single stepping
git commit 5e9a2692 "[S390] ptrace cleanup" introduced a regression
for the case when both a user PER set (e.g. a storage alteration trace) and
PTRACE_SINGLESTEP are active. The new code will overrule the user PER set
with a instruction-fetch PER set over the whole address space for ptrace
single stepping. The inferior process will be stopped after each instruction
with an instruction fetch event. Any other events that may have occurred
concurrently are not reported (e.g. storage alteration event) because the
control bits for them are not set. The solution is to merge the PER control
bits of the user PER set with the PER_EVENT_IFETCH control bit for
PTRACE_SINGLESTEP.

Cc: stable@kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:15 +01:00
Sebastian Ott caa04f69df [S390] topology: fix alloc_masks annotation
Fix this warning:
WARNING: vmlinux.o(.text+0x199b6): Section mismatch in reference from
the function alloc_masks() to the function .init.text:__alloc_bootmem()

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:15 +01:00
Martin Schwidefsky dd4a5a31fc [S390] avoid warning in show_cpuinfo
The .start function and indirectly the .next function of the show_cpuinfo
sequential operation uses NR_CPUS as limit instead of nr_cpu_ids.
This can cause warnings like this:

WARNING: at /usr/src/linux/include/linux/cpumask.h:107
Process lscpu (pid: 575, task: 000000007deb4338, ksp: 000000007794f588)
Krnl PSW : 0704000180000000 0000000000106db4 (show_cpuinfo+0x108/0x234)
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3
Krnl GPRS: 0000000000000003 0000000000791988 000000000071b478 0000000000000004
           0000000000000001 0000000000000000 000000007d139500 0000000000000400
           0000000000000000 000000000070e24c 000000007d48d600 0000000000000005
           000000007d48d600 00000000004dfa10 0000000000106cf8 000000007794fcc0
Krnl Code: 0000000000106da8: 95001000           cli     0(%r1),0
           0000000000106dac: a774ffac           brc     7,106d04
           0000000000106db0: a7f40001           brc     15,106db2
          >0000000000106db4: 92011000           mvi     0(%r1),1
           0000000000106db8: a7f4ffa6           brc     15,106d04
           0000000000106dbc: c0e5000065b4       brasl   %r14,113924
           0000000000106dc2: c09000303a45       larl    %r9,70e24c
           0000000000106dc8: c020001eefd4       larl    %r2,4e4d70

Replacing NR_CPUS with nr_cpu_ids fixes it.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:15 +01:00