Add a new cmpxchg interface:
bool try_cmpxchg(u{8,16,32,64} *ptr, u{8,16,32,64} *val, u{8,16,32,64} new);
Where the boolean returns the result of the compare; and thus if the
exchange happened; and in case of failure, the new value of *ptr is
returned in *val.
This allows simplification/improvement of loops like:
for (;;) {
new = val $op $imm;
old = cmpxchg(ptr, val, new);
if (old == val)
break;
val = old;
}
into:
do {
} while (!try_cmpxchg(ptr, &val, val $op $imm));
while also generating better code (GCC6 and onwards).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If a PER_CPU struct which contains a spin_lock is statically initialized
via:
DEFINE_PER_CPU(struct foo, bla) = {
.lock = __SPIN_LOCK_UNLOCKED(bla.lock)
};
then lockdep assigns a seperate key to each lock because the logic for
assigning a key to statically initialized locks is to use the address as
the key. With per CPU locks the address is obvioulsy different on each CPU.
That's wrong, because all locks should have the same key.
To solve this the following modifications are required:
1) Extend the is_kernel/module_percpu_addr() functions to hand back the
canonical address of the per CPU address, i.e. the per CPU address
minus the per CPU offset.
2) Check the lock address with these functions and if the per CPU check
matches use the returned canonical address as the lock key, so all per
CPU locks have the same key.
3) Move the static_obj(key) check into look_up_lock_class() so this check
can be avoided for statically initialized per CPU locks. That's
required because the canonical address fails the static_obj(key) check
for obvious reasons.
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Merged Dan's fixups for !MODULES and !SMP into this patch. ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Murphy <dmurphy@ti.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170227143736.pectaimkjkan5kow@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull s390 fixes from Martin Schwidefsky:
- four patches to get the new cputime code in shape for s390
- add the new statx system call
- a few bug fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: wire up statx system call
KVM: s390: Fix guest migration for huge guests resulting in panic
s390/ipl: always use load normal for CCW-type re-IPL
s390/timex: micro optimization for tod_to_ns
s390/cputime: provide archicture specific cputime_to_nsecs
s390/cputime: reset all accounting fields on fork
s390/cputime: remove last traces of cputime_t
s390: fix in-kernel program checks
s390/crypt: fix missing unlock in ctr_paes_crypt on error path
Pull x86 fixes from Thomas Gleixner:
- a fix for the kexec/purgatory regression which was introduced in the
merge window via an innocent sparse fix. We could have reverted that
commit, but on deeper inspection it turned out that the whole
machinery is neither documented nor robust. So a proper cleanup was
done instead
- the fix for the TLB flush issue which was discovered recently
- a simple typo fix for a reboot quirk
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tlb: Fix tlb flushing when lguest clears PGE
kexec, x86/purgatory: Unbreak it and clean it up
x86/reboot/quirks: Fix typo in ASUS EeeBook X205TA reboot quirk
Pull irq fixes from Thomas Gleixner:
- a workaround for a GIC erratum
- a missing stub function for CONFIG_IRQDOMAIN=n
- fixes for a couple of type inconsistencies
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/crossbar: Fix incorrect type of register size
irqchip/gicv3-its: Add workaround for QDF2400 ITS erratum 0065
irqdomain: Add empty irq_domain_check_msi_remap
irqchip/crossbar: Fix incorrect type of local variables
Fengguang reported random corruptions from various locations on x86-32
after commits d2852a2240 ("arch: add ARCH_HAS_SET_MEMORY config") and
9d876e79df ("bpf: fix unlocking of jited image when module ronx not set")
that uses the former. While x86-32 doesn't have a JIT like x86_64, the
bpf_prog_lock_ro() and bpf_prog_unlock_ro() got enabled due to
ARCH_HAS_SET_MEMORY, whereas Fengguang's test kernel doesn't have module
support built in and therefore never had the DEBUG_SET_MODULE_RONX setting
enabled.
After investigating the crashes further, it turned out that using
set_memory_ro() and set_memory_rw() didn't have the desired effect, for
example, setting the pages as read-only on x86-32 would still let
probe_kernel_write() succeed without error. This behavior would manifest
itself in situations where the vmalloc'ed buffer was accessed prior to
set_memory_*() such as in case of bpf_prog_alloc(). In cases where it
wasn't, the page attribute changes seemed to have taken effect, leading to
the conclusion that a TLB invalidate didn't happen. Moreover, it turned out
that this issue reproduced with qemu in "-cpu kvm64" mode, but not for
"-cpu host". When the issue occurs, change_page_attr_set_clr() did trigger
a TLB flush as expected via __flush_tlb_all() through cpa_flush_range(),
though.
There are 3 variants for issuing a TLB flush: invpcid_flush_all() (depends
on CPU feature bits X86_FEATURE_INVPCID, X86_FEATURE_PGE), cr4 based flush
(depends on X86_FEATURE_PGE), and cr3 based flush. For "-cpu host" case in
my setup, the flush used invpcid_flush_all() variant, whereas for "-cpu
kvm64", the flush was cr4 based. Switching the kvm64 case to cr3 manually
worked fine, and further investigating the cr4 one turned out that
X86_CR4_PGE bit was not set in cr4 register, meaning the
__native_flush_tlb_global_irq_disabled() wrote cr4 twice with the same
value instead of clearing X86_CR4_PGE in the first write to trigger the
flush.
It turned out that X86_CR4_PGE was cleared from cr4 during init from
lguest_arch_host_init() via adjust_pge(). The X86_FEATURE_PGE bit is also
cleared from there due to concerns of using PGE in guest kernel that can
lead to hard to trace bugs (see bff672e630 ("lguest: documentation V:
Host") in init()). The CPU feature bits are cleared in dynamic
boot_cpu_data, but they never propagated to __flush_tlb_all() as it uses
static_cpu_has() instead of boot_cpu_has() for testing which variant of TLB
flushing to use, meaning they still used the old setting of the host
kernel.
Clearing via setup_clear_cpu_cap(X86_FEATURE_PGE) so this would propagate
to static_cpu_has() checks is too late at this point as sections have been
patched already, so for now, it seems reasonable to switch back to
boot_cpu_has(X86_FEATURE_PGE) as it was prior to commit c109bf9599
("x86/cpufeature: Remove cpu_has_pge"). This lets the TLB flush trigger via
cr3 as originally intended, properly makes the new page attributes visible
and thus fixes the crashes seen by Fengguang.
Fixes: c109bf9599 ("x86/cpufeature: Remove cpu_has_pge")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: bp@suse.de
Cc: Kees Cook <keescook@chromium.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: lkp@01.org
Cc: Laura Abbott <labbott@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernrl.org/r/20170301125426.l4nf65rx4wahohyl@wfg-t540p.sh.intel.com
Link: http://lkml.kernel.org/r/25c41ad9eca164be4db9ad84f768965b7eb19d9e.1489191673.git.daniel@iogearbox.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull KVM fixes from Radim Krčmář:
"ARM updates from Marc Zyngier:
- vgic updates:
- Honour disabling the ITS
- Don't deadlock when deactivating own interrupts via MMIO
- Correctly expose the lact of IRQ/FIQ bypass on GICv3
- I/O virtualization:
- Make KVM_CAP_NR_MEMSLOTS big enough for large guests with many
PCIe devices
- General bug fixes:
- Gracefully handle exception generated with syndroms that the host
doesn't understand
- Properly invalidate TLBs on VHE systems
x86:
- improvements in emulation of VMCLEAR, VMX MSR bitmaps, and VCPU
reset
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: nVMX: do not warn when MSR bitmap address is not backed
KVM: arm64: Increase number of user memslots to 512
KVM: arm/arm64: Remove KVM_PRIVATE_MEM_SLOTS definition that are unused
KVM: arm/arm64: Enable KVM_CAP_NR_MEMSLOTS on arm/arm64
KVM: Add documentation for KVM_CAP_NR_MEMSLOTS
KVM: arm/arm64: VGIC: Fix command handling while ITS being disabled
arm64: KVM: Survive unknown traps from guests
arm: KVM: Survive unknown traps from guests
KVM: arm/arm64: Let vcpu thread modify its own active state
KVM: nVMX: reset nested_run_pending if the vCPU is going to be reset
kvm: nVMX: VMCLEAR should not cause the vCPU to shut down
KVM: arm/arm64: vgic-v3: Don't pretend to support IRQ/FIQ bypass
arm64: KVM: VHE: Clear HCR_TGE when invalidating guest TLBs
Pull extable.h fix from Paul Gortmaker:
"Fixup for arch/score after extable.h introduction.
It seems that Guenter is the only one on the planet doing builds for
arch/score -- we don't have compile coverage for it in linux-next or
in the kbuild-bot either. Guenter couldn't even recall where he got
his toolchain, but was kind enough to share it with me so I could
validate this change and also add arch/score to my build coverage.
I sat on this a bit in case there was any other fallout in other arch
dirs, but since this still seems to be the only one, I might as well
send it on its way"
* tag 'extable-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
score: Fix implicit includes now failing build after extable change
Pull random updates from Ted Ts'o:
"Change get_random_{int,log} to use the CRNG used by /dev/urandom and
getrandom(2). It's faster and arguably more secure than cut-down MD5
that we had been using.
Also do some code cleanup"
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: move random_min_urandom_seed into CONFIG_SYSCTL ifdef block
random: convert get_random_int/long into get_random_u32/u64
random: use chacha20 for get_random_int/long
random: fix comment for unused random_min_urandom_seed
random: remove variable limit
random: remove stale urandom_init_wait
random: remove stale maybe_reseed_primary_crng
After changing from module.h to extable.h, score builds fail with:
arch/score/kernel/traps.c: In function 'do_ri':
arch/score/kernel/traps.c:248:4: error: implicit declaration of function 'user_disable_single_step'
arch/score/mm/extable.c: In function 'fixup_exception':
arch/score/mm/extable.c:32:38: error: dereferencing pointer to incomplete type
arch/score/mm/extable.c:34:24: error: dereferencing pointer to incomplete type
because extable.h doesn't drag in the same amount of headers as the
module.h did. Add in the headers which were implicitly expected.
Fixes: 90858794c9 ("module.h: remove extable.h include now users have migrated")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
[PG: tweak commit log; refresh for sched header refactoring.]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>