When disabling LPIs (for example on reset) at the redistributor
level, it is expected that LPIs that was pending in the CPU
interface are eventually retired.
Currently, this is not what is happening, and these LPIs will
stay in the ap_list, eventually being acknowledged by the vcpu
(which didn't quite expect this behaviour).
The fix is thus to retire these LPIs from the list of pending
interrupts as we disable LPIs.
Reported-by: Heyi Guo <guoheyi@huawei.com>
Tested-by: Heyi Guo <guoheyi@huawei.com>
Fixes: 0e4e82f154 ("KVM: arm64: vgic-its: Enable ITS emulation as a virtual MSI controller")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Recently the generic timer test of kvm-unit-tests failed to complete
(stalled) when a physical timer is being used. This issue is caused
by incorrect update of CNTP_CVAL when CNTP_TVAL is being accessed,
introduced by 'Commit 84135d3d18 ("KVM: arm/arm64: consolidate arch
timer trap handlers")'. According to Arm ARM, the read/write behavior
of accesses to the TVAL registers is expected to be:
* READ: TimerValue = (CompareValue – (Counter - Offset)
* WRITE: CompareValue = ((Counter - Offset) + Sign(TimerValue)
This patch fixes the TVAL read/write code path according to the
specification.
Fixes: 84135d3d18 ("KVM: arm/arm64: consolidate arch timer trap handlers")
Signed-off-by: Wei Huang <wei@redhat.com>
[maz: commit message tidy-up]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Some comments in virt/kvm/arm/mmu.c are outdated. Update them to
reflect the current state of the code.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
[maz: commit message tidy-up]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Fix sparse warnings:
arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-its.c:1732:5: warning:
symbol 'vgic_its_has_attr_regs' was not declared. Should it be static?
arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-its.c:1753:5: warning:
symbol 'vgic_its_attr_regs_access' was not declared. Should it be static?
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
[maz: fixed subject]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We rely on the mmu_notifier call backs to handle the split/merge
of huge pages and thus we are guaranteed that, while creating a
block mapping, either the entire block is unmapped at stage2 or it
is missing permission.
However, we miss a case where the block mapping is split for dirty
logging case and then could later be made block mapping, if we cancel the
dirty logging. This not only creates inconsistent TLB entries for
the pages in the the block, but also leakes the table pages for
PMD level.
Handle this corner case for the huge mappings at stage2 by
unmapping the non-huge mapping for the block. This could potentially
release the upper level table. So we need to restart the table walk
once we unmap the range.
Fixes : ad361f093c ("KVM: ARM: Support hugetlbfs backed huge pages")
Reported-by: Zheng Xiang <zhengxiang9@huawei.com>
Cc: Zheng Xiang <zhengxiang9@huawei.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
commit 6794ad5443 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings")
made the checks to skip huge mappings, stricter. However it introduced
a bug where we still use huge mappings, ignoring the flag to
use PTE mappings, by not reseting the vma_pagesize to PAGE_SIZE.
Also, the checks do not cover the PUD huge pages, that was
under review during the same period. This patch fixes both
the issues.
Fixes : 6794ad5443 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings")
Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
When halting a guest, QEMU flushes the virtual ITS caches, which
amounts to writing to the various tables that the guest has allocated.
When doing this, we fail to take the srcu lock, and the kernel
shouts loudly if running a lockdep kernel:
[ 69.680416] =============================
[ 69.680819] WARNING: suspicious RCU usage
[ 69.681526] 5.1.0-rc1-00008-g600025238f51-dirty #18 Not tainted
[ 69.682096] -----------------------------
[ 69.682501] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage!
[ 69.683225]
[ 69.683225] other info that might help us debug this:
[ 69.683225]
[ 69.683975]
[ 69.683975] rcu_scheduler_active = 2, debug_locks = 1
[ 69.684598] 6 locks held by qemu-system-aar/4097:
[ 69.685059] #0: 0000000034196013 (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0
[ 69.686087] #1: 00000000f2ed935e (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0
[ 69.686919] #2: 000000005e71ea54 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.687698] #3: 00000000c17e548d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.688475] #4: 00000000ba386017 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.689978] #5: 00000000c2c3c335 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.690729]
[ 69.690729] stack backtrace:
[ 69.691151] CPU: 2 PID: 4097 Comm: qemu-system-aar Not tainted 5.1.0-rc1-00008-g600025238f51-dirty #18
[ 69.691984] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019
[ 69.692831] Call trace:
[ 69.694072] lockdep_rcu_suspicious+0xcc/0x110
[ 69.694490] gfn_to_memslot+0x174/0x190
[ 69.694853] kvm_write_guest+0x50/0xb0
[ 69.695209] vgic_its_save_tables_v0+0x248/0x330
[ 69.695639] vgic_its_set_attr+0x298/0x3a0
[ 69.696024] kvm_device_ioctl_attr+0x9c/0xd8
[ 69.696424] kvm_device_ioctl+0x8c/0xf8
[ 69.696788] do_vfs_ioctl+0xc8/0x960
[ 69.697128] ksys_ioctl+0x8c/0xa0
[ 69.697445] __arm64_sys_ioctl+0x28/0x38
[ 69.697817] el0_svc_common+0xd8/0x138
[ 69.698173] el0_svc_handler+0x38/0x78
[ 69.698528] el0_svc+0x8/0xc
The fix is to obviously take the srcu lock, just like we do on the
read side of things since bf308242ab. One wonders why this wasn't
fixed at the same time, but hey...
Fixes: bf308242ab ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The normal interrupt flow is not to enable the vgic when no virtual
interrupt is to be injected (i.e. the LRs are empty). But when a guest
is likely to use GICv4 for LPIs, we absolutely need to switch it on
at all times. Otherwise, VLPIs only get delivered when there is something
in the LRs, which doesn't happen very often.
Reported-by: Nianyao Tang <tangnianyao@huawei.com>
Tested-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We've become very cautious to now always reset the vcpu when nothing
is loaded on the physical CPU. To do so, we now disable preemption
and do a kvm_arch_vcpu_put() to make sure we have all the state
in memory (and that it won't be loaded behind out back).
This now causes issues with resetting the PMU, which calls into perf.
Perf itself uses mutexes, which clashes with the lack of preemption.
It is worth realizing that the PMU is fully emulated, and that
no PMU state is ever loaded on the physical CPU. This means we can
perfectly reset the PMU outside of the non-preemptible section.
Fixes: e761a927bc ("KVM: arm/arm64: Reset the VCPU without preemption and vcpu state loaded")
Reported-by: Julien Grall <julien.grall@arm.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Pull more Kbuild updates from Masahiro Yamada:
- add more Build-Depends to Debian source package
- prefix header search paths with $(srctree)/
- make modpost show verbose section mismatch warnings
- avoid hard-coded CROSS_COMPILE for h8300
- fix regression for Debian make-kpkg command
- add semantic patch to detect missing put_device()
- fix some warnings of 'make deb-pkg'
- optimize NOSTDINC_FLAGS evaluation
- add warnings about redundant generic-y
- clean up Makefiles and scripts
* tag 'kbuild-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: remove stale lxdialog/.gitignore
kbuild: force all architectures except um to include mandatory-y
kbuild: warn redundant generic-y
Revert "modsign: Abort modules_install when signing fails"
kbuild: Make NOSTDINC_FLAGS a simply expanded variable
kbuild: deb-pkg: avoid implicit effects
coccinelle: semantic code search for missing put_device()
kbuild: pkg: grep include/config/auto.conf instead of $KCONFIG_CONFIG
kbuild: deb-pkg: introduce is_enabled and if_enabled_echo to builddeb
kbuild: deb-pkg: add CONFIG_ prefix to kernel config options
kbuild: add workaround for Debian make-kpkg
kbuild: source include/config/auto.conf instead of ${KCONFIG_CONFIG}
unicore32: simplify linker script generation for decompressor
h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux-
kbuild: move archive command to scripts/Makefile.lib
modpost: always show verbose warning for section mismatch
ia64: prefix header search path with $(srctree)/
libfdt: prefix header search paths with $(srctree)/
deb-pkg: generate correct build dependencies
Pull x86 asm updates from Thomas Gleixner:
"Two cleanup patches removing dead conditionals and unused code"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Remove unused __constant_c_x_memset() macro and inlines
x86/asm: Remove dead __GNUC__ conditionals
Pull perf fixes from Thomas Gleixner:
"Three fixes for the fallout from the TSX errata workaround:
- Prevent memory corruption caused by a unchecked out of bound array
index.
- Two trivial fixes to address compiler warnings"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Make dev_attr_allow_tsx_force_abort static
perf/x86: Fixup typo in stub functions
perf/x86/intel: Fix memory corruption
Pull xen fix from Juergen Gross:
"A fix for a Xen bug introduced by David's series for excluding
ballooned pages in vmcores"
* tag 'for-linus-5.1b-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/balloon: Fix mapping PG_offline pages to user space
Pull 9p updates from Dominique Martinet:
"Here is a 9p update for 5.1; there honestly hasn't been much.
Two fixes (leak on invalid mount argument and possible deadlock on
i_size update on 32bit smp) and a fall-through warning cleanup"
* tag '9p-for-5.1' of git://github.com/martinetd/linux:
9p/net: fix memory leak in p9_client_create
9p: use inode->i_lock to protect i_size_write() under 32-bit
9p: mark expected switch fall-through
When this .gitignore was added, lxdialog was an independent hostprogs-y.
Now that all objects in lxdialog/ are directly linked to mconf, the
lxdialog is no longer generated.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Currently, every arch/*/include/uapi/asm/Kbuild explicitly includes
the common Kbuild.asm file. Factor out the duplicated include directives
to scripts/Makefile.asm-generic so that no architecture would opt out
of the mandatory-y mechanism.
um is not forced to include mandatory-y since it is a very exceptional
case which does not support UAPI.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
The generic-y is redundant under the following condition:
- arch has its own implementation
- the same header is added to generated-y
- the same header is added to mandatory-y
If a redundant generic-y is found, the warning like follows is displayed:
scripts/Makefile.asm-generic:20: redundant generic-y found in arch/arm/include/asm/Kbuild: timex.h
I fixed up arch Kbuild files found by this.
Suggested-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
During a simple no-op (nothing changed) build I saw 39 invocations of
the C compiler with the argument "-print-file-name=include". We don't
need to call the C compiler 39 times for this--one time will suffice.
Let's change NOSTDINC_FLAGS to a simply expanded variable to avoid
this since there doesn't appear to be any reason it should be
recursively expanded.
On my build this shaved ~400 ms off my "no-op" build.
Note that the recursive expansion seems to date back to the (really
old) commit e8f5bdb02c ("[PATCH] Makefile include path ordering").
It's a little unclear to me if the point of that patch was to switch
the variable to be recursively expanded (which it did) or to avoid
directly assigning to NOSTDINC_FLAGS (AKA to switch to +=) because
someone else (out of tree?) was setting it. I presume later since if
the only goal was to switch to recursive expansion the patch would
have just removed the ":".
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* The man page for dpkg-source(1) notes:
> -b, --build directory [format-specific-parameters]
> Build a source package (--build since dpkg 1.17.14).
> <...>
>
> dpkg-source will build the source package with the first
> format found in this ordered list: the format indicated
> with the --format command line option, the format
> indicated in debian/source/format, “1.0”. The fallback
> to “1.0” is deprecated and will be removed at some point
> in the future, you should always document the desired
> source format in debian/source/format. See section
> SOURCE PACKAGE FORMATS for an extensive description of
> the various source package formats.
Thus it would be more foolproof to explicitly use 1.0 (as we always
did) than to rely on dpkg-source's defaults.
* In a similar vein, debian/rules is not made executable by mkdebian,
and dpkg-source warns about that but still silently fixes the file.
Let's be explicit once again.
Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
The of_find_device_by_node() takes a reference to the underlying device
structure, we should release that reference.
The implementation of this semantic code search is:
In a function, for a local variable returned by calling
of_find_device_by_node(),
a, if it is released by a function such as
put_device()/of_dev_put()/platform_device_put() after the last use,
it is considered that there is no reference leak;
b, if it is passed back to the caller via
dev_get_drvdata()/platform_get_drvdata()/get_device(), etc., the
reference will be released in other functions, and the current function
also considers that there is no reference leak;
c, for the rest of the situation, the current function should release the
reference by calling put_device, this code search will report the
corresponding error message.
By using this semantic code search, we have found some object reference leaks,
such as:
commit 11907e9d35 ("ASoC: fsl-asoc-card: fix object reference leaks in
fsl_asoc_card_probe")
commit a12085d139 ("mtd: rawnand: atmel: fix possible object reference leak")
commit 11493f2685 ("mtd: rawnand: jz4780: fix possible object reference leak")
There are still dozens of reference leaks in the current kernel code.
Further, for the case of b, the object returned to other functions may also
have a reference leak, we will continue to develop other cocci scripts to
further check the reference leak.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Reviewed-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Markus Elfring <Markus.Elfring@web.de>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Pull pidfd system call from Christian Brauner:
"This introduces the ability to use file descriptors from /proc/<pid>/
as stable handles on struct pid. Even if a pid is recycled the handle
will not change. For a start these fds can be used to send signals to
the processes they refer to.
With the ability to use /proc/<pid> fds as stable handles on struct
pid we can fix a long-standing issue where after a process has exited
its pid can be reused by another process. If a caller sends a signal
to a reused pid it will end up signaling the wrong process.
With this patchset we enable a variety of use cases. One obvious
example is that we can now safely delegate an important part of
process management - sending signals - to processes other than the
parent of a given process by sending file descriptors around via scm
rights and not fearing that the given process will have been recycled
in the meantime. It also allows for easy testing whether a given
process is still alive or not by sending signal 0 to a pidfd which is
quite handy.
There has been some interest in this feature e.g. from systems
management (systemd, glibc) and container managers. I have requested
and gotten comments from glibc to make sure that this syscall is
suitable for their needs as well. In the future I expect it to take on
most other pid-based signal syscalls. But such features are left for
the future once they are needed.
This has been sitting in linux-next for quite a while and has not
caused any issues. It comes with selftests which verify basic
functionality and also test that a recycled pid cannot be signaled via
a pidfd.
Jon has written about a prior version of this patchset. It should
cover the basic functionality since not a lot has changed since then:
https://lwn.net/Articles/773459/
The commit message for the syscall itself is extensively documenting
the syscall, including it's functionality and extensibility"
* tag 'pidfd-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
selftests: add tests for pidfd_send_signal()
signal: add pidfd_send_signal() syscall