Recent rework of the virtio_mmio probe/remove paths balanced a
devm_ioremap() with an iounmap() rather than its devm variant. This ends
up corrupting the devm datastructures, and results in the following
boot-time splat on arm64 under QEMU 2.9.0:
[ 3.450397] ------------[ cut here ]------------
[ 3.453822] Trying to vfree() nonexistent vm area (00000000c05b4844)
[ 3.460534] WARNING: CPU: 1 PID: 1 at mm/vmalloc.c:1525 __vunmap+0x1b8/0x220
[ 3.475898] Kernel panic - not syncing: panic_on_warn set ...
[ 3.475898]
[ 3.493933] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc3 #1
[ 3.513109] Hardware name: linux,dummy-virt (DT)
[ 3.525382] Call trace:
[ 3.531683] dump_backtrace+0x0/0x368
[ 3.543921] show_stack+0x20/0x30
[ 3.547767] dump_stack+0x108/0x164
[ 3.559584] panic+0x25c/0x51c
[ 3.569184] __warn+0x29c/0x31c
[ 3.576023] report_bug+0x1d4/0x290
[ 3.586069] bug_handler.part.2+0x40/0x100
[ 3.597820] bug_handler+0x4c/0x88
[ 3.608400] brk_handler+0x11c/0x218
[ 3.613430] do_debug_exception+0xe8/0x318
[ 3.627370] el1_dbg+0x18/0x78
[ 3.634037] __vunmap+0x1b8/0x220
[ 3.648747] vunmap+0x6c/0xc0
[ 3.653864] __iounmap+0x44/0x58
[ 3.659771] devm_ioremap_release+0x34/0x68
[ 3.672983] release_nodes+0x404/0x880
[ 3.683543] devres_release_all+0x6c/0xe8
[ 3.695692] driver_probe_device+0x250/0x828
[ 3.706187] __driver_attach+0x190/0x210
[ 3.717645] bus_for_each_dev+0x14c/0x1f0
[ 3.728633] driver_attach+0x48/0x78
[ 3.740249] bus_add_driver+0x26c/0x5b8
[ 3.752248] driver_register+0x16c/0x398
[ 3.757211] __platform_driver_register+0xd8/0x128
[ 3.770860] virtio_mmio_init+0x1c/0x24
[ 3.782671] do_one_initcall+0xe0/0x398
[ 3.791890] kernel_init_freeable+0x594/0x660
[ 3.798514] kernel_init+0x18/0x190
[ 3.810220] ret_from_fork+0x10/0x18
To fix this, we can simply rip out the explicit cleanup that the devm
infrastructure will do for us when our probe function returns an error
code, or when our remove function returns.
We only need to ensure that we call put_device() if a call to
register_virtio_device() fails in the probe path.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 7eb781b1bb ("virtio_mmio: add cleanup for virtio_mmio_probe")
Fixes: 25f32223bc ("virtio_mmio: add cleanup for virtio_mmio_remove")
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: weiping zhang <zhangweiping@didichuxing.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
The function virtqueue_get_buf_ctx() could return NULL, the return
value 'buf' need to be checked with NULL, not value 'ctx'.
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
As mentioned at drivers/base/core.c:
/*
* NOTE: _Never_ directly free @dev after calling this function, even
* if it returned an error! Always use put_device() to give up the
* reference initialized in this function instead.
*/
so we don't free vm_dev until vm_dev.dev.release be called.
Signed-off-by: weiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
On driver remove(), all objects created during probe() should be
removed, but sysfs qemu_fw_cfg/rev file was left. Also reorder
functions to match probe() error cleanup code.
Cc: stable@vger.kernel.org
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The vhost_vsock->guest_cid field is uninitialized when /dev/vhost-vsock
is opened until the VHOST_VSOCK_SET_GUEST_CID ioctl is called.
kvmalloc(..., GFP_KERNEL | __GFP_RETRY_MAYFAIL) does not zero memory.
All other vhost_vsock fields are initialized explicitly so just
initialize this field too.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
During access_ok checks, addr increases as we iterate over the data
structure, thus addr + len - 1 will point beyond the end of region we
are translating. Harmless since we then verify that the region covers
addr, but let's not waste cpu cycles.
Reported-by: Koichiro Den <den@klaipeden.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Koichiro Den <den@klaipeden.com>
The following patch changed the behavior which originally did safe
iteration. Make it safe as it was.
12bdcbd539
vhost/scsi: Don't reinvent the wheel but use existing llist API
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
fill_balloon doing memory allocations under balloon_lock
can cause a deadlock when leak_balloon is called from
virtballoon_oom_notify and tries to take same lock.
To fix, split page allocation and enqueue and do allocations outside the lock.
Here's a detailed analysis of the deadlock by Tetsuo Handa:
In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
serialize against fill_balloon(). But in fill_balloon(),
alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
is specified, this allocation attempt might indirectly depend on somebody
else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
__GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
will cause OOM lockup.
Thread1 Thread2
fill_balloon()
takes a balloon_lock
balloon_page_enqueue()
alloc_page(GFP_HIGHUSER_MOVABLE)
direct reclaim (__GFP_FS context) takes a fs lock
waits for that fs lock alloc_page(GFP_NOFS)
__alloc_pages_may_oom()
takes the oom_lock
out_of_memory()
blocking_notifier_call_chain()
leak_balloon()
tries to take that balloon_lock and deadlocks
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Pull x86 fixes from Thomas Gleixner:
"A set of small fixes:
- make KGDB work again which got broken by the conversion of WARN()
to #UD. The WARN fixup needs to run before the notifier callchain,
otherwise KGDB tries to handle it and crashes.
- disable KASAN in the ORC unwinder to prevent false positive KASAN
warnings
- prevent default mapping above 47bit when 5 level page tables are
enabled
- make the delay calibration optimization work correctly, which had
the conditionals the wrong way around and was operating on data
which was not yet updated.
- remove the bogus X86_TRAP_BP trap init from the default IDT init
table, which broke 32bit int3 handling by overwriting the correct
int3 setup.
- replace this_cpu* with boot_cpu_data access in the preemptible
oprofile init code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/debug: Handle warnings before the notifier chain, to fix KGDB crash
x86/mm: Fix ELF_ET_DYN_BASE for 5-level paging
x86/idt: Remove X86_TRAP_BP initialization in idt_setup_traps()
x86/oprofile/ppro: Do not use __this_cpu*() in preemptible context
x86/unwind: Disable KASAN checking in the ORC unwinder
x86/smpboot: Make optimization of delay calibration work correctly
Pull perf tool fixes from Thomas Gleixner:
"A small set of fixes for perf tool:
- synchronize the i915 drm header to avoid the 'out of date' warning
- make sure that perf trace cleans up its temporary files on exit
- unbreak the build with newer flex versions
- add missing braces in the eBPF parsing rules"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tooling/headers: Sync the tools/include/uapi/drm/i915_drm.h UAPI header
perf trace: Call machine__exit() at exit
perf tools: Fix eBPF event specification parsing
perf tools: Add "reject" option for parse-events.l
Pull networking fixes from David Miller:
1) Use after free in vlan, from Cong Wang.
2) Handle NAPI poll with a zero budget properly in mlx5 driver, from
Saeed Mahameed.
3) If DMA mapping fails in mlx5 driver, NULL out page, from Inbar
Karmy.
4) Handle overrun in RX FIFO of sun4i CAN driver, from Gerhard
Bertelsmann.
5) Missing return in mdb and vlan prepare phase of DSA layer, from
Vivien Didelot.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
vlan: fix a use-after-free in vlan_device_event()
net: dsa: return after vlan prepare phase
net: dsa: return after mdb prepare phase
can: ifi: Fix transmitter delay calculation
tcp: fix tcp_fastretrans_alert warning
tcp: gso: avoid refcount_t warning from tcp_gso_segment()
can: peak: Add support for new PCIe/M2 CAN FD interfaces
can: sun4i: handle overrun in RX FIFO
can: c_can: don't indicate triple sampling support for D_CAN
net/mlx5e: Increase Striding RQ minimum size limit to 4 multi-packet WQEs
net/mlx5e: Set page to null in case dma mapping fails
net/mlx5e: Fix napi poll with zero budget
net/mlx5: Cancel health poll before sending panic teardown command
net/mlx5: Loop over temp list to release delay events
rds: ib: Fix NULL pointer dereference in debug code
Marc Kleine-Budde says:
====================
pull-request: can 2017-11-10
this is a pull request for net/master.
The first patch by Richard Schütz for the c_can driver removes the false
indication to support triple sampling for d_can. Gerhard Bertelsmann's
patch for the sun4i driver improves the RX overrun handling. The patch
by Stephane Grosjean for the peak_canfd driver adds the PCI ids for
various new PCIe/M2 interfaces. Marek Vasut's patch for the ifi driver
fix transmitter delay calculation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2017-11-08
The following series includes some fixes for mlx5 core and etherent
driver.
Sorry for the late submission but as you can see i have some very
critical fixes below that i would like them merged into this RC.
Please pull and let me know if there is any problem.
For -stable:
('net/mlx5e: Set page to null in case dma mapping fails') kernels >= 4.13
('net/mlx5: FPGA, return -EINVAL if size is zero') kernels >= 4.13
('net/mlx5: Cancel health poll before sending panic teardown command') kernels >= 4.13
V1->V2:
- Fix Reviewed-by tag of the 2nd patch.
- Drop the FPGA 0 size fix, it needs some more change log info.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
After refcnt reaches zero, vlan_vid_del() could free
dev->vlan_info via RCU:
RCU_INIT_POINTER(dev->vlan_info, NULL);
call_rcu(&vlan_info->rcu, vlan_info_rcu_free);
However, the pointer 'grp' still points to that memory
since it is set before vlan_vid_del():
vlan_info = rtnl_dereference(dev->vlan_info);
if (!vlan_info)
goto out;
grp = &vlan_info->grp;
Depends on when that RCU callback is scheduled, we could
trigger a use-after-free in vlan_group_for_each_dev()
right following this vlan_vid_del().
Fix it by moving vlan_vid_del() before setting grp. This
is also symmetric to the vlan_vid_add() we call in
vlan_device_event().
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Fixes: efc73f4bbc ("net: Fix memory leak - vlan_info struct")
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Last minute upstream update to one of the UAPI headers - sync it with tooling,
to address this warning:
Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The current code does not return after successfully preparing the VLAN
addition on every ports member of a it. Fix this.
Fixes: 1ca4aa9cd4 ("net: dsa: check VLAN capability of every switch")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code does not return after successfully preparing the MDB
addition on every ports member of a multicast group. Fix this.
Fixes: a1a6b7ea7f ("net: dsa: add cross-chip multicast support")
Reported-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull ceph gix from Ilya Dryomov:
"Memory allocation flags fix, marked for stable"
* tag 'ceph-for-4.14-rc9' of git://github.com/ceph/ceph-client:
rbd: use GFP_NOIO for parent stat and data requests