This is in preparation for having another tag set available. Cleanup
the parameters, and allow passing in of tags for blk_mq_put_tag().
Signed-off-by: Jens Axboe <axboe@fb.com>
[hch: even more cleanups]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
It's only used in blk-mq, kill it from the main exported header
and kill the symbol export as well.
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Prep patch for adding MQ ops as well, since doing anon unions with
named initializers doesn't work on older compilers.
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
If a GUID Partition Table claims to have more than 2**25 entries, the
calculation of the partition table size in alloc_read_gpt_entries() will
overflow a 32-bit integer and not enough space will be allocated for the
table.
Nothing seems to get written out of bounds, but later efi_partition() will
read up to 32768 bytes from a 128 byte buffer, possibly OOPSing or exposing
information to /proc/partitions and uevents.
The problem exists on both 64-bit and 32-bit platforms.
Fix the overflow and also print a meaningful debug message if the table
size is too large.
Signed-off-by: Alden Tondettar <alden.tondettar@gmail.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
The old maintainers email is bouncing and I've rewritten most of this
driver in the recent months. Also add linux-block to the mailinglist
and remove the old tree, I will send patches through the linux-block
tree. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
If the last bvec of the 1st bio and the 1st bvec of the next
bio are physically contigious, and the latter can be merged
to last segment of the 1st bio, we should think they don't
violate sg gap(or virt boundary) limit.
Both Vitaly and Dexuan reported lots of unmergeable small bios
are observed when running mkfs on Hyper-V virtual storage, and
performance becomes quite low. This patch fixes that performance
issue.
The same issue should exist on NVMe, since it sets virt boundary too.
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reported-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Cc: Keith Busch <keith.busch@intel.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The raw_cmd_copyin() function does a kmalloc() with GFP_USER, although the
allocated structure is obviously not mapped to userspace, just copied from/to.
In this case GFP_KERNEL is more appropriate, so let's use it, although in the
current implementation this does not manifest as any error.
Reported-by: Matthew Wilcox <mawilcox@linuxonhyperv.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Pull USB fixes from Greg KH:
"Here are a bunch of USB fixes for 4.10-rc3. Yeah, it's a lot, an
artifact of the holiday break I think.
Lots of gadget and the usual XHCI fixups for reported issues (one day
that driver will calm down...) Also included are a bunch of usb-serial
driver fixes, and for good measure, a number of much-reported MUSB
driver issues have finally been resolved.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (72 commits)
USB: fix problems with duplicate endpoint addresses
usb: ohci-at91: use descriptor-based gpio APIs correctly
usb: storage: unusual_uas: Add JMicron JMS56x to unusual device
usb: hub: Move hub_port_disable() to fix warning if PM is disabled
usb: musb: blackfin: add bfin_fifo_offset in bfin_ops
usb: musb: fix compilation warning on unused function
usb: musb: Fix trying to free already-free IRQ 4
usb: musb: dsps: implement clear_ep_rxintr() callback
usb: musb: core: add clear_ep_rxintr() to musb_platform_ops
USB: serial: ti_usb_3410_5052: fix NULL-deref at open
USB: serial: spcp8x5: fix NULL-deref at open
USB: serial: quatech2: fix sleep-while-atomic in close
USB: serial: pl2303: fix NULL-deref at open
USB: serial: oti6858: fix NULL-deref at open
USB: serial: omninet: fix NULL-derefs at open and disconnect
USB: serial: mos7840: fix misleading interrupt-URB comment
USB: serial: mos7840: remove unused write URB
USB: serial: mos7840: fix NULL-deref at open
USB: serial: mos7720: remove obsolete port initialisation
USB: serial: mos7720: fix parallel probe
...
Pull char/misc fixes from Greg KH:
"Here are a few small char/misc driver fixes for 4.10-rc3.
Two MEI driver fixes, and three NVMEM patches for reported issues, and
a new Hyper-V driver MAINTAINER update. Nothing major at all, all have
been in linux-next with no reported issues"
* tag 'char-misc-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
hyper-v: Add myself as additional MAINTAINER
nvmem: fix nvmem_cell_read() return type doc
nvmem: imx-ocotp: Fix wrong register size
nvmem: qfprom: Allow single byte accesses for read/write
mei: move write cb to completion on credentials failures
mei: bus: fix mei_cldev_enable KDoc
Pull staging/IIO fixes from Greg KH:
"Here are some staging and IIO driver fixes for 4.10-rc3.
Most of these are minor IIO fixes of reported issues, along with one
network driver fix to resolve an issue. And a MAINTAINERS update with
a new mailing list. All of these, except the MAINTAINERS file update,
have been in linux-next with no reported issues (the MAINTAINERS patch
happened on Friday...)"
* tag 'staging-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
MAINTAINERS: add greybus subsystem mailing list
staging: octeon: Call SET_NETDEV_DEV()
iio: accel: st_accel: fix LIS3LV02 reading and scaling
iio: common: st_sensors: fix channel data parsing
iio: max44000: correct value in illuminance_integration_time_available
iio: adc: TI_AM335X_ADC should depend on HAS_DMA
iio: bmi160: Fix time needed to sleep after command execution
iio: 104-quad-8: Fix active level mismatch for the preset enable option
iio: 104-quad-8: Fix off-by-one errors when addressing IOR
iio: 104-quad-8: Fix index control configuration
Several people report seeing warnings about inconsistent radix tree
nodes followed by crashes in the workingset code, which all looked like
use-after-free access from the shadow node shrinker.
Dave Jones managed to reproduce the issue with a debug patch applied,
which confirmed that the radix tree shrinking indeed frees shadow nodes
while they are still linked to the shadow LRU:
WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3
Call Trace:
delete_node+0x1e4/0x200
__radix_tree_delete_node+0xd/0x10
shadow_lru_isolate+0xe6/0x220
__list_lru_walk_one.isra.4+0x9b/0x190
list_lru_walk_one+0x23/0x30
scan_shadow_nodes+0x2e/0x40
shrink_slab.part.44+0x23d/0x5d0
shrink_node+0x22c/0x330
kswapd+0x392/0x8f0
This is the WARN_ON_ONCE(!list_empty(&node->private_list)) placed in the
inlined radix_tree_shrink().
The problem is with 14b468791f ("mm: workingset: move shadow entry
tracking to radix tree exceptional tracking"), which passes an update
callback into the radix tree to link and unlink shadow leaf nodes when
tree entries change, but forgot to pass the callback when reclaiming a
shadow node.
While the reclaimed shadow node itself is unlinked by the shrinker, its
deletion from the tree can cause the left-most leaf node in the tree to
be shrunk. If that happens to be a shadow node as well, we don't unlink
it from the LRU as we should.
Consider this tree, where the s are shadow entries:
root->rnode
|
[0 n]
| |
[s ] [sssss]
Now the shadow node shrinker reclaims the rightmost leaf node through
the shadow node LRU:
root->rnode
|
[0 ]
|
[s ]
Because the parent of the deleted node is the first level below the
root and has only one child in the left-most slot, the intermediate
level is shrunk and the node containing the single shadow is put in
its place:
root->rnode
|
[s ]
The shrinker again sees a single left-most slot in a first level node
and thus decides to store the shadow in root->rnode directly and free
the node - which is a leaf node on the shadow node LRU.
root->rnode
|
s
Without the update callback, the freed node remains on the shadow LRU,
where it causes later shrinker runs to crash.
Pass the node updater callback into __radix_tree_delete_node() in case
the deletion causes the left-most branch in the tree to collapse too.
Also add warnings when linked nodes are freed right away, rather than
wait for the use-after-free when the list is scanned much later.
Fixes: 14b468791f ("mm: workingset: move shadow entry tracking to radix tree exceptional tracking")
Reported-by: Dave Chinner <david@fromorbit.com>
Reported-by: Hugh Dickins <hughd@google.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-and-tested-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chris Leech <cleech@redhat.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4.10-rc loadtest (even on x86, and even without THPCache) fails with
"fork: Cannot allocate memory" or some such; and /proc/meminfo shows
PageTables growing.
Commit 953c66c2b2 ("mm: THP page cache support for ppc64") that got
merged in rc1 removed the freeing of an unused preallocated pagetable
after do_fault_around() has called map_pages().
This is usually a good optimization, so that the followup doesn't have
to reallocate one; but it's not sufficient to shift the freeing into
alloc_set_pte(), since there are failure cases (most commonly
VM_FAULT_RETRY) which never reach finish_fault().
Check and free it at the outer level in do_fault(), then we don't need
to worry in alloc_set_pte(), and can restore that to how it was (I
cannot find any reason to pte_free() under lock as it was doing).
And fix a separate pagetable leak, or crash, introduced by the same
change, that could only show up on some ppc64: why does do_set_pmd()'s
failure case attempt to withdraw a pagetable when it never deposited
one, at the same time overwriting (so leaking) the vmf->prealloc_pte?
Residue of an earlier implementation, perhaps? Delete it.
Fixes: 953c66c2b2 ("mm: THP page cache support for ppc64")
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull kbuild fix from Michal Marek:
"The asm-prototypes.h file added in the last merge window results in
invalid code with CONFIG_KMEMCHECK=y. The net result is that genksyms
segfaults.
This pull request fixes the header, the genksyms fix is in my kbuild
branch for 4.11"
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
asm-prototypes: Clear any CPP defines before declaring the functions
Pull sound fixes from Takashi Iwai:
"Nothing particular stands out, only a few small fixes for USB-audio,
HD-audio and Firewire. The USB-audio fix is the respin of the previous
race fix after a revert due to the regression"
* tag 'sound-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
Revert "ALSA: firewire-lib: change structure member with proper type"
ALSA: usb-audio: test EP_FLAG_RUNNING at urb completion
ALSA: usb-audio: Fix irq/process data synchronization
ALSA: hda - Apply asus-mode8 fixup to ASUS X71SL
ALSA: hda - Fix up GPIO for ASUS ROG Ranger
ALSA: firewire-lib: change structure member with proper type
ALSA: firewire-tascam: Fix to handle error from initialization of stream data
ALSA: fireworks: fix asymmetric API call at unit removal
Pull clk fixes from Stephen Boyd:
"One fix for a broken driver on Renesas RZ/A1 SoCs with bootloaders
that don't turn all the clks on and another fix for stm32f4 SoCs where
we have multiple drivers attaching to the same DT node"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: stm32f4: Use CLK_OF_DECLARE_DRIVER initialization method
clk: renesas: mstp: Support 8-bit registers for r7s72100
Pull VFIO fixes from Alex Williamson:
- Add mtty sample driver properly into build system (Alex Williamson)
- Restore type1 mapping performance after mdev (Alex Williamson)
- Fix mdev device race (Alex Williamson)
- Cleanups to the mdev ABI used by vendor drivers (Alex Williamson)
- Build fix for old compilers (Arnd Bergmann)
- Fix sample driver error path (Dan Carpenter)
- Handle pci_iomap() error (Arvind Yadav)
- Fix mdev ioctl return type (Paul Gortmaker)
* tag 'vfio-v4.10-rc3' of git://github.com/awilliam/linux-vfio:
vfio-mdev: fix non-standard ioctl return val causing i386 build fail
vfio-pci: Handle error from pci_iomap
vfio-mdev: fix some error codes in the sample code
vfio-pci: use 32-bit comparisons for register address for gcc-4.5
vfio-mdev: Make mdev_device private and abstract interfaces
vfio-mdev: Make mdev_parent private
vfio-mdev: de-polute the namespace, rename parent_device & parent_ops
vfio-mdev: Fix remove race
vfio/type1: Restore mapping performance with mdev support
vfio-mdev: Fix mtty sample driver building
Pull swiotlb fixes from Konrad Rzeszutek Wilk:
"This has one fix to make i915 work when using Xen SWIOTLB, and a
feature from Geert to aid in debugging of devices that can't do DMA
outside the 32-bit address space.
The feature from Geert is on top of v4.10 merge window commit
(specifically you pulling my previous branch), as his changes were
dependent on the Documentation/ movement patches.
I figured it would just easier than me trying than to cherry-pick the
Documentation patches to satisfy git.
The patches have been soaking since 12/20, albeit I updated the last
patch due to linux-next catching an compiler error and adding an
Tested-and-Reported-by tag"
* 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
swiotlb: Export swiotlb_max_segment to users
swiotlb: Add swiotlb=noforce debug option
swiotlb: Convert swiotlb_force from int to enum
x86, swiotlb: Simplify pci_swiotlb_detect_override()