For symmetry with badblocks_init() make it clear that this path only
destroys incremental allocations of a badblocks instance, and does not
free the badblocks instance itself.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
During region creation, perform Address Range Scrubs (ARS) for the SPA
(System Physical Address) ranges to retrieve known poison locations from
firmware. Add a new data structure 'nd_poison' which is used as a list
in nvdimm_bus to store these poison locations.
When creating a pmem namespace, if there is any known poison associated
with its physical address space, convert the poison ranges to bad sectors
that are exposed using the badblocks interface.
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for getting a poison list using ARS DSMs, enable DSMs for
all manufactured NFITs supplied by the test framework. Also, supply
valid response data for ars_status.
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Retain badblocks as part of rdev, but use the accessor functions from
include/linux/badblocks for all manipulation.
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
NVDIMM devices, which can behave more like DRAM rather than block
devices, may develop bad cache lines, or 'poison'. A block device
exposed by the pmem driver can then consume poison via a read (or
write), and cause a machine check. On platforms without machine
check recovery features, this would mean a crash.
The block device maintaining a runtime list of all known sectors that
have poison can directly avoid this, and also provide a path forward
to enable proper handling/recovery for DAX faults on such a device.
Use the new badblock management interfaces to add a badblocks list to
gendisks.
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Take the core badblocks implementation from md, and make it generally
available. This follows the same style as kernel implementations of
linked lists, rb-trees etc, where you can have a structure that can be
embedded anywhere, and accessor functions to manipulate the data.
The only changes in this copy of the code are ones to generalize
function/variable names from md-specific ones. Also add init and free
functions.
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When tearing down a block device early in its lifetime, userspace may
still be performing discovery actions like blkdev_ioctl() to re-read
partitions.
The nvdimm_revalidate_disk() implementation depends on
disk->driverfs_dev to be valid at entry. However, it is set to NULL in
del_gendisk() and fatally this is happening *before* the disk device is
deleted from userspace view.
There's no reason for del_gendisk() to clear ->driverfs_dev. That
device is the parent of the disk. It is guaranteed to not be freed
until the disk, as a child, drops its ->parent reference.
We could also fix this issue locally in nvdimm_revalidate_disk() by
using disk_to_dev(disk)->parent, but lets fix it globally since
->driverfs_dev follows the lifetime of the parent. Longer term we
should probably just add a @parent parameter to add_disk(), and stop
carrying this pointer in the gendisk.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa00340a8>] nvdimm_revalidate_disk+0x18/0x90 [libnvdimm]
CPU: 2 PID: 538 Comm: systemd-udevd Tainted: G O 4.4.0-rc5 #2257
[..]
Call Trace:
[<ffffffff8143e5c7>] rescan_partitions+0x87/0x2c0
[<ffffffff810f37f9>] ? __lock_is_held+0x49/0x70
[<ffffffff81438c62>] __blkdev_reread_part+0x72/0xb0
[<ffffffff81438cc5>] blkdev_reread_part+0x25/0x40
[<ffffffff8143982d>] blkdev_ioctl+0x4fd/0x9c0
[<ffffffff811246c9>] ? current_kernel_time64+0x69/0xd0
[<ffffffff812916dd>] block_ioctl+0x3d/0x50
[<ffffffff81264c38>] do_vfs_ioctl+0x308/0x560
[<ffffffff8115dbd1>] ? __audit_syscall_entry+0xb1/0x100
[<ffffffff810031d6>] ? do_audit_syscall_entry+0x66/0x70
[<ffffffff81264f09>] SyS_ioctl+0x79/0x90
[<ffffffff81902672>] entry_SYSCALL_64_fastpath+0x12/0x76
Reported-by: Robert Hu <robert.hu@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If an application wants exclusive access to all of the persistent memory
provided by an NVDIMM namespace it can use this raw-block-dax facility
to forgo establishing a filesystem. This capability is targeted
primarily to hypervisors wanting to provision persistent memory for
guests. It can be disabled / enabled dynamically via the new BLKDAXSET
ioctl.
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Reviewed-by: Jan Kara <jack@suse.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This effectively promotes IORESOURCE_BUSY to IORESOURCE_EXCLUSIVE
semantics by default. If userspace really believes it is safe to access
the memory region it can also perform the extra step of disabling an
active driver. This protects device address ranges with read side
effects and otherwise directs userspace to use the driver.
Persistent memory presents a large "mistake surface" to /dev/mem as now
accidental writes can corrupt a filesystem.
In general if a device driver is busily using a memory region it already
informs other parts of the kernel to not touch it via
request_mem_region(). /dev/mem should honor the same safety restriction
by default. Debugging a device driver from userspace becomes more
difficult with this enabled. Any application using /dev/mem or mmap of
sysfs pci resources will now need to perform the extra step of either:
1/ Disabling the driver, for example:
echo <device id> > /dev/bus/<parent bus>/drivers/<driver name>/unbind
2/ Rebooting with "iomem=relaxed" on the command line
3/ Recompiling with CONFIG_IO_STRICT_DEVMEM=n
Traditional users of /dev/mem like dosemu are unaffected because the
first 1MB of memory is not subject to the IO_STRICT_DEVMEM restriction.
Legacy X configurations use /dev/mem to talk to graphics hardware, but
that functionality has since moved to kernel graphics drivers.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Jan Stancek reported that I wrecked things for him by fixing things for
Vladimir :/
His report was due to an UNINTERRUPTIBLE wait getting -EINTR, which
should not be possible, however my previous patch made this possible by
unconditionally checking signal_pending().
We cannot use current->state as was done previously, because the
instruction after the store to that variable it can be changed. We must
instead pass the initial state along and use that.
Fixes: 68985633bc ("sched/wait: Fix signal handling in bit wait helpers")
Reported-by: Jan Stancek <jstancek@redhat.com>
Reported-by: Chris Mason <clm@fb.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Chris Mason <clm@fb.com>
Reviewed-by: Paul Turner <pjt@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: tglx@linutronix.de
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: hpa@zytor.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull timer fixlets from Thomas Gleixner:
"Two trivial fixes which add missing header fileas and forward
declarations so the code will compile even when the magic include
chains are different"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3: Add missing include for barrier.h
irqchip/gic-v3: Add missing struct device_node declaration
Pull timer fix from Thomas Gleixner:
"A single fix to unbreak a clocksource driver which has more than 32bit
counter width"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: Mmio: remove artificial 32bit limitation
Pull fpga driver fixes from Greg KH:
"Only two small fpga driver fixes here, both have been in linux-next
for a while, and resolve some reported issues"
* tag 'char-misc-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
fpga manager: Fix firmware resource leak on error
fpga manager: remove label
Pull staging driver fixes from Greg KH:
"Here are a few staging and IIO driver fixes for 4.4-rc5.
All of them resolve reported problems and have been in linux-next for
a while. Nothing major here, just small fixes where needed"
* tag 'staging-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: lustre: echo_copy.._lsm() dereferences userland pointers directly
iio: adc: spmi-vadc: add missing of_node_put
iio: fix some warning messages
iio: light: apds9960: correct ->last_busy count
iio: lidar: return -EINVAL on invalid signal
staging: iio: dummy: complete IIO events delivery to userspace
Pull USB driver fixes from Greg KH:
"Here are a number of small USB fixes for 4.4-rc5. All of them have
been in linux-next. The majority are gadget and phy issues, with a
few new quirks and device ids added as well"
* tag 'usb-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (32 commits)
USB: add quirk for devices with broken LPM
xhci: fix usb2 resume timing and races.
usb: musb: fail with error when no DMA controller set
usb: gadget: uvc: fix permissions of configfs attributes
usb: musb: core: Fix pm runtime for deferred probe
usb: phy: msm: fix a possible NULL dereference
USB: host: ohci-at91: fix a crash in ohci_hcd_at91_overcurrent_irq
usb: Quiet down false peer failure messages
usb: xhci: fix config fail of FS hub behind a HS hub with MTT
xhci: Fix memory leak in xhci_pme_acpi_rtd3_enable()
usb: Use the USB_SS_MULT() macro to decode burst multiplier for log message
USB: whci-hcd: add check for dma mapping error
usb: core : hub: Fix BOS 'NULL pointer' kernel panic
USB: quirks: Apply ALWAYS_POLL to all ELAN devices
usb-storage: Fix scsi-sd failure "Invalid field in cdb" for USB adapter JMicron
USB: quirks: Fix another ELAN touchscreen
usb: dwc3: gadget: don't prestart interrupt endpoints
USB: serial: Another Infineon flash loader USB ID
USB: cdc_acm: Ignore Infineon Flash Loader utility
USB: cp210x: Remove CP2110 ID from compatibility list
...
Pull ARM SoC fixes from Arnd Bergmann:
"Here are a bunch of small bug fixes for various ARM platforms, nothing
really sticks out this week, most of either fixes bugs in code that
was just added in 4.4, or that has been broken for many years without
anyone noticing.
at91/sama5d2:
- fix sama5de hardware setup of sd/mmc interface
- proper selection of pinctrl drivers. PIO4 is necessary for sama5d2
berlin:
- fix incorrect clock input for SDIO
exynos:
- Fix potential NULL pointer dereference in Exynos PMU driver.
imx:
- Fix vf610 SAI clock configuration bug which is discovered by the
newly added master mode support in SAI audio driver.
- Fix buggy L2 cache latency values in vf610 device trees, which may
cause system hang when cpu runs at a higher frequency.
ixp4xx:
- fix prototypes for readl/writel functions
ls2080a:
- use little-endian register access for GPIO and SDHCI
omap:
- Fix clock source for ARM TWD and global timers on am437x
- Always select REGULATOR_FIXED_VOLTAGE for omap2+ instead of when
MACH_OMAP3_PANDORA is selected
- Fix SPI DMA handles for dm816x as only some were mapped
- Fix up mbox cells for dm816x to make mailbox usable
pxa:
- use PWM lookup table for all ezx machines
s3c24xx:
- Remove incorrect __init annotation from s3c24xx cpufreq driver
structures.
versatile:
- fix PCI IRQ mapping on Versatile PB"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ls2080a/dts: Add little endian property for GPIO IP block
dt-bindings: define little-endian property for QorIQ GPIO
ARM64: dts: ls2080a: fix eSDHC endianness
ARM: dts: vf610: use reset values for L2 cache latencies
ARM: pxa: use PWM lookup table for all machines
ARM: dts: berlin: add 2nd clock for BG2Q sdhci0 and sdhci1
ARM: dts: berlin: correct BG2Q's sdhci2 2nd clock
ARM: dts: am4372: fix clock source for arm twd and global timers
ARM: at91: fix pinctrl driver selection
ARM: at91/dt: add always-on to 1.8V regulator
ARM: dts: vf610: fix clock definition for SAI2
ARM: imx: clk-vf610: fix SAI clock tree
ARM: ixp4xx: fix read{b,w,l} return types
irqchip/versatile-fpga: Fix PCI IRQ mapping on Versatile PB
ARM: OMAP2+: enable REGULATOR_FIXED_VOLTAGE
ARM: dts: add dm816x missing spi DT dma handles
ARM: dts: add dm816x missing #mbox-cells
cpufreq: s3c24xx: Do not mark s3c2410_plls_add as __init
ARM: EXYNOS: Fix potential NULL pointer access in exynos_sys_powerdown_conf
Pull powerpc fixes from Michael Ellerman:
- opal-irqchip: Fix double endian conversion from Alistair Popple
- cxl: Set endianess of kernel contexts from Frederic Barrat
- sbc8641: drop bogus PHY IRQ entries from DTS file from Paul Gortmaker
- Revert "powerpc/eeh: Don't unfreeze PHB PE after reset" from Andrew
Donnellan
* tag 'powerpc-4.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
Revert "powerpc/eeh: Don't unfreeze PHB PE after reset"
powerpc/sbc8641: drop bogus PHY IRQ entries from DTS file
cxl: Set endianess of kernel contexts
powerpc/opal-irqchip: Fix double endian conversion
Merge misc fixes from Andrew Morton:
"17 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
MIPS: fix DMA contiguous allocation
sh64: fix __NR_fgetxattr
ocfs2: fix SGID not inherited issue
mm/oom_kill.c: avoid attempting to kill init sharing same memory
drivers/base/memory.c: prohibit offlining of memory blocks with missing sections
tmpfs: fix shmem_evict_inode() warnings on i_blocks
mm/hugetlb.c: fix resv map memory leak for placeholder entries
mm: hugetlb: call huge_pte_alloc() only if ptep is null
kernel: remove stop_machine() Kconfig dependency
mm: kmemleak: mark kmemleak_init prototype as __init
mm: fix kerneldoc on mem_cgroup_replace_page
osd fs: __r4w_get_page rely on PageUptodate for uptodate
MAINTAINERS: make Vladimir co-maintainer of the memory controller
mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress
mm: fix swapped Movable and Reclaimable in /proc/pagetypeinfo
memcg: fix memory.high target
mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
Pull parisc fixes from Helge Deller:
"Fix the boot crash on Mako machines with Huge Pages, prevent a panic
with SATA controllers (and others) by correctly calculating the IOMMU
space, hook up the mlock2 syscall and drop unneeded code in the parisc
pci code"
* 'parisc-4.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Disable huge pages on Mako machines
parisc: Wire up mlock2 syscall
parisc: Remove unused pcibios_init_bus()
parisc iommu: fix panic due to trying to allocate too large region
Pull block layer fixes from Jens Axboe:
"A set of fixes for the current series. This contains:
- A bunch of fixes for lightnvm, should be the last round for this
series. From Matias and Wenwei.
- A writeback detach inode fix from Ilya, also marked for stable.
- A block (though it says SCSI) fix for an OOPS in SCSI runtime power
management.
- Module init error path fixes for null_blk from Minfei"
* 'for-linus' of git://git.kernel.dk/linux-block:
null_blk: Fix error path in module initialization
lightnvm: do not compile in debugging by default
lightnvm: prevent gennvm module unload on use
lightnvm: fix media mgr registration
lightnvm: replace req queue with nvmdev for lld
lightnvm: comments on constants
lightnvm: check mm before use
lightnvm: refactor spin_unlock in gennvm_get_blk
lightnvm: put blks when luns configure failed
lightnvm: use flags in rrpc_get_blk
block: detach bdev inode from its wb in __blkdev_put()
SCSI: Fix NULL pointer dereference in runtime PM