Commit Graph

1359 Commits

Author SHA1 Message Date
Linus Torvalds
1249b571ba Merge tag 'powerpc-4.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
 "Nine small fixes, really nothing that stands out.

  A work-around for a spurious MCE on Power9. A CXL fault handling fix,
  some fixes to the new XIVE code, and a fix to the new 32-bit
  STRICT_KERNEL_RWX code.

  Fixes for old code/stable: an fix to an incorrect TLB flush on boot
  but not on any current machines, a compile error on 4xx and a fix to
  memory hotplug when using radix (Power9).

  Thanks to: Anton Blanchard, Cédric Le Goater, Christian Lamparter,
  Christophe Leroy, Christophe Lombard, Guenter Roeck, Jeremy Kerr,
  Michael Neuling, Nicholas Piggin"

* tag 'powerpc-4.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/powernv: Increase memory block size to 1GB on radix
  powerpc/mm: Call flush_tlb_kernel_range with interrupts enabled
  powerpc/xive: Clear XIVE internal structures when a CPU is removed
  powerpc/xive: Fix IPI reset
  powerpc/4xx: Fix compile error with 64K pages on 40x, 44x
  powerpc: Fix action argument for cpufeatures-based TLB flush
  cxl: Fix memory page not handled
  powerpc: Fix workaround for spurious MCE on POWER9
  powerpc: Handle MCE on POWER9 with only DSISR bit 30 set
2017-10-06 08:47:21 -07:00
Cédric Le Goater
cc56939802 powerpc/xive: Clear XIVE internal structures when a CPU is removed
Commit eac1e731b5 ("powerpc/xive: guest exploitation of the XIVE
interrupt controller") introduced support for the XIVE exploitation
mode of the P9 interrupt controller on the pseries platform.

At that time, support for CPU removal was not complete on PowerVM and
CPU hot unplug remained untested. It appears that some cleanups of the
XIVE internal structures are required before releasing the CPU,
without which the kernel crashes in a RTAS call doing the CPU
isolation.

These changes fix the crash by deconfiguring the IPI interrupt source
and clearing the event queues of the CPU when it is removed.

Fixes: eac1e731b5 ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04 22:01:57 +11:00
Cédric Le Goater
74f1282114 powerpc/xive: Fix IPI reset
When resetting an IPI, hw_ipi should also be set to zero.

Fixes: eac1e731b5 ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04 22:01:15 +11:00
Ioan Nicu
31d1e130f4 rapidio: remove global irq spinlocks from the subsystem
Locking of config and doorbell operations should be done only if the
underlying hardware requires it.

This patch removes the global spinlocks from the rapidio subsystem and
moves them to the mport drivers (fsl_rio and tsi721), only to the
necessary places.  For example, local config space read and write
operations (lcread/lcwrite) are atomic in all existing drivers, so there
should be no need for locking, while the cread/cwrite operations which
generate maintenance transactions need to be synchronized with a lock.

Later, each driver could chose to use a per-port lock instead of a
global one, or even more granular locking.

Link: http://lkml.kernel.org/r/20170824113023.GD50104@nokia.com
Signed-off-by: Ioan Nicu <ioan.nicu.ext@nokia.com>
Signed-off-by: Frank Kunz <frank.kunz@nokia.com>
Acked-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-03 17:54:24 -07:00
Alexey Dobriyan
9b130ad5bb treewide: make "nr_cpu_ids" unsigned
First, number of CPUs can't be negative number.

Second, different signnnedness leads to suboptimal code in the following
cases:

1)
	kmalloc(nr_cpu_ids * sizeof(X));

"int" has to be sign extended to size_t.

2)
	while (loff_t *pos < nr_cpu_ids)

MOVSXD is 1 byte longed than the same MOV.

Other cases exist as well. Basically compiler is told that nr_cpu_ids
can't be negative which can't be deduced if it is "int".

Code savings on allyesconfig kernel: -3KB

	add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
	function                                     old     new   delta
	coretemp_cpu_online                          450     512     +62
	rcu_init_one                                1234    1272     +38
	pci_device_probe                             374     399     +25

				...

	pgdat_reclaimable_pages                      628     556     -72
	select_fallback_rq                           446     369     -77
	task_numa_find_cpu                          1923    1807    -116

Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:48 -07:00
Linus Torvalds
a0725ab0c7 Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
 "This is the first pull request for 4.14, containing most of the code
  changes. It's a quiet series this round, which I think we needed after
  the churn of the last few series. This contains:

   - Fix for a registration race in loop, from Anton Volkov.

   - Overflow complaint fix from Arnd for DAC960.

   - Series of drbd changes from the usual suspects.

   - Conversion of the stec/skd driver to blk-mq. From Bart.

   - A few BFQ improvements/fixes from Paolo.

   - CFQ improvement from Ritesh, allowing idling for group idle.

   - A few fixes found by Dan's smatch, courtesy of Dan.

   - A warning fixup for a race between changing the IO scheduler and
     device remova. From David Jeffery.

   - A few nbd fixes from Josef.

   - Support for cgroup info in blktrace, from Shaohua.

   - Also from Shaohua, new features in the null_blk driver to allow it
     to actually hold data, among other things.

   - Various corner cases and error handling fixes from Weiping Zhang.

   - Improvements to the IO stats tracking for blk-mq from me. Can
     drastically improve performance for fast devices and/or big
     machines.

   - Series from Christoph removing bi_bdev as being needed for IO
     submission, in preparation for nvme multipathing code.

   - Series from Bart, including various cleanups and fixes for switch
     fall through case complaints"

* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
  kernfs: checking for IS_ERR() instead of NULL
  drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
  drbd: Fix allyesconfig build, fix recent commit
  drbd: switch from kmalloc() to kmalloc_array()
  drbd: abort drbd_start_resync if there is no connection
  drbd: move global variables to drbd namespace and make some static
  drbd: rename "usermode_helper" to "drbd_usermode_helper"
  drbd: fix race between handshake and admin disconnect/down
  drbd: fix potential deadlock when trying to detach during handshake
  drbd: A single dot should be put into a sequence.
  drbd: fix rmmod cleanup, remove _all_ debugfs entries
  drbd: Use setup_timer() instead of init_timer() to simplify the code.
  drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
  drbd: new disk-option disable-write-same
  drbd: Fix resource role for newly created resources in events2
  drbd: mark symbols static where possible
  drbd: Send P_NEG_ACK upon write error in protocol != C
  drbd: add explicit plugging when submitting batches
  drbd: change list_for_each_safe to while(list_first_entry_or_null)
  drbd: introduce drbd_recv_header_maybe_unplug
  ...
2017-09-07 11:59:42 -07:00
Cédric Le Goater
265601f034 powerpc/xive: Fix section __init warning
xive_spapr_init() is called from a __init routine and calls __init
routines.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-04 19:38:07 +10:00
Cédric Le Goater
5f121292f0 powerpc/xive: improve debugging macros
Having the CPU identifier in the debug logs is helpful when tracking
issues. Also add some more logging and fix a compile issue in
xive_do_source_eoi().

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-02 21:02:38 +10:00
Cédric Le Goater
bed81ee181 powerpc/xive: introduce H_INT_ESB hcall
The H_INT_ESB hcall() is used to issue a load or store to the ESB page
instead of using the MMIO pages. This can be used as a workaround on
some HW issues. The OS knows that this hcall should be used on an
interrupt source when the ESB hcall flag is set to 1 in the hcall
H_INT_GET_SOURCE_INFO.

To maintain the frontier between the xive frontend and backend, we
introduce a new xive operation 'esb_rw' to be used in the routines
doing memory accesses on the ESBs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-02 21:02:37 +10:00
Cédric Le Goater
c58a14a9cc powerpc/xive: add the HW IRQ number under xive_irq_data
It will be required later by the H_INT_ESB hcall.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-02 21:02:37 +10:00
Cédric Le Goater
99f122573e powerpc/xive: introduce xive_esb_write()
Some source support MMIO stores on the ESB page to perform EOI. Let's
introduce a specific routine for this case even if this should be the
only use of it.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-02 21:02:36 +10:00
Cédric Le Goater
59fc2724e4 powerpc/xive: rename xive_poke_esb() in xive_esb_read()
xive_poke_esb() is performing a load/read so it is better named as
xive_esb_read() as we will need to introduce a xive_esb_write()
routine. Also use the XIVE_ESB_LOAD_EOI offset when EOI'ing LSI
interrupts.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-02 21:02:36 +10:00
Cédric Le Goater
eac1e731b5 powerpc/xive: guest exploitation of the XIVE interrupt controller
This is the framework for using XIVE in a PowerVM guest. The support
is very similar to the native one in a much simpler form.

Each source is associated with an Event State Buffer (ESB). This is a
two bit state machine which is used to trigger events. The bits are
named "P" (pending) and "Q" (queued) and can be controlled by MMIO.
The Guest OS registers event (or notifications) queues on which the HW
will post event data for a target to notify.

Instead of OPAL calls, a set of Hypervisors call are used to configure
the interrupt sources and the event/notification queues of the guest:

 - H_INT_GET_SOURCE_INFO

   used to obtain the address of the MMIO page of the Event State
   Buffer (PQ bits) entry associated with the source.

 - H_INT_SET_SOURCE_CONFIG

   assigns a source to a "target".

 - H_INT_GET_SOURCE_CONFIG

   determines to which "target" and "priority" is assigned to a source

 - H_INT_GET_QUEUE_INFO

   returns the address of the notification management page associated
   with the specified "target" and "priority".

 - H_INT_SET_QUEUE_CONFIG

   sets or resets the event queue for a given "target" and "priority".
   It is also used to set the notification config associated with the
   queue, only unconditional notification for the moment.  Reset is
   performed with a queue size of 0 and queueing is disabled in that
   case.

 - H_INT_GET_QUEUE_CONFIG

   returns the queue settings for a given "target" and "priority".

 - H_INT_RESET

   resets all of the partition's interrupt exploitation structures to
   their initial state, losing all configuration set via the hcalls
   H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.

 - H_INT_SYNC

   issue a synchronisation on a source to make sure sure all
   notifications have reached their queue.

As for XICS, the XIVE interface for the guest is described in the
device tree under the "interrupt-controller" node. A couple of new
properties are specific to XIVE :

 - "reg"

   contains the base address and size of the thread interrupt
   managnement areas (TIMA), also called rings, for the User level and
   for the Guest OS level. Only the Guest OS level is taken into
   account today.

 - "ibm,xive-eq-sizes"

   the size of the event queues. One cell per size supported, contains
   log2 of size, in ascending order.

 - "ibm,xive-lisn-ranges"

   the interrupt numbers ranges assigned to the guest. These are
   allocated using a simple bitmap.

and also :

 - "/ibm,plat-res-int-priorities"

   contains a list of priorities that the hypervisor has reserved for
   its own use.

Tested with a QEMU XIVE model for pseries and with the Power hypervisor.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-02 21:02:35 +10:00
Cédric Le Goater
994ea2f419 powerpc/xive: introduce a common routine xive_queue_page_alloc()
This routine will be used in the spapr backend. Also introduce a short
xive_alloc_order() helper.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-02 21:02:34 +10:00
Markus Elfring
fdbb9457b4 axonram: Return directly after a failed kzalloc() in axon_ram_probe()
* Return directly after a call of the function "kzalloc" failed
  at the beginning.

* Delete a repeated check for the local variable "bank"
  which became unnecessary with this refactoring.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01 16:42:56 +10:00
Markus Elfring
a1bddf3991 axonram: Improve a size determination in axon_ram_probe()
Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01 16:42:56 +10:00
Markus Elfring
c86a93971e axonram: Delete an error message for a failed memory allocation in axon_ram_probe()
Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01 16:42:55 +10:00
Cédric Le Goater
a9dadc1c51 powerpc/xive: Fix the size of the cpumask used in xive_find_target_in_mask()
When called from xive_irq_startup(), the size of the cpumask can be
larger than nr_cpu_ids. This can result in a WARN_ON such as:

  WARNING: CPU: 10 PID: 1 at ../arch/powerpc/sysdev/xive/common.c:476 xive_find_target_in_mask+0x110/0x2f0
  ...
  NIP [c00000000008a310] xive_find_target_in_mask+0x110/0x2f0
  LR [c00000000008a2e4] xive_find_target_in_mask+0xe4/0x2f0
  Call Trace:
    xive_find_target_in_mask+0x74/0x2f0 (unreliable)
    xive_pick_irq_target.isra.1+0x200/0x230
    xive_irq_startup+0x60/0x180
    irq_startup+0x70/0xd0
    __setup_irq+0x7bc/0x880
    request_threaded_irq+0x14c/0x2c0
    request_event_sources_irqs+0x100/0x180
    __machine_initcall_pseries_init_ras_IRQ+0x104/0x134
    do_one_initcall+0x68/0x1d0
    kernel_init_freeable+0x290/0x374
    kernel_init+0x24/0x170
    ret_from_kernel_thread+0x5c/0x74

This happens because we're being called with our affinity mask set to
irq_default_affinity. That in turn was populated using
cpumask_setall(), which sets NR_CPUs worth of bits, not nr_cpu_ids
worth. Finally cpumask_weight() will return > nr_cpu_ids when passed a
mask which has > nr_cpu_ids bits set.

Fix it by limiting the value returned by cpumask_weight().

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[mpe: Add change log details on actual cause]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-24 15:20:18 +10:00
Christoph Hellwig
74d46992e0 block: replace bi_bdev with a gendisk pointer and partitions index
This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-23 12:49:55 -06:00
Rob Herring
b7c670d673 powerpc: Convert to using %pOF instead of full_name
Now that we have a custom printf format specifier, convert users of
full_name to use %pOF instead. This is preparation to remove storing
of the full path string for each node.

Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Anatolij Gustschin <agust@denx.de>
Cc: Scott Wood <oss@buserror.net>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 22:27:04 +10:00
Bhumika Goyal
8bfa42ab84 powerpc: Add const to bin_attribute structures
Declare bin_attribute structures as const as they are only passed as an
argument to the function sysfs_create_bin_file. This argument is of
type const, so declare the structure as const.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-17 21:56:26 +10:00
Naveen N. Rao
e12d94f806 powerpc/xmon: Exclude all of xmon from ftrace
Exclude core xmon files from ftrace (along with an xmon xive helper
outside of xmon/) to minimize impact of ftrace while within xmon.

Before:
  /sys/kernel/debug/tracing# grep -ci xmon available_filter_functions
  26

After:
  /sys/kernel/debug/tracing# grep -ci xmon available_filter_functions
  0

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Use $(subst ..) on KBUILD_CFLAGS rather than CFLAGS_REMOVE_xxx]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15 15:30:17 +10:00
Michael Ellerman
df4c798318 powerpc/xive: Fix section mismatch warnings
Both xive_core_init() and xive_native_init() are called from and call
__init routines, so they should also be __init.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:41:02 +10:00
Christophe Leroy
de41ef6e4d powerpc/8xx: Move mpc8xx_pic.c from sysdev to platform/8xx
mpc8xx_pic.c is dedicated to the 8xx, so move it to platform/8xx

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:32:07 +10:00
Christophe Leroy
ea16e83aec powerpc/cpm1: link to CONFIG_CPM1 instead of CONFIG_8xx
To remain consistent with what is done with CPM2, let's link
CPM1 related parts to CONFIG_CPM1 instead of CONFIG_8xx

When something depends on both CPM1 and CPM2 we associate it
with CONFIG_CPM

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:32:05 +10:00