Commit Graph

277 Commits

Author SHA1 Message Date
Linus Torvalds ca520cab25 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking and atomic updates from Ingo Molnar:
 "Main changes in this cycle are:

   - Extend atomic primitives with coherent logic op primitives
     (atomic_{or,and,xor}()) and deprecate the old partial APIs
     (atomic_{set,clear}_mask())

     The old ops were incoherent with incompatible signatures across
     architectures and with incomplete support.  Now every architecture
     supports the primitives consistently (by Peter Zijlstra)

   - Generic support for 'relaxed atomics':

       - _acquire/release/relaxed() flavours of xchg(), cmpxchg() and {add,sub}_return()
       - atomic_read_acquire()
       - atomic_set_release()

     This came out of porting qwrlock code to arm64 (by Will Deacon)

   - Clean up the fragile static_key APIs that were causing repeat bugs,
     by introducing a new one:

       DEFINE_STATIC_KEY_TRUE(name);
       DEFINE_STATIC_KEY_FALSE(name);

     which define a key of different types with an initial true/false
     value.

     Then allow:

       static_branch_likely()
       static_branch_unlikely()

     to take a key of either type and emit the right instruction for the
     case.  To be able to know the 'type' of the static key we encode it
     in the jump entry (by Peter Zijlstra)

   - Static key self-tests (by Jason Baron)

   - qrwlock optimizations (by Waiman Long)

   - small futex enhancements (by Davidlohr Bueso)

   - ... and misc other changes"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
  jump_label/x86: Work around asm build bug on older/backported GCCs
  locking, ARM, atomics: Define our SMP atomics in terms of _relaxed() operations
  locking, include/llist: Use linux/atomic.h instead of asm/cmpxchg.h
  locking/qrwlock: Make use of _{acquire|release|relaxed}() atomics
  locking/qrwlock: Implement queue_write_unlock() using smp_store_release()
  locking/lockref: Remove homebrew cmpxchg64_relaxed() macro definition
  locking, asm-generic: Add _{relaxed|acquire|release}() variants for 'atomic_long_t'
  locking, asm-generic: Rework atomic-long.h to avoid bulk code duplication
  locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations
  locking, compiler.h: Cast away attributes in the WRITE_ONCE() magic
  locking/static_keys: Make verify_keys() static
  jump label, locking/static_keys: Update docs
  locking/static_keys: Provide a selftest
  jump_label: Provide a self-test
  s390/uaccess, locking/static_keys: employ static_branch_likely()
  x86, tsc, locking/static_keys: Employ static_branch_likely()
  locking/static_keys: Add selftest
  locking/static_keys: Add a new static_key interface
  locking/static_keys: Rework update logic
  locking/static_keys: Add static_key_{en,dis}able() helpers
  ...
2015-09-03 15:46:07 -07:00
Vineet Gupta 9b28829d6d ARCv2: perf: Finally introduce HS perf unit
With all features in place, the ARC HS pct block can now be effectively
allowed to be probed/used

Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-27 14:59:07 +05:30
Alexey Brodkin e6b1d126bb ARCv2: perf: implement exclusion of event counting in user or kernel mode
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-27 14:58:14 +05:30
Alexey Brodkin 36481cf7fb ARCv2: perf: Support sampling events using overflow interrupts
In times of ARC 700 performance counters didn't have support of
interrupt an so for ARC we only had support of non-sampling events.

Put simply only "perf stat" was functional.

Now with ARC HS we have support of interrupts in performance counters
which this change introduces support of.

ARC performance counters act in the following way in regard of
interrupts generation.
 [1] A counter counts starting from value set in PCT_COUNT register pair
 [2] Once counter reaches value set in PCT_INT_CNT interrupt is raised

Basic setup look like this:
 [1] PCT_COUNT = 0;
 [2] PCT_INT_CNT = __limit_value__;
 [3] Enable interrupts for that counter and let it run
 [4] Let counter reach its limit
 [5] Handle interrupt when it happens

Note that PCT HW block is build in CPU core and so ints interrupt
line (which is basically OR of all counters IRQs) is wired directly to
top-level IRQC. That means do de-assert PCT interrupt it's required to
reset IRQs from all counters that have reached their limit values.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-27 14:57:43 +05:30
Vineet Gupta fb7c572551 ARC: perf: cap the number of counters to hardware max of 32
The number of counters in PCT can never be more than 32 (while
countable conditions could be 100+) for both ARCompact and ARCv2

And while at it update copyright dates.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-27 14:57:03 +05:30
Vineet Gupta 090749502f ARC: add/fix some comments in code - no functional change
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-20 19:05:49 +05:30
Yuriy Kolerov 6de6066c0d ARC: change some branchs to jumps to resolve linkage errors
When kernel's binary becomes large enough (32M and more) errors
may occur during the final linkage stage. It happens because
the build system uses short relocations for ARC  by default.
This problem may be easily resolved by passing -mlong-calls
option to GCC to use long absolute jumps (j) instead of short
relative branchs (b).

But there are fragments of pure assembler code exist which use
branchs in inappropriate places and cause a linkage error because
of relocations overflow.

First of these fragments is .fixup insertion in futex.h and
unaligned.c. It inserts a code in the separate section (.fixup)
with branch instruction. It leads to the linkage error when
kernel becomes large.

Second of these fragments is calling scheduler's functions
(common kernel code) from entry.S of ARC's code. When kernel's
binary becomes large it may lead to the linkage error because
scheduler may occur far enough from ARC's code in the final
binary.

Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-20 18:53:15 +05:30
Vineet Gupta eb2cd8b72b ARC: ensure futex ops are atomic in !LLSC config
W/o hardware assisted atomic r-m-w the best we can do is to disable
preemption.

Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-20 18:16:01 +05:30
Vineet Gupta 882a95ae0a ARC: make futex_atomic_cmpxchg_inatomic() return bimodal
Callers of cmpxchg_futex_value_locked() in futex code expect bimodal
return value:
  !0 (essentially -EFAULT as failure)
   0 (success)

Before this patch, the success return value was old value of futex,
which could very well be non zero, causing caller to possibly take the
failure path erroneously.

Fix that by returning 0 for success

(This fix was done back in 2011 for all upstream arches, which ARC
obviously missed)

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-20 18:16:00 +05:30
Vineet Gupta ed574e2bbd ARC: futex cosmetics
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-20 18:16:00 +05:30
Vineet Gupta 31d30c8208 ARC: add barriers to futex code
The atomic ops on futex need to provide the full barrier just like
regular atomics in kernel.

Also remove pagefault_enable/disable in futex_atomic_cmpxchg_inatomic()
as core code already does that

Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-20 18:15:59 +05:30
Alexey Brodkin f2b0b25a37 ARCv2: Support IO Coherency and permutations involving L1 and L2 caches
In case of ARCv2 CPU there're could be following configurations
that affect cache handling for data exchanged with peripherals
via DMA:
 [1] Only L1 cache exists
 [2] Both L1 and L2 exist, but no IO coherency unit
 [3] L1, L2 caches and IO coherency unit exist

Current implementation takes care of [1] and [2].
Moreover support of [2] is implemented with run-time check
for SLC existence which is not super optimal.

This patch introduces support of [3] and rework of DMA ops
usage. Instead of doing run-time check every time a particular
DMA op is executed we'll have 3 different implementations of
DMA ops and select appropriate one during init.

As for IOC support for it we need:
 [a] Implement empty DMA ops because IOC takes care of cache
     coherency with DMAed data
 [b] Route dma_alloc_coherent() via dma_alloc_noncoherent()
     This is required to make IOC work in first place and also
     serves as optimization as LD/ST to coherent buffers can be
     srviced from caches w/o going all the way to memory

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
[vgupta:
  -Added some comments about IOC gains
  -Marked dma ops as static,
  -Massaged changelog a bit]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-20 18:11:17 +05:30
Vineet Gupta 1097163870 ARCv2: spinlock/rwlock/atomics: reduce 1 instruction in exponential backoff
The increment of delay counter was 2 instructions:
Arithmatic Shfit Left (ASL) + set to 1 on overflow

This can be done in 1 using ROtate Left (ROL)

Suggested-by: Nigel Topham <ntopham@synopsys.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-07 13:56:16 +05:30
Vineet Gupta 87ce62802f ARC: Make pt_regs regs unsigned
KGDB fails to build after f51e2f1911 ("ARC: make sure instruction_pointer()
returns unsigned value")

The hack to force one specific reg to unsigned backfired. There's no
reason to keep the regs signed after all.

|  CC      arch/arc/kernel/kgdb.o
|../arch/arc/kernel/kgdb.c: In function 'kgdb_trap':
| ../arch/arc/kernel/kgdb.c:180:29: error: lvalue required as left operand of assignment
|   instruction_pointer(regs) -= BREAK_INSTR_SIZE;

Reported-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com>
Fixes: f51e2f1911 ("ARC: make sure instruction_pointer() returns unsigned value")
Cc: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-05 11:48:21 +05:30
Vineet Gupta b89aa12c17 ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle
The previous commit for delayed retry of SCOND needs some fine tuning
for spin locks.

The backoff from delayed retry in conjunction with spin looping of lock
itself can potentially cause the delay counter to reach high values.
So to provide fairness to any lock operation, after a lock "seems"
available (i.e. just before first SCOND try0, reset the delay counter
back to starting value of 1

Essentially reset delay to 1 for a new spin-wait-loop-acquire cycle.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-04 09:26:35 +05:30
Vineet Gupta e78fdfef84 ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff
This is to workaround the llock/scond livelock

HS38x4 could get into a LLOCK/SCOND livelock in case of multiple overlapping
coherency transactions in the SCU. The exclusive line state keeps rotating
among contenting cores leading to a never ending cycle. So break the cycle
by deferring the retry of failed exclusive access (SCOND). The actual delay
needed is function of number of contending cores as well as the unrelated
coherency traffic from other cores. To keep the code simple, start off with
small delay of 1 which would suffice most cases and in case of contention
double the delay. Eventually the delay is sufficient such that the coherency
pipeline is drained, thus a subsequent exclusive access would succeed.

Link: http://lkml.kernel.org/r/1438612568-28265-1-git-send-email-vgupta@synopsys.com
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-04 09:26:34 +05:30
Vineet Gupta 69cbe630f5 ARC: LLOCK/SCOND based rwlock
With LLOCK/SCOND, the rwlock counter can be atomically updated w/o need
for a guarding spin lock.

This in turn elides the EXchange instruction based spinning which causes
the cacheline transition to exclusive state and concurrent spinning
across cores would cause the line to keep bouncing around.
LLOCK/SCOND based implementation is superior as spinning on LLOCK keeps
the cacheline in shared state.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-04 09:26:33 +05:30
Vineet Gupta ae7eae9e03 ARC: LLOCK/SCOND based spin_lock
Current spin_lock uses EXchange instruction to implement the atomic test
and set of lock location (reads orig value and ST 1). This however forces
the cacheline into exclusive state (because of the ST) and concurrent
loops in multiple cores will bounce the line around between cores.

Instead, use LLOCK/SCOND to implement the atomic test and set which is
better as line is in shared state while lock is spinning on LLOCK

The real motivation of this change however is to make way for future
changes in atomics to implement delayed retry (with backoff).
Initial experiment with delayed retry in atomics combined with orig
EX based spinlock was a total disaster (broke even LMBench) as
struct sock has a cache line sharing an atomic_t and spinlock. The
tight spinning on lock, caused the atomic retry to keep backing off
such that it would never finish.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-04 09:26:33 +05:30
Vineet Gupta 8ac0665fb6 ARC: refactor atomic inline asm operands with symbolic names
This reduces the diff in forth-coming patches and also helps understand
better the incremental changes to inline asm.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-04 09:26:32 +05:30
Vineet Gupta f5959cb0c3 Revert "ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock"
Extended testing of quad core configuration revealed that this fix was
insufficient. Specifically LTP open posix shm_op/23-1 would cause the
hardware livelock in llock/scond loop in update_cpu_load_active()

So remove this and make way for a proper workaround

This reverts commit a5c8b52abe.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-04 09:26:31 +05:30
Vineet Gupta e13c42ecbe ARCv2: Fix the peripheral address space detection
With HS 2.1 release, the peripheral space register no longer contains
the uncached space specifics, causing the kernel to panic early on.
So read the newer NON VOLATILE AUX register to get that info.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-03 19:34:07 +05:30
Peter Zijlstra de9e432cb5 atomic: Collapse all atomic_{set,clear}_mask definitions
Move the now generic definitions of atomic_{set,clear}_mask() into
linux/atomic.h to avoid endless and pointless repetition.

Also, provide an atomic_andnot() wrapper for those few archs that can
implement that.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-27 14:06:24 +02:00
Peter Zijlstra e6942b7de2 atomic: Provide atomic_{or,xor,and}
Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-27 14:06:24 +02:00
Peter Zijlstra cda7e4137a arc: Provide atomic_{or,xor,and}
Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Acked-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-27 14:06:21 +02:00
Laurent Dufour f2abeef9fd mm: clean up per architecture MM hook header files
Commit 2ae416b142 ("mm: new mm hook framework") introduced an empty
header file (mm-arch-hooks.h) for every architecture, even those which
doesn't need to define mm hooks.

As suggested by Geert Uytterhoeven, this could be cleaned through the use
of a generic header file included via each per architecture
asm/include/Kbuild file.

The PowerPC architecture is not impacted here since this architecture has
to defined the arch_remap MM hook.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-17 16:39:53 -07:00