Commit Graph

315 Commits

Author SHA1 Message Date
Jiri Kosina
ec527c3180 x86/power: Fix 'nosmt' vs hibernation triple fault during resume
As explained in

	0cc3cd2165 ("cpu/hotplug: Boot HT siblings at least once")

we always, no matter what, have to bring up x86 HT siblings during boot at
least once in order to avoid first MCE bringing the system to its knees.

That means that whenever 'nosmt' is supplied on the kernel command-line,
all the HT siblings are as a result sitting in mwait or cpudile after
going through the online-offline cycle at least once.

This causes a serious issue though when a kernel, which saw 'nosmt' on its
commandline, is going to perform resume from hibernation: if the resume
from the hibernated image is successful, cr3 is flipped in order to point
to the address space of the kernel that is being resumed, which in turn
means that all the HT siblings are all of a sudden mwaiting on address
which is no longer valid.

That results in triple fault shortly after cr3 is switched, and machine
reboots.

Fix this by always waking up all the SMT siblings before initiating the
'restore from hibernation' process; this guarantees that all the HT
siblings will be properly carried over to the resumed kernel waiting in
resume_play_dead(), and acted upon accordingly afterwards, based on the
target kernel configuration.

Symmetricaly, the resumed kernel has to push the SMT siblings to mwait
again in case it has SMT disabled; this means it has to online all
the siblings when resuming (so that they come out of hlt) and offline
them again to let them reach mwait.

Cc: 4.19+ <stable@vger.kernel.org> # v4.19+
Debugged-by: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0cc3cd2165 ("cpu/hotplug: Boot HT siblings at least once")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-03 12:02:03 +02:00
Linus Torvalds
a0e928ed7c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Ingo Molnar:
 "This cycle had the following changes:

   - Timer tracing improvements (Anna-Maria Gleixner)

   - Continued tasklet reduction work: remove the hrtimer_tasklet
     (Thomas Gleixner)

   - Fix CPU hotplug remove race in the tick-broadcast mask handling
     code (Thomas Gleixner)

   - Force upper bound for setting CLOCK_REALTIME, to fix ABI
     inconsistencies with handling values that are close to the maximum
     supported and the vagueness of when uptime related wraparound might
     occur. Make the consistent maximum the year 2232 across all
     relevant ABIs and APIs. (Thomas Gleixner)

   - various cleanups and smaller fixes"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick: Fix typos in comments
  tick/broadcast: Fix warning about undefined tick_broadcast_oneshot_offline()
  timekeeping: Force upper bound for setting CLOCK_REALTIME
  timer/trace: Improve timer tracing
  timer/trace: Replace deprecated vsprintf pointer extension %pf by %ps
  timer: Move trace point to get proper index
  tick/sched: Update tick_sched struct documentation
  tick: Remove outgoing CPU from broadcast masks
  timekeeping: Consistently use unsigned int for seqcount snapshot
  softirq: Remove tasklet_hrtimer
  xfrm: Replace hrtimer tasklet with softirq hrtimer
  mac80211_hwsim: Replace hrtimer tasklet with softirq hrtimer
2019-05-06 14:50:46 -07:00
Linus Torvalds
5a2bf1abbf Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug updates from Ingo Molnar:
 "Two changes in this cycle:

   - Make the /sys/devices/system/cpu/smt/* files available on all
     arches, so user space has a consistent way to detect whether SMT is
     enabled.

   - Sparse annotation fix"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  smpboot: Place the __percpu annotation correctly
  cpu/hotplug: Create SMT sysfs interface for all arches
2019-05-06 14:44:49 -07:00
Linus Torvalds
e00d413575 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Make nohz housekeeping processing more permissive and less
     intrusive to isolated CPUs

   - Decouple CPU-bound workqueue acconting from the scheduler and move
     it into the workqueue code.

   - Optimize topology building

   - Better handle quota and period overflows

   - Add more RCU annotations

   - Comment updates, misc cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  nohz_full: Allow the boot CPU to be nohz_full
  sched/isolation: Require a present CPU in housekeeping mask
  kernel/cpu: Allow non-zero CPU to be primary for suspend / kexec freeze
  power/suspend: Add function to disable secondaries for suspend
  sched/core: Allow the remote scheduler tick to be started on CPU0
  sched/nohz: Run NOHZ idle load balancer on HK_FLAG_MISC CPUs
  sched/debug: Fix spelling mistake "logaritmic" -> "logarithmic"
  sched/topology: Update init_sched_domains() comment
  cgroup/cpuset: Update stale generate_sched_domains() comments
  sched/core: Check quota and period overflow at usec to nsec conversion
  sched/core: Handle overflow in cpu_shares_write_u64
  sched/rt: Check integer overflow at usec to nsec conversion
  sched/core: Fix typo in comment
  sched/core: Make some functions static
  sched/core: Unify p->on_rq updates
  sched/core: Remove ttwu_activate()
  sched/core, workqueues: Distangle worker accounting from rq lock
  sched/fair: Remove unneeded prototype of capacity_of()
  sched/topology: Skip duplicate group rewrites in build_sched_groups()
  sched/topology: Fix build_sched_groups() comment
  ...
2019-05-06 14:31:50 -07:00
Linus Torvalds
0a499fc5c3 Merge branch 'core-speculation-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull speculation mitigation update from Ingo Molnar:
 "This adds the "mitigations=" bootline option, which offers a
  cross-arch set of options that will work on x86, PowerPC and s390 that
  will map to the arch specific option internally"

* 'core-speculation-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  s390/speculation: Support 'mitigations=' cmdline option
  powerpc/speculation: Support 'mitigations=' cmdline option
  x86/speculation: Support 'mitigations=' cmdline option
  cpu/speculation: Add 'mitigations=' cmdline option
2019-05-06 13:01:16 -07:00
Nicholas Piggin
9ca12ac04b kernel/cpu: Allow non-zero CPU to be primary for suspend / kexec freeze
This patch provides an arch option, ARCH_SUSPEND_NONZERO_CPU, to
opt-in to allowing suspend to occur on one of the housekeeping CPUs
rather than hardcoded CPU0.

This will allow CPU0 to be a nohz_full CPU with a later change.

It may be possible for platforms with hardware/firmware restrictions
on suspend/wake effectively support this by handing off the final
stage to CPU0 when kernel housekeeping is no longer required. Another
option is to make housekeeping / nohz_full mask dynamic at runtime,
but the complexity could not be justified at this time.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lkml.kernel.org/r/20190411033448.20842-4-npiggin@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-03 19:42:58 +02:00
Josh Poimboeuf
98af845294 cpu/speculation: Add 'mitigations=' cmdline option
Keeping track of the number of mitigations for all the CPU speculation
bugs has become overwhelming for many users.  It's getting more and more
complicated to decide which mitigations are needed for a given
architecture.  Complicating matters is the fact that each arch tends to
have its own custom way to mitigate the same vulnerability.

Most users fall into a few basic categories:

a) they want all mitigations off;

b) they want all reasonable mitigations on, with SMT enabled even if
   it's vulnerable; or

c) they want all reasonable mitigations on, with SMT disabled if
   vulnerable.

Define a set of curated, arch-independent options, each of which is an
aggregation of existing options:

- mitigations=off: Disable all mitigations.

- mitigations=auto: [default] Enable all the default mitigations, but
  leave SMT enabled, even if it's vulnerable.

- mitigations=auto,nosmt: Enable all the default mitigations, disabling
  SMT if needed by a mitigation.

Currently, these options are placeholders which don't actually do
anything.  They will be fleshed out in upcoming patches.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86)
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/b07a8ef9b7c5055c3a4637c87d07c296d5016fe0.1555085500.git.jpoimboe@redhat.com
2019-04-17 21:37:28 +02:00
Josh Poimboeuf
de7b77e5bb cpu/hotplug: Create SMT sysfs interface for all arches
Make the /sys/devices/system/cpu/smt/* files available on all arches, so
user space has a consistent way to detect whether SMT is enabled.

The 'control' file now shows 'notimplemented' for architectures which
don't yet have CONFIG_HOTPLUG_SMT.

[ tglx: Make notimplemented a real state ]

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Link: https://lkml.kernel.org/r/469c2b98055f2c41e75748e06447d592a64080c9.1553635520.git.jpoimboe@redhat.com
2019-04-02 12:36:56 +02:00
Thomas Gleixner
206b92353c cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n
Tianyu reported a crash in a CPU hotplug teardown callback when booting a
kernel which has CONFIG_HOTPLUG_CPU disabled with the 'nosmt' boot
parameter.

It turns out that the SMP=y CONFIG_HOTPLUG_CPU=n case has been broken
forever in case that a bringup callback fails. Unfortunately this issue was
not recognized when the CPU hotplug code was reworked, so the shortcoming
just stayed in place.

When a bringup callback fails, the CPU hotplug code rolls back the
operation and takes the CPU offline.

The 'nosmt' command line argument uses a bringup failure to abort the
bringup of SMT sibling CPUs. This partial bringup is required due to the
MCE misdesign on Intel CPUs.

With CONFIG_HOTPLUG_CPU=y the rollback works perfectly fine, but
CONFIG_HOTPLUG_CPU=n lacks essential mechanisms to exercise the low level
teardown of a CPU including the synchronizations in various facilities like
RCU, NOHZ and others.

As a consequence the teardown callbacks which must be executed on the
outgoing CPU within stop machine with interrupts disabled are executed on
the control CPU in interrupt enabled and preemptible context causing the
kernel to crash and burn. The pre state machine code has a different
failure mode which is more subtle and resulting in a less obvious use after
free crash because the control side frees resources which are still in use
by the undead CPU.

But this is not a x86 only problem. Any architecture which supports the
SMP=y HOTPLUG_CPU=n combination suffers from the same issue. It's just less
likely to be triggered because in 99.99999% of the cases all bringup
callbacks succeed.

The easy solution of making HOTPLUG_CPU mandatory for SMP is not working on
all architectures as the following architectures have either no hotplug
support at all or not all subarchitectures support it:

 alpha, arc, hexagon, openrisc, riscv, sparc (32bit), mips (partial).

Crashing the kernel in such a situation is not an acceptable state
either.

Implement a minimal rollback variant by limiting the teardown to the point
where all regular teardown callbacks have been invoked and leave the CPU in
the 'dead' idle state. This has the following consequences:

 - the CPU is brought down to the point where the stop_machine takedown
   would happen.

 - the CPU stays there forever and is idle

 - The CPU is cleared in the CPU active mask, but not in the CPU online
   mask which is a legit state.

 - Interrupts are not forced away from the CPU

 - All facilities which only look at online mask would still see it, but
   that is the case during normal hotplug/unplug operations as well. It's
   just a (way) longer time frame.

This will expose issues, which haven't been exposed before or only seldom,
because now the normally transient state of being non active but online is
a permanent state. In testing this exposed already an issue vs. work queues
where the vmstat code schedules work on the almost dead CPU which ends up
in an unbound workqueue and triggers 'preemtible context' warnings. This is
not a problem of this change, it merily exposes an already existing issue.
Still this is better than crashing fully without a chance to debug it.

This is mainly thought as workaround for those architectures which do not
support HOTPLUG_CPU. All others should enforce HOTPLUG_CPU for SMP.

Fixes: 2e1a3483ce ("cpu/hotplug: Split out the state walk into functions")
Reported-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Mukesh Ojha <mojha@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Rik van Riel <riel@surriel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Micheal Kelley <michael.h.kelley@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190326163811.503390616@linutronix.de
2019-03-28 13:34:58 +01:00
Thomas Gleixner
1b72d43237 tick: Remove outgoing CPU from broadcast masks
Valentin reported that unplugging a CPU occasionally results in a warning
in the tick broadcast code which is triggered when an offline CPU is in the
broadcast mask.

This happens because the outgoing CPU is not removing itself from the
broadcast masks, especially not from the broadcast_force_mask. The removal
happens on the control CPU after the outgoing CPU is dead. It's a long
standing issue, but the warning is harmless.

Rework the hotplug mechanism so that the outgoing CPU removes itself from
the broadcast masks after disabling interrupts and removing itself from the
online mask.

Reported-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1903211540180.1784@nanos.tec.linutronix.de
2019-03-23 18:26:43 +01:00
Ingo Molnar
31fe3cbbf2 Merge tag 'v5.0-rc5' into locking/core to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04 08:57:24 +01:00
Josh Poimboeuf
b284909aba cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
With the following commit:

  73d5e2b472 ("cpu/hotplug: detect SMT disabled by BIOS")

... the hotplug code attempted to detect when SMT was disabled by BIOS,
in which case it reported SMT as permanently disabled.  However, that
code broke a virt hotplug scenario, where the guest is booted with only
primary CPU threads, and a sibling is brought online later.

The problem is that there doesn't seem to be a way to reliably
distinguish between the HW "SMT disabled by BIOS" case and the virt
"sibling not yet brought online" case.  So the above-mentioned commit
was a bit misguided, as it permanently disabled SMT for both cases,
preventing future virt sibling hotplugs.

Going back and reviewing the original problems which were attempted to
be solved by that commit, when SMT was disabled in BIOS:

  1) /sys/devices/system/cpu/smt/control showed "on" instead of
     "notsupported"; and

  2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.

I'd propose that we instead consider #1 above to not actually be a
problem.  Because, at least in the virt case, it's possible that SMT
wasn't disabled by BIOS and a sibling thread could be brought online
later.  So it makes sense to just always default the smt control to "on"
to allow for that possibility (assuming cpuid indicates that the CPU
supports SMT).

The real problem is #2, which has a simple fix: change vmx_vm_init() to
query the actual current SMT state -- i.e., whether any siblings are
currently online -- instead of looking at the SMT "control" sysfs value.

So fix it by:

  a) reverting the original "fix" and its followup fix:

     73d5e2b472 ("cpu/hotplug: detect SMT disabled by BIOS")
     bc2d8d262c ("cpu/hotplug: Fix SMT supported evaluation")

     and

  b) changing vmx_vm_init() to query the actual current SMT state --
     instead of the sysfs control value -- to determine whether the L1TF
     warning is needed.  This also requires the 'sched_smt_present'
     variable to exported, instead of 'cpu_smt_control'.

Fixes: 73d5e2b472 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Joe Mario <jmario@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
2019-01-30 19:27:00 +01:00
Zhenzhong Duan
34d66caf25 x86/speculation: Remove redundant arch_smt_update() invocation
With commit a74cfffb03 ("x86/speculation: Rework SMT state change"),
arch_smt_update() is invoked from each individual CPU hotplug function.

Therefore the extra arch_smt_update() call in the sysfs SMT control is
redundant.

Fixes: a74cfffb03 ("x86/speculation: Rework SMT state change")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <konrad.wilk@oracle.com>
Cc: <dwmw@amazon.co.uk>
Cc: <bp@suse.de>
Cc: <srinivas.eeda@oracle.com>
Cc: <peterz@infradead.org>
Cc: <hpa@zytor.com>
Link: https://lkml.kernel.org/r/e2e064f2-e8ef-42ca-bf4f-76b612964752@default
2019-01-29 22:20:24 +01:00
Valentin Schneider
ce48c457b9 cpu/hotplug: Mute hotplug lockdep during init
Since we've had:

  commit cb538267ea ("jump_label/lockdep: Assert we hold the hotplug lock for _cpuslocked() operations")

we've been getting some lockdep warnings during init, such as on HiKey960:

[    0.820495] WARNING: CPU: 4 PID: 0 at kernel/cpu.c:316 lockdep_assert_cpus_held+0x3c/0x48
[    0.820498] Modules linked in:
[    0.820509] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G S                4.20.0-rc5-00051-g4cae42a #34
[    0.820511] Hardware name: HiKey960 (DT)
[    0.820516] pstate: 600001c5 (nZCv dAIF -PAN -UAO)
[    0.820520] pc : lockdep_assert_cpus_held+0x3c/0x48
[    0.820523] lr : lockdep_assert_cpus_held+0x38/0x48
[    0.820526] sp : ffff00000a9cbe50
[    0.820528] x29: ffff00000a9cbe50 x28: 0000000000000000
[    0.820533] x27: 00008000b69e5000 x26: ffff8000bff4cfe0
[    0.820537] x25: ffff000008ba69e0 x24: 0000000000000001
[    0.820541] x23: ffff000008fce000 x22: ffff000008ba70c8
[    0.820545] x21: 0000000000000001 x20: 0000000000000003
[    0.820548] x19: ffff00000a35d628 x18: ffffffffffffffff
[    0.820552] x17: 0000000000000000 x16: 0000000000000000
[    0.820556] x15: ffff00000958f848 x14: 455f3052464d4d34
[    0.820559] x13: 00000000769dde98 x12: ffff8000bf3f65a8
[    0.820564] x11: 0000000000000000 x10: ffff00000958f848
[    0.820567] x9 : ffff000009592000 x8 : ffff00000958f848
[    0.820571] x7 : ffff00000818ffa0 x6 : 0000000000000000
[    0.820574] x5 : 0000000000000000 x4 : 0000000000000001
[    0.820578] x3 : 0000000000000000 x2 : 0000000000000001
[    0.820582] x1 : 00000000ffffffff x0 : 0000000000000000
[    0.820587] Call trace:
[    0.820591]  lockdep_assert_cpus_held+0x3c/0x48
[    0.820598]  static_key_enable_cpuslocked+0x28/0xd0
[    0.820606]  arch_timer_check_ool_workaround+0xe8/0x228
[    0.820610]  arch_timer_starting_cpu+0xe4/0x2d8
[    0.820615]  cpuhp_invoke_callback+0xe8/0xd08
[    0.820619]  notify_cpu_starting+0x80/0xb8
[    0.820625]  secondary_start_kernel+0x118/0x1d0

We've also had a similar warning in sched_init_smp() for every
asymmetric system that would enable the sched_asym_cpucapacity static
key, although that was singled out in:

  commit 40fa3780ba ("sched/core: Take the hotplug lock in sched_init_smp()")

Those warnings are actually harmless, since we cannot have hotplug
operations at the time they appear. Instead of starting to sprinkle
useless hotplug lock operations in the init codepaths, mute the
warnings until they start warning about real problems.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: cai@gmx.us
Cc: daniel.lezcano@linaro.org
Cc: dietmar.eggemann@arm.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: longman@redhat.com
Cc: marc.zyngier@arm.com
Cc: mark.rutland@arm.com
Link: https://lkml.kernel.org/r/1545243796-23224-2-git-send-email-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-01-21 11:18:53 +01:00
Thomas Gleixner
a74cfffb03 x86/speculation: Rework SMT state change
arch_smt_update() is only called when the sysfs SMT control knob is
changed. This means that when SMT is enabled in the sysfs control knob the
system is considered to have SMT active even if all siblings are offline.

To allow finegrained control of the speculation mitigations, the actual SMT
state is more interesting than the fact that siblings could be enabled.

Rework the code, so arch_smt_update() is invoked from each individual CPU
hotplug function, and simplify the update function while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de
2018-11-28 11:57:07 +01:00
Linus Torvalds
d82924c3b8 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti updates from Ingo Molnar:
 "The main changes:

   - Make the IBPB barrier more strict and add STIBP support (Jiri
     Kosina)

   - Micro-optimize and clean up the entry code (Andy Lutomirski)

   - ... plus misc other fixes"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation: Propagate information about RSB filling mitigation to sysfs
  x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation
  x86/speculation: Apply IBPB more strictly to avoid cross-process data leak
  x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant
  x86/CPU: Fix unused variable warning when !CONFIG_IA32_EMULATION
  x86/pti/64: Remove the SYSCALL64 entry trampoline
  x86/entry/64: Use the TSS sp2 slot for SYSCALL/SYSRET scratch space
  x86/entry/64: Document idtentry
2018-10-23 18:43:04 +01:00
Linus Torvalds
42f52e1c59 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes are:

   - Migrate CPU-intense 'misfit' tasks on asymmetric capacity systems,
     to better utilize (much) faster 'big core' CPUs. (Morten Rasmussen,
     Valentin Schneider)

   - Topology handling improvements, in particular when CPU capacity
     changes and related load-balancing fixes/improvements (Morten
     Rasmussen)

   - ... plus misc other improvements, fixes and updates"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
  sched/completions/Documentation: Add recommendation for dynamic and ONSTACK completions
  sched/completions/Documentation: Clean up the document some more
  sched/completions/Documentation: Fix a couple of punctuation nits
  cpu/SMT: State SMT is disabled even with nosmt and without "=force"
  sched/core: Fix comment regarding nr_iowait_cpu() and get_iowait_load()
  sched/fair: Remove setting task's se->runnable_weight during PELT update
  sched/fair: Disable LB_BIAS by default
  sched/pelt: Fix warning and clean up IRQ PELT config
  sched/topology: Make local variables static
  sched/debug: Use symbolic names for task state constants
  sched/numa: Remove unused numa_stats::nr_running field
  sched/numa: Remove unused code from update_numa_stats()
  sched/debug: Explicitly cast sched_feat() to bool
  sched/core: Disable SD_PREFER_SIBLING on asymmetric CPU capacity domains
  sched/fair: Don't move tasks to lower capacity CPUs unless necessary
  sched/fair: Set rq->rd->overload when misfit
  sched/fair: Wrap rq->rd->overload accesses with READ/WRITE_ONCE()
  sched/core: Change root_domain->overload type to int
  sched/fair: Change 'prefer_sibling' type to bool
  sched/fair: Kick nohz balance if rq->misfit_task_load
  ...
2018-10-23 15:00:03 +01:00
Borislav Petkov
d0e7d14455 cpu/SMT: State SMT is disabled even with nosmt and without "=force"
When booting with "nosmt=force" a message is issued into dmesg to
confirm that SMT has been force-disabled but such a message is not
issued when only "nosmt" is on the kernel command line.

Fix that.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181004172227.10094-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-05 10:20:31 +02:00
Jiri Kosina
53c613fe63 x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation
STIBP is a feature provided by certain Intel ucodes / CPUs. This feature
(once enabled) prevents cross-hyperthread control of decisions made by
indirect branch predictors.

Enable this feature if

- the CPU is vulnerable to spectre v2
- the CPU supports SMT and has SMT siblings online
- spectre_v2 mitigation autoselection is enabled (default)

After some previous discussion, this leaves STIBP on all the time, as wrmsr
on crossing kernel boundary is a no-no. This could perhaps later be a bit
more optimized (like disabling it in NOHZ, experiment with disabling it in
idle, etc) if needed.

Note that the synchronization of the mask manipulation via newly added
spec_ctrl_mutex is currently not strictly needed, as the only updater is
already being serialized by cpu_add_remove_lock, but let's make this a
little bit more future-proof.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pm
2018-09-26 14:26:52 +02:00
Peter Zijlstra
cb92173d1f locking/lockdep, cpu/hotplug: Annotate AP thread
Anybody trying to assert the cpu_hotplug_lock is held (lockdep_assert_cpus_held())
from AP callbacks will fail, because the lock is held by the BP.

Stick in an explicit annotation in cpuhp_thread_fun() to make this work.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-tip-commits@vger.kernel.org
Fixes: cb538267ea ("jump_label/lockdep: Assert we hold the hotplug lock for _cpuslocked() operations")
Link: http://lkml.kernel.org/r/20180911095127.GT24082@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-11 20:01:03 +02:00
Thomas Gleixner
69fa6eb7d6 cpu/hotplug: Prevent state corruption on error rollback
When a teardown callback fails, the CPU hotplug code brings the CPU back to
the previous state. The previous state becomes the new target state. The
rollback happens in undo_cpu_down() which increments the state
unconditionally even if the state is already the same as the target.

As a consequence the next CPU hotplug operation will start at the wrong
state. This is easily to observe when __cpu_disable() fails.

Prevent the unconditional undo by checking the state vs. target before
incrementing state and fix up the consequently wrong conditional in the
unplug code which handles the failure of the final CPU take down on the
control CPU side.

Fixes: 4dddfb5faa ("smp/hotplug: Rewrite AP state machine core")
Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: josh@joshtriplett.org
Cc: peterz@infradead.org
Cc: jiangshanlai@gmail.com
Cc: dzickus@redhat.com
Cc: brendan.jackman@arm.com
Cc: malat@debian.org
Cc: sramana@codeaurora.org
Cc: linux-arm-msm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1809051419580.1416@nanos.tec.linutronix.de

----
2018-09-06 15:21:38 +02:00
Neeraj Upadhyay
f8b7530aa0 cpu/hotplug: Adjust misplaced smb() in cpuhp_thread_fun()
The smp_mb() in cpuhp_thread_fun() is misplaced. It needs to be after the
load of st->should_run to prevent reordering of the later load/stores
w.r.t. the load of st->should_run.

Fixes: 4dddfb5faa ("smp/hotplug: Rewrite AP state machine core")
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infraded.org>
Cc: josh@joshtriplett.org
Cc: peterz@infradead.org
Cc: jiangshanlai@gmail.com
Cc: dzickus@redhat.com
Cc: brendan.jackman@arm.com
Cc: malat@debian.org
Cc: mojha@codeaurora.org
Cc: sramana@codeaurora.org
Cc: linux-arm-msm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1536126727-11629-1-git-send-email-neeraju@codeaurora.org
2018-09-06 15:21:37 +02:00
Mukesh Ojha
6fb86d9720 cpu/hotplug: Remove skip_onerr field from cpuhp_step structure
When notifiers were there, `skip_onerr` was used to avoid calling
particular step startup/teardown callbacks in the CPU up/down rollback
path, which made the hotplug asymmetric.

As notifiers are gone now after the full state machine conversion, the
`skip_onerr` field is no longer required.

Remove it from the structure and its usage.

Signed-off-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1535439294-31426-1-git-send-email-mojha@codeaurora.org
2018-08-31 14:13:03 +02:00
Abel Vesa
269777aa53 cpu/hotplug: Non-SMP machines do not make use of booted_once
Commit 0cc3cd2165 ("cpu/hotplug: Boot HT siblings at least once")
breaks non-SMP builds.

[ I suspect the 'bool' fields should just be made to be bitfields and be
  exposed regardless of configuration, but that's a separate cleanup
  that I'll leave to the owners of this file for later.   - Linus ]

Fixes: 0cc3cd2165 ("cpu/hotplug: Boot HT siblings at least once")
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Abel Vesa <abelvesa@linux.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-14 15:00:00 -07:00
Linus Torvalds
b018fc9800 Merge tag 'pm-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
 "These add a new framework for CPU idle time injection, to be used by
  all of the idle injection code in the kernel in the future, fix some
  issues and add a number of relatively small extensions in multiple
  places.

  Specifics:

   - Add a new framework for CPU idle time injection (Daniel Lezcano).

   - Add AVS support to the armada-37xx cpufreq driver (Gregory
     CLEMENT).

   - Add support for current CPU frequency reporting to the ACPI CPPC
     cpufreq driver (George Cherian).

   - Rework the cooling device registration in the imx6q/thermal driver
     (Bastian Stender).

   - Make the pcc-cpufreq driver refuse to work with dynamic scaling
     governors on systems with many CPUs to avoid scalability issues
     with it (Rafael Wysocki).

   - Fix the intel_pstate driver to report different maximum CPU
     frequencies on systems where they really are different and to
     ignore the turbo active ratio if hardware-managend P-states (HWP)
     are in use; make it use the match_string() helper (Xie Yisheng,
     Srinivas Pandruvada).

   - Fix a minor deferred probe issue in the qcom-kryo cpufreq driver
     (Niklas Cassel).

   - Add a tracepoint for the tracking of frequency limits changes (from
     Andriod) to the cpufreq core (Ruchi Kandoi).

   - Fix a circular lock dependency between CPU hotplug and sysfs
     locking in the cpufreq core reported by lockdep (Waiman Long).

   - Avoid excessive error reports on driver registration failures in
     the ARM cpuidle driver (Sudeep Holla).

   - Add a new device links flag to the driver core to make links go
     away automatically on supplier driver removal (Vivek Gautam).

   - Eliminate potential race condition between system-wide power
     management transitions and system shutdown (Pingfan Liu).

   - Add a quirk to save NVS memory on system suspend for the ASUS 1025C
     laptop (Willy Tarreau).

   - Make more systems use suspend-to-idle (instead of ACPI S3) by
     default (Tristian Celestin).

   - Get rid of stack VLA usage in the low-level hibernation code on
     64-bit x86 (Kees Cook).

   - Fix error handling in the hibernation core and mark an expected
     fall-through switch in it (Chengguang Xu, Gustavo Silva).

   - Extend the generic power domains (genpd) framework to support
     attaching a device to a power domain by name (Ulf Hansson).

   - Fix device reference counting and user limits initialization in the
     devfreq core (Arvind Yadav, Matthias Kaehlcke).

   - Fix a few issues in the rk3399_dmc devfreq driver and improve its
     documentation (Enric Balletbo i Serra, Lin Huang, Nick Milner).

   - Drop a redundant error message from the exynos-ppmu devfreq driver
     (Markus Elfring)"

* tag 'pm-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits)
  PM / reboot: Eliminate race between reboot and suspend
  PM / hibernate: Mark expected switch fall-through
  cpufreq: intel_pstate: Ignore turbo active ratio in HWP
  cpufreq: Fix a circular lock dependency problem
  cpu/hotplug: Add a cpus_read_trylock() function
  x86/power/hibernate_64: Remove VLA usage
  cpufreq: trace frequency limits change
  cpufreq: intel_pstate: Show different max frequency with turbo 3 and HWP
  cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems
  cpufreq: qcom-kryo: Silently error out on EPROBE_DEFER
  cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
  cpufreq: armada-37xx: Add AVS support
  dt-bindings: marvell: Add documentation for the Armada 3700 AVS binding
  PM / devfreq: rk3399_dmc: Fix duplicated opp table on reload.
  PM / devfreq: Init user limits from OPP limits, not viceversa
  PM / devfreq: rk3399_dmc: fix spelling mistakes.
  PM / devfreq: rk3399_dmc: do not print error when get supply and clk defer.
  dt-bindings: devfreq: rk3399_dmc: move interrupts to be optional.
  PM / devfreq: rk3399_dmc: remove wait for dcf irq event.
  dt-bindings: clock: add rk3399 DDR3 standard speed bins.
  ...
2018-08-14 13:12:24 -07:00