Commit Graph

271 Commits

Author SHA1 Message Date
Waiman Long
aa1567a7e6 intel_idle: Add ibrs_off module parameter to force-disable IBRS
Commit bf5835bcdb ("intel_idle: Disable IBRS during long idle")
disables IBRS when the cstate is 6 or lower. However, there are
some use cases where a customer may want to use max_cstate=1 to
lower latency. Such use cases will suffer from the performance
degradation caused by the enabling of IBRS in the sibling idle thread.
Add a "ibrs_off" module parameter to force disable IBRS and the
CPUIDLE_FLAG_IRQ_ENABLE flag if set.

In the case of a Skylake server with max_cstate=1, this new ibrs_off
option will likely increase the IRQ response latency as IRQ will now
be disabled.

When running SPECjbb2015 with cstates set to C1 on a Skylake system.

First test when the kernel is booted with: "intel_idle.ibrs_off":

  max-jOPS = 117828, critical-jOPS = 66047

Then retest when the kernel is booted without the "intel_idle.ibrs_off"
added:

  max-jOPS = 116408, critical-jOPS = 58958

That means booting with "intel_idle.ibrs_off" improves performance by:

  max-jOPS:      +1.2%, which could be considered noise range.
  critical-jOPS: +12%,  which is definitely a solid improvement.

The admin-guide/pm/intel_idle.rst file is updated to add a description
about the new "ibrs_off" module parameter.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20230727184600.26768-5-longman@redhat.com
2023-10-07 11:33:28 +02:00
Waiman Long
7506203089 intel_idle: Use __update_spec_ctrl() in intel_idle_ibrs()
When intel_idle_ibrs() is called, it modifies the SPEC_CTRL MSR to 0
in order disable IBRS. However, the new MSR value isn't reflected in
x86_spec_ctrl_current which is at odd with the other code that keep track
of its state in that percpu variable.  Use the new __update_spec_ctrl()
to have the x86_spec_ctrl_current percpu value properly updated.

Since spec-ctrl.h includes both msr.h and nospec-branch.h, we can remove
those from the include file list.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20230727184600.26768-4-longman@redhat.com
2023-10-07 11:33:28 +02:00
Linus Torvalds
1a7c611546 Merge tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event updates from Ingo Molnar:

 - AMD IBS improvements

 - Intel PMU driver updates

 - Extend core perf facilities & the ARM PMU driver to better handle ARM big.LITTLE events

 - Micro-optimize software events and the ring-buffer code

 - Misc cleanups & fixes

* tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/uncore: Remove unnecessary ?: operator around pcibios_err_to_errno() call
  perf/x86/intel: Add Crestmont PMU
  x86/cpu: Update Hybrids
  x86/cpu: Fix Crestmont uarch
  x86/cpu: Fix Gracemont uarch
  perf: Remove unused extern declaration arch_perf_get_page_size()
  perf: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  arm_pmu: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  perf/x86: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability
  perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
  perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
  perf/mem: Introduce PERF_MEM_LVLNUM_UNC
  perf/ring_buffer: Use local_try_cmpxchg in __perf_output_begin
  locking/arch: Avoid variable shadowing in local_try_cmpxchg()
  perf/core: Use local64_try_cmpxchg in perf_swevent_set_period
  perf/x86: Use local64_try_cmpxchg
  perf/amd: Prevent grouping of IBS events
2023-08-28 16:35:01 -07:00
Peter Zijlstra
882cdb06b6 x86/cpu: Fix Gracemont uarch
Alderlake N is an E-core only product using Gracemont
micro-architecture. It fits the pre-existing naming scheme perfectly
fine, adhere to it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230807150405.686834933@infradead.org
2023-08-09 21:51:06 +02:00
Rafael J. Wysocki
5534f44627 Revert "intel_idle: Add support for using intel_idle in a VM guest using just hlt"
This reverts commit 2f3d08f074 ("intel_idle: Add support for using
intel_idle in a VM guest using just hlt"), because it causes functional
issues to appear and it is not really useful without a related commit
that got reverted previously.

Link: https://lore.kernel.org/linux-pm/5c7de6d5-7706-c4a5-7c41-146db1269aff@intel.com
Reported-by: Xiaoyao Li <xiaoyao.li@intel.com>
Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-07-19 20:10:03 +02:00
Rafael J. Wysocki
a5155c023d Revert "intel_idle: Add a "Long HLT" C1 state for the VM guest mode"
This reverts commit 0fac214bb7 ("intel_idle: Add a "Long HLT" C1 state
for the VM guest mode"), because there is a coding mistake in it and its
validity is questioned.

Link: https://lore.kernel.org/all/20230711132553.GN3062772@hirez.programming.kicks-ass.net
Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-07-19 20:05:43 +02:00
Rafael J. Wysocki
d46b0a05bd Revert "intel_idle: Add __init annotation to matchup_vm_state_with_baremetal()"
This reverts commit b2918089d5 ("intel_idle: Add __init annotation to
matchup_vm_state_with_baremetal()"), because the commit fixed by it will
be reverted.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-07-19 19:56:08 +02:00
Rafael J. Wysocki
b2918089d5 intel_idle: Add __init annotation to matchup_vm_state_with_baremetal()
The caller of (recently added) matchup_vm_state_with_baremetal() is an
__init function and it uses some __initdata data structures, so add the
__init annotation to it for consistency.

This addresses the following build warnings:

WARNING: modpost: vmlinux: section mismatch in reference: matchup_vm_state_with_baremetal+0x51 (section: .text) -> intel_idle_max_cstate_reached (section: .init.text)
WARNING: modpost: vmlinux: section mismatch in reference: matchup_vm_state_with_baremetal+0x62 (section: .text) -> cpuidle_state_table (section: .init.data)
WARNING: modpost: vmlinux: section mismatch in reference: matchup_vm_state_with_baremetal+0x79 (section: .text) -> icpu (section: .init.data)

Fixes: 0fac214bb7 ("intel_idle: Add a "Long HLT" C1 state for the VM guest mode")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
2023-06-28 19:09:55 +02:00
Arjan van de Ven
0fac214bb7 intel_idle: Add a "Long HLT" C1 state for the VM guest mode
intel_idle will, for the bare metal case, usually have one or more deep
power states that have the CPUIDLE_FLAG_TLB_FLUSHED flag set. When
a state with this flag is selected by the cpuidle framework, it will also
flush the TLBs as part of entering this state. The benefit of doing this is
that the kernel does not need to wake the cpu out of this deep power state
just to flush the TLBs... for which the latency can be very high due to
the exit latency of deep power states.

In a VM guest currently, this benefit of avoiding the wakeup does not exist,
while the problem (long exit latency) is even more severe. Linux will need
to wake up a vCPU (causing the host to either come out of a deep C state,
or the VMM to have to deschedule something else to schedule the vCPU) which
can take a very long time.. adding a lot of latency to tlb flush operations
(including munmap and others).

To solve this, add a "Long HLT" C state to the state table for the VM guest
case that has the CPUIDLE_FLAG_TLB_FLUSHED flag set.  The result of that is
that for long idle periods (where the VMM is likely to do things that cause
large latency) the cpuidle framework will flush the TLBs (and avoid the
wakeups), while for short/quick idle durations, the existing behavior is
retained.

Now, there is still only "hlt" available in the guest, but for long idle,
the host can go to a deeper state (say C6).  There is a reasonable debate
one can have to what to set for the exit_latency and break even point for
this "Long HLT" state.  The good news is that intel_idle has these values
available for the underlying CPU (even when mwait is not exposed).  The
solution thus is to just use the latency and break even of the deepest state
from the bare metal CPU.  This is under the assumption that this is a pretty
reasonable estimate of what the VMM would do to cause latency.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-21 19:46:58 +02:00
Arjan van de Ven
2f3d08f074 intel_idle: Add support for using intel_idle in a VM guest using just hlt
In a typical VM guest, the mwait instruction is not available, leaving
only the 'hlt' instruction (which causes a VMEXIT to the host).

So for this common case, intel_idle will detect the lack of mwait, and
fail to initialize (after which another idle method would step in which
will just use hlt always).

Other (non-common) cases exist; the table below shows the before/after
for these:

+------------+--------------------------+-------------------------+
| Hypervisor | Idle method before patch | Idle method after patch |
| exposes    |                          |                         |
+============+==========================+=========================+
| nothing    | default_idle fallback    | intel_idle VM table     |
| (common)   | (straight "hlt")         |                         |
+------------+--------------------------+-------------------------+
| mwait      | intel_idle mwait table   | intel_idle mwait table  |
+------------+--------------------------+-------------------------+
| ACPI       | ACPI C1 state ("hlt")    | intel_idle VM table     |
+------------+--------------------------+-------------------------+

This is only applicable to CPUs known by intel_idle. For the bare metal
case, unknown CPU models will use the ACPI tables (when available) to
get estimates for latency and break even point for longer idle states.
In guests, the common case is that ACPI tables are not available, but
even when they are available, they can't and don't provide the latency
information for the longer (mwait based) states. For this scenario
(unknown CPU model), the default_idle mode (no ACPI) or ACPI C1 (ACPI
avaible) will be used.

By providing capability to do this with the intel_idle driver, we can
do better than the fallback or ACPI table methods. While this current
change only gets us to the existing behavior, later patches in this
series will add new capabilities such as optimized TLB flushing.

In order to do this, a simplified version of the initialization
function for VM guests is created, and this will be called if the CPU
is recognized, but mwait is not supported, and we're in a VM guest.

One thing to note is that the max latency (and break even) of this C1
state is higher than the typical bare metal C1 state. Because hlt causes
a vmexit, and the cost of vmexit + hypervisor overhead + vmenter is
typically in the order of upto 5 microseconds... even if the hypervisor
does not actually goes into a hardware power saving state.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
[ rjw: Dropped redundant checks from should_verify_mwait() ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-16 19:12:36 +02:00
Arjan van de Ven
7826c069c8 intel_idle: clean up the (new) state_update_enter_method function
Now that the logic for state_update_enter_method() is in its own
function, the long if .. else if .. else if .. else if chain
can be simplified by just returning from the function
at the various places. This does not change functionality,
but it makes the logic much simpler to read or modify later.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-12 20:07:22 +02:00
Arjan van de Ven
4622ba923e intel_idle: refactor state->enter manipulation into its own function
Since the 6.4 kernel, the logic for updating a state's enter method
based on "environmental conditions" (command line options, cpu sidechannel
workarounds etc etc) has gotten pretty complex.
This patch refactors this into a seperate small, self contained function
(no behavior changes) for improved readability and to make future
changes to this logic easier to do and understand.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-12 20:07:22 +02:00
Artem Bityutskiy
bd4468295e intel_idle: mark few variables as __read_mostly
The intention is to clean up the code and make it look a bit more
consistent.

Mark all unitialized module parameter variables as __read_mostly,
not just one of them. The other parameters are read-mostly too.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:37:36 +02:00
Artem Bityutskiy
4152379a70 intel_idle: do not sprinkle module parameter definitions around
This is a cleanup which improves code consistency. Move the force_irq_on
module parameter variable and definition to the same place where we have
variables and definitions for other module parameters.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:37:36 +02:00
Artem Bityutskiy
db1ae0c999 intel_idle: fix confusing message
By default, all non-POLL C-states are entered with interrupts disabled.

There are 2 ways to make 'intel_idle' enter C-states with interrupts
enabled:
  1. Mark the C-state with the CPUIDLE_FLAG_IRQ_ENABLE flag.
  2. Use the force_irq_on module parameter.

The former is the "proper" way of doing it, it is per-C-state and
per-platform. The latter is for debugging purposes only.

The problem is that intel_idle prints the "forced intel_idle_irq"
message in both cases, even though the former case does not needed
this message, because nothing is forced there. This patch addresses the
problem.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:37:36 +02:00
Artem Bityutskiy
00433eae17 intel_idle: improve C-state flags handling robustness
The following C-state flags are currently mutually-exclusive and should not
be combined:
  * IRQ_ENABLE
  * IBRS
  * XSTATE

There is a warning for the situation when the IRQ_ENABLE flag
is combined with the IBRS flag, but no warnings for other combinations.
This is inconsistent and prone to errors.

Improve the situation by adding warnings for all the unexpected
combinations. Add a couple of helpful commentaries too.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:37:36 +02:00
Artem Bityutskiy
1abffbd827 intel_idle: further intel_idle_init_cstates_icpu() cleanup
Introduce a temporary 'state' variable for referencing the currently
processed C-state in the intel_idle_init_cstates_icpu() function.

This makes code lines shorter and easier to read.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:37:36 +02:00
Artem Bityutskiy
a78032e94b intel_idle: clean up intel_idle_init_cstates_icpu()
The intel_idle_init_cstates_icpu() function includes a loop that iterates
over every C-state. Inside the loop, the same C-state data is referenced 2
ways:
 1. as cpuidle_state_table[cstate]
 2. as drv->states[drv->state_count] (but it is a copy of #1, not the same
    object).

Make the code be more consistent and easier to read by using only the 2nd
way. So the code structure would be as follows:

 1. Use cpuidle_state_table[cstate]
 2. Copy cpuidle_state_table[cstate] to drv->states[drv->state_count]
 3. Use only drv->states[drv->state_count] from this point.

Note, this change introduces a checkpatch.pl warning (too long line), but it
will be addressed in the next patch.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:37:36 +02:00
Artem Bityutskiy
91048ce422 intel_idle: use pr_info() instead of printk()
Substitute 'printk()' with 'pr_info()', because 'intel_idle' already uses
'pr_debug()', so using 'pr_info()' will be more consistent.

In addition to this, this patch addresses  the following checkpatch.pl
warning:
  WARNING: printk() should include KERN_<LEVEL> facility level

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:37:36 +02:00
Linus Torvalds
2504ba8b01 Merge tag 'pm-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
 "These add EPP support to the AMD P-state cpufreq driver, add support
  for new platforms to the Intel RAPL power capping driver, intel_idle
  and the Qualcomm cpufreq driver, enable thermal cooling for Tegra194,
  drop the custom cpufreq driver for loongson1 that is not necessary any
  more (and the corresponding cpufreq platform device), fix assorted
  issues and clean up code.

  Specifics:

   - Add EPP support to the AMD P-state cpufreq driver (Perry Yuan, Wyes
     Karny, Arnd Bergmann, Bagas Sanjaya)

   - Drop the custom cpufreq driver for loongson1 that is not necessary
     any more and the corresponding cpufreq platform device (Keguang
     Zhang)

   - Remove "select SRCU" from system sleep, cpufreq and OPP Kconfig
     entries (Paul E. McKenney)

   - Enable thermal cooling for Tegra194 (Yi-Wei Wang)

   - Register module device table and add missing compatibles for
     cpufreq-qcom-hw (Nícolas F. R. A. Prado, Abel Vesa and Luca Weiss)

   - Various dt binding updates for qcom-cpufreq-nvmem and
     opp-v2-kryo-cpu (Christian Marangi)

   - Make kobj_type structure in the cpufreq core constant (Thomas
     Weißschuh)

   - Make cpufreq_unregister_driver() return void (Uwe Kleine-König)

   - Make the TEO cpuidle governor check CPU utilization in order to
     refine idle state selection (Kajetan Puchalski)

   - Make Kconfig select the haltpoll cpuidle governor when the haltpoll
     cpuidle driver is selected and replace a default_idle() call in
     that driver with arch_cpu_idle() to allow MWAIT to be used (Li
     RongQing)

   - Add Emerald Rapids Xeon support to the intel_idle driver (Artem
     Bityutskiy)

   - Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to
     avoid randconfig build failures (Arnd Bergmann)

   - Make kobj_type structures used in the cpuidle sysfs interface
     constant (Thomas Weißschuh)

   - Make the cpuidle driver registration code update microsecond values
     of idle state parameters in accordance with their nanosecond values
     if they are provided (Rafael Wysocki)

   - Make the PSCI cpuidle driver prevent topology CPUs from being
     suspended on PREEMPT_RT (Krzysztof Kozlowski)

   - Document that pm_runtime_force_suspend() cannot be used with
     DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald)

   - Add EXPORT macros for exporting PM functions from drivers (Richard
     Fitzgerald)

   - Remove /** from non-kernel-doc comments in hibernation code (Randy
     Dunlap)

   - Fix possible name leak in powercap_register_zone() (Yang Yingliang)

   - Add Meteor Lake and Emerald Rapids support to the intel_rapl power
     capping driver (Zhang Rui)

   - Modify the idle_inject power capping facility to support 100% idle
     injection (Srinivas Pandruvada)

   - Fix large time windows handling in the intel_rapl power capping
     driver (Zhang Rui)

   - Fix memory leaks with using debugfs_lookup() in the generic PM
     domains and Energy Model code (Greg Kroah-Hartman)

   - Add missing 'cache-unified' property in the example for kryo OPP
     bindings (Rob Herring)

   - Fix error checking in opp_migrate_dentry() (Qi Zheng)

   - Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad
     Dybcio)

   - Modify some power management utilities to use the canonical ftrace
     path (Ross Zwisler)

   - Correct spelling problems for Documentation/power/ as reported by
     codespell (Randy Dunlap)"

* tag 'pm-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (53 commits)
  Documentation: amd-pstate: disambiguate user space sections
  cpufreq: amd-pstate: Fix invalid write to MSR_AMD_CPPC_REQ
  dt-bindings: opp: opp-v2-kryo-cpu: enlarge opp-supported-hw maximum
  dt-bindings: cpufreq: qcom-cpufreq-nvmem: make cpr bindings optional
  dt-bindings: cpufreq: qcom-cpufreq-nvmem: specify supported opp tables
  PM: Add EXPORT macros for exporting PM functions
  cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT
  MIPS: loongson32: Drop obsolete cpufreq platform device
  powercap: intel_rapl: Fix handling for large time window
  cpuidle: driver: Update microsecond values of state parameters as needed
  cpuidle: sysfs: make kobj_type structures constant
  cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies
  PM: EM: fix memory leak with using debugfs_lookup()
  PM: domains: fix memory leak with using debugfs_lookup()
  cpufreq: Make kobj_type structure constant
  cpufreq: davinci: Fix clk use after free
  cpufreq: amd-pstate: avoid uninitialized variable use
  cpufreq: Make cpufreq_unregister_driver() return void
  OPP: fix error checking in opp_migrate_dentry()
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM8550 compatible
  ...
2023-02-21 12:13:58 -08:00
Artem Bityutskiy
74528edfbc intel_idle: add Emerald Rapids Xeon support
Emerald Rapids (EMR) is the next Intel Xeon processor after Sapphire
Rapids (SPR).

EMR C-states are the same as SPR C-states, and we expect that EMR
C-state characteristics (latency and target residency) will be the
same as in SPR. Therefore, add EMR support by using SPR C-states table.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-20 18:39:42 +01:00
Peter Zijlstra
365bd03ff6 intel_idle: Add force_irq_on module param
For testing purposes.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195541.967699392@infradead.org
2023-01-13 11:48:17 +01:00
Peter Zijlstra
9b461a6faa cpuidle, intel_idle: Fix CPUIDLE_FLAG_IBRS
objtool to the rescue:

  vmlinux.o: warning: objtool: intel_idle_ibrs+0x17: call to spec_ctrl_current() leaves .noinstr.text section
  vmlinux.o: warning: objtool: intel_idle_ibrs+0x27: call to wrmsrl.constprop.0() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.556912863@infradead.org
2023-01-13 11:48:15 +01:00
Peter Zijlstra
6d9c7f51b1 cpuidle, intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE *again*
So objtool found this bug:

  vmlinux.o: warning: objtool: intel_idle_irq+0x10c: call to trace_hardirqs_off() leaves .noinstr.text section

As per commit 32d4fd5751 ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE"):

  "must not have tracing in idle functions"

Clearly people can't read and tinker along until splat dissapears.
This straight up reverts commit d295ad34f2 ("intel_idle: Fix false
positive RCU splats due to incorrect hardirqs state").

It doesn't re-introduce the problem because preceding patches fixed it
properly.

Fixes: d295ad34f2 ("intel_idle: Fix false positive RCU splats due to incorrect hardirqs state")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.434302128@infradead.org
2023-01-13 11:48:15 +01:00
Zhang Rui
65c0c2367e intel_idle: Add AlderLake-N support
Similar to the other other AlderLake platforms, the C1 and C1E states on
ADL-N are mutually exclusive. Only one of them can be enabled at a time.

C1E is preferred on ADL-N for better energy efficiency.

C6S is also supported on this platform. Its latency is far bigger than
C6, but really close to C8 (PC8), thus it is not exposed as a separate
state.

Suggested-by: Baieswara Reddy Sagili <baieswara.reddy.sagili@intel.com>
Suggested-by: Vinay Kumar <vinay.kumar@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-21 20:33:49 +02:00