mirror of
https://github.com/Dasharo/linux.git
synced 2026-03-06 15:25:10 -08:00
docs: power: convert docs to ReST and rename to *.rst
Convert the PM documents to ReST, in order to allow them to build with Sphinx. The conversion is actually: - add blank lines and indentation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
This commit is contained in:
committed by
Bjorn Helgaas
parent
9595aee2a3
commit
151f4e2bdc
@@ -5,7 +5,7 @@ Contact: linux-pm@vger.kernel.org
|
||||
Description:
|
||||
The powercap/ class sub directory belongs to the power cap
|
||||
subsystem. Refer to
|
||||
Documentation/power/powercap/powercap.txt for details.
|
||||
Documentation/power/powercap/powercap.rst for details.
|
||||
|
||||
What: /sys/class/powercap/<control type>
|
||||
Date: September 2013
|
||||
|
||||
@@ -13,7 +13,7 @@
|
||||
For ARM64, ONLY "acpi=off", "acpi=on" or "acpi=force"
|
||||
are available
|
||||
|
||||
See also Documentation/power/runtime_pm.txt, pci=noacpi
|
||||
See also Documentation/power/runtime_pm.rst, pci=noacpi
|
||||
|
||||
acpi_apic_instance= [ACPI, IOAPIC]
|
||||
Format: <int>
|
||||
@@ -223,7 +223,7 @@
|
||||
acpi_sleep= [HW,ACPI] Sleep options
|
||||
Format: { s3_bios, s3_mode, s3_beep, s4_nohwsig,
|
||||
old_ordering, nonvs, sci_force_enable, nobl }
|
||||
See Documentation/power/video.txt for information on
|
||||
See Documentation/power/video.rst for information on
|
||||
s3_bios and s3_mode.
|
||||
s3_beep is for debugging; it makes the PC's speaker beep
|
||||
as soon as the kernel's real-mode entry point is called.
|
||||
@@ -4108,7 +4108,7 @@
|
||||
Specify the offset from the beginning of the partition
|
||||
given by "resume=" at which the swap header is located,
|
||||
in <PAGE_SIZE> units (needed only for swap files).
|
||||
See Documentation/power/swsusp-and-swap-files.txt
|
||||
See Documentation/power/swsusp-and-swap-files.rst
|
||||
|
||||
resumedelay= [HIBERNATION] Delay (in seconds) to pause before attempting to
|
||||
read the resume files
|
||||
|
||||
@@ -95,7 +95,7 @@ flags - flags of the cpufreq driver
|
||||
|
||||
3. CPUFreq Table Generation with Operating Performance Point (OPP)
|
||||
==================================================================
|
||||
For details about OPP, see Documentation/power/opp.txt
|
||||
For details about OPP, see Documentation/power/opp.rst
|
||||
|
||||
dev_pm_opp_init_cpufreq_table -
|
||||
This function provides a ready to use conversion routine to translate
|
||||
|
||||
@@ -225,7 +225,7 @@ system-wide transition to a sleep state even though its :c:member:`runtime_auto`
|
||||
flag is clear.
|
||||
|
||||
For more information about the runtime power management framework, refer to
|
||||
:file:`Documentation/power/runtime_pm.txt`.
|
||||
:file:`Documentation/power/runtime_pm.rst`.
|
||||
|
||||
|
||||
Calling Drivers to Enter and Leave System Sleep States
|
||||
@@ -728,7 +728,7 @@ it into account in any way.
|
||||
|
||||
Devices may be defined as IRQ-safe which indicates to the PM core that their
|
||||
runtime PM callbacks may be invoked with disabled interrupts (see
|
||||
:file:`Documentation/power/runtime_pm.txt` for more information). If an
|
||||
:file:`Documentation/power/runtime_pm.rst` for more information). If an
|
||||
IRQ-safe device belongs to a PM domain, the runtime PM of the domain will be
|
||||
disallowed, unless the domain itself is defined as IRQ-safe. However, it
|
||||
makes sense to define a PM domain as IRQ-safe only if all the devices in it
|
||||
@@ -795,7 +795,7 @@ so on) and the final state of the device must reflect the "active" runtime PM
|
||||
status in that case.
|
||||
|
||||
During system-wide resume from a sleep state it's easiest to put devices into
|
||||
the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
|
||||
the full-power state, as explained in :file:`Documentation/power/runtime_pm.rst`.
|
||||
[Refer to that document for more information regarding this particular issue as
|
||||
well as for information on the device runtime power management framework in
|
||||
general.]
|
||||
|
||||
@@ -46,7 +46,7 @@ device is turned off while the system as a whole remains running, we
|
||||
call it a "dynamic suspend" (also known as a "runtime suspend" or
|
||||
"selective suspend"). This document concentrates mostly on how
|
||||
dynamic PM is implemented in the USB subsystem, although system PM is
|
||||
covered to some extent (see ``Documentation/power/*.txt`` for more
|
||||
covered to some extent (see ``Documentation/power/*.rst`` for more
|
||||
information about system PM).
|
||||
|
||||
System PM support is present only if the kernel was built with
|
||||
|
||||
@@ -1,5 +1,7 @@
|
||||
============
|
||||
APM or ACPI?
|
||||
------------
|
||||
============
|
||||
|
||||
If you have a relatively recent x86 mobile, desktop, or server system,
|
||||
odds are it supports either Advanced Power Management (APM) or
|
||||
Advanced Configuration and Power Interface (ACPI). ACPI is the newer
|
||||
@@ -28,5 +30,7 @@ and be sure that they are started sometime in the system boot process.
|
||||
Go ahead and start both. If ACPI or APM is not available on your
|
||||
system the associated daemon will exit gracefully.
|
||||
|
||||
apmd: http://ftp.debian.org/pool/main/a/apmd/
|
||||
acpid: http://acpid.sf.net/
|
||||
===== =======================================
|
||||
apmd http://ftp.debian.org/pool/main/a/apmd/
|
||||
acpid http://acpid.sf.net/
|
||||
===== =======================================
|
||||
@@ -1,12 +1,16 @@
|
||||
=================================
|
||||
Debugging hibernation and suspend
|
||||
=================================
|
||||
|
||||
(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
|
||||
|
||||
1. Testing hibernation (aka suspend to disk or STD)
|
||||
===================================================
|
||||
|
||||
To check if hibernation works, you can try to hibernate in the "reboot" mode:
|
||||
To check if hibernation works, you can try to hibernate in the "reboot" mode::
|
||||
|
||||
# echo reboot > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
# echo reboot > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
|
||||
and the system should create a hibernation image, reboot, resume and get back to
|
||||
the command prompt where you have started the transition. If that happens,
|
||||
@@ -15,20 +19,21 @@ test at least a couple of times in a row for confidence. [This is necessary,
|
||||
because some problems only show up on a second attempt at suspending and
|
||||
resuming the system.] Moreover, hibernating in the "reboot" and "shutdown"
|
||||
modes causes the PM core to skip some platform-related callbacks which on ACPI
|
||||
systems might be necessary to make hibernation work. Thus, if your machine fails
|
||||
to hibernate or resume in the "reboot" mode, you should try the "platform" mode:
|
||||
systems might be necessary to make hibernation work. Thus, if your machine
|
||||
fails to hibernate or resume in the "reboot" mode, you should try the
|
||||
"platform" mode::
|
||||
|
||||
# echo platform > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
# echo platform > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
|
||||
which is the default and recommended mode of hibernation.
|
||||
|
||||
Unfortunately, the "platform" mode of hibernation does not work on some systems
|
||||
with broken BIOSes. In such cases the "shutdown" mode of hibernation might
|
||||
work:
|
||||
work::
|
||||
|
||||
# echo shutdown > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
# echo shutdown > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
|
||||
(it is similar to the "reboot" mode, but it requires you to press the power
|
||||
button to make the system resume).
|
||||
@@ -37,6 +42,7 @@ If neither "platform" nor "shutdown" hibernation mode works, you will need to
|
||||
identify what goes wrong.
|
||||
|
||||
a) Test modes of hibernation
|
||||
----------------------------
|
||||
|
||||
To find out why hibernation fails on your system, you can use a special testing
|
||||
facility available if the kernel is compiled with CONFIG_PM_DEBUG set. Then,
|
||||
@@ -44,36 +50,38 @@ there is the file /sys/power/pm_test that can be used to make the hibernation
|
||||
core run in a test mode. There are 5 test modes available:
|
||||
|
||||
freezer
|
||||
- test the freezing of processes
|
||||
- test the freezing of processes
|
||||
|
||||
devices
|
||||
- test the freezing of processes and suspending of devices
|
||||
- test the freezing of processes and suspending of devices
|
||||
|
||||
platform
|
||||
- test the freezing of processes, suspending of devices and platform
|
||||
global control methods(*)
|
||||
- test the freezing of processes, suspending of devices and platform
|
||||
global control methods [1]_
|
||||
|
||||
processors
|
||||
- test the freezing of processes, suspending of devices, platform
|
||||
global control methods(*) and the disabling of nonboot CPUs
|
||||
- test the freezing of processes, suspending of devices, platform
|
||||
global control methods [1]_ and the disabling of nonboot CPUs
|
||||
|
||||
core
|
||||
- test the freezing of processes, suspending of devices, platform global
|
||||
control methods(*), the disabling of nonboot CPUs and suspending of
|
||||
platform/system devices
|
||||
- test the freezing of processes, suspending of devices, platform global
|
||||
control methods\ [1]_, the disabling of nonboot CPUs and suspending
|
||||
of platform/system devices
|
||||
|
||||
(*) the platform global control methods are only available on ACPI systems
|
||||
.. [1]
|
||||
|
||||
the platform global control methods are only available on ACPI systems
|
||||
and are only tested if the hibernation mode is set to "platform"
|
||||
|
||||
To use one of them it is necessary to write the corresponding string to
|
||||
/sys/power/pm_test (eg. "devices" to test the freezing of processes and
|
||||
suspending devices) and issue the standard hibernation commands. For example,
|
||||
to use the "devices" test mode along with the "platform" mode of hibernation,
|
||||
you should do the following:
|
||||
you should do the following::
|
||||
|
||||
# echo devices > /sys/power/pm_test
|
||||
# echo platform > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
# echo devices > /sys/power/pm_test
|
||||
# echo platform > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
|
||||
Then, the kernel will try to freeze processes, suspend devices, wait a few
|
||||
seconds (5 by default, but configurable by the suspend.pm_test_delay module
|
||||
@@ -108,11 +116,12 @@ If the "devices" test fails, most likely there is a driver that cannot suspend
|
||||
or resume its device (in the latter case the system may hang or become unstable
|
||||
after the test, so please take that into consideration). To find this driver,
|
||||
you can carry out a binary search according to the rules:
|
||||
|
||||
- if the test fails, unload a half of the drivers currently loaded and repeat
|
||||
(that would probably involve rebooting the system, so always note what drivers
|
||||
have been loaded before the test),
|
||||
(that would probably involve rebooting the system, so always note what drivers
|
||||
have been loaded before the test),
|
||||
- if the test succeeds, load a half of the drivers you have unloaded most
|
||||
recently and repeat.
|
||||
recently and repeat.
|
||||
|
||||
Once you have found the failing driver (there can be more than just one of
|
||||
them), you have to unload it every time before hibernation. In that case please
|
||||
@@ -146,6 +155,7 @@ indicates a serious problem that very well may be related to the hardware, but
|
||||
please report it anyway.
|
||||
|
||||
b) Testing minimal configuration
|
||||
--------------------------------
|
||||
|
||||
If all of the hibernation test modes work, you can boot the system with the
|
||||
"init=/bin/bash" command line parameter and attempt to hibernate in the
|
||||
@@ -165,14 +175,15 @@ Again, if you find the offending module(s), it(they) must be unloaded every time
|
||||
before hibernation, and please report the problem with it(them).
|
||||
|
||||
c) Using the "test_resume" hibernation option
|
||||
---------------------------------------------
|
||||
|
||||
/sys/power/disk generally tells the kernel what to do after creating a
|
||||
hibernation image. One of the available options is "test_resume" which
|
||||
causes the just created image to be used for immediate restoration. Namely,
|
||||
after doing:
|
||||
after doing::
|
||||
|
||||
# echo test_resume > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
# echo test_resume > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
|
||||
a hibernation image will be created and a resume from it will be triggered
|
||||
immediately without involving the platform firmware in any way.
|
||||
@@ -190,6 +201,7 @@ to resume may be related to the differences between the restore and image
|
||||
kernels.
|
||||
|
||||
d) Advanced debugging
|
||||
---------------------
|
||||
|
||||
In case that hibernation does not work on your system even in the minimal
|
||||
configuration and compiling more drivers as modules is not practical or some
|
||||
@@ -200,9 +212,10 @@ kernel messages using the serial console. This may provide you with some
|
||||
information about the reasons of the suspend (resume) failure. Alternatively,
|
||||
it may be possible to use a FireWire port for debugging with firescope
|
||||
(http://v3.sk/~lkundrak/firescope/). On x86 it is also possible to
|
||||
use the PM_TRACE mechanism documented in Documentation/power/s2ram.txt .
|
||||
use the PM_TRACE mechanism documented in Documentation/power/s2ram.rst .
|
||||
|
||||
2. Testing suspend to RAM (STR)
|
||||
===============================
|
||||
|
||||
To verify that the STR works, it is generally more convenient to use the s2ram
|
||||
tool available from http://suspend.sf.net and documented at
|
||||
@@ -230,7 +243,8 @@ you will have to unload them every time before an STR transition (ie. before
|
||||
you run s2ram), and please report the problems with them.
|
||||
|
||||
There is a debugfs entry which shows the suspend to RAM statistics. Here is an
|
||||
example of its output.
|
||||
example of its output::
|
||||
|
||||
# mount -t debugfs none /sys/kernel/debug
|
||||
# cat /sys/kernel/debug/suspend_stats
|
||||
success: 20
|
||||
@@ -248,6 +262,7 @@ example of its output.
|
||||
-16
|
||||
last_failed_step: suspend
|
||||
suspend
|
||||
|
||||
Field success means the success number of suspend to RAM, and field fail means
|
||||
the failure number. Others are the failure number of different steps of suspend
|
||||
to RAM. suspend_stats just lists the last 2 failed devices, error number and
|
||||
@@ -1,4 +1,7 @@
|
||||
===============
|
||||
Charger Manager
|
||||
===============
|
||||
|
||||
(C) 2011 MyungJoo Ham <myungjoo.ham@samsung.com>, GPL
|
||||
|
||||
Charger Manager provides in-kernel battery charger management that
|
||||
@@ -55,41 +58,39 @@ Charger Manager supports the following:
|
||||
notification to users with UEVENT.
|
||||
|
||||
2. Global Charger-Manager Data related with suspend_again
|
||||
========================================================
|
||||
=========================================================
|
||||
In order to setup Charger Manager with suspend-again feature
|
||||
(in-suspend monitoring), the user should provide charger_global_desc
|
||||
with setup_charger_manager(struct charger_global_desc *).
|
||||
with setup_charger_manager(`struct charger_global_desc *`).
|
||||
This charger_global_desc data for in-suspend monitoring is global
|
||||
as the name suggests. Thus, the user needs to provide only once even
|
||||
if there are multiple batteries. If there are multiple batteries, the
|
||||
multiple instances of Charger Manager share the same charger_global_desc
|
||||
and it will manage in-suspend monitoring for all instances of Charger Manager.
|
||||
|
||||
The user needs to provide all the three entries properly in order to activate
|
||||
in-suspend monitoring:
|
||||
The user needs to provide all the three entries to `struct charger_global_desc`
|
||||
properly in order to activate in-suspend monitoring:
|
||||
|
||||
struct charger_global_desc {
|
||||
|
||||
char *rtc_name;
|
||||
: The name of rtc (e.g., "rtc0") used to wakeup the system from
|
||||
`char *rtc_name;`
|
||||
The name of rtc (e.g., "rtc0") used to wakeup the system from
|
||||
suspend for Charger Manager. The alarm interrupt (AIE) of the rtc
|
||||
should be able to wake up the system from suspend. Charger Manager
|
||||
saves and restores the alarm value and use the previously-defined
|
||||
alarm if it is going to go off earlier than Charger Manager so that
|
||||
Charger Manager does not interfere with previously-defined alarms.
|
||||
|
||||
bool (*rtc_only_wakeup)(void);
|
||||
: This callback should let CM know whether
|
||||
`bool (*rtc_only_wakeup)(void);`
|
||||
This callback should let CM know whether
|
||||
the wakeup-from-suspend is caused only by the alarm of "rtc" in the
|
||||
same struct. If there is any other wakeup source triggered the
|
||||
wakeup, it should return false. If the "rtc" is the only wakeup
|
||||
reason, it should return true.
|
||||
|
||||
bool assume_timer_stops_in_suspend;
|
||||
: if true, Charger Manager assumes that
|
||||
`bool assume_timer_stops_in_suspend;`
|
||||
if true, Charger Manager assumes that
|
||||
the timer (CM uses jiffies as timer) stops during suspend. Then, CM
|
||||
assumes that the suspend-duration is same as the alarm length.
|
||||
};
|
||||
|
||||
|
||||
3. How to setup suspend_again
|
||||
=============================
|
||||
@@ -109,26 +110,28 @@ if the system was woken up by Charger Manager and the polling
|
||||
=============================================
|
||||
For each battery charged independently from other batteries (if a series of
|
||||
batteries are charged by a single charger, they are counted as one independent
|
||||
battery), an instance of Charger Manager is attached to it.
|
||||
battery), an instance of Charger Manager is attached to it. The following
|
||||
|
||||
struct charger_desc {
|
||||
struct charger_desc elements:
|
||||
|
||||
char *psy_name;
|
||||
: The power-supply-class name of the battery. Default is
|
||||
`char *psy_name;`
|
||||
The power-supply-class name of the battery. Default is
|
||||
"battery" if psy_name is NULL. Users can access the psy entries
|
||||
at "/sys/class/power_supply/[psy_name]/".
|
||||
|
||||
enum polling_modes polling_mode;
|
||||
: CM_POLL_DISABLE: do not poll this battery.
|
||||
CM_POLL_ALWAYS: always poll this battery.
|
||||
CM_POLL_EXTERNAL_POWER_ONLY: poll this battery if and only if
|
||||
an external power source is attached.
|
||||
CM_POLL_CHARGING_ONLY: poll this battery if and only if the
|
||||
battery is being charged.
|
||||
`enum polling_modes polling_mode;`
|
||||
CM_POLL_DISABLE:
|
||||
do not poll this battery.
|
||||
CM_POLL_ALWAYS:
|
||||
always poll this battery.
|
||||
CM_POLL_EXTERNAL_POWER_ONLY:
|
||||
poll this battery if and only if an external power
|
||||
source is attached.
|
||||
CM_POLL_CHARGING_ONLY:
|
||||
poll this battery if and only if the battery is being charged.
|
||||
|
||||
unsigned int fullbatt_vchkdrop_ms;
|
||||
unsigned int fullbatt_vchkdrop_uV;
|
||||
: If both have non-zero values, Charger Manager will check the
|
||||
`unsigned int fullbatt_vchkdrop_ms; / unsigned int fullbatt_vchkdrop_uV;`
|
||||
If both have non-zero values, Charger Manager will check the
|
||||
battery voltage drop fullbatt_vchkdrop_ms after the battery is fully
|
||||
charged. If the voltage drop is over fullbatt_vchkdrop_uV, Charger
|
||||
Manager will try to recharge the battery by disabling and enabling
|
||||
@@ -136,50 +139,52 @@ unsigned int fullbatt_vchkdrop_uV;
|
||||
condition) is needed to be implemented with hardware interrupts from
|
||||
fuel gauges or charger devices/chips.
|
||||
|
||||
unsigned int fullbatt_uV;
|
||||
: If specified with a non-zero value, Charger Manager assumes
|
||||
`unsigned int fullbatt_uV;`
|
||||
If specified with a non-zero value, Charger Manager assumes
|
||||
that the battery is full (capacity = 100) if the battery is not being
|
||||
charged and the battery voltage is equal to or greater than
|
||||
fullbatt_uV.
|
||||
|
||||
unsigned int polling_interval_ms;
|
||||
: Required polling interval in ms. Charger Manager will poll
|
||||
`unsigned int polling_interval_ms;`
|
||||
Required polling interval in ms. Charger Manager will poll
|
||||
this battery every polling_interval_ms or more frequently.
|
||||
|
||||
enum data_source battery_present;
|
||||
: CM_BATTERY_PRESENT: assume that the battery exists.
|
||||
CM_NO_BATTERY: assume that the battery does not exists.
|
||||
CM_FUEL_GAUGE: get battery presence information from fuel gauge.
|
||||
CM_CHARGER_STAT: get battery presence from chargers.
|
||||
`enum data_source battery_present;`
|
||||
CM_BATTERY_PRESENT:
|
||||
assume that the battery exists.
|
||||
CM_NO_BATTERY:
|
||||
assume that the battery does not exists.
|
||||
CM_FUEL_GAUGE:
|
||||
get battery presence information from fuel gauge.
|
||||
CM_CHARGER_STAT:
|
||||
get battery presence from chargers.
|
||||
|
||||
char **psy_charger_stat;
|
||||
: An array ending with NULL that has power-supply-class names of
|
||||
`char **psy_charger_stat;`
|
||||
An array ending with NULL that has power-supply-class names of
|
||||
chargers. Each power-supply-class should provide "PRESENT" (if
|
||||
battery_present is "CM_CHARGER_STAT"), "ONLINE" (shows whether an
|
||||
external power source is attached or not), and "STATUS" (shows whether
|
||||
the battery is {"FULL" or not FULL} or {"FULL", "Charging",
|
||||
"Discharging", "NotCharging"}).
|
||||
|
||||
int num_charger_regulators;
|
||||
struct regulator_bulk_data *charger_regulators;
|
||||
: Regulators representing the chargers in the form for
|
||||
`int num_charger_regulators; / struct regulator_bulk_data *charger_regulators;`
|
||||
Regulators representing the chargers in the form for
|
||||
regulator framework's bulk functions.
|
||||
|
||||
char *psy_fuel_gauge;
|
||||
: Power-supply-class name of the fuel gauge.
|
||||
`char *psy_fuel_gauge;`
|
||||
Power-supply-class name of the fuel gauge.
|
||||
|
||||
int (*temperature_out_of_range)(int *mC);
|
||||
bool measure_battery_temp;
|
||||
: This callback returns 0 if the temperature is safe for charging,
|
||||
`int (*temperature_out_of_range)(int *mC); / bool measure_battery_temp;`
|
||||
This callback returns 0 if the temperature is safe for charging,
|
||||
a positive number if it is too hot to charge, and a negative number
|
||||
if it is too cold to charge. With the variable mC, the callback returns
|
||||
the temperature in 1/1000 of centigrade.
|
||||
The source of temperature can be battery or ambient one according to
|
||||
the value of measure_battery_temp.
|
||||
};
|
||||
|
||||
|
||||
5. Notify Charger-Manager of charger events: cm_notify_event()
|
||||
=========================================================
|
||||
==============================================================
|
||||
If there is an charger event is required to notify
|
||||
Charger Manager, a charger device driver that triggers the event can call
|
||||
cm_notify_event(psy, type, msg) to notify the corresponding Charger Manager.
|
||||
@@ -1,7 +1,11 @@
|
||||
====================================================
|
||||
Testing suspend and resume support in device drivers
|
||||
====================================================
|
||||
|
||||
(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
|
||||
|
||||
1. Preparing the test system
|
||||
============================
|
||||
|
||||
Unfortunately, to effectively test the support for the system-wide suspend and
|
||||
resume transitions in a driver, it is necessary to suspend and resume a fully
|
||||
@@ -14,19 +18,20 @@ the machine's BIOS.
|
||||
Of course, for this purpose the test system has to be known to suspend and
|
||||
resume without the driver being tested. Thus, if possible, you should first
|
||||
resolve all suspend/resume-related problems in the test system before you start
|
||||
testing the new driver. Please see Documentation/power/basic-pm-debugging.txt
|
||||
testing the new driver. Please see Documentation/power/basic-pm-debugging.rst
|
||||
for more information about the debugging of suspend/resume functionality.
|
||||
|
||||
2. Testing the driver
|
||||
=====================
|
||||
|
||||
Once you have resolved the suspend/resume-related problems with your test system
|
||||
without the new driver, you are ready to test it:
|
||||
|
||||
a) Build the driver as a module, load it and try the test modes of hibernation
|
||||
(see: Documentation/power/basic-pm-debugging.txt, 1).
|
||||
(see: Documentation/power/basic-pm-debugging.rst, 1).
|
||||
|
||||
b) Load the driver and attempt to hibernate in the "reboot", "shutdown" and
|
||||
"platform" modes (see: Documentation/power/basic-pm-debugging.txt, 1).
|
||||
"platform" modes (see: Documentation/power/basic-pm-debugging.rst, 1).
|
||||
|
||||
c) Compile the driver directly into the kernel and try the test modes of
|
||||
hibernation.
|
||||
@@ -34,12 +39,12 @@ c) Compile the driver directly into the kernel and try the test modes of
|
||||
d) Attempt to hibernate with the driver compiled directly into the kernel
|
||||
in the "reboot", "shutdown" and "platform" modes.
|
||||
|
||||
e) Try the test modes of suspend (see: Documentation/power/basic-pm-debugging.txt,
|
||||
e) Try the test modes of suspend (see: Documentation/power/basic-pm-debugging.rst,
|
||||
2). [As far as the STR tests are concerned, it should not matter whether or
|
||||
not the driver is built as a module.]
|
||||
|
||||
f) Attempt to suspend to RAM using the s2ram tool with the driver loaded
|
||||
(see: Documentation/power/basic-pm-debugging.txt, 2).
|
||||
(see: Documentation/power/basic-pm-debugging.rst, 2).
|
||||
|
||||
Each of the above tests should be repeated several times and the STD tests
|
||||
should be mixed with the STR tests. If any of them fails, the driver cannot be
|
||||
@@ -1,6 +1,6 @@
|
||||
====================
|
||||
Energy Model of CPUs
|
||||
====================
|
||||
====================
|
||||
Energy Model of CPUs
|
||||
====================
|
||||
|
||||
1. Overview
|
||||
-----------
|
||||
@@ -20,7 +20,7 @@ kernel, hence enabling to avoid redundant work.
|
||||
|
||||
The figure below depicts an example of drivers (Arm-specific here, but the
|
||||
approach is applicable to any architecture) providing power costs to the EM
|
||||
framework, and interested clients reading the data from it.
|
||||
framework, and interested clients reading the data from it::
|
||||
|
||||
+---------------+ +-----------------+ +---------------+
|
||||
| Thermal (IPA) | | Scheduler (EAS) | | Other |
|
||||
@@ -58,15 +58,17 @@ micro-architectures.
|
||||
2. Core APIs
|
||||
------------
|
||||
|
||||
2.1 Config options
|
||||
2.1 Config options
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
CONFIG_ENERGY_MODEL must be enabled to use the EM framework.
|
||||
|
||||
|
||||
2.2 Registration of performance domains
|
||||
2.2 Registration of performance domains
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Drivers are expected to register performance domains into the EM framework by
|
||||
calling the following API:
|
||||
calling the following API::
|
||||
|
||||
int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
|
||||
struct em_data_callback *cb);
|
||||
@@ -80,7 +82,8 @@ callback, and kernel/power/energy_model.c for further documentation on this
|
||||
API.
|
||||
|
||||
|
||||
2.3 Accessing performance domains
|
||||
2.3 Accessing performance domains
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Subsystems interested in the energy model of a CPU can retrieve it using the
|
||||
em_cpu_get() API. The energy model tables are allocated once upon creation of
|
||||
@@ -99,46 +102,46 @@ More details about the above APIs can be found in include/linux/energy_model.h.
|
||||
This section provides a simple example of a CPUFreq driver registering a
|
||||
performance domain in the Energy Model framework using the (fake) 'foo'
|
||||
protocol. The driver implements an est_power() function to be provided to the
|
||||
EM framework.
|
||||
EM framework::
|
||||
|
||||
-> drivers/cpufreq/foo_cpufreq.c
|
||||
-> drivers/cpufreq/foo_cpufreq.c
|
||||
|
||||
01 static int est_power(unsigned long *mW, unsigned long *KHz, int cpu)
|
||||
02 {
|
||||
03 long freq, power;
|
||||
04
|
||||
05 /* Use the 'foo' protocol to ceil the frequency */
|
||||
06 freq = foo_get_freq_ceil(cpu, *KHz);
|
||||
07 if (freq < 0);
|
||||
08 return freq;
|
||||
09
|
||||
10 /* Estimate the power cost for the CPU at the relevant freq. */
|
||||
11 power = foo_estimate_power(cpu, freq);
|
||||
12 if (power < 0);
|
||||
13 return power;
|
||||
14
|
||||
15 /* Return the values to the EM framework */
|
||||
16 *mW = power;
|
||||
17 *KHz = freq;
|
||||
18
|
||||
19 return 0;
|
||||
20 }
|
||||
21
|
||||
22 static int foo_cpufreq_init(struct cpufreq_policy *policy)
|
||||
23 {
|
||||
24 struct em_data_callback em_cb = EM_DATA_CB(est_power);
|
||||
25 int nr_opp, ret;
|
||||
26
|
||||
27 /* Do the actual CPUFreq init work ... */
|
||||
28 ret = do_foo_cpufreq_init(policy);
|
||||
29 if (ret)
|
||||
30 return ret;
|
||||
31
|
||||
32 /* Find the number of OPPs for this policy */
|
||||
33 nr_opp = foo_get_nr_opp(policy);
|
||||
34
|
||||
35 /* And register the new performance domain */
|
||||
36 em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
|
||||
37
|
||||
38 return 0;
|
||||
39 }
|
||||
01 static int est_power(unsigned long *mW, unsigned long *KHz, int cpu)
|
||||
02 {
|
||||
03 long freq, power;
|
||||
04
|
||||
05 /* Use the 'foo' protocol to ceil the frequency */
|
||||
06 freq = foo_get_freq_ceil(cpu, *KHz);
|
||||
07 if (freq < 0);
|
||||
08 return freq;
|
||||
09
|
||||
10 /* Estimate the power cost for the CPU at the relevant freq. */
|
||||
11 power = foo_estimate_power(cpu, freq);
|
||||
12 if (power < 0);
|
||||
13 return power;
|
||||
14
|
||||
15 /* Return the values to the EM framework */
|
||||
16 *mW = power;
|
||||
17 *KHz = freq;
|
||||
18
|
||||
19 return 0;
|
||||
20 }
|
||||
21
|
||||
22 static int foo_cpufreq_init(struct cpufreq_policy *policy)
|
||||
23 {
|
||||
24 struct em_data_callback em_cb = EM_DATA_CB(est_power);
|
||||
25 int nr_opp, ret;
|
||||
26
|
||||
27 /* Do the actual CPUFreq init work ... */
|
||||
28 ret = do_foo_cpufreq_init(policy);
|
||||
29 if (ret)
|
||||
30 return ret;
|
||||
31
|
||||
32 /* Find the number of OPPs for this policy */
|
||||
33 nr_opp = foo_get_nr_opp(policy);
|
||||
34
|
||||
35 /* And register the new performance domain */
|
||||
36 em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
|
||||
37
|
||||
38 return 0;
|
||||
39 }
|
||||
@@ -1,13 +1,18 @@
|
||||
=================
|
||||
Freezing of tasks
|
||||
(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
|
||||
=================
|
||||
|
||||
(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
|
||||
|
||||
I. What is the freezing of tasks?
|
||||
=================================
|
||||
|
||||
The freezing of tasks is a mechanism by which user space processes and some
|
||||
kernel threads are controlled during hibernation or system-wide suspend (on some
|
||||
architectures).
|
||||
|
||||
II. How does it work?
|
||||
=====================
|
||||
|
||||
There are three per-task flags used for that, PF_NOFREEZE, PF_FROZEN
|
||||
and PF_FREEZER_SKIP (the last one is auxiliary). The tasks that have
|
||||
@@ -41,7 +46,7 @@ explicitly in suitable places or use the wait_event_freezable() or
|
||||
wait_event_freezable_timeout() macros (defined in include/linux/freezer.h)
|
||||
that combine interruptible sleep with checking if the task is to be frozen and
|
||||
calling try_to_freeze(). The main loop of a freezable kernel thread may look
|
||||
like the following one:
|
||||
like the following one::
|
||||
|
||||
set_freezable();
|
||||
do {
|
||||
@@ -65,7 +70,7 @@ order to clear the PF_FROZEN flag for each frozen task. Then, the tasks that
|
||||
have been frozen leave __refrigerator() and continue running.
|
||||
|
||||
|
||||
Rationale behind the functions dealing with freezing and thawing of tasks:
|
||||
Rationale behind the functions dealing with freezing and thawing of tasks
|
||||
-------------------------------------------------------------------------
|
||||
|
||||
freeze_processes():
|
||||
@@ -86,6 +91,7 @@ thaw_processes():
|
||||
|
||||
|
||||
III. Which kernel threads are freezable?
|
||||
========================================
|
||||
|
||||
Kernel threads are not freezable by default. However, a kernel thread may clear
|
||||
PF_NOFREEZE for itself by calling set_freezable() (the resetting of PF_NOFREEZE
|
||||
@@ -93,37 +99,39 @@ directly is not allowed). From this point it is regarded as freezable
|
||||
and must call try_to_freeze() in a suitable place.
|
||||
|
||||
IV. Why do we do that?
|
||||
======================
|
||||
|
||||
Generally speaking, there is a couple of reasons to use the freezing of tasks:
|
||||
|
||||
1. The principal reason is to prevent filesystems from being damaged after
|
||||
hibernation. At the moment we have no simple means of checkpointing
|
||||
filesystems, so if there are any modifications made to filesystem data and/or
|
||||
metadata on disks, we cannot bring them back to the state from before the
|
||||
modifications. At the same time each hibernation image contains some
|
||||
filesystem-related information that must be consistent with the state of the
|
||||
on-disk data and metadata after the system memory state has been restored from
|
||||
the image (otherwise the filesystems will be damaged in a nasty way, usually
|
||||
making them almost impossible to repair). We therefore freeze tasks that might
|
||||
cause the on-disk filesystems' data and metadata to be modified after the
|
||||
hibernation image has been created and before the system is finally powered off.
|
||||
The majority of these are user space processes, but if any of the kernel threads
|
||||
may cause something like this to happen, they have to be freezable.
|
||||
hibernation. At the moment we have no simple means of checkpointing
|
||||
filesystems, so if there are any modifications made to filesystem data and/or
|
||||
metadata on disks, we cannot bring them back to the state from before the
|
||||
modifications. At the same time each hibernation image contains some
|
||||
filesystem-related information that must be consistent with the state of the
|
||||
on-disk data and metadata after the system memory state has been restored
|
||||
from the image (otherwise the filesystems will be damaged in a nasty way,
|
||||
usually making them almost impossible to repair). We therefore freeze
|
||||
tasks that might cause the on-disk filesystems' data and metadata to be
|
||||
modified after the hibernation image has been created and before the
|
||||
system is finally powered off. The majority of these are user space
|
||||
processes, but if any of the kernel threads may cause something like this
|
||||
to happen, they have to be freezable.
|
||||
|
||||
2. Next, to create the hibernation image we need to free a sufficient amount of
|
||||
memory (approximately 50% of available RAM) and we need to do that before
|
||||
devices are deactivated, because we generally need them for swapping out. Then,
|
||||
after the memory for the image has been freed, we don't want tasks to allocate
|
||||
additional memory and we prevent them from doing that by freezing them earlier.
|
||||
[Of course, this also means that device drivers should not allocate substantial
|
||||
amounts of memory from their .suspend() callbacks before hibernation, but this
|
||||
is a separate issue.]
|
||||
memory (approximately 50% of available RAM) and we need to do that before
|
||||
devices are deactivated, because we generally need them for swapping out.
|
||||
Then, after the memory for the image has been freed, we don't want tasks
|
||||
to allocate additional memory and we prevent them from doing that by
|
||||
freezing them earlier. [Of course, this also means that device drivers
|
||||
should not allocate substantial amounts of memory from their .suspend()
|
||||
callbacks before hibernation, but this is a separate issue.]
|
||||
|
||||
3. The third reason is to prevent user space processes and some kernel threads
|
||||
from interfering with the suspending and resuming of devices. A user space
|
||||
process running on a second CPU while we are suspending devices may, for
|
||||
example, be troublesome and without the freezing of tasks we would need some
|
||||
safeguards against race conditions that might occur in such a case.
|
||||
from interfering with the suspending and resuming of devices. A user space
|
||||
process running on a second CPU while we are suspending devices may, for
|
||||
example, be troublesome and without the freezing of tasks we would need some
|
||||
safeguards against race conditions that might occur in such a case.
|
||||
|
||||
Although Linus Torvalds doesn't like the freezing of tasks, he said this in one
|
||||
of the discussions on LKML (http://lkml.org/lkml/2007/4/27/608):
|
||||
@@ -132,7 +140,7 @@ of the discussions on LKML (http://lkml.org/lkml/2007/4/27/608):
|
||||
|
||||
Linus: In many ways, 'at all'.
|
||||
|
||||
I _do_ realize the IO request queue issues, and that we cannot actually do
|
||||
I **do** realize the IO request queue issues, and that we cannot actually do
|
||||
s2ram with some devices in the middle of a DMA. So we want to be able to
|
||||
avoid *that*, there's no question about that. And I suspect that stopping
|
||||
user threads and then waiting for a sync is practically one of the easier
|
||||
@@ -150,17 +158,18 @@ thawed after the driver's .resume() callback has run, so it won't be accessing
|
||||
the device while it's suspended.
|
||||
|
||||
4. Another reason for freezing tasks is to prevent user space processes from
|
||||
realizing that hibernation (or suspend) operation takes place. Ideally, user
|
||||
space processes should not notice that such a system-wide operation has occurred
|
||||
and should continue running without any problems after the restore (or resume
|
||||
from suspend). Unfortunately, in the most general case this is quite difficult
|
||||
to achieve without the freezing of tasks. Consider, for example, a process
|
||||
that depends on all CPUs being online while it's running. Since we need to
|
||||
disable nonboot CPUs during the hibernation, if this process is not frozen, it
|
||||
may notice that the number of CPUs has changed and may start to work incorrectly
|
||||
because of that.
|
||||
realizing that hibernation (or suspend) operation takes place. Ideally, user
|
||||
space processes should not notice that such a system-wide operation has
|
||||
occurred and should continue running without any problems after the restore
|
||||
(or resume from suspend). Unfortunately, in the most general case this
|
||||
is quite difficult to achieve without the freezing of tasks. Consider,
|
||||
for example, a process that depends on all CPUs being online while it's
|
||||
running. Since we need to disable nonboot CPUs during the hibernation,
|
||||
if this process is not frozen, it may notice that the number of CPUs has
|
||||
changed and may start to work incorrectly because of that.
|
||||
|
||||
V. Are there any problems related to the freezing of tasks?
|
||||
===========================================================
|
||||
|
||||
Yes, there are.
|
||||
|
||||
@@ -172,11 +181,12 @@ may be undesirable. That's why kernel threads are not freezable by default.
|
||||
|
||||
Second, there are the following two problems related to the freezing of user
|
||||
space processes:
|
||||
|
||||
1. Putting processes into an uninterruptible sleep distorts the load average.
|
||||
2. Now that we have FUSE, plus the framework for doing device drivers in
|
||||
userspace, it gets even more complicated because some userspace processes are
|
||||
now doing the sorts of things that kernel threads do
|
||||
(https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012309.html).
|
||||
userspace, it gets even more complicated because some userspace processes are
|
||||
now doing the sorts of things that kernel threads do
|
||||
(https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012309.html).
|
||||
|
||||
The problem 1. seems to be fixable, although it hasn't been fixed so far. The
|
||||
other one is more serious, but it seems that we can work around it by using
|
||||
@@ -201,6 +211,7 @@ requested early enough using the suspend notifier API described in
|
||||
Documentation/driver-api/pm/notifiers.rst.
|
||||
|
||||
VI. Are there any precautions to be taken to prevent freezing failures?
|
||||
=======================================================================
|
||||
|
||||
Yes, there are.
|
||||
|
||||
@@ -226,6 +237,8 @@ So, to summarize, use [un]lock_system_sleep() instead of directly using
|
||||
mutex_[un]lock(&system_transition_mutex). That would prevent freezing failures.
|
||||
|
||||
V. Miscellaneous
|
||||
================
|
||||
|
||||
/sys/power/pm_freeze_timeout controls how long it will cost at most to freeze
|
||||
all user space processes or all freezable kernel threads, in unit of millisecond.
|
||||
The default value is 20000, with range of unsigned integer.
|
||||
46
Documentation/power/index.rst
Normal file
46
Documentation/power/index.rst
Normal file
@@ -0,0 +1,46 @@
|
||||
:orphan:
|
||||
|
||||
================
|
||||
Power Management
|
||||
================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
apm-acpi
|
||||
basic-pm-debugging
|
||||
charger-manager
|
||||
drivers-testing
|
||||
energy-model
|
||||
freezing-of-tasks
|
||||
interface
|
||||
opp
|
||||
pci
|
||||
pm_qos_interface
|
||||
power_supply_class
|
||||
runtime_pm
|
||||
s2ram
|
||||
suspend-and-cpuhotplug
|
||||
suspend-and-interrupts
|
||||
swsusp-and-swap-files
|
||||
swsusp-dmcrypt
|
||||
swsusp
|
||||
video
|
||||
tricks
|
||||
|
||||
userland-swsusp
|
||||
|
||||
powercap/powercap
|
||||
|
||||
regulator/consumer
|
||||
regulator/design
|
||||
regulator/machine
|
||||
regulator/overview
|
||||
regulator/regulator
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
||||
@@ -1,4 +1,6 @@
|
||||
===========================================
|
||||
Power Management Interface for System Sleep
|
||||
===========================================
|
||||
|
||||
Copyright (c) 2016 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
||||
|
||||
@@ -11,10 +13,10 @@ mounted at /sys).
|
||||
|
||||
Reading from it returns a list of supported sleep states, encoded as:
|
||||
|
||||
'freeze' (Suspend-to-Idle)
|
||||
'standby' (Power-On Suspend)
|
||||
'mem' (Suspend-to-RAM)
|
||||
'disk' (Suspend-to-Disk)
|
||||
- 'freeze' (Suspend-to-Idle)
|
||||
- 'standby' (Power-On Suspend)
|
||||
- 'mem' (Suspend-to-RAM)
|
||||
- 'disk' (Suspend-to-Disk)
|
||||
|
||||
Suspend-to-Idle is always supported. Suspend-to-Disk is always supported
|
||||
too as long the kernel has been configured to support hibernation at all
|
||||
@@ -32,18 +34,18 @@ Specifically, it tells the kernel what to do after creating a hibernation image.
|
||||
|
||||
Reading from it returns a list of supported options encoded as:
|
||||
|
||||
'platform' (put the system into sleep using a platform-provided method)
|
||||
'shutdown' (shut the system down)
|
||||
'reboot' (reboot the system)
|
||||
'suspend' (trigger a Suspend-to-RAM transition)
|
||||
'test_resume' (resume-after-hibernation test mode)
|
||||
- 'platform' (put the system into sleep using a platform-provided method)
|
||||
- 'shutdown' (shut the system down)
|
||||
- 'reboot' (reboot the system)
|
||||
- 'suspend' (trigger a Suspend-to-RAM transition)
|
||||
- 'test_resume' (resume-after-hibernation test mode)
|
||||
|
||||
The currently selected option is printed in square brackets.
|
||||
|
||||
The 'platform' option is only available if the platform provides a special
|
||||
mechanism to put the system to sleep after creating a hibernation image (ACPI
|
||||
does that, for example). The 'suspend' option is available if Suspend-to-RAM
|
||||
is supported. Refer to Documentation/power/basic-pm-debugging.txt for the
|
||||
is supported. Refer to Documentation/power/basic-pm-debugging.rst for the
|
||||
description of the 'test_resume' option.
|
||||
|
||||
To select an option, write the string representing it to /sys/power/disk.
|
||||
@@ -71,7 +73,7 @@ If /sys/power/pm_trace contains '1', the fingerprint of each suspend/resume
|
||||
event point in turn will be stored in the RTC memory (overwriting the actual
|
||||
RTC information), so it will survive a system crash if one occurs right after
|
||||
storing it and it can be used later to identify the driver that caused the crash
|
||||
to happen (see Documentation/power/s2ram.txt for more information).
|
||||
to happen (see Documentation/power/s2ram.rst for more information).
|
||||
|
||||
Initially it contains '0' which may be changed to '1' by writing a string
|
||||
representing a nonzero integer into it.
|
||||
@@ -1,20 +1,23 @@
|
||||
==========================================
|
||||
Operating Performance Points (OPP) Library
|
||||
==========================================
|
||||
|
||||
(C) 2009-2010 Nishanth Menon <nm@ti.com>, Texas Instruments Incorporated
|
||||
|
||||
Contents
|
||||
--------
|
||||
1. Introduction
|
||||
2. Initial OPP List Registration
|
||||
3. OPP Search Functions
|
||||
4. OPP Availability Control Functions
|
||||
5. OPP Data Retrieval Functions
|
||||
6. Data Structures
|
||||
.. Contents
|
||||
|
||||
1. Introduction
|
||||
2. Initial OPP List Registration
|
||||
3. OPP Search Functions
|
||||
4. OPP Availability Control Functions
|
||||
5. OPP Data Retrieval Functions
|
||||
6. Data Structures
|
||||
|
||||
1. Introduction
|
||||
===============
|
||||
|
||||
1.1 What is an Operating Performance Point (OPP)?
|
||||
-------------------------------------------------
|
||||
|
||||
Complex SoCs of today consists of a multiple sub-modules working in conjunction.
|
||||
In an operational system executing varied use cases, not all modules in the SoC
|
||||
@@ -28,16 +31,19 @@ the device will support per domain are called Operating Performance Points or
|
||||
OPPs.
|
||||
|
||||
As an example:
|
||||
|
||||
Let us consider an MPU device which supports the following:
|
||||
{300MHz at minimum voltage of 1V}, {800MHz at minimum voltage of 1.2V},
|
||||
{1GHz at minimum voltage of 1.3V}
|
||||
|
||||
We can represent these as three OPPs as the following {Hz, uV} tuples:
|
||||
{300000000, 1000000}
|
||||
{800000000, 1200000}
|
||||
{1000000000, 1300000}
|
||||
|
||||
- {300000000, 1000000}
|
||||
- {800000000, 1200000}
|
||||
- {1000000000, 1300000}
|
||||
|
||||
1.2 Operating Performance Points Library
|
||||
----------------------------------------
|
||||
|
||||
OPP library provides a set of helper functions to organize and query the OPP
|
||||
information. The library is located in drivers/base/power/opp.c and the header
|
||||
@@ -46,9 +52,10 @@ CONFIG_PM_OPP from power management menuconfig menu. OPP library depends on
|
||||
CONFIG_PM as certain SoCs such as Texas Instrument's OMAP framework allows to
|
||||
optionally boot at a certain OPP without needing cpufreq.
|
||||
|
||||
Typical usage of the OPP library is as follows:
|
||||
(users) -> registers a set of default OPPs -> (library)
|
||||
SoC framework -> modifies on required cases certain OPPs -> OPP layer
|
||||
Typical usage of the OPP library is as follows::
|
||||
|
||||
(users) -> registers a set of default OPPs -> (library)
|
||||
SoC framework -> modifies on required cases certain OPPs -> OPP layer
|
||||
-> queries to search/retrieve information ->
|
||||
|
||||
OPP layer expects each domain to be represented by a unique device pointer. SoC
|
||||
@@ -57,8 +64,9 @@ list is expected to be an optimally small number typically around 5 per device.
|
||||
This initial list contains a set of OPPs that the framework expects to be safely
|
||||
enabled by default in the system.
|
||||
|
||||
Note on OPP Availability:
|
||||
------------------------
|
||||
Note on OPP Availability
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
As the system proceeds to operate, SoC framework may choose to make certain
|
||||
OPPs available or not available on each device based on various external
|
||||
factors. Example usage: Thermal management or other exceptional situations where
|
||||
@@ -88,7 +96,8 @@ registering the OPPs is maintained by OPP library throughout the device
|
||||
operation. The SoC framework can subsequently control the availability of the
|
||||
OPPs dynamically using the dev_pm_opp_enable / disable functions.
|
||||
|
||||
dev_pm_opp_add - Add a new OPP for a specific domain represented by the device pointer.
|
||||
dev_pm_opp_add
|
||||
Add a new OPP for a specific domain represented by the device pointer.
|
||||
The OPP is defined using the frequency and voltage. Once added, the OPP
|
||||
is assumed to be available and control of it's availability can be done
|
||||
with the dev_pm_opp_enable/disable functions. OPP library internally stores
|
||||
@@ -96,9 +105,11 @@ dev_pm_opp_add - Add a new OPP for a specific domain represented by the device p
|
||||
used by SoC framework to define a optimal list as per the demands of
|
||||
SoC usage environment.
|
||||
|
||||
WARNING: Do not use this function in interrupt context.
|
||||
WARNING:
|
||||
Do not use this function in interrupt context.
|
||||
|
||||
Example::
|
||||
|
||||
Example:
|
||||
soc_pm_init()
|
||||
{
|
||||
/* Do things */
|
||||
@@ -125,12 +136,15 @@ Callers of these functions shall call dev_pm_opp_put() after they have used the
|
||||
OPP. Otherwise the memory for the OPP will never get freed and result in
|
||||
memleak.
|
||||
|
||||
dev_pm_opp_find_freq_exact - Search for an OPP based on an *exact* frequency and
|
||||
dev_pm_opp_find_freq_exact
|
||||
Search for an OPP based on an *exact* frequency and
|
||||
availability. This function is especially useful to enable an OPP which
|
||||
is not available by default.
|
||||
Example: In a case when SoC framework detects a situation where a
|
||||
higher frequency could be made available, it can use this function to
|
||||
find the OPP prior to call the dev_pm_opp_enable to actually make it available.
|
||||
find the OPP prior to call the dev_pm_opp_enable to actually make
|
||||
it available::
|
||||
|
||||
opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false);
|
||||
dev_pm_opp_put(opp);
|
||||
/* dont operate on the pointer.. just do a sanity check.. */
|
||||
@@ -141,27 +155,34 @@ dev_pm_opp_find_freq_exact - Search for an OPP based on an *exact* frequency and
|
||||
dev_pm_opp_enable(dev,1000000000);
|
||||
}
|
||||
|
||||
NOTE: This is the only search function that operates on OPPs which are
|
||||
not available.
|
||||
NOTE:
|
||||
This is the only search function that operates on OPPs which are
|
||||
not available.
|
||||
|
||||
dev_pm_opp_find_freq_floor - Search for an available OPP which is *at most* the
|
||||
dev_pm_opp_find_freq_floor
|
||||
Search for an available OPP which is *at most* the
|
||||
provided frequency. This function is useful while searching for a lesser
|
||||
match OR operating on OPP information in the order of decreasing
|
||||
frequency.
|
||||
Example: To find the highest opp for a device:
|
||||
Example: To find the highest opp for a device::
|
||||
|
||||
freq = ULONG_MAX;
|
||||
opp = dev_pm_opp_find_freq_floor(dev, &freq);
|
||||
dev_pm_opp_put(opp);
|
||||
|
||||
dev_pm_opp_find_freq_ceil - Search for an available OPP which is *at least* the
|
||||
dev_pm_opp_find_freq_ceil
|
||||
Search for an available OPP which is *at least* the
|
||||
provided frequency. This function is useful while searching for a
|
||||
higher match OR operating on OPP information in the order of increasing
|
||||
frequency.
|
||||
Example 1: To find the lowest opp for a device:
|
||||
Example 1: To find the lowest opp for a device::
|
||||
|
||||
freq = 0;
|
||||
opp = dev_pm_opp_find_freq_ceil(dev, &freq);
|
||||
dev_pm_opp_put(opp);
|
||||
Example 2: A simplified implementation of a SoC cpufreq_driver->target:
|
||||
|
||||
Example 2: A simplified implementation of a SoC cpufreq_driver->target::
|
||||
|
||||
soc_cpufreq_target(..)
|
||||
{
|
||||
/* Do stuff like policy checks etc. */
|
||||
@@ -184,12 +205,15 @@ fine grained dynamic control of which sets of OPPs are operationally available.
|
||||
These functions are intended to *temporarily* remove an OPP in conditions such
|
||||
as thermal considerations (e.g. don't use OPPx until the temperature drops).
|
||||
|
||||
WARNING: Do not use these functions in interrupt context.
|
||||
WARNING:
|
||||
Do not use these functions in interrupt context.
|
||||
|
||||
dev_pm_opp_enable - Make a OPP available for operation.
|
||||
dev_pm_opp_enable
|
||||
Make a OPP available for operation.
|
||||
Example: Lets say that 1GHz OPP is to be made available only if the
|
||||
SoC temperature is lower than a certain threshold. The SoC framework
|
||||
implementation might choose to do something as follows:
|
||||
implementation might choose to do something as follows::
|
||||
|
||||
if (cur_temp < temp_low_thresh) {
|
||||
/* Enable 1GHz if it was disabled */
|
||||
opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false);
|
||||
@@ -201,10 +225,12 @@ dev_pm_opp_enable - Make a OPP available for operation.
|
||||
goto try_something_else;
|
||||
}
|
||||
|
||||
dev_pm_opp_disable - Make an OPP to be not available for operation
|
||||
dev_pm_opp_disable
|
||||
Make an OPP to be not available for operation
|
||||
Example: Lets say that 1GHz OPP is to be disabled if the temperature
|
||||
exceeds a threshold value. The SoC framework implementation might
|
||||
choose to do something as follows:
|
||||
choose to do something as follows::
|
||||
|
||||
if (cur_temp > temp_high_thresh) {
|
||||
/* Disable 1GHz if it was enabled */
|
||||
opp = dev_pm_opp_find_freq_exact(dev, 1000000000, true);
|
||||
@@ -223,11 +249,13 @@ information from the OPP structure is necessary. Once an OPP pointer is
|
||||
retrieved using the search functions, the following functions can be used by SoC
|
||||
framework to retrieve the information represented inside the OPP layer.
|
||||
|
||||
dev_pm_opp_get_voltage - Retrieve the voltage represented by the opp pointer.
|
||||
dev_pm_opp_get_voltage
|
||||
Retrieve the voltage represented by the opp pointer.
|
||||
Example: At a cpufreq transition to a different frequency, SoC
|
||||
framework requires to set the voltage represented by the OPP using
|
||||
the regulator framework to the Power Management chip providing the
|
||||
voltage.
|
||||
voltage::
|
||||
|
||||
soc_switch_to_freq_voltage(freq)
|
||||
{
|
||||
/* do things */
|
||||
@@ -239,10 +267,12 @@ dev_pm_opp_get_voltage - Retrieve the voltage represented by the opp pointer.
|
||||
/* do other things */
|
||||
}
|
||||
|
||||
dev_pm_opp_get_freq - Retrieve the freq represented by the opp pointer.
|
||||
dev_pm_opp_get_freq
|
||||
Retrieve the freq represented by the opp pointer.
|
||||
Example: Lets say the SoC framework uses a couple of helper functions
|
||||
we could pass opp pointers instead of doing additional parameters to
|
||||
handle quiet a bit of data parameters.
|
||||
handle quiet a bit of data parameters::
|
||||
|
||||
soc_cpufreq_target(..)
|
||||
{
|
||||
/* do things.. */
|
||||
@@ -264,9 +294,11 @@ dev_pm_opp_get_freq - Retrieve the freq represented by the opp pointer.
|
||||
/* do things.. */
|
||||
}
|
||||
|
||||
dev_pm_opp_get_opp_count - Retrieve the number of available opps for a device
|
||||
dev_pm_opp_get_opp_count
|
||||
Retrieve the number of available opps for a device
|
||||
Example: Lets say a co-processor in the SoC needs to know the available
|
||||
frequencies in a table, the main processor can notify as following:
|
||||
frequencies in a table, the main processor can notify as following::
|
||||
|
||||
soc_notify_coproc_available_frequencies()
|
||||
{
|
||||
/* Do things */
|
||||
@@ -289,54 +321,59 @@ dev_pm_opp_get_opp_count - Retrieve the number of available opps for a device
|
||||
==================
|
||||
Typically an SoC contains multiple voltage domains which are variable. Each
|
||||
domain is represented by a device pointer. The relationship to OPP can be
|
||||
represented as follows:
|
||||
SoC
|
||||
|- device 1
|
||||
| |- opp 1 (availability, freq, voltage)
|
||||
| |- opp 2 ..
|
||||
... ...
|
||||
| `- opp n ..
|
||||
|- device 2
|
||||
...
|
||||
`- device m
|
||||
represented as follows::
|
||||
|
||||
SoC
|
||||
|- device 1
|
||||
| |- opp 1 (availability, freq, voltage)
|
||||
| |- opp 2 ..
|
||||
... ...
|
||||
| `- opp n ..
|
||||
|- device 2
|
||||
...
|
||||
`- device m
|
||||
|
||||
OPP library maintains a internal list that the SoC framework populates and
|
||||
accessed by various functions as described above. However, the structures
|
||||
representing the actual OPPs and domains are internal to the OPP library itself
|
||||
to allow for suitable abstraction reusable across systems.
|
||||
|
||||
struct dev_pm_opp - The internal data structure of OPP library which is used to
|
||||
struct dev_pm_opp
|
||||
The internal data structure of OPP library which is used to
|
||||
represent an OPP. In addition to the freq, voltage, availability
|
||||
information, it also contains internal book keeping information required
|
||||
for the OPP library to operate on. Pointer to this structure is
|
||||
provided back to the users such as SoC framework to be used as a
|
||||
identifier for OPP in the interactions with OPP layer.
|
||||
|
||||
WARNING: The struct dev_pm_opp pointer should not be parsed or modified by the
|
||||
users. The defaults of for an instance is populated by dev_pm_opp_add, but the
|
||||
availability of the OPP can be modified by dev_pm_opp_enable/disable functions.
|
||||
WARNING:
|
||||
The struct dev_pm_opp pointer should not be parsed or modified by the
|
||||
users. The defaults of for an instance is populated by
|
||||
dev_pm_opp_add, but the availability of the OPP can be modified
|
||||
by dev_pm_opp_enable/disable functions.
|
||||
|
||||
struct device - This is used to identify a domain to the OPP layer. The
|
||||
struct device
|
||||
This is used to identify a domain to the OPP layer. The
|
||||
nature of the device and it's implementation is left to the user of
|
||||
OPP library such as the SoC framework.
|
||||
|
||||
Overall, in a simplistic view, the data structure operations is represented as
|
||||
following:
|
||||
following::
|
||||
|
||||
Initialization / modification:
|
||||
+-----+ /- dev_pm_opp_enable
|
||||
dev_pm_opp_add --> | opp | <-------
|
||||
| +-----+ \- dev_pm_opp_disable
|
||||
\-------> domain_info(device)
|
||||
Initialization / modification:
|
||||
+-----+ /- dev_pm_opp_enable
|
||||
dev_pm_opp_add --> | opp | <-------
|
||||
| +-----+ \- dev_pm_opp_disable
|
||||
\-------> domain_info(device)
|
||||
|
||||
Search functions:
|
||||
/-- dev_pm_opp_find_freq_ceil ---\ +-----+
|
||||
domain_info<---- dev_pm_opp_find_freq_exact -----> | opp |
|
||||
\-- dev_pm_opp_find_freq_floor ---/ +-----+
|
||||
Search functions:
|
||||
/-- dev_pm_opp_find_freq_ceil ---\ +-----+
|
||||
domain_info<---- dev_pm_opp_find_freq_exact -----> | opp |
|
||||
\-- dev_pm_opp_find_freq_floor ---/ +-----+
|
||||
|
||||
Retrieval functions:
|
||||
+-----+ /- dev_pm_opp_get_voltage
|
||||
| opp | <---
|
||||
+-----+ \- dev_pm_opp_get_freq
|
||||
Retrieval functions:
|
||||
+-----+ /- dev_pm_opp_get_voltage
|
||||
| opp | <---
|
||||
+-----+ \- dev_pm_opp_get_freq
|
||||
|
||||
domain_info <- dev_pm_opp_get_opp_count
|
||||
domain_info <- dev_pm_opp_get_opp_count
|
||||
@@ -1,4 +1,6 @@
|
||||
====================
|
||||
PCI Power Management
|
||||
====================
|
||||
|
||||
Copyright (c) 2010 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
|
||||
|
||||
@@ -9,14 +11,14 @@ management. Based on previous work by Patrick Mochel <mochel@transmeta.com>
|
||||
This document only covers the aspects of power management specific to PCI
|
||||
devices. For general description of the kernel's interfaces related to device
|
||||
power management refer to Documentation/driver-api/pm/devices.rst and
|
||||
Documentation/power/runtime_pm.txt.
|
||||
Documentation/power/runtime_pm.rst.
|
||||
|
||||
---------------------------------------------------------------------------
|
||||
.. contents:
|
||||
|
||||
1. Hardware and Platform Support for PCI Power Management
|
||||
2. PCI Subsystem and Device Power Management
|
||||
3. PCI Device Drivers and Power Management
|
||||
4. Resources
|
||||
1. Hardware and Platform Support for PCI Power Management
|
||||
2. PCI Subsystem and Device Power Management
|
||||
3. PCI Device Drivers and Power Management
|
||||
4. Resources
|
||||
|
||||
|
||||
1. Hardware and Platform Support for PCI Power Management
|
||||
@@ -24,6 +26,7 @@ Documentation/power/runtime_pm.txt.
|
||||
|
||||
1.1. Native and Platform-Based Power Management
|
||||
-----------------------------------------------
|
||||
|
||||
In general, power management is a feature allowing one to save energy by putting
|
||||
devices into states in which they draw less power (low-power states) at the
|
||||
price of reduced functionality or performance.
|
||||
@@ -67,6 +70,7 @@ mechanisms have to be used simultaneously to obtain the desired result.
|
||||
|
||||
1.2. Native PCI Power Management
|
||||
--------------------------------
|
||||
|
||||
The PCI Bus Power Management Interface Specification (PCI PM Spec) was
|
||||
introduced between the PCI 2.1 and PCI 2.2 Specifications. It defined a
|
||||
standard interface for performing various operations related to power
|
||||
@@ -134,6 +138,7 @@ sufficiently active to generate a wakeup signal.
|
||||
|
||||
1.3. ACPI Device Power Management
|
||||
---------------------------------
|
||||
|
||||
The platform firmware support for the power management of PCI devices is
|
||||
system-specific. However, if the system in question is compliant with the
|
||||
Advanced Configuration and Power Interface (ACPI) Specification, like the
|
||||
@@ -194,6 +199,7 @@ enabled for the device to be able to generate wakeup signals.
|
||||
|
||||
1.4. Wakeup Signaling
|
||||
---------------------
|
||||
|
||||
Wakeup signals generated by PCI devices, either as native PCI PMEs, or as
|
||||
a result of the execution of the _DSW (or _PSW) ACPI control method before
|
||||
putting the device into a low-power state, have to be caught and handled as
|
||||
@@ -265,14 +271,15 @@ the native PCI Express PME signaling cannot be used by the kernel in that case.
|
||||
|
||||
2.1. Device Power Management Callbacks
|
||||
--------------------------------------
|
||||
|
||||
The PCI Subsystem participates in the power management of PCI devices in a
|
||||
number of ways. First of all, it provides an intermediate code layer between
|
||||
the device power management core (PM core) and PCI device drivers.
|
||||
Specifically, the pm field of the PCI subsystem's struct bus_type object,
|
||||
pci_bus_type, points to a struct dev_pm_ops object, pci_dev_pm_ops, containing
|
||||
pointers to several device power management callbacks:
|
||||
pointers to several device power management callbacks::
|
||||
|
||||
const struct dev_pm_ops pci_dev_pm_ops = {
|
||||
const struct dev_pm_ops pci_dev_pm_ops = {
|
||||
.prepare = pci_pm_prepare,
|
||||
.complete = pci_pm_complete,
|
||||
.suspend = pci_pm_suspend,
|
||||
@@ -290,7 +297,7 @@ const struct dev_pm_ops pci_dev_pm_ops = {
|
||||
.runtime_suspend = pci_pm_runtime_suspend,
|
||||
.runtime_resume = pci_pm_runtime_resume,
|
||||
.runtime_idle = pci_pm_runtime_idle,
|
||||
};
|
||||
};
|
||||
|
||||
These callbacks are executed by the PM core in various situations related to
|
||||
device power management and they, in turn, execute power management callbacks
|
||||
@@ -299,9 +306,9 @@ involving some standard configuration registers of PCI devices that device
|
||||
drivers need not know or care about.
|
||||
|
||||
The structure representing a PCI device, struct pci_dev, contains several fields
|
||||
that these callbacks operate on:
|
||||
that these callbacks operate on::
|
||||
|
||||
struct pci_dev {
|
||||
struct pci_dev {
|
||||
...
|
||||
pci_power_t current_state; /* Current operating state. */
|
||||
int pm_cap; /* PM capability offset in the
|
||||
@@ -315,13 +322,14 @@ struct pci_dev {
|
||||
unsigned int wakeup_prepared:1; /* Device prepared for wake up */
|
||||
unsigned int d3_delay; /* D3->D0 transition time in ms */
|
||||
...
|
||||
};
|
||||
};
|
||||
|
||||
They also indirectly use some fields of the struct device that is embedded in
|
||||
struct pci_dev.
|
||||
|
||||
2.2. Device Initialization
|
||||
--------------------------
|
||||
|
||||
The PCI subsystem's first task related to device power management is to
|
||||
prepare the device for power management and initialize the fields of struct
|
||||
pci_dev used for this purpose. This happens in two functions defined in
|
||||
@@ -348,10 +356,11 @@ during system-wide transitions to a sleep state and back to the working state.
|
||||
|
||||
2.3. Runtime Device Power Management
|
||||
------------------------------------
|
||||
|
||||
The PCI subsystem plays a vital role in the runtime power management of PCI
|
||||
devices. For this purpose it uses the general runtime power management
|
||||
(runtime PM) framework described in Documentation/power/runtime_pm.txt.
|
||||
Namely, it provides subsystem-level callbacks:
|
||||
(runtime PM) framework described in Documentation/power/runtime_pm.rst.
|
||||
Namely, it provides subsystem-level callbacks::
|
||||
|
||||
pci_pm_runtime_suspend()
|
||||
pci_pm_runtime_resume()
|
||||
@@ -425,13 +434,14 @@ to the given subsystem before the next phase begins. These phases always run
|
||||
after tasks have been frozen.
|
||||
|
||||
2.4.1. System Suspend
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
When the system is going into a sleep state in which the contents of memory will
|
||||
be preserved, such as one of the ACPI sleep states S1-S3, the phases are:
|
||||
|
||||
prepare, suspend, suspend_noirq.
|
||||
|
||||
The following PCI bus type's callbacks, respectively, are used in these phases:
|
||||
The following PCI bus type's callbacks, respectively, are used in these phases::
|
||||
|
||||
pci_pm_prepare()
|
||||
pci_pm_suspend()
|
||||
@@ -492,6 +502,7 @@ this purpose). PCI device drivers are not encouraged to do that, but in some
|
||||
rare cases doing that in the driver may be the optimum approach.
|
||||
|
||||
2.4.2. System Resume
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
When the system is undergoing a transition from a sleep state in which the
|
||||
contents of memory have been preserved, such as one of the ACPI sleep states
|
||||
@@ -500,7 +511,7 @@ S1-S3, into the working state (ACPI S0), the phases are:
|
||||
resume_noirq, resume, complete.
|
||||
|
||||
The following PCI bus type's callbacks, respectively, are executed in these
|
||||
phases:
|
||||
phases::
|
||||
|
||||
pci_pm_resume_noirq()
|
||||
pci_pm_resume()
|
||||
@@ -539,6 +550,7 @@ The pci_pm_complete() routine only executes the device driver's pm->complete()
|
||||
callback, if defined.
|
||||
|
||||
2.4.3. System Hibernation
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
System hibernation is more complicated than system suspend, because it requires
|
||||
a system image to be created and written into a persistent storage medium. The
|
||||
@@ -551,7 +563,7 @@ to be free) in the following three phases:
|
||||
|
||||
prepare, freeze, freeze_noirq
|
||||
|
||||
that correspond to the PCI bus type's callbacks:
|
||||
that correspond to the PCI bus type's callbacks::
|
||||
|
||||
pci_pm_prepare()
|
||||
pci_pm_freeze()
|
||||
@@ -580,7 +592,7 @@ back to the fully functional state and this is done in the following phases:
|
||||
|
||||
thaw_noirq, thaw, complete
|
||||
|
||||
using the following PCI bus type's callbacks:
|
||||
using the following PCI bus type's callbacks::
|
||||
|
||||
pci_pm_thaw_noirq()
|
||||
pci_pm_thaw()
|
||||
@@ -608,7 +620,7 @@ three phases:
|
||||
|
||||
where the prepare phase is exactly the same as for system suspend. The other
|
||||
two phases are analogous to the suspend and suspend_noirq phases, respectively.
|
||||
The PCI subsystem-level callbacks they correspond to
|
||||
The PCI subsystem-level callbacks they correspond to::
|
||||
|
||||
pci_pm_poweroff()
|
||||
pci_pm_poweroff_noirq()
|
||||
@@ -618,6 +630,7 @@ although they don't attempt to save the device's standard configuration
|
||||
registers.
|
||||
|
||||
2.4.4. System Restore
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
System restore requires a hibernation image to be loaded into memory and the
|
||||
pre-hibernation memory contents to be restored before the pre-hibernation system
|
||||
@@ -653,7 +666,7 @@ phases:
|
||||
|
||||
The first two of these are analogous to the resume_noirq and resume phases
|
||||
described above, respectively, and correspond to the following PCI subsystem
|
||||
callbacks:
|
||||
callbacks::
|
||||
|
||||
pci_pm_restore_noirq()
|
||||
pci_pm_restore()
|
||||
@@ -671,6 +684,7 @@ resume.
|
||||
|
||||
3.1. Power Management Callbacks
|
||||
-------------------------------
|
||||
|
||||
PCI device drivers participate in power management by providing callbacks to be
|
||||
executed by the PCI subsystem's power management routines described above and by
|
||||
controlling the runtime power management of their devices.
|
||||
@@ -698,6 +712,7 @@ defined, though, they are expected to behave as described in the following
|
||||
subsections.
|
||||
|
||||
3.1.1. prepare()
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
The prepare() callback is executed during system suspend, during hibernation
|
||||
(when a hibernation image is about to be created), during power-off after
|
||||
@@ -716,6 +731,7 @@ preallocated earlier, for example in a suspend/hibernate notifier as described
|
||||
in Documentation/driver-api/pm/notifiers.rst).
|
||||
|
||||
3.1.2. suspend()
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
The suspend() callback is only executed during system suspend, after prepare()
|
||||
callbacks have been executed for all devices in the system.
|
||||
@@ -742,6 +758,7 @@ operations relying on the driver's ability to handle interrupts should be
|
||||
carried out in this callback.
|
||||
|
||||
3.1.3. suspend_noirq()
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The suspend_noirq() callback is only executed during system suspend, after
|
||||
suspend() callbacks have been executed for all devices in the system and
|
||||
@@ -753,6 +770,7 @@ suspend_noirq() can carry out operations that would cause race conditions to
|
||||
arise if they were performed in suspend().
|
||||
|
||||
3.1.4. freeze()
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
The freeze() callback is hibernation-specific and is executed in two situations,
|
||||
during hibernation, after prepare() callbacks have been executed for all devices
|
||||
@@ -770,6 +788,7 @@ or put it into a low-power state. Still, either it or freeze_noirq() should
|
||||
save the device's standard configuration registers using pci_save_state().
|
||||
|
||||
3.1.5. freeze_noirq()
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The freeze_noirq() callback is hibernation-specific. It is executed during
|
||||
hibernation, after prepare() and freeze() callbacks have been executed for all
|
||||
@@ -786,6 +805,7 @@ The difference between freeze_noirq() and freeze() is analogous to the
|
||||
difference between suspend_noirq() and suspend().
|
||||
|
||||
3.1.6. poweroff()
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
The poweroff() callback is hibernation-specific. It is executed when the system
|
||||
is about to be powered off after saving a hibernation image to a persistent
|
||||
@@ -802,6 +822,7 @@ into a low-power state, respectively, but it need not save the device's standard
|
||||
configuration registers.
|
||||
|
||||
3.1.7. poweroff_noirq()
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The poweroff_noirq() callback is hibernation-specific. It is executed after
|
||||
poweroff() callbacks have been executed for all devices in the system.
|
||||
@@ -814,6 +835,7 @@ The difference between poweroff_noirq() and poweroff() is analogous to the
|
||||
difference between suspend_noirq() and suspend().
|
||||
|
||||
3.1.8. resume_noirq()
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The resume_noirq() callback is only executed during system resume, after the
|
||||
PM core has enabled the non-boot CPUs. The driver's interrupt handler will not
|
||||
@@ -827,6 +849,7 @@ it should only be used for performing operations that would lead to race
|
||||
conditions if carried out by resume().
|
||||
|
||||
3.1.9. resume()
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
The resume() callback is only executed during system resume, after
|
||||
resume_noirq() callbacks have been executed for all devices in the system and
|
||||
@@ -837,6 +860,7 @@ device and bringing it back to the fully functional state. The device should be
|
||||
able to process I/O in a usual way after resume() has returned.
|
||||
|
||||
3.1.10. thaw_noirq()
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The thaw_noirq() callback is hibernation-specific. It is executed after a
|
||||
system image has been created and the non-boot CPUs have been enabled by the PM
|
||||
@@ -851,6 +875,7 @@ freeze() and freeze_noirq(), so in general it does not need to modify the
|
||||
contents of the device's registers.
|
||||
|
||||
3.1.11. thaw()
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
The thaw() callback is hibernation-specific. It is executed after thaw_noirq()
|
||||
callbacks have been executed for all devices in the system and after device
|
||||
@@ -860,6 +885,7 @@ This callback is responsible for restoring the pre-freeze configuration of
|
||||
the device, so that it will work in a usual way after thaw() has returned.
|
||||
|
||||
3.1.12. restore_noirq()
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The restore_noirq() callback is hibernation-specific. It is executed in the
|
||||
restore_noirq phase of hibernation, when the boot kernel has passed control to
|
||||
@@ -875,6 +901,7 @@ For the vast majority of PCI device drivers there is no difference between
|
||||
resume_noirq() and restore_noirq().
|
||||
|
||||
3.1.13. restore()
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
The restore() callback is hibernation-specific. It is executed after
|
||||
restore_noirq() callbacks have been executed for all devices in the system and
|
||||
@@ -888,14 +915,17 @@ For the vast majority of PCI device drivers there is no difference between
|
||||
resume() and restore().
|
||||
|
||||
3.1.14. complete()
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The complete() callback is executed in the following situations:
|
||||
|
||||
- during system resume, after resume() callbacks have been executed for all
|
||||
devices,
|
||||
- during hibernation, before saving the system image, after thaw() callbacks
|
||||
have been executed for all devices,
|
||||
- during system restore, when the system is going back to its pre-hibernation
|
||||
state, after restore() callbacks have been executed for all devices.
|
||||
|
||||
It also may be executed if the loading of a hibernation image into memory fails
|
||||
(in that case it is run after thaw() callbacks have been executed for all
|
||||
devices that have drivers in the boot kernel).
|
||||
@@ -904,6 +934,7 @@ This callback is entirely optional, although it may be necessary if the
|
||||
prepare() callback performs operations that need to be reversed.
|
||||
|
||||
3.1.15. runtime_suspend()
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The runtime_suspend() callback is specific to device runtime power management
|
||||
(runtime PM). It is executed by the PM core's runtime PM framework when the
|
||||
@@ -915,6 +946,7 @@ put into a low-power state, but it must allow the PCI subsystem to perform all
|
||||
of the PCI-specific actions necessary for suspending the device.
|
||||
|
||||
3.1.16. runtime_resume()
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The runtime_resume() callback is specific to device runtime PM. It is executed
|
||||
by the PM core's runtime PM framework when the device is about to be resumed
|
||||
@@ -927,6 +959,7 @@ The device is expected to be able to process I/O in the usual way after
|
||||
runtime_resume() has returned.
|
||||
|
||||
3.1.17. runtime_idle()
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The runtime_idle() callback is specific to device runtime PM. It is executed
|
||||
by the PM core's runtime PM framework whenever it may be desirable to suspend
|
||||
@@ -939,6 +972,7 @@ PCI subsystem will call pm_runtime_suspend() for the device, which in turn will
|
||||
cause the driver's runtime_suspend() callback to be executed.
|
||||
|
||||
3.1.18. Pointing Multiple Callback Pointers to One Routine
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Although in principle each of the callbacks described in the previous
|
||||
subsections can be defined as a separate function, it often is convenient to
|
||||
@@ -962,6 +996,7 @@ dev_pm_ops to indicate that one suspend routine is to be pointed to by the
|
||||
be pointed to by the .resume(), .thaw(), and .restore() members.
|
||||
|
||||
3.1.19. Driver Flags for Power Management
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The PM core allows device drivers to set flags that influence the handling of
|
||||
power management for the devices by the core itself and by middle layer code
|
||||
@@ -1007,6 +1042,7 @@ it.
|
||||
|
||||
3.2. Device Runtime Power Management
|
||||
------------------------------------
|
||||
|
||||
In addition to providing device power management callbacks PCI device drivers
|
||||
are responsible for controlling the runtime power management (runtime PM) of
|
||||
their devices.
|
||||
@@ -1073,22 +1109,27 @@ device the PM core automatically queues a request to check if the device is
|
||||
idle), device drivers are generally responsible for queuing power management
|
||||
requests for their devices. For this purpose they should use the runtime PM
|
||||
helper functions provided by the PM core, discussed in
|
||||
Documentation/power/runtime_pm.txt.
|
||||
Documentation/power/runtime_pm.rst.
|
||||
|
||||
Devices can also be suspended and resumed synchronously, without placing a
|
||||
request into pm_wq. In the majority of cases this also is done by their
|
||||
drivers that use helper functions provided by the PM core for this purpose.
|
||||
|
||||
For more information on the runtime PM of devices refer to
|
||||
Documentation/power/runtime_pm.txt.
|
||||
Documentation/power/runtime_pm.rst.
|
||||
|
||||
|
||||
4. Resources
|
||||
============
|
||||
|
||||
PCI Local Bus Specification, Rev. 3.0
|
||||
|
||||
PCI Bus Power Management Interface Specification, Rev. 1.2
|
||||
|
||||
Advanced Configuration and Power Interface (ACPI) Specification, Rev. 3.0b
|
||||
|
||||
PCI Express Base Specification, Rev. 2.0
|
||||
|
||||
Documentation/driver-api/pm/devices.rst
|
||||
Documentation/power/runtime_pm.txt
|
||||
|
||||
Documentation/power/runtime_pm.rst
|
||||
@@ -1,4 +1,6 @@
|
||||
PM Quality Of Service Interface.
|
||||
===============================
|
||||
PM Quality Of Service Interface
|
||||
===============================
|
||||
|
||||
This interface provides a kernel and user mode interface for registering
|
||||
performance expectations by drivers, subsystems and user space applications on
|
||||
@@ -11,6 +13,7 @@ memory_bandwidth.
|
||||
constraints and PM QoS flags.
|
||||
|
||||
Each parameters have defined units:
|
||||
|
||||
* latency: usec
|
||||
* timeout: usec
|
||||
* throughput: kbs (kilo bit / sec)
|
||||
@@ -18,6 +21,7 @@ Each parameters have defined units:
|
||||
|
||||
|
||||
1. PM QoS framework
|
||||
===================
|
||||
|
||||
The infrastructure exposes multiple misc device nodes one per implemented
|
||||
parameter. The set of parameters implement is defined by pm_qos_power_init()
|
||||
@@ -37,38 +41,39 @@ reading the aggregated value does not require any locking mechanism.
|
||||
From kernel mode the use of this interface is simple:
|
||||
|
||||
void pm_qos_add_request(handle, param_class, target_value):
|
||||
Will insert an element into the list for that identified PM QoS class with the
|
||||
target value. Upon change to this list the new target is recomputed and any
|
||||
registered notifiers are called only if the target value is now different.
|
||||
Clients of pm_qos need to save the returned handle for future use in other
|
||||
pm_qos API functions.
|
||||
Will insert an element into the list for that identified PM QoS class with the
|
||||
target value. Upon change to this list the new target is recomputed and any
|
||||
registered notifiers are called only if the target value is now different.
|
||||
Clients of pm_qos need to save the returned handle for future use in other
|
||||
pm_qos API functions.
|
||||
|
||||
void pm_qos_update_request(handle, new_target_value):
|
||||
Will update the list element pointed to by the handle with the new target value
|
||||
and recompute the new aggregated target, calling the notification tree if the
|
||||
target is changed.
|
||||
Will update the list element pointed to by the handle with the new target value
|
||||
and recompute the new aggregated target, calling the notification tree if the
|
||||
target is changed.
|
||||
|
||||
void pm_qos_remove_request(handle):
|
||||
Will remove the element. After removal it will update the aggregate target and
|
||||
call the notification tree if the target was changed as a result of removing
|
||||
the request.
|
||||
Will remove the element. After removal it will update the aggregate target and
|
||||
call the notification tree if the target was changed as a result of removing
|
||||
the request.
|
||||
|
||||
int pm_qos_request(param_class):
|
||||
Returns the aggregated value for a given PM QoS class.
|
||||
Returns the aggregated value for a given PM QoS class.
|
||||
|
||||
int pm_qos_request_active(handle):
|
||||
Returns if the request is still active, i.e. it has not been removed from a
|
||||
PM QoS class constraints list.
|
||||
Returns if the request is still active, i.e. it has not been removed from a
|
||||
PM QoS class constraints list.
|
||||
|
||||
int pm_qos_add_notifier(param_class, notifier):
|
||||
Adds a notification callback function to the PM QoS class. The callback is
|
||||
called when the aggregated value for the PM QoS class is changed.
|
||||
Adds a notification callback function to the PM QoS class. The callback is
|
||||
called when the aggregated value for the PM QoS class is changed.
|
||||
|
||||
int pm_qos_remove_notifier(int param_class, notifier):
|
||||
Removes the notification callback function for the PM QoS class.
|
||||
Removes the notification callback function for the PM QoS class.
|
||||
|
||||
|
||||
From user mode:
|
||||
|
||||
Only processes can register a pm_qos request. To provide for automatic
|
||||
cleanup of a process, the interface requires the process to register its
|
||||
parameter requests in the following way:
|
||||
@@ -89,6 +94,7 @@ node.
|
||||
|
||||
|
||||
2. PM QoS per-device latency and flags framework
|
||||
================================================
|
||||
|
||||
For each device, there are three lists of PM QoS requests. Two of them are
|
||||
maintained along with the aggregated targets of resume latency and active
|
||||
@@ -107,73 +113,80 @@ the aggregated value does not require any locking mechanism.
|
||||
From kernel mode the use of this interface is the following:
|
||||
|
||||
int dev_pm_qos_add_request(device, handle, type, value):
|
||||
Will insert an element into the list for that identified device with the
|
||||
target value. Upon change to this list the new target is recomputed and any
|
||||
registered notifiers are called only if the target value is now different.
|
||||
Clients of dev_pm_qos need to save the handle for future use in other
|
||||
dev_pm_qos API functions.
|
||||
Will insert an element into the list for that identified device with the
|
||||
target value. Upon change to this list the new target is recomputed and any
|
||||
registered notifiers are called only if the target value is now different.
|
||||
Clients of dev_pm_qos need to save the handle for future use in other
|
||||
dev_pm_qos API functions.
|
||||
|
||||
int dev_pm_qos_update_request(handle, new_value):
|
||||
Will update the list element pointed to by the handle with the new target value
|
||||
and recompute the new aggregated target, calling the notification trees if the
|
||||
target is changed.
|
||||
Will update the list element pointed to by the handle with the new target
|
||||
value and recompute the new aggregated target, calling the notification
|
||||
trees if the target is changed.
|
||||
|
||||
int dev_pm_qos_remove_request(handle):
|
||||
Will remove the element. After removal it will update the aggregate target and
|
||||
call the notification trees if the target was changed as a result of removing
|
||||
the request.
|
||||
Will remove the element. After removal it will update the aggregate target
|
||||
and call the notification trees if the target was changed as a result of
|
||||
removing the request.
|
||||
|
||||
s32 dev_pm_qos_read_value(device):
|
||||
Returns the aggregated value for a given device's constraints list.
|
||||
Returns the aggregated value for a given device's constraints list.
|
||||
|
||||
enum pm_qos_flags_status dev_pm_qos_flags(device, mask)
|
||||
Check PM QoS flags of the given device against the given mask of flags.
|
||||
The meaning of the return values is as follows:
|
||||
PM_QOS_FLAGS_ALL: All flags from the mask are set
|
||||
PM_QOS_FLAGS_SOME: Some flags from the mask are set
|
||||
PM_QOS_FLAGS_NONE: No flags from the mask are set
|
||||
PM_QOS_FLAGS_UNDEFINED: The device's PM QoS structure has not been
|
||||
initialized or the list of requests is empty.
|
||||
Check PM QoS flags of the given device against the given mask of flags.
|
||||
The meaning of the return values is as follows:
|
||||
|
||||
PM_QOS_FLAGS_ALL:
|
||||
All flags from the mask are set
|
||||
PM_QOS_FLAGS_SOME:
|
||||
Some flags from the mask are set
|
||||
PM_QOS_FLAGS_NONE:
|
||||
No flags from the mask are set
|
||||
PM_QOS_FLAGS_UNDEFINED:
|
||||
The device's PM QoS structure has not been initialized
|
||||
or the list of requests is empty.
|
||||
|
||||
int dev_pm_qos_add_ancestor_request(dev, handle, type, value)
|
||||
Add a PM QoS request for the first direct ancestor of the given device whose
|
||||
power.ignore_children flag is unset (for DEV_PM_QOS_RESUME_LATENCY requests)
|
||||
or whose power.set_latency_tolerance callback pointer is not NULL (for
|
||||
DEV_PM_QOS_LATENCY_TOLERANCE requests).
|
||||
Add a PM QoS request for the first direct ancestor of the given device whose
|
||||
power.ignore_children flag is unset (for DEV_PM_QOS_RESUME_LATENCY requests)
|
||||
or whose power.set_latency_tolerance callback pointer is not NULL (for
|
||||
DEV_PM_QOS_LATENCY_TOLERANCE requests).
|
||||
|
||||
int dev_pm_qos_expose_latency_limit(device, value)
|
||||
Add a request to the device's PM QoS list of resume latency constraints and
|
||||
create a sysfs attribute pm_qos_resume_latency_us under the device's power
|
||||
directory allowing user space to manipulate that request.
|
||||
Add a request to the device's PM QoS list of resume latency constraints and
|
||||
create a sysfs attribute pm_qos_resume_latency_us under the device's power
|
||||
directory allowing user space to manipulate that request.
|
||||
|
||||
void dev_pm_qos_hide_latency_limit(device)
|
||||
Drop the request added by dev_pm_qos_expose_latency_limit() from the device's
|
||||
PM QoS list of resume latency constraints and remove sysfs attribute
|
||||
pm_qos_resume_latency_us from the device's power directory.
|
||||
Drop the request added by dev_pm_qos_expose_latency_limit() from the device's
|
||||
PM QoS list of resume latency constraints and remove sysfs attribute
|
||||
pm_qos_resume_latency_us from the device's power directory.
|
||||
|
||||
int dev_pm_qos_expose_flags(device, value)
|
||||
Add a request to the device's PM QoS list of flags and create sysfs attribute
|
||||
pm_qos_no_power_off under the device's power directory allowing user space to
|
||||
change the value of the PM_QOS_FLAG_NO_POWER_OFF flag.
|
||||
Add a request to the device's PM QoS list of flags and create sysfs attribute
|
||||
pm_qos_no_power_off under the device's power directory allowing user space to
|
||||
change the value of the PM_QOS_FLAG_NO_POWER_OFF flag.
|
||||
|
||||
void dev_pm_qos_hide_flags(device)
|
||||
Drop the request added by dev_pm_qos_expose_flags() from the device's PM QoS list
|
||||
of flags and remove sysfs attribute pm_qos_no_power_off from the device's power
|
||||
directory.
|
||||
Drop the request added by dev_pm_qos_expose_flags() from the device's PM QoS list
|
||||
of flags and remove sysfs attribute pm_qos_no_power_off from the device's power
|
||||
directory.
|
||||
|
||||
Notification mechanisms:
|
||||
|
||||
The per-device PM QoS framework has a per-device notification tree.
|
||||
|
||||
int dev_pm_qos_add_notifier(device, notifier):
|
||||
Adds a notification callback function for the device.
|
||||
The callback is called when the aggregated value of the device constraints list
|
||||
is changed (for resume latency device PM QoS only).
|
||||
Adds a notification callback function for the device.
|
||||
The callback is called when the aggregated value of the device constraints list
|
||||
is changed (for resume latency device PM QoS only).
|
||||
|
||||
int dev_pm_qos_remove_notifier(device, notifier):
|
||||
Removes the notification callback function for the device.
|
||||
Removes the notification callback function for the device.
|
||||
|
||||
|
||||
Active state latency tolerance
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
This device PM QoS type is used to support systems in which hardware may switch
|
||||
to energy-saving operation modes on the fly. In those systems, if the operation
|
||||
282
Documentation/power/power_supply_class.rst
Normal file
282
Documentation/power/power_supply_class.rst
Normal file
@@ -0,0 +1,282 @@
|
||||
========================
|
||||
Linux power supply class
|
||||
========================
|
||||
|
||||
Synopsis
|
||||
~~~~~~~~
|
||||
Power supply class used to represent battery, UPS, AC or DC power supply
|
||||
properties to user-space.
|
||||
|
||||
It defines core set of attributes, which should be applicable to (almost)
|
||||
every power supply out there. Attributes are available via sysfs and uevent
|
||||
interfaces.
|
||||
|
||||
Each attribute has well defined meaning, up to unit of measure used. While
|
||||
the attributes provided are believed to be universally applicable to any
|
||||
power supply, specific monitoring hardware may not be able to provide them
|
||||
all, so any of them may be skipped.
|
||||
|
||||
Power supply class is extensible, and allows to define drivers own attributes.
|
||||
The core attribute set is subject to the standard Linux evolution (i.e.
|
||||
if it will be found that some attribute is applicable to many power supply
|
||||
types or their drivers, it can be added to the core set).
|
||||
|
||||
It also integrates with LED framework, for the purpose of providing
|
||||
typically expected feedback of battery charging/fully charged status and
|
||||
AC/USB power supply online status. (Note that specific details of the
|
||||
indication (including whether to use it at all) are fully controllable by
|
||||
user and/or specific machine defaults, per design principles of LED
|
||||
framework).
|
||||
|
||||
|
||||
Attributes/properties
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
Power supply class has predefined set of attributes, this eliminates code
|
||||
duplication across drivers. Power supply class insist on reusing its
|
||||
predefined attributes *and* their units.
|
||||
|
||||
So, userspace gets predictable set of attributes and their units for any
|
||||
kind of power supply, and can process/present them to a user in consistent
|
||||
manner. Results for different power supplies and machines are also directly
|
||||
comparable.
|
||||
|
||||
See drivers/power/supply/ds2760_battery.c and drivers/power/supply/pda_power.c
|
||||
for the example how to declare and handle attributes.
|
||||
|
||||
|
||||
Units
|
||||
~~~~~
|
||||
Quoting include/linux/power_supply.h:
|
||||
|
||||
All voltages, currents, charges, energies, time and temperatures in µV,
|
||||
µA, µAh, µWh, seconds and tenths of degree Celsius unless otherwise
|
||||
stated. It's driver's job to convert its raw values to units in which
|
||||
this class operates.
|
||||
|
||||
|
||||
Attributes/properties detailed
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
+--------------------------------------------------------------------------+
|
||||
| **Charge/Energy/Capacity - how to not confuse** |
|
||||
+--------------------------------------------------------------------------+
|
||||
| **Because both "charge" (µAh) and "energy" (µWh) represents "capacity" |
|
||||
| of battery, this class distinguish these terms. Don't mix them!** |
|
||||
| |
|
||||
| - `CHARGE_*` |
|
||||
| attributes represents capacity in µAh only. |
|
||||
| - `ENERGY_*` |
|
||||
| attributes represents capacity in µWh only. |
|
||||
| - `CAPACITY` |
|
||||
| attribute represents capacity in *percents*, from 0 to 100. |
|
||||
+--------------------------------------------------------------------------+
|
||||
|
||||
Postfixes:
|
||||
|
||||
_AVG
|
||||
*hardware* averaged value, use it if your hardware is really able to
|
||||
report averaged values.
|
||||
_NOW
|
||||
momentary/instantaneous values.
|
||||
|
||||
STATUS
|
||||
this attribute represents operating status (charging, full,
|
||||
discharging (i.e. powering a load), etc.). This corresponds to
|
||||
`BATTERY_STATUS_*` values, as defined in battery.h.
|
||||
|
||||
CHARGE_TYPE
|
||||
batteries can typically charge at different rates.
|
||||
This defines trickle and fast charges. For batteries that
|
||||
are already charged or discharging, 'n/a' can be displayed (or
|
||||
'unknown', if the status is not known).
|
||||
|
||||
AUTHENTIC
|
||||
indicates the power supply (battery or charger) connected
|
||||
to the platform is authentic(1) or non authentic(0).
|
||||
|
||||
HEALTH
|
||||
represents health of the battery, values corresponds to
|
||||
POWER_SUPPLY_HEALTH_*, defined in battery.h.
|
||||
|
||||
VOLTAGE_OCV
|
||||
open circuit voltage of the battery.
|
||||
|
||||
VOLTAGE_MAX_DESIGN, VOLTAGE_MIN_DESIGN
|
||||
design values for maximal and minimal power supply voltages.
|
||||
Maximal/minimal means values of voltages when battery considered
|
||||
"full"/"empty" at normal conditions. Yes, there is no direct relation
|
||||
between voltage and battery capacity, but some dumb
|
||||
batteries use voltage for very approximated calculation of capacity.
|
||||
Battery driver also can use this attribute just to inform userspace
|
||||
about maximal and minimal voltage thresholds of a given battery.
|
||||
|
||||
VOLTAGE_MAX, VOLTAGE_MIN
|
||||
same as _DESIGN voltage values except that these ones should be used
|
||||
if hardware could only guess (measure and retain) the thresholds of a
|
||||
given power supply.
|
||||
|
||||
VOLTAGE_BOOT
|
||||
Reports the voltage measured during boot
|
||||
|
||||
CURRENT_BOOT
|
||||
Reports the current measured during boot
|
||||
|
||||
CHARGE_FULL_DESIGN, CHARGE_EMPTY_DESIGN
|
||||
design charge values, when battery considered full/empty.
|
||||
|
||||
ENERGY_FULL_DESIGN, ENERGY_EMPTY_DESIGN
|
||||
same as above but for energy.
|
||||
|
||||
CHARGE_FULL, CHARGE_EMPTY
|
||||
These attributes means "last remembered value of charge when battery
|
||||
became full/empty". It also could mean "value of charge when battery
|
||||
considered full/empty at given conditions (temperature, age)".
|
||||
I.e. these attributes represents real thresholds, not design values.
|
||||
|
||||
ENERGY_FULL, ENERGY_EMPTY
|
||||
same as above but for energy.
|
||||
|
||||
CHARGE_COUNTER
|
||||
the current charge counter (in µAh). This could easily
|
||||
be negative; there is no empty or full value. It is only useful for
|
||||
relative, time-based measurements.
|
||||
|
||||
PRECHARGE_CURRENT
|
||||
the maximum charge current during precharge phase of charge cycle
|
||||
(typically 20% of battery capacity).
|
||||
|
||||
CHARGE_TERM_CURRENT
|
||||
Charge termination current. The charge cycle terminates when battery
|
||||
voltage is above recharge threshold, and charge current is below
|
||||
this setting (typically 10% of battery capacity).
|
||||
|
||||
CONSTANT_CHARGE_CURRENT
|
||||
constant charge current programmed by charger.
|
||||
|
||||
|
||||
CONSTANT_CHARGE_CURRENT_MAX
|
||||
maximum charge current supported by the power supply object.
|
||||
|
||||
CONSTANT_CHARGE_VOLTAGE
|
||||
constant charge voltage programmed by charger.
|
||||
CONSTANT_CHARGE_VOLTAGE_MAX
|
||||
maximum charge voltage supported by the power supply object.
|
||||
|
||||
INPUT_CURRENT_LIMIT
|
||||
input current limit programmed by charger. Indicates
|
||||
the current drawn from a charging source.
|
||||
|
||||
CHARGE_CONTROL_LIMIT
|
||||
current charge control limit setting
|
||||
CHARGE_CONTROL_LIMIT_MAX
|
||||
maximum charge control limit setting
|
||||
|
||||
CALIBRATE
|
||||
battery or coulomb counter calibration status
|
||||
|
||||
CAPACITY
|
||||
capacity in percents.
|
||||
CAPACITY_ALERT_MIN
|
||||
minimum capacity alert value in percents.
|
||||
CAPACITY_ALERT_MAX
|
||||
maximum capacity alert value in percents.
|
||||
CAPACITY_LEVEL
|
||||
capacity level. This corresponds to POWER_SUPPLY_CAPACITY_LEVEL_*.
|
||||
|
||||
TEMP
|
||||
temperature of the power supply.
|
||||
TEMP_ALERT_MIN
|
||||
minimum battery temperature alert.
|
||||
TEMP_ALERT_MAX
|
||||
maximum battery temperature alert.
|
||||
TEMP_AMBIENT
|
||||
ambient temperature.
|
||||
TEMP_AMBIENT_ALERT_MIN
|
||||
minimum ambient temperature alert.
|
||||
TEMP_AMBIENT_ALERT_MAX
|
||||
maximum ambient temperature alert.
|
||||
TEMP_MIN
|
||||
minimum operatable temperature
|
||||
TEMP_MAX
|
||||
maximum operatable temperature
|
||||
|
||||
TIME_TO_EMPTY
|
||||
seconds left for battery to be considered empty
|
||||
(i.e. while battery powers a load)
|
||||
TIME_TO_FULL
|
||||
seconds left for battery to be considered full
|
||||
(i.e. while battery is charging)
|
||||
|
||||
|
||||
Battery <-> external power supply interaction
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Often power supplies are acting as supplies and supplicants at the same
|
||||
time. Batteries are good example. So, batteries usually care if they're
|
||||
externally powered or not.
|
||||
|
||||
For that case, power supply class implements notification mechanism for
|
||||
batteries.
|
||||
|
||||
External power supply (AC) lists supplicants (batteries) names in
|
||||
"supplied_to" struct member, and each power_supply_changed() call
|
||||
issued by external power supply will notify supplicants via
|
||||
external_power_changed callback.
|
||||
|
||||
|
||||
Devicetree battery characteristics
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Drivers should call power_supply_get_battery_info() to obtain battery
|
||||
characteristics from a devicetree battery node, defined in
|
||||
Documentation/devicetree/bindings/power/supply/battery.txt. This is
|
||||
implemented in drivers/power/supply/bq27xxx_battery.c.
|
||||
|
||||
Properties in struct power_supply_battery_info and their counterparts in the
|
||||
battery node have names corresponding to elements in enum power_supply_property,
|
||||
for naming consistency between sysfs attributes and battery node properties.
|
||||
|
||||
|
||||
QA
|
||||
~~
|
||||
|
||||
Q:
|
||||
Where is POWER_SUPPLY_PROP_XYZ attribute?
|
||||
A:
|
||||
If you cannot find attribute suitable for your driver needs, feel free
|
||||
to add it and send patch along with your driver.
|
||||
|
||||
The attributes available currently are the ones currently provided by the
|
||||
drivers written.
|
||||
|
||||
Good candidates to add in future: model/part#, cycle_time, manufacturer,
|
||||
etc.
|
||||
|
||||
|
||||
Q:
|
||||
I have some very specific attribute (e.g. battery color), should I add
|
||||
this attribute to standard ones?
|
||||
A:
|
||||
Most likely, no. Such attribute can be placed in the driver itself, if
|
||||
it is useful. Of course, if the attribute in question applicable to
|
||||
large set of batteries, provided by many drivers, and/or comes from
|
||||
some general battery specification/standard, it may be a candidate to
|
||||
be added to the core attribute set.
|
||||
|
||||
|
||||
Q:
|
||||
Suppose, my battery monitoring chip/firmware does not provides capacity
|
||||
in percents, but provides charge_{now,full,empty}. Should I calculate
|
||||
percentage capacity manually, inside the driver, and register CAPACITY
|
||||
attribute? The same question about time_to_empty/time_to_full.
|
||||
A:
|
||||
Most likely, no. This class is designed to export properties which are
|
||||
directly measurable by the specific hardware available.
|
||||
|
||||
Inferring not available properties using some heuristics or mathematical
|
||||
model is not subject of work for a battery driver. Such functionality
|
||||
should be factored out, and in fact, apm_power, the driver to serve
|
||||
legacy APM API on top of power supply class, uses a simple heuristic of
|
||||
approximating remaining battery capacity based on its charge, current,
|
||||
voltage and so on. But full-fledged battery model is likely not subject
|
||||
for kernel at all, as it would require floating point calculation to deal
|
||||
with things like differential equations and Kalman filters. This is
|
||||
better be handled by batteryd/libbattery, yet to be written.
|
||||
@@ -1,231 +0,0 @@
|
||||
Linux power supply class
|
||||
========================
|
||||
|
||||
Synopsis
|
||||
~~~~~~~~
|
||||
Power supply class used to represent battery, UPS, AC or DC power supply
|
||||
properties to user-space.
|
||||
|
||||
It defines core set of attributes, which should be applicable to (almost)
|
||||
every power supply out there. Attributes are available via sysfs and uevent
|
||||
interfaces.
|
||||
|
||||
Each attribute has well defined meaning, up to unit of measure used. While
|
||||
the attributes provided are believed to be universally applicable to any
|
||||
power supply, specific monitoring hardware may not be able to provide them
|
||||
all, so any of them may be skipped.
|
||||
|
||||
Power supply class is extensible, and allows to define drivers own attributes.
|
||||
The core attribute set is subject to the standard Linux evolution (i.e.
|
||||
if it will be found that some attribute is applicable to many power supply
|
||||
types or their drivers, it can be added to the core set).
|
||||
|
||||
It also integrates with LED framework, for the purpose of providing
|
||||
typically expected feedback of battery charging/fully charged status and
|
||||
AC/USB power supply online status. (Note that specific details of the
|
||||
indication (including whether to use it at all) are fully controllable by
|
||||
user and/or specific machine defaults, per design principles of LED
|
||||
framework).
|
||||
|
||||
|
||||
Attributes/properties
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
Power supply class has predefined set of attributes, this eliminates code
|
||||
duplication across drivers. Power supply class insist on reusing its
|
||||
predefined attributes *and* their units.
|
||||
|
||||
So, userspace gets predictable set of attributes and their units for any
|
||||
kind of power supply, and can process/present them to a user in consistent
|
||||
manner. Results for different power supplies and machines are also directly
|
||||
comparable.
|
||||
|
||||
See drivers/power/supply/ds2760_battery.c and drivers/power/supply/pda_power.c
|
||||
for the example how to declare and handle attributes.
|
||||
|
||||
|
||||
Units
|
||||
~~~~~
|
||||
Quoting include/linux/power_supply.h:
|
||||
|
||||
All voltages, currents, charges, energies, time and temperatures in µV,
|
||||
µA, µAh, µWh, seconds and tenths of degree Celsius unless otherwise
|
||||
stated. It's driver's job to convert its raw values to units in which
|
||||
this class operates.
|
||||
|
||||
|
||||
Attributes/properties detailed
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
~ ~ ~ ~ ~ ~ ~ Charge/Energy/Capacity - how to not confuse ~ ~ ~ ~ ~ ~ ~
|
||||
~ ~
|
||||
~ Because both "charge" (µAh) and "energy" (µWh) represents "capacity" ~
|
||||
~ of battery, this class distinguish these terms. Don't mix them! ~
|
||||
~ ~
|
||||
~ CHARGE_* attributes represents capacity in µAh only. ~
|
||||
~ ENERGY_* attributes represents capacity in µWh only. ~
|
||||
~ CAPACITY attribute represents capacity in *percents*, from 0 to 100. ~
|
||||
~ ~
|
||||
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
|
||||
|
||||
Postfixes:
|
||||
_AVG - *hardware* averaged value, use it if your hardware is really able to
|
||||
report averaged values.
|
||||
_NOW - momentary/instantaneous values.
|
||||
|
||||
STATUS - this attribute represents operating status (charging, full,
|
||||
discharging (i.e. powering a load), etc.). This corresponds to
|
||||
BATTERY_STATUS_* values, as defined in battery.h.
|
||||
|
||||
CHARGE_TYPE - batteries can typically charge at different rates.
|
||||
This defines trickle and fast charges. For batteries that
|
||||
are already charged or discharging, 'n/a' can be displayed (or
|
||||
'unknown', if the status is not known).
|
||||
|
||||
AUTHENTIC - indicates the power supply (battery or charger) connected
|
||||
to the platform is authentic(1) or non authentic(0).
|
||||
|
||||
HEALTH - represents health of the battery, values corresponds to
|
||||
POWER_SUPPLY_HEALTH_*, defined in battery.h.
|
||||
|
||||
VOLTAGE_OCV - open circuit voltage of the battery.
|
||||
|
||||
VOLTAGE_MAX_DESIGN, VOLTAGE_MIN_DESIGN - design values for maximal and
|
||||
minimal power supply voltages. Maximal/minimal means values of voltages
|
||||
when battery considered "full"/"empty" at normal conditions. Yes, there is
|
||||
no direct relation between voltage and battery capacity, but some dumb
|
||||
batteries use voltage for very approximated calculation of capacity.
|
||||
Battery driver also can use this attribute just to inform userspace
|
||||
about maximal and minimal voltage thresholds of a given battery.
|
||||
|
||||
VOLTAGE_MAX, VOLTAGE_MIN - same as _DESIGN voltage values except that
|
||||
these ones should be used if hardware could only guess (measure and
|
||||
retain) the thresholds of a given power supply.
|
||||
|
||||
VOLTAGE_BOOT - Reports the voltage measured during boot
|
||||
|
||||
CURRENT_BOOT - Reports the current measured during boot
|
||||
|
||||
CHARGE_FULL_DESIGN, CHARGE_EMPTY_DESIGN - design charge values, when
|
||||
battery considered full/empty.
|
||||
|
||||
ENERGY_FULL_DESIGN, ENERGY_EMPTY_DESIGN - same as above but for energy.
|
||||
|
||||
CHARGE_FULL, CHARGE_EMPTY - These attributes means "last remembered value
|
||||
of charge when battery became full/empty". It also could mean "value of
|
||||
charge when battery considered full/empty at given conditions (temperature,
|
||||
age)". I.e. these attributes represents real thresholds, not design values.
|
||||
|
||||
ENERGY_FULL, ENERGY_EMPTY - same as above but for energy.
|
||||
|
||||
CHARGE_COUNTER - the current charge counter (in µAh). This could easily
|
||||
be negative; there is no empty or full value. It is only useful for
|
||||
relative, time-based measurements.
|
||||
|
||||
PRECHARGE_CURRENT - the maximum charge current during precharge phase
|
||||
of charge cycle (typically 20% of battery capacity).
|
||||
CHARGE_TERM_CURRENT - Charge termination current. The charge cycle
|
||||
terminates when battery voltage is above recharge threshold, and charge
|
||||
current is below this setting (typically 10% of battery capacity).
|
||||
|
||||
CONSTANT_CHARGE_CURRENT - constant charge current programmed by charger.
|
||||
CONSTANT_CHARGE_CURRENT_MAX - maximum charge current supported by the
|
||||
power supply object.
|
||||
|
||||
CONSTANT_CHARGE_VOLTAGE - constant charge voltage programmed by charger.
|
||||
CONSTANT_CHARGE_VOLTAGE_MAX - maximum charge voltage supported by the
|
||||
power supply object.
|
||||
|
||||
INPUT_CURRENT_LIMIT - input current limit programmed by charger. Indicates
|
||||
the current drawn from a charging source.
|
||||
|
||||
CHARGE_CONTROL_LIMIT - current charge control limit setting
|
||||
CHARGE_CONTROL_LIMIT_MAX - maximum charge control limit setting
|
||||
|
||||
CALIBRATE - battery or coulomb counter calibration status
|
||||
|
||||
CAPACITY - capacity in percents.
|
||||
CAPACITY_ALERT_MIN - minimum capacity alert value in percents.
|
||||
CAPACITY_ALERT_MAX - maximum capacity alert value in percents.
|
||||
CAPACITY_LEVEL - capacity level. This corresponds to
|
||||
POWER_SUPPLY_CAPACITY_LEVEL_*.
|
||||
|
||||
TEMP - temperature of the power supply.
|
||||
TEMP_ALERT_MIN - minimum battery temperature alert.
|
||||
TEMP_ALERT_MAX - maximum battery temperature alert.
|
||||
TEMP_AMBIENT - ambient temperature.
|
||||
TEMP_AMBIENT_ALERT_MIN - minimum ambient temperature alert.
|
||||
TEMP_AMBIENT_ALERT_MAX - maximum ambient temperature alert.
|
||||
TEMP_MIN - minimum operatable temperature
|
||||
TEMP_MAX - maximum operatable temperature
|
||||
|
||||
TIME_TO_EMPTY - seconds left for battery to be considered empty (i.e.
|
||||
while battery powers a load)
|
||||
TIME_TO_FULL - seconds left for battery to be considered full (i.e.
|
||||
while battery is charging)
|
||||
|
||||
|
||||
Battery <-> external power supply interaction
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Often power supplies are acting as supplies and supplicants at the same
|
||||
time. Batteries are good example. So, batteries usually care if they're
|
||||
externally powered or not.
|
||||
|
||||
For that case, power supply class implements notification mechanism for
|
||||
batteries.
|
||||
|
||||
External power supply (AC) lists supplicants (batteries) names in
|
||||
"supplied_to" struct member, and each power_supply_changed() call
|
||||
issued by external power supply will notify supplicants via
|
||||
external_power_changed callback.
|
||||
|
||||
|
||||
Devicetree battery characteristics
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Drivers should call power_supply_get_battery_info() to obtain battery
|
||||
characteristics from a devicetree battery node, defined in
|
||||
Documentation/devicetree/bindings/power/supply/battery.txt. This is
|
||||
implemented in drivers/power/supply/bq27xxx_battery.c.
|
||||
|
||||
Properties in struct power_supply_battery_info and their counterparts in the
|
||||
battery node have names corresponding to elements in enum power_supply_property,
|
||||
for naming consistency between sysfs attributes and battery node properties.
|
||||
|
||||
|
||||
QA
|
||||
~~
|
||||
Q: Where is POWER_SUPPLY_PROP_XYZ attribute?
|
||||
A: If you cannot find attribute suitable for your driver needs, feel free
|
||||
to add it and send patch along with your driver.
|
||||
|
||||
The attributes available currently are the ones currently provided by the
|
||||
drivers written.
|
||||
|
||||
Good candidates to add in future: model/part#, cycle_time, manufacturer,
|
||||
etc.
|
||||
|
||||
|
||||
Q: I have some very specific attribute (e.g. battery color), should I add
|
||||
this attribute to standard ones?
|
||||
A: Most likely, no. Such attribute can be placed in the driver itself, if
|
||||
it is useful. Of course, if the attribute in question applicable to
|
||||
large set of batteries, provided by many drivers, and/or comes from
|
||||
some general battery specification/standard, it may be a candidate to
|
||||
be added to the core attribute set.
|
||||
|
||||
|
||||
Q: Suppose, my battery monitoring chip/firmware does not provides capacity
|
||||
in percents, but provides charge_{now,full,empty}. Should I calculate
|
||||
percentage capacity manually, inside the driver, and register CAPACITY
|
||||
attribute? The same question about time_to_empty/time_to_full.
|
||||
A: Most likely, no. This class is designed to export properties which are
|
||||
directly measurable by the specific hardware available.
|
||||
|
||||
Inferring not available properties using some heuristics or mathematical
|
||||
model is not subject of work for a battery driver. Such functionality
|
||||
should be factored out, and in fact, apm_power, the driver to serve
|
||||
legacy APM API on top of power supply class, uses a simple heuristic of
|
||||
approximating remaining battery capacity based on its charge, current,
|
||||
voltage and so on. But full-fledged battery model is likely not subject
|
||||
for kernel at all, as it would require floating point calculation to deal
|
||||
with things like differential equations and Kalman filters. This is
|
||||
better be handled by batteryd/libbattery, yet to be written.
|
||||
257
Documentation/power/powercap/powercap.rst
Normal file
257
Documentation/power/powercap/powercap.rst
Normal file
@@ -0,0 +1,257 @@
|
||||
=======================
|
||||
Power Capping Framework
|
||||
=======================
|
||||
|
||||
The power capping framework provides a consistent interface between the kernel
|
||||
and the user space that allows power capping drivers to expose the settings to
|
||||
user space in a uniform way.
|
||||
|
||||
Terminology
|
||||
===========
|
||||
|
||||
The framework exposes power capping devices to user space via sysfs in the
|
||||
form of a tree of objects. The objects at the root level of the tree represent
|
||||
'control types', which correspond to different methods of power capping. For
|
||||
example, the intel-rapl control type represents the Intel "Running Average
|
||||
Power Limit" (RAPL) technology, whereas the 'idle-injection' control type
|
||||
corresponds to the use of idle injection for controlling power.
|
||||
|
||||
Power zones represent different parts of the system, which can be controlled and
|
||||
monitored using the power capping method determined by the control type the
|
||||
given zone belongs to. They each contain attributes for monitoring power, as
|
||||
well as controls represented in the form of power constraints. If the parts of
|
||||
the system represented by different power zones are hierarchical (that is, one
|
||||
bigger part consists of multiple smaller parts that each have their own power
|
||||
controls), those power zones may also be organized in a hierarchy with one
|
||||
parent power zone containing multiple subzones and so on to reflect the power
|
||||
control topology of the system. In that case, it is possible to apply power
|
||||
capping to a set of devices together using the parent power zone and if more
|
||||
fine grained control is required, it can be applied through the subzones.
|
||||
|
||||
|
||||
Example sysfs interface tree::
|
||||
|
||||
/sys/devices/virtual/powercap
|
||||
└──intel-rapl
|
||||
├──intel-rapl:0
|
||||
│ ├──constraint_0_name
|
||||
│ ├──constraint_0_power_limit_uw
|
||||
│ ├──constraint_0_time_window_us
|
||||
│ ├──constraint_1_name
|
||||
│ ├──constraint_1_power_limit_uw
|
||||
│ ├──constraint_1_time_window_us
|
||||
│ ├──device -> ../../intel-rapl
|
||||
│ ├──energy_uj
|
||||
│ ├──intel-rapl:0:0
|
||||
│ │ ├──constraint_0_name
|
||||
│ │ ├──constraint_0_power_limit_uw
|
||||
│ │ ├──constraint_0_time_window_us
|
||||
│ │ ├──constraint_1_name
|
||||
│ │ ├──constraint_1_power_limit_uw
|
||||
│ │ ├──constraint_1_time_window_us
|
||||
│ │ ├──device -> ../../intel-rapl:0
|
||||
│ │ ├──energy_uj
|
||||
│ │ ├──max_energy_range_uj
|
||||
│ │ ├──name
|
||||
│ │ ├──enabled
|
||||
│ │ ├──power
|
||||
│ │ │ ├──async
|
||||
│ │ │ []
|
||||
│ │ ├──subsystem -> ../../../../../../class/power_cap
|
||||
│ │ └──uevent
|
||||
│ ├──intel-rapl:0:1
|
||||
│ │ ├──constraint_0_name
|
||||
│ │ ├──constraint_0_power_limit_uw
|
||||
│ │ ├──constraint_0_time_window_us
|
||||
│ │ ├──constraint_1_name
|
||||
│ │ ├──constraint_1_power_limit_uw
|
||||
│ │ ├──constraint_1_time_window_us
|
||||
│ │ ├──device -> ../../intel-rapl:0
|
||||
│ │ ├──energy_uj
|
||||
│ │ ├──max_energy_range_uj
|
||||
│ │ ├──name
|
||||
│ │ ├──enabled
|
||||
│ │ ├──power
|
||||
│ │ │ ├──async
|
||||
│ │ │ []
|
||||
│ │ ├──subsystem -> ../../../../../../class/power_cap
|
||||
│ │ └──uevent
|
||||
│ ├──max_energy_range_uj
|
||||
│ ├──max_power_range_uw
|
||||
│ ├──name
|
||||
│ ├──enabled
|
||||
│ ├──power
|
||||
│ │ ├──async
|
||||
│ │ []
|
||||
│ ├──subsystem -> ../../../../../class/power_cap
|
||||
│ ├──enabled
|
||||
│ ├──uevent
|
||||
├──intel-rapl:1
|
||||
│ ├──constraint_0_name
|
||||
│ ├──constraint_0_power_limit_uw
|
||||
│ ├──constraint_0_time_window_us
|
||||
│ ├──constraint_1_name
|
||||
│ ├──constraint_1_power_limit_uw
|
||||
│ ├──constraint_1_time_window_us
|
||||
│ ├──device -> ../../intel-rapl
|
||||
│ ├──energy_uj
|
||||
│ ├──intel-rapl:1:0
|
||||
│ │ ├──constraint_0_name
|
||||
│ │ ├──constraint_0_power_limit_uw
|
||||
│ │ ├──constraint_0_time_window_us
|
||||
│ │ ├──constraint_1_name
|
||||
│ │ ├──constraint_1_power_limit_uw
|
||||
│ │ ├──constraint_1_time_window_us
|
||||
│ │ ├──device -> ../../intel-rapl:1
|
||||
│ │ ├──energy_uj
|
||||
│ │ ├──max_energy_range_uj
|
||||
│ │ ├──name
|
||||
│ │ ├──enabled
|
||||
│ │ ├──power
|
||||
│ │ │ ├──async
|
||||
│ │ │ []
|
||||
│ │ ├──subsystem -> ../../../../../../class/power_cap
|
||||
│ │ └──uevent
|
||||
│ ├──intel-rapl:1:1
|
||||
│ │ ├──constraint_0_name
|
||||
│ │ ├──constraint_0_power_limit_uw
|
||||
│ │ ├──constraint_0_time_window_us
|
||||
│ │ ├──constraint_1_name
|
||||
│ │ ├──constraint_1_power_limit_uw
|
||||
│ │ ├──constraint_1_time_window_us
|
||||
│ │ ├──device -> ../../intel-rapl:1
|
||||
│ │ ├──energy_uj
|
||||
│ │ ├──max_energy_range_uj
|
||||
│ │ ├──name
|
||||
│ │ ├──enabled
|
||||
│ │ ├──power
|
||||
│ │ │ ├──async
|
||||
│ │ │ []
|
||||
│ │ ├──subsystem -> ../../../../../../class/power_cap
|
||||
│ │ └──uevent
|
||||
│ ├──max_energy_range_uj
|
||||
│ ├──max_power_range_uw
|
||||
│ ├──name
|
||||
│ ├──enabled
|
||||
│ ├──power
|
||||
│ │ ├──async
|
||||
│ │ []
|
||||
│ ├──subsystem -> ../../../../../class/power_cap
|
||||
│ ├──uevent
|
||||
├──power
|
||||
│ ├──async
|
||||
│ []
|
||||
├──subsystem -> ../../../../class/power_cap
|
||||
├──enabled
|
||||
└──uevent
|
||||
|
||||
The above example illustrates a case in which the Intel RAPL technology,
|
||||
available in Intel® IA-64 and IA-32 Processor Architectures, is used. There is one
|
||||
control type called intel-rapl which contains two power zones, intel-rapl:0 and
|
||||
intel-rapl:1, representing CPU packages. Each of these power zones contains
|
||||
two subzones, intel-rapl:j:0 and intel-rapl:j:1 (j = 0, 1), representing the
|
||||
"core" and the "uncore" parts of the given CPU package, respectively. All of
|
||||
the zones and subzones contain energy monitoring attributes (energy_uj,
|
||||
max_energy_range_uj) and constraint attributes (constraint_*) allowing controls
|
||||
to be applied (the constraints in the 'package' power zones apply to the whole
|
||||
CPU packages and the subzone constraints only apply to the respective parts of
|
||||
the given package individually). Since Intel RAPL doesn't provide instantaneous
|
||||
power value, there is no power_uw attribute.
|
||||
|
||||
In addition to that, each power zone contains a name attribute, allowing the
|
||||
part of the system represented by that zone to be identified.
|
||||
For example::
|
||||
|
||||
cat /sys/class/power_cap/intel-rapl/intel-rapl:0/name
|
||||
|
||||
package-0
|
||||
---------
|
||||
|
||||
The Intel RAPL technology allows two constraints, short term and long term,
|
||||
with two different time windows to be applied to each power zone. Thus for
|
||||
each zone there are 2 attributes representing the constraint names, 2 power
|
||||
limits and 2 attributes representing the sizes of the time windows. Such that,
|
||||
constraint_j_* attributes correspond to the jth constraint (j = 0,1).
|
||||
|
||||
For example::
|
||||
|
||||
constraint_0_name
|
||||
constraint_0_power_limit_uw
|
||||
constraint_0_time_window_us
|
||||
constraint_1_name
|
||||
constraint_1_power_limit_uw
|
||||
constraint_1_time_window_us
|
||||
|
||||
Power Zone Attributes
|
||||
=====================
|
||||
|
||||
Monitoring attributes
|
||||
---------------------
|
||||
|
||||
energy_uj (rw)
|
||||
Current energy counter in micro joules. Write "0" to reset.
|
||||
If the counter can not be reset, then this attribute is read only.
|
||||
|
||||
max_energy_range_uj (ro)
|
||||
Range of the above energy counter in micro-joules.
|
||||
|
||||
power_uw (ro)
|
||||
Current power in micro watts.
|
||||
|
||||
max_power_range_uw (ro)
|
||||
Range of the above power value in micro-watts.
|
||||
|
||||
name (ro)
|
||||
Name of this power zone.
|
||||
|
||||
It is possible that some domains have both power ranges and energy counter ranges;
|
||||
however, only one is mandatory.
|
||||
|
||||
Constraints
|
||||
-----------
|
||||
|
||||
constraint_X_power_limit_uw (rw)
|
||||
Power limit in micro watts, which should be applicable for the
|
||||
time window specified by "constraint_X_time_window_us".
|
||||
|
||||
constraint_X_time_window_us (rw)
|
||||
Time window in micro seconds.
|
||||
|
||||
constraint_X_name (ro)
|
||||
An optional name of the constraint
|
||||
|
||||
constraint_X_max_power_uw(ro)
|
||||
Maximum allowed power in micro watts.
|
||||
|
||||
constraint_X_min_power_uw(ro)
|
||||
Minimum allowed power in micro watts.
|
||||
|
||||
constraint_X_max_time_window_us(ro)
|
||||
Maximum allowed time window in micro seconds.
|
||||
|
||||
constraint_X_min_time_window_us(ro)
|
||||
Minimum allowed time window in micro seconds.
|
||||
|
||||
Except power_limit_uw and time_window_us other fields are optional.
|
||||
|
||||
Common zone and control type attributes
|
||||
---------------------------------------
|
||||
|
||||
enabled (rw): Enable/Disable controls at zone level or for all zones using
|
||||
a control type.
|
||||
|
||||
Power Cap Client Driver Interface
|
||||
=================================
|
||||
|
||||
The API summary:
|
||||
|
||||
Call powercap_register_control_type() to register control type object.
|
||||
Call powercap_register_zone() to register a power zone (under a given
|
||||
control type), either as a top-level power zone or as a subzone of another
|
||||
power zone registered earlier.
|
||||
The number of constraints in a power zone and the corresponding callbacks have
|
||||
to be defined prior to calling powercap_register_zone() to register that zone.
|
||||
|
||||
To Free a power zone call powercap_unregister_zone().
|
||||
To free a control type object call powercap_unregister_control_type().
|
||||
Detailed API can be generated using kernel-doc on include/linux/powercap.h.
|
||||
@@ -1,236 +0,0 @@
|
||||
Power Capping Framework
|
||||
==================================
|
||||
|
||||
The power capping framework provides a consistent interface between the kernel
|
||||
and the user space that allows power capping drivers to expose the settings to
|
||||
user space in a uniform way.
|
||||
|
||||
Terminology
|
||||
=========================
|
||||
The framework exposes power capping devices to user space via sysfs in the
|
||||
form of a tree of objects. The objects at the root level of the tree represent
|
||||
'control types', which correspond to different methods of power capping. For
|
||||
example, the intel-rapl control type represents the Intel "Running Average
|
||||
Power Limit" (RAPL) technology, whereas the 'idle-injection' control type
|
||||
corresponds to the use of idle injection for controlling power.
|
||||
|
||||
Power zones represent different parts of the system, which can be controlled and
|
||||
monitored using the power capping method determined by the control type the
|
||||
given zone belongs to. They each contain attributes for monitoring power, as
|
||||
well as controls represented in the form of power constraints. If the parts of
|
||||
the system represented by different power zones are hierarchical (that is, one
|
||||
bigger part consists of multiple smaller parts that each have their own power
|
||||
controls), those power zones may also be organized in a hierarchy with one
|
||||
parent power zone containing multiple subzones and so on to reflect the power
|
||||
control topology of the system. In that case, it is possible to apply power
|
||||
capping to a set of devices together using the parent power zone and if more
|
||||
fine grained control is required, it can be applied through the subzones.
|
||||
|
||||
|
||||
Example sysfs interface tree:
|
||||
|
||||
/sys/devices/virtual/powercap
|
||||
??? intel-rapl
|
||||
??? intel-rapl:0
|
||||
? ??? constraint_0_name
|
||||
? ??? constraint_0_power_limit_uw
|
||||
? ??? constraint_0_time_window_us
|
||||
? ??? constraint_1_name
|
||||
? ??? constraint_1_power_limit_uw
|
||||
? ??? constraint_1_time_window_us
|
||||
? ??? device -> ../../intel-rapl
|
||||
? ??? energy_uj
|
||||
? ??? intel-rapl:0:0
|
||||
? ? ??? constraint_0_name
|
||||
? ? ??? constraint_0_power_limit_uw
|
||||
? ? ??? constraint_0_time_window_us
|
||||
? ? ??? constraint_1_name
|
||||
? ? ??? constraint_1_power_limit_uw
|
||||
? ? ??? constraint_1_time_window_us
|
||||
? ? ??? device -> ../../intel-rapl:0
|
||||
? ? ??? energy_uj
|
||||
? ? ??? max_energy_range_uj
|
||||
? ? ??? name
|
||||
? ? ??? enabled
|
||||
? ? ??? power
|
||||
? ? ? ??? async
|
||||
? ? ? []
|
||||
? ? ??? subsystem -> ../../../../../../class/power_cap
|
||||
? ? ??? uevent
|
||||
? ??? intel-rapl:0:1
|
||||
? ? ??? constraint_0_name
|
||||
? ? ??? constraint_0_power_limit_uw
|
||||
? ? ??? constraint_0_time_window_us
|
||||
? ? ??? constraint_1_name
|
||||
? ? ??? constraint_1_power_limit_uw
|
||||
? ? ??? constraint_1_time_window_us
|
||||
? ? ??? device -> ../../intel-rapl:0
|
||||
? ? ??? energy_uj
|
||||
? ? ??? max_energy_range_uj
|
||||
? ? ??? name
|
||||
? ? ??? enabled
|
||||
? ? ??? power
|
||||
? ? ? ??? async
|
||||
? ? ? []
|
||||
? ? ??? subsystem -> ../../../../../../class/power_cap
|
||||
? ? ??? uevent
|
||||
? ??? max_energy_range_uj
|
||||
? ??? max_power_range_uw
|
||||
? ??? name
|
||||
? ??? enabled
|
||||
? ??? power
|
||||
? ? ??? async
|
||||
? ? []
|
||||
? ??? subsystem -> ../../../../../class/power_cap
|
||||
? ??? enabled
|
||||
? ??? uevent
|
||||
??? intel-rapl:1
|
||||
? ??? constraint_0_name
|
||||
? ??? constraint_0_power_limit_uw
|
||||
? ??? constraint_0_time_window_us
|
||||
? ??? constraint_1_name
|
||||
? ??? constraint_1_power_limit_uw
|
||||
? ??? constraint_1_time_window_us
|
||||
? ??? device -> ../../intel-rapl
|
||||
? ??? energy_uj
|
||||
? ??? intel-rapl:1:0
|
||||
? ? ??? constraint_0_name
|
||||
? ? ??? constraint_0_power_limit_uw
|
||||
? ? ??? constraint_0_time_window_us
|
||||
? ? ??? constraint_1_name
|
||||
? ? ??? constraint_1_power_limit_uw
|
||||
? ? ??? constraint_1_time_window_us
|
||||
? ? ??? device -> ../../intel-rapl:1
|
||||
? ? ??? energy_uj
|
||||
? ? ??? max_energy_range_uj
|
||||
? ? ??? name
|
||||
? ? ??? enabled
|
||||
? ? ??? power
|
||||
? ? ? ??? async
|
||||
? ? ? []
|
||||
? ? ??? subsystem -> ../../../../../../class/power_cap
|
||||
? ? ??? uevent
|
||||
? ??? intel-rapl:1:1
|
||||
? ? ??? constraint_0_name
|
||||
? ? ??? constraint_0_power_limit_uw
|
||||
? ? ??? constraint_0_time_window_us
|
||||
? ? ??? constraint_1_name
|
||||
? ? ??? constraint_1_power_limit_uw
|
||||
? ? ??? constraint_1_time_window_us
|
||||
? ? ??? device -> ../../intel-rapl:1
|
||||
? ? ??? energy_uj
|
||||
? ? ??? max_energy_range_uj
|
||||
? ? ??? name
|
||||
? ? ??? enabled
|
||||
? ? ??? power
|
||||
? ? ? ??? async
|
||||
? ? ? []
|
||||
? ? ??? subsystem -> ../../../../../../class/power_cap
|
||||
? ? ??? uevent
|
||||
? ??? max_energy_range_uj
|
||||
? ??? max_power_range_uw
|
||||
? ??? name
|
||||
? ??? enabled
|
||||
? ??? power
|
||||
? ? ??? async
|
||||
? ? []
|
||||
? ??? subsystem -> ../../../../../class/power_cap
|
||||
? ??? uevent
|
||||
??? power
|
||||
? ??? async
|
||||
? []
|
||||
??? subsystem -> ../../../../class/power_cap
|
||||
??? enabled
|
||||
??? uevent
|
||||
|
||||
The above example illustrates a case in which the Intel RAPL technology,
|
||||
available in Intel® IA-64 and IA-32 Processor Architectures, is used. There is one
|
||||
control type called intel-rapl which contains two power zones, intel-rapl:0 and
|
||||
intel-rapl:1, representing CPU packages. Each of these power zones contains
|
||||
two subzones, intel-rapl:j:0 and intel-rapl:j:1 (j = 0, 1), representing the
|
||||
"core" and the "uncore" parts of the given CPU package, respectively. All of
|
||||
the zones and subzones contain energy monitoring attributes (energy_uj,
|
||||
max_energy_range_uj) and constraint attributes (constraint_*) allowing controls
|
||||
to be applied (the constraints in the 'package' power zones apply to the whole
|
||||
CPU packages and the subzone constraints only apply to the respective parts of
|
||||
the given package individually). Since Intel RAPL doesn't provide instantaneous
|
||||
power value, there is no power_uw attribute.
|
||||
|
||||
In addition to that, each power zone contains a name attribute, allowing the
|
||||
part of the system represented by that zone to be identified.
|
||||
For example:
|
||||
|
||||
cat /sys/class/power_cap/intel-rapl/intel-rapl:0/name
|
||||
package-0
|
||||
|
||||
The Intel RAPL technology allows two constraints, short term and long term,
|
||||
with two different time windows to be applied to each power zone. Thus for
|
||||
each zone there are 2 attributes representing the constraint names, 2 power
|
||||
limits and 2 attributes representing the sizes of the time windows. Such that,
|
||||
constraint_j_* attributes correspond to the jth constraint (j = 0,1).
|
||||
|
||||
For example:
|
||||
constraint_0_name
|
||||
constraint_0_power_limit_uw
|
||||
constraint_0_time_window_us
|
||||
constraint_1_name
|
||||
constraint_1_power_limit_uw
|
||||
constraint_1_time_window_us
|
||||
|
||||
Power Zone Attributes
|
||||
=================================
|
||||
Monitoring attributes
|
||||
----------------------
|
||||
|
||||
energy_uj (rw): Current energy counter in micro joules. Write "0" to reset.
|
||||
If the counter can not be reset, then this attribute is read only.
|
||||
|
||||
max_energy_range_uj (ro): Range of the above energy counter in micro-joules.
|
||||
|
||||
power_uw (ro): Current power in micro watts.
|
||||
|
||||
max_power_range_uw (ro): Range of the above power value in micro-watts.
|
||||
|
||||
name (ro): Name of this power zone.
|
||||
|
||||
It is possible that some domains have both power ranges and energy counter ranges;
|
||||
however, only one is mandatory.
|
||||
|
||||
Constraints
|
||||
----------------
|
||||
constraint_X_power_limit_uw (rw): Power limit in micro watts, which should be
|
||||
applicable for the time window specified by "constraint_X_time_window_us".
|
||||
|
||||
constraint_X_time_window_us (rw): Time window in micro seconds.
|
||||
|
||||
constraint_X_name (ro): An optional name of the constraint
|
||||
|
||||
constraint_X_max_power_uw(ro): Maximum allowed power in micro watts.
|
||||
|
||||
constraint_X_min_power_uw(ro): Minimum allowed power in micro watts.
|
||||
|
||||
constraint_X_max_time_window_us(ro): Maximum allowed time window in micro seconds.
|
||||
|
||||
constraint_X_min_time_window_us(ro): Minimum allowed time window in micro seconds.
|
||||
|
||||
Except power_limit_uw and time_window_us other fields are optional.
|
||||
|
||||
Common zone and control type attributes
|
||||
----------------------------------------
|
||||
enabled (rw): Enable/Disable controls at zone level or for all zones using
|
||||
a control type.
|
||||
|
||||
Power Cap Client Driver Interface
|
||||
==================================
|
||||
The API summary:
|
||||
|
||||
Call powercap_register_control_type() to register control type object.
|
||||
Call powercap_register_zone() to register a power zone (under a given
|
||||
control type), either as a top-level power zone or as a subzone of another
|
||||
power zone registered earlier.
|
||||
The number of constraints in a power zone and the corresponding callbacks have
|
||||
to be defined prior to calling powercap_register_zone() to register that zone.
|
||||
|
||||
To Free a power zone call powercap_unregister_zone().
|
||||
To free a control type object call powercap_unregister_control_type().
|
||||
Detailed API can be generated using kernel-doc on include/linux/powercap.h.
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user