Merge branch 'for-linus' into for-3.1/core

Conflicts:
	block/blk-throttle.c
	block/cfq-iosched.c

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
This commit is contained in:
Jens Axboe
2011-07-01 16:17:13 +02:00
992 changed files with 12563 additions and 6681 deletions
+8
View File
@@ -518,6 +518,14 @@ N: Zach Brown
E: zab@zabbo.net
D: maestro pci sound
M: David Brownell
D: Kernel engineer, mentor, and friend. Maintained USB EHCI and
D: gadget layers, SPI subsystem, GPIO subsystem, and more than a few
D: device drivers. His encouragement also helped many engineers get
D: started working on the Linux kernel. David passed away in early
D: 2011, and will be greatly missed.
W: https://lkml.org/lkml/2011/4/5/36
N: Gary Brubaker
E: xavyer@ix.netcom.com
D: USB Serial Empeg Empeg-car Mark I/II Driver
@@ -0,0 +1,56 @@
What: /sys/class/backlight/<backlight>/<ambient light zone>_max
What: /sys/class/backlight/<backlight>/l1_daylight_max
What: /sys/class/backlight/<backlight>/l2_bright_max
What: /sys/class/backlight/<backlight>/l3_office_max
What: /sys/class/backlight/<backlight>/l4_indoor_max
What: /sys/class/backlight/<backlight>/l5_dark_max
Date: Mai 2011
KernelVersion: 2.6.40
Contact: device-drivers-devel@blackfin.uclinux.org
Description:
Control the maximum brightness for <ambient light zone>
on this <backlight>. Values are between 0 and 127. This file
will also show the brightness level stored for this
<ambient light zone>.
What: /sys/class/backlight/<backlight>/<ambient light zone>_dim
What: /sys/class/backlight/<backlight>/l2_bright_dim
What: /sys/class/backlight/<backlight>/l3_office_dim
What: /sys/class/backlight/<backlight>/l4_indoor_dim
What: /sys/class/backlight/<backlight>/l5_dark_dim
Date: Mai 2011
KernelVersion: 2.6.40
Contact: device-drivers-devel@blackfin.uclinux.org
Description:
Control the dim brightness for <ambient light zone>
on this <backlight>. Values are between 0 and 127, typically
set to 0. Full off when the backlight is disabled.
This file will also show the dim brightness level stored for
this <ambient light zone>.
What: /sys/class/backlight/<backlight>/ambient_light_level
Date: Mai 2011
KernelVersion: 2.6.40
Contact: device-drivers-devel@blackfin.uclinux.org
Description:
Get conversion value of the light sensor.
This value is updated every 80 ms (when the light sensor
is enabled). Returns integer between 0 (dark) and
8000 (max ambient brightness)
What: /sys/class/backlight/<backlight>/ambient_light_zone
Date: Mai 2011
KernelVersion: 2.6.40
Contact: device-drivers-devel@blackfin.uclinux.org
Description:
Get/Set current ambient light zone. Reading returns
integer between 1..5 (1 = daylight, 2 = bright, ..., 5 = dark).
Writing a value between 1..5 forces the backlight controller
to enter the corresponding ambient light zone.
Writing 0 returns to normal/automatic ambient light level
operation. The ambient light sensing feature on these devices
is an extension to the API documented in
Documentation/ABI/stable/sysfs-class-backlight.
It can be enabled by writing the value stored in
/sys/class/backlight/<backlight>/max_brightness to
/sys/class/backlight/<backlight>/brightness.
+2 -2
View File
@@ -21,7 +21,7 @@ information will not be available.
To extract cgroup statistics a utility very similar to getdelays.c
has been developed, the sample output of the utility is shown below
~/balbir/cgroupstats # ./getdelays -C "/cgroup/a"
~/balbir/cgroupstats # ./getdelays -C "/sys/fs/cgroup/a"
sleeping 1, blocked 0, running 1, stopped 0, uninterruptible 0
~/balbir/cgroupstats # ./getdelays -C "/cgroup"
~/balbir/cgroupstats # ./getdelays -C "/sys/fs/cgroup"
sleeping 155, blocked 0, running 1, stopped 0, uninterruptible 2
+17 -14
View File
@@ -28,16 +28,19 @@ cgroups. Here is what you can do.
- Enable group scheduling in CFQ
CONFIG_CFQ_GROUP_IOSCHED=y
- Compile and boot into kernel and mount IO controller (blkio).
- Compile and boot into kernel and mount IO controller (blkio); see
cgroups.txt, Why are cgroups needed?.
mount -t cgroup -o blkio none /cgroup
mount -t tmpfs cgroup_root /sys/fs/cgroup
mkdir /sys/fs/cgroup/blkio
mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
- Create two cgroups
mkdir -p /cgroup/test1/ /cgroup/test2
mkdir -p /sys/fs/cgroup/blkio/test1/ /sys/fs/cgroup/blkio/test2
- Set weights of group test1 and test2
echo 1000 > /cgroup/test1/blkio.weight
echo 500 > /cgroup/test2/blkio.weight
echo 1000 > /sys/fs/cgroup/blkio/test1/blkio.weight
echo 500 > /sys/fs/cgroup/blkio/test2/blkio.weight
- Create two same size files (say 512MB each) on same disk (file1, file2) and
launch two dd threads in different cgroup to read those files.
@@ -46,12 +49,12 @@ cgroups. Here is what you can do.
echo 3 > /proc/sys/vm/drop_caches
dd if=/mnt/sdb/zerofile1 of=/dev/null &
echo $! > /cgroup/test1/tasks
cat /cgroup/test1/tasks
echo $! > /sys/fs/cgroup/blkio/test1/tasks
cat /sys/fs/cgroup/blkio/test1/tasks
dd if=/mnt/sdb/zerofile2 of=/dev/null &
echo $! > /cgroup/test2/tasks
cat /cgroup/test2/tasks
echo $! > /sys/fs/cgroup/blkio/test2/tasks
cat /sys/fs/cgroup/blkio/test2/tasks
- At macro level, first dd should finish first. To get more precise data, keep
on looking at (with the help of script), at blkio.disk_time and
@@ -68,13 +71,13 @@ Throttling/Upper Limit policy
- Enable throttling in block layer
CONFIG_BLK_DEV_THROTTLING=y
- Mount blkio controller
mount -t cgroup -o blkio none /cgroup/blkio
- Mount blkio controller (see cgroups.txt, Why are cgroups needed?)
mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
- Specify a bandwidth rate on particular device for root group. The format
for policy is "<major>:<minor> <byes_per_second>".
echo "8:16 1048576" > /cgroup/blkio/blkio.read_bps_device
echo "8:16 1048576" > /sys/fs/cgroup/blkio/blkio.read_bps_device
Above will put a limit of 1MB/second on reads happening for root group
on device having major/minor number 8:16.
@@ -108,7 +111,7 @@ Hierarchical Cgroups
CFQ and throttling will practically treat all groups at same level.
pivot
/ | \ \
/ / \ \
root test1 test2 test3
Down the line we can implement hierarchical accounting/control support
@@ -149,7 +152,7 @@ Proportional weight policy files
Following is the format.
#echo dev_maj:dev_minor weight > /path/to/cgroup/blkio.weight_device
# echo dev_maj:dev_minor weight > blkio.weight_device
Configure weight=300 on /dev/sdb (8:16) in this cgroup
# echo 8:16 300 > blkio.weight_device
# cat blkio.weight_device
+36 -24
View File
@@ -138,11 +138,11 @@ With the ability to classify tasks differently for different resources
the admin can easily set up a script which receives exec notifications
and depending on who is launching the browser he can
# echo browser_pid > /mnt/<restype>/<userclass>/tasks
# echo browser_pid > /sys/fs/cgroup/<restype>/<userclass>/tasks
With only a single hierarchy, he now would potentially have to create
a separate cgroup for every browser launched and associate it with
approp network and other resource class. This may lead to
appropriate network and other resource class. This may lead to
proliferation of such cgroups.
Also lets say that the administrator would like to give enhanced network
@@ -153,9 +153,9 @@ apps enhanced CPU power,
With ability to write pids directly to resource classes, it's just a
matter of :
# echo pid > /mnt/network/<new_class>/tasks
# echo pid > /sys/fs/cgroup/network/<new_class>/tasks
(after some time)
# echo pid > /mnt/network/<orig_class>/tasks
# echo pid > /sys/fs/cgroup/network/<orig_class>/tasks
Without this ability, he would have to split the cgroup into
multiple separate ones and then associate the new cgroups with the
@@ -310,21 +310,24 @@ subsystem, this is the case for the cpuset.
To start a new job that is to be contained within a cgroup, using
the "cpuset" cgroup subsystem, the steps are something like:
1) mkdir /dev/cgroup
2) mount -t cgroup -ocpuset cpuset /dev/cgroup
3) Create the new cgroup by doing mkdir's and write's (or echo's) in
the /dev/cgroup virtual file system.
4) Start a task that will be the "founding father" of the new job.
5) Attach that task to the new cgroup by writing its pid to the
/dev/cgroup tasks file for that cgroup.
6) fork, exec or clone the job tasks from this founding father task.
1) mount -t tmpfs cgroup_root /sys/fs/cgroup
2) mkdir /sys/fs/cgroup/cpuset
3) mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset
4) Create the new cgroup by doing mkdir's and write's (or echo's) in
the /sys/fs/cgroup virtual file system.
5) Start a task that will be the "founding father" of the new job.
6) Attach that task to the new cgroup by writing its pid to the
/sys/fs/cgroup/cpuset/tasks file for that cgroup.
7) fork, exec or clone the job tasks from this founding father task.
For example, the following sequence of commands will setup a cgroup
named "Charlie", containing just CPUs 2 and 3, and Memory Node 1,
and then start a subshell 'sh' in that cgroup:
mount -t cgroup cpuset -ocpuset /dev/cgroup
cd /dev/cgroup
mount -t tmpfs cgroup_root /sys/fs/cgroup
mkdir /sys/fs/cgroup/cpuset
mount -t cgroup cpuset -ocpuset /sys/fs/cgroup/cpuset
cd /sys/fs/cgroup/cpuset
mkdir Charlie
cd Charlie
/bin/echo 2-3 > cpuset.cpus
@@ -345,7 +348,7 @@ Creating, modifying, using the cgroups can be done through the cgroup
virtual filesystem.
To mount a cgroup hierarchy with all available subsystems, type:
# mount -t cgroup xxx /dev/cgroup
# mount -t cgroup xxx /sys/fs/cgroup
The "xxx" is not interpreted by the cgroup code, but will appear in
/proc/mounts so may be any useful identifying string that you like.
@@ -354,23 +357,32 @@ Note: Some subsystems do not work without some user input first. For instance,
if cpusets are enabled the user will have to populate the cpus and mems files
for each new cgroup created before that group can be used.
As explained in section `1.2 Why are cgroups needed?' you should create
different hierarchies of cgroups for each single resource or group of
resources you want to control. Therefore, you should mount a tmpfs on
/sys/fs/cgroup and create directories for each cgroup resource or resource
group.
# mount -t tmpfs cgroup_root /sys/fs/cgroup
# mkdir /sys/fs/cgroup/rg1
To mount a cgroup hierarchy with just the cpuset and memory
subsystems, type:
# mount -t cgroup -o cpuset,memory hier1 /dev/cgroup
# mount -t cgroup -o cpuset,memory hier1 /sys/fs/cgroup/rg1
To change the set of subsystems bound to a mounted hierarchy, just
remount with different options:
# mount -o remount,cpuset,blkio hier1 /dev/cgroup
# mount -o remount,cpuset,blkio hier1 /sys/fs/cgroup/rg1
Now memory is removed from the hierarchy and blkio is added.
Note this will add blkio to the hierarchy but won't remove memory or
cpuset, because the new options are appended to the old ones:
# mount -o remount,blkio /dev/cgroup
# mount -o remount,blkio /sys/fs/cgroup/rg1
To Specify a hierarchy's release_agent:
# mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \
xxx /dev/cgroup
xxx /sys/fs/cgroup/rg1
Note that specifying 'release_agent' more than once will return failure.
@@ -379,17 +391,17 @@ when the hierarchy consists of a single (root) cgroup. Supporting
the ability to arbitrarily bind/unbind subsystems from an existing
cgroup hierarchy is intended to be implemented in the future.
Then under /dev/cgroup you can find a tree that corresponds to the
tree of the cgroups in the system. For instance, /dev/cgroup
Then under /sys/fs/cgroup/rg1 you can find a tree that corresponds to the
tree of the cgroups in the system. For instance, /sys/fs/cgroup/rg1
is the cgroup that holds the whole system.
If you want to change the value of release_agent:
# echo "/sbin/new_release_agent" > /dev/cgroup/release_agent
# echo "/sbin/new_release_agent" > /sys/fs/cgroup/rg1/release_agent
It can also be changed via remount.
If you want to create a new cgroup under /dev/cgroup:
# cd /dev/cgroup
If you want to create a new cgroup under /sys/fs/cgroup/rg1:
# cd /sys/fs/cgroup/rg1
# mkdir my_cgroup
Now you want to do something with this cgroup.
+9 -10
View File
@@ -10,26 +10,25 @@ directly present in its group.
Accounting groups can be created by first mounting the cgroup filesystem.
# mkdir /cgroups
# mount -t cgroup -ocpuacct none /cgroups
# mount -t cgroup -ocpuacct none /sys/fs/cgroup
With the above step, the initial or the parent accounting group
becomes visible at /cgroups. At bootup, this group includes all the
tasks in the system. /cgroups/tasks lists the tasks in this cgroup.
/cgroups/cpuacct.usage gives the CPU time (in nanoseconds) obtained by
this group which is essentially the CPU time obtained by all the tasks
With the above step, the initial or the parent accounting group becomes
visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
/sys/fs/cgroup/cpuacct.usage gives the CPU time (in nanoseconds) obtained
by this group which is essentially the CPU time obtained by all the tasks
in the system.
New accounting groups can be created under the parent group /cgroups.
New accounting groups can be created under the parent group /sys/fs/cgroup.
# cd /cgroups
# cd /sys/fs/cgroup
# mkdir g1
# echo $$ > g1
The above steps create a new group g1 and move the current shell
process (bash) into it. CPU time consumed by this bash and its children
can be obtained from g1/cpuacct.usage and the same is accumulated in
/cgroups/cpuacct.usage also.
/sys/fs/cgroup/cpuacct.usage also.
cpuacct.stat file lists a few statistics which further divide the
CPU time obtained by the cgroup into user and system times. Currently
+14 -14
View File
@@ -661,21 +661,21 @@ than stress the kernel.
To start a new job that is to be contained within a cpuset, the steps are:
1) mkdir /dev/cpuset
2) mount -t cgroup -ocpuset cpuset /dev/cpuset
1) mkdir /sys/fs/cgroup/cpuset
2) mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset
3) Create the new cpuset by doing mkdir's and write's (or echo's) in
the /dev/cpuset virtual file system.
the /sys/fs/cgroup/cpuset virtual file system.
4) Start a task that will be the "founding father" of the new job.
5) Attach that task to the new cpuset by writing its pid to the
/dev/cpuset tasks file for that cpuset.
/sys/fs/cgroup/cpuset tasks file for that cpuset.
6) fork, exec or clone the job tasks from this founding father task.
For example, the following sequence of commands will setup a cpuset
named "Charlie", containing just CPUs 2 and 3, and Memory Node 1,
and then start a subshell 'sh' in that cpuset:
mount -t cgroup -ocpuset cpuset /dev/cpuset
cd /dev/cpuset
mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset
cd /sys/fs/cgroup/cpuset
mkdir Charlie
cd Charlie
/bin/echo 2-3 > cpuset.cpus
@@ -710,14 +710,14 @@ Creating, modifying, using the cpusets can be done through the cpuset
virtual filesystem.
To mount it, type:
# mount -t cgroup -o cpuset cpuset /dev/cpuset
# mount -t cgroup -o cpuset cpuset /sys/fs/cgroup/cpuset
Then under /dev/cpuset you can find a tree that corresponds to the
tree of the cpusets in the system. For instance, /dev/cpuset
Then under /sys/fs/cgroup/cpuset you can find a tree that corresponds to the
tree of the cpusets in the system. For instance, /sys/fs/cgroup/cpuset
is the cpuset that holds the whole system.
If you want to create a new cpuset under /dev/cpuset:
# cd /dev/cpuset
If you want to create a new cpuset under /sys/fs/cgroup/cpuset:
# cd /sys/fs/cgroup/cpuset
# mkdir my_cpuset
Now you want to do something with this cpuset.
@@ -765,12 +765,12 @@ wrapper around the cgroup filesystem.
The command
mount -t cpuset X /dev/cpuset
mount -t cpuset X /sys/fs/cgroup/cpuset
is equivalent to
mount -t cgroup -ocpuset,noprefix X /dev/cpuset
echo "/sbin/cpuset_release_agent" > /dev/cpuset/release_agent
mount -t cgroup -ocpuset,noprefix X /sys/fs/cgroup/cpuset
echo "/sbin/cpuset_release_agent" > /sys/fs/cgroup/cpuset/release_agent
2.2 Adding/removing cpus
------------------------
+3 -3
View File
@@ -22,16 +22,16 @@ removed from the child(ren).
An entry is added using devices.allow, and removed using
devices.deny. For instance
echo 'c 1:3 mr' > /cgroups/1/devices.allow
echo 'c 1:3 mr' > /sys/fs/cgroup/1/devices.allow
allows cgroup 1 to read and mknod the device usually known as
/dev/null. Doing
echo a > /cgroups/1/devices.deny
echo a > /sys/fs/cgroup/1/devices.deny
will remove the default 'a *:* rwm' entry. Doing
echo a > /cgroups/1/devices.allow
echo a > /sys/fs/cgroup/1/devices.allow
will add the 'a *:* rwm' entry to the whitelist.
+10 -10
View File
@@ -59,28 +59,28 @@ is non-freezable.
* Examples of usage :
# mkdir /containers
# mount -t cgroup -ofreezer freezer /containers
# mkdir /containers/0
# echo $some_pid > /containers/0/tasks
# mkdir /sys/fs/cgroup/freezer
# mount -t cgroup -ofreezer freezer /sys/fs/cgroup/freezer
# mkdir /sys/fs/cgroup/freezer/0
# echo $some_pid > /sys/fs/cgroup/freezer/0/tasks
to get status of the freezer subsystem :
# cat /containers/0/freezer.state
# cat /sys/fs/cgroup/freezer/0/freezer.state
THAWED
to freeze all tasks in the container :
# echo FROZEN > /containers/0/freezer.state
# cat /containers/0/freezer.state
# echo FROZEN > /sys/fs/cgroup/freezer/0/freezer.state
# cat /sys/fs/cgroup/freezer/0/freezer.state
FREEZING
# cat /containers/0/freezer.state
# cat /sys/fs/cgroup/freezer/0/freezer.state
FROZEN
to unfreeze all tasks in the container :
# echo THAWED > /containers/0/freezer.state
# cat /containers/0/freezer.state
# echo THAWED > /sys/fs/cgroup/freezer/0/freezer.state
# cat /sys/fs/cgroup/freezer/0/freezer.state
THAWED
This is the basic mechanism which should do the right thing for user space task
+39 -19
View File
@@ -1,8 +1,8 @@
Memory Resource Controller
NOTE: The Memory Resource Controller has been generically been referred
to as the memory controller in this document. Do not confuse memory
controller used here with the memory controller that is used in hardware.
NOTE: The Memory Resource Controller has generically been referred to as the
memory controller in this document. Do not confuse memory controller
used here with the memory controller that is used in hardware.
(For editors)
In this document:
@@ -70,6 +70,7 @@ Brief summary of control files.
(See sysctl's vm.swappiness)
memory.move_charge_at_immigrate # set/show controls of moving charges
memory.oom_control # set/show oom controls.
memory.numa_stat # show the number of memory usage per numa node
1. History
@@ -181,7 +182,7 @@ behind this approach is that a cgroup that aggressively uses a shared
page will eventually get charged for it (once it is uncharged from
the cgroup that brought it in -- this will happen on memory pressure).
Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used..
Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used.
When you do swapoff and make swapped-out pages of shmem(tmpfs) to
be backed into memory in force, charges for pages are accounted against the
caller of swapoff rather than the users of shmem.
@@ -213,7 +214,7 @@ affecting global LRU, memory+swap limit is better than just limiting swap from
OS point of view.
* What happens when a cgroup hits memory.memsw.limit_in_bytes
When a cgroup his memory.memsw.limit_in_bytes, it's useless to do swap-out
When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out
in this cgroup. Then, swap-out will not be done by cgroup routine and file
caches are dropped. But as mentioned above, global LRU can do swapout memory
from it for sanity of the system's memory management state. You can't forbid
@@ -263,16 +264,17 @@ b. Enable CONFIG_RESOURCE_COUNTERS
c. Enable CONFIG_CGROUP_MEM_RES_CTLR
d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension)
1. Prepare the cgroups
# mkdir -p /cgroups
# mount -t cgroup none /cgroups -o memory
1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
# mount -t tmpfs none /sys/fs/cgroup
# mkdir /sys/fs/cgroup/memory
# mount -t cgroup none /sys/fs/cgroup/memory -o memory
2. Make the new group and move bash into it
# mkdir /cgroups/0
# echo $$ > /cgroups/0/tasks
# mkdir /sys/fs/cgroup/memory/0
# echo $$ > /sys/fs/cgroup/memory/0/tasks
Since now we're in the 0 cgroup, we can alter the memory limit:
# echo 4M > /cgroups/0/memory.limit_in_bytes
# echo 4M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes
NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.)
@@ -280,11 +282,11 @@ mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.)
NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited).
NOTE: We cannot set limits on the root cgroup any more.
# cat /cgroups/0/memory.limit_in_bytes
# cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes
4194304
We can check the usage:
# cat /cgroups/0/memory.usage_in_bytes
# cat /sys/fs/cgroup/memory/0/memory.usage_in_bytes
1216512
A successful write to this file does not guarantee a successful set of
@@ -464,6 +466,24 @@ value for efficient access. (Of course, when necessary, it's synchronized.)
If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
value in memory.stat(see 5.2).
5.6 numa_stat
This is similar to numa_maps but operates on a per-memcg basis. This is
useful for providing visibility into the numa locality information within
an memcg since the pages are allowed to be allocated from any physical
node. One of the usecases is evaluating application performance by
combining this information with the application's cpu allocation.
We export "total", "file", "anon" and "unevictable" pages per-node for
each memcg. The ouput format of memory.numa_stat is:
total=<total pages> N0=<node 0 pages> N1=<node 1 pages> ...
file=<total file pages> N0=<node 0 pages> N1=<node 1 pages> ...
anon=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
And we have total = file + anon + unevictable.
6. Hierarchy support
The memory controller supports a deep hierarchy and hierarchical accounting.
@@ -471,13 +491,13 @@ The hierarchy is created by creating the appropriate cgroups in the
cgroup filesystem. Consider for example, the following cgroup filesystem
hierarchy
root
root
/ | \
/ | \
a b c
| \
| \
d e
/ | \
a b c
| \
| \
d e
In the diagram above, with hierarchical accounting enabled, all memory
usage of e, is accounted to its ancestors up until the root (i.e, c and root),
@@ -481,23 +481,6 @@ Who: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
----------------------------
What: namespace cgroup (ns_cgroup)
When: 2.6.38
Why: The ns_cgroup leads to some problems:
* cgroup creation is out-of-control
* cgroup name can conflict when pids are looping
* it is not possible to have a single process handling
a lot of namespaces without falling in a exponential creation time
* we may want to create a namespace without creating a cgroup
The ns_cgroup is replaced by a compatibility flag 'clone_children',
where a newly created cgroup will copy the parent cgroup values.
The userspace has to manually create a cgroup and add a task to
the 'tasks' file.
Who: Daniel Lezcano <daniel.lezcano@free.fr>
----------------------------
What: iwlwifi disable_hw_scan module parameters
When: 2.6.40
Why: Hareware scan is the prefer method for iwlwifi devices for
+1
View File
@@ -843,6 +843,7 @@ Provides counts of softirq handlers serviced since boot time, for each cpu.
TASKLET: 0 0 0 290
SCHED: 27035 26983 26971 26746
HRTIMER: 0 0 0 0
RCU: 1678 1769 2178 2250
1.3 IDE devices in /proc/ide
+2
View File
@@ -2598,6 +2598,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
unlock ejectable media);
m = MAX_SECTORS_64 (don't transfer more
than 64 sectors = 32 KB at a time);
n = INITIAL_READ10 (force a retry of the
initial READ(10) command);
o = CAPACITY_OK (accept the capacity
reported by the device);
r = IGNORE_RESIDUE (the device reports
+3 -1
View File
@@ -11,7 +11,9 @@ with the difference that the orphan objects are not freed but only
reported via /sys/kernel/debug/kmemleak. A similar method is used by the
Valgrind tool (memcheck --leak-check) to detect the memory leaks in
user-space applications.
Kmemleak is supported on x86, arm, powerpc, sparc, sh, microblaze and tile.
Please check DEBUG_KMEMLEAK dependencies in lib/Kconfig.debug for supported
architectures.
Usage
-----
+1 -1
View File
@@ -555,7 +555,7 @@ also have
sync_min
sync_max
The two values, given as numbers of sectors, indicate a range
withing the array where 'check'/'repair' will operate. Must be
within the array where 'check'/'repair' will operate. Must be
a multiple of chunk_size. When it reaches "sync_max" it will
pause, rather than complete.
You can use 'select' or 'poll' on "sync_completed" to wait for
+13 -52
View File
@@ -520,59 +520,20 @@ Support for power domains is provided through the pwr_domain field of struct
device. This field is a pointer to an object of type struct dev_power_domain,
defined in include/linux/pm.h, providing a set of power management callbacks
analogous to the subsystem-level and device driver callbacks that are executed
for the given device during all power transitions, in addition to the respective
subsystem-level callbacks. Specifically, the power domain "suspend" callbacks
(i.e. ->runtime_suspend(), ->suspend(), ->freeze(), ->poweroff(), etc.) are
executed after the analogous subsystem-level callbacks, while the power domain
"resume" callbacks (i.e. ->runtime_resume(), ->resume(), ->thaw(), ->restore,
etc.) are executed before the analogous subsystem-level callbacks. Error codes
returned by the "suspend" and "resume" power domain callbacks are ignored.
for the given device during all power transitions, instead of the respective
subsystem-level callbacks. Specifically, if a device's pm_domain pointer is
not NULL, the ->suspend() callback from the object pointed to by it will be
executed instead of its subsystem's (e.g. bus type's) ->suspend() callback and
anlogously for all of the remaining callbacks. In other words, power management
domain callbacks, if defined for the given device, always take precedence over
the callbacks provided by the device's subsystem (e.g. bus type).
Power domain ->runtime_idle() callback is executed before the subsystem-level
->runtime_idle() callback and the result returned by it is not ignored. Namely,
if it returns error code, the subsystem-level ->runtime_idle() callback will not
be called and the helper function rpm_idle() executing it will return error
code. This mechanism is intended to help platforms where saving device state
is a time consuming operation and should only be carried out if all devices
in the power domain are idle, before turning off the shared power resource(s).
Namely, the power domain ->runtime_idle() callback may return error code until
the pm_runtime_idle() helper (or its asychronous version) has been called for
all devices in the power domain (it is recommended that the returned error code
be -EBUSY in those cases), preventing the subsystem-level ->runtime_idle()
callback from being run prematurely.
The support for device power domains is only relevant to platforms needing to
use the same subsystem-level (e.g. platform bus type) and device driver power
management callbacks in many different power domain configurations and wanting
to avoid incorporating the support for power domains into the subsystem-level
callbacks. The other platforms need not implement it or take it into account
in any way.
System Devices
--------------
System devices (sysdevs) follow a slightly different API, which can be found in
include/linux/sysdev.h
drivers/base/sys.c
System devices will be suspended with interrupts disabled, and after all other
devices have been suspended. On resume, they will be resumed before any other
devices, and also with interrupts disabled. These things occur in special
"sysdev_driver" phases, which affect only system devices.
Thus, after the suspend_noirq (or freeze_noirq or poweroff_noirq) phase, when
the non-boot CPUs are all offline and IRQs are disabled on the remaining online
CPU, then a sysdev_driver.suspend phase is carried out, and the system enters a
sleep state (or a system image is created). During resume (or after the image
has been created or loaded) a sysdev_driver.resume phase is carried out, IRQs
are enabled on the only online CPU, the non-boot CPUs are enabled, and the
resume_noirq (or thaw_noirq or restore_noirq) phase begins.
Code to actually enter and exit the system-wide low power state sometimes
involves hardware details that are only known to the boot firmware, and
may leave a CPU running software (from SRAM or flash memory) that monitors
the system and manages its wakeup sequence.
The support for device power management domains is only relevant to platforms
needing to use the same device driver power management callbacks in many
different power domain configurations and wanting to avoid incorporating the
support for power domains into subsystem-level callbacks, for example by
modifying the platform bus type. Other platforms need not implement it or take
it into account in any way.
Device Low Power (suspend) States
-5
View File
@@ -566,11 +566,6 @@ to do this is:
pm_runtime_set_active(dev);
pm_runtime_enable(dev);
The PM core always increments the run-time usage counter before calling the
->prepare() callback and decrements it after calling the ->complete() callback.
Hence disabling run-time PM temporarily like this will not cause any run-time
suspend callbacks to be lost.
7. Generic subsystem callbacks
Subsystems may wish to conserve code space by using the set of generic power
+117 -2
View File
@@ -9,7 +9,121 @@ If variable is of Type, use printk format specifier:
size_t %zu or %zx
ssize_t %zd or %zx
Raw pointer value SHOULD be printed with %p.
Raw pointer value SHOULD be printed with %p. The kernel supports
the following extended format specifiers for pointer types:
Symbols/Function Pointers:
%pF versatile_init+0x0/0x110
%pf versatile_init
%pS versatile_init+0x0/0x110
%ps versatile_init
%pB prev_fn_of_versatile_init+0x88/0x88
For printing symbols and function pointers. The 'S' and 's' specifiers
result in the symbol name with ('S') or without ('s') offsets. Where
this is used on a kernel without KALLSYMS - the symbol address is
printed instead.
The 'B' specifier results in the symbol name with offsets and should be
used when printing stack backtraces. The specifier takes into
consideration the effect of compiler optimisations which may occur
when tail-call's are used and marked with the noreturn GCC attribute.
On ia64, ppc64 and parisc64 architectures function pointers are
actually function descriptors which must first be resolved. The 'F' and
'f' specifiers perform this resolution and then provide the same
functionality as the 'S' and 's' specifiers.
Kernel Pointers:
%pK 0x01234567 or 0x0123456789abcdef
For printing kernel pointers which should be hidden from unprivileged
users. The behaviour of %pK depends on the kptr_restrict sysctl - see
Documentation/sysctl/kernel.txt for more details.
Struct Resources:
%pr [mem 0x60000000-0x6fffffff flags 0x2200] or
[mem 0x0000000060000000-0x000000006fffffff flags 0x2200]
%pR [mem 0x60000000-0x6fffffff pref] or
[mem 0x0000000060000000-0x000000006fffffff pref]
For printing struct resources. The 'R' and 'r' specifiers result in a
printed resource with ('R') or without ('r') a decoded flags member.
MAC/FDDI addresses:
%pM 00:01:02:03:04:05
%pMF 00-01-02-03-04-05
%pm 000102030405
For printing 6-byte MAC/FDDI addresses in hex notation. The 'M' and 'm'
specifiers result in a printed address with ('M') or without ('m') byte
separators. The default byte separator is the colon (':').
Where FDDI addresses are concerned the 'F' specifier can be used after
the 'M' specifier to use dash ('-') separators instead of the default
separator.
IPv4 addresses:
%pI4 1.2.3.4
%pi4 001.002.003.004
%p[Ii][hnbl]
For printing IPv4 dot-separated decimal addresses. The 'I4' and 'i4'
specifiers result in a printed address with ('i4') or without ('I4')
leading zeros.
The additional 'h', 'n', 'b', and 'l' specifiers are used to specify
host, network, big or little endian order addresses respectively. Where
no specifier is provided the default network/big endian order is used.
IPv6 addresses:
%pI6 0001:0002:0003:0004:0005:0006:0007:0008
%pi6 00010002000300040005000600070008
%pI6c 1:2:3:4:5:6:7:8
For printing IPv6 network-order 16-bit hex addresses. The 'I6' and 'i6'
specifiers result in a printed address with ('I6') or without ('i6')
colon-separators. Leading zeros are always used.
The additional 'c' specifier can be used with the 'I' specifier to
print a compressed IPv6 address as described by
http://tools.ietf.org/html/rfc5952
UUID/GUID addresses:
%pUb 00010203-0405-0607-0809-0a0b0c0d0e0f
%pUB 00010203-0405-0607-0809-0A0B0C0D0E0F
%pUl 03020100-0504-0706-0809-0a0b0c0e0e0f
%pUL 03020100-0504-0706-0809-0A0B0C0E0E0F
For printing 16-byte UUID/GUIDs addresses. The additional 'l', 'L',
'b' and 'B' specifiers are used to specify a little endian order in
lower ('l') or upper case ('L') hex characters - and big endian order
in lower ('b') or upper case ('B') hex characters.
Where no additional specifiers are used the default little endian
order with lower case hex characters will be printed.
struct va_format:
%pV
For printing struct va_format structures. These contain a format string
and va_list as follows:
struct va_format {
const char *fmt;
va_list *va;
};
Do not use this feature without some mechanism to verify the
correctness of the format string and va_list arguments.
u64 SHOULD be printed with %llu/%llx, (unsigned long long):
@@ -32,4 +146,5 @@ Reminder: sizeof() result is of type size_t.
Thank you for your cooperation and attention.
By Randy Dunlap <rdunlap@xenotime.net>
By Randy Dunlap <rdunlap@xenotime.net> and
Andrew Murray <amurray@mpc-data.co.uk>
+4 -3
View File
@@ -223,9 +223,10 @@ When CONFIG_FAIR_GROUP_SCHED is defined, a "cpu.shares" file is created for each
group created using the pseudo filesystem. See example steps below to create
task groups and modify their CPU share using the "cgroups" pseudo filesystem.
# mkdir /dev/cpuctl
# mount -t cgroup -ocpu none /dev/cpuctl
# cd /dev/cpuctl
# mount -t tmpfs cgroup_root /sys/fs/cgroup
# mkdir /sys/fs/cgroup/cpu
# mount -t cgroup -ocpu none /sys/fs/cgroup/cpu
# cd /sys/fs/cgroup/cpu
# mkdir multimedia # create "multimedia" group of tasks
# mkdir browser # create "browser" group of tasks
+3 -4
View File
@@ -129,9 +129,8 @@ priority!
Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real
CPU bandwidth to task groups.
This uses the /cgroup virtual file system and
"/cgroup/<cgroup>/cpu.rt_runtime_us" to control the CPU time reserved for each
control group.
This uses the cgroup virtual file system and "<cgroup>/cpu.rt_runtime_us"
to control the CPU time reserved for each control group.
For more information on working with control groups, you should read
Documentation/cgroups/cgroups.txt as well.
@@ -150,7 +149,7 @@ For now, this can be simplified to just the following (but see Future plans):
===============
There is work in progress to make the scheduling period for each group
("/cgroup/<cgroup>/cpu.rt_period_us") configurable as well.
("<cgroup>/cpu.rt_period_us") configurable as well.
The constraint on the period is that a subgroup must have a smaller or
equal period to its parent. But realistically its not very useful _yet_

Some files were not shown because too many files have changed in this diff Show More