You've already forked linux-apfs
mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
Merge commit 'v2.6.34-rc1' into perf/urgent
Conflicts: tools/perf/util/probe-event.c Merge reason: Pick up -rc1 and resolve the conflict as well. Signed-off-by: Ingo Molnar <mingo@elte.hu>
This commit is contained in:
@@ -36,6 +36,7 @@ modules.builtin
|
||||
#
|
||||
tags
|
||||
TAGS
|
||||
linux
|
||||
vmlinux
|
||||
vmlinuz
|
||||
System.map
|
||||
|
||||
@@ -0,0 +1,7 @@
|
||||
What: /sys/devices/system/node/nodeX
|
||||
Date: October 2002
|
||||
Contact: Linux Memory Management list <linux-mm@kvack.org>
|
||||
Description:
|
||||
When CONFIG_NUMA is enabled, this is a directory containing
|
||||
information on node X such as what CPUs are local to the
|
||||
node.
|
||||
@@ -128,3 +128,17 @@ Description:
|
||||
preferred request size for workloads where sustained
|
||||
throughput is desired. If no optimal I/O size is
|
||||
reported this file contains 0.
|
||||
|
||||
What: /sys/block/<disk>/queue/nomerges
|
||||
Date: January 2010
|
||||
Contact:
|
||||
Description:
|
||||
Standard I/O elevator operations include attempts to
|
||||
merge contiguous I/Os. For known random I/O loads these
|
||||
attempts will always fail and result in extra cycles
|
||||
being spent in the kernel. This allows one to turn off
|
||||
this behavior on one of two ways: When set to 1, complex
|
||||
merge checks are disabled, but the simple one-shot merges
|
||||
with the previous I/O request are enabled. When set to 2,
|
||||
all merge tries are disabled. The default value is 0 -
|
||||
which enables all types of merge tries.
|
||||
|
||||
@@ -159,3 +159,14 @@ Description:
|
||||
device. This is useful to ensure auto probing won't
|
||||
match the driver to the device. For example:
|
||||
# echo "046d c315" > /sys/bus/usb/drivers/foo/remove_id
|
||||
|
||||
What: /sys/bus/usb/device/.../avoid_reset
|
||||
Date: December 2009
|
||||
Contact: Oliver Neukum <oliver@neukum.org>
|
||||
Description:
|
||||
Writing 1 to this file tells the kernel that this
|
||||
device will morph into another mode when it is reset.
|
||||
Drivers will not use reset for error handling for
|
||||
such devices.
|
||||
Users:
|
||||
usb_modeswitch
|
||||
|
||||
@@ -0,0 +1,79 @@
|
||||
What: /sys/devices/.../power/
|
||||
Date: January 2009
|
||||
Contact: Rafael J. Wysocki <rjw@sisk.pl>
|
||||
Description:
|
||||
The /sys/devices/.../power directory contains attributes
|
||||
allowing the user space to check and modify some power
|
||||
management related properties of given device.
|
||||
|
||||
What: /sys/devices/.../power/wakeup
|
||||
Date: January 2009
|
||||
Contact: Rafael J. Wysocki <rjw@sisk.pl>
|
||||
Description:
|
||||
The /sys/devices/.../power/wakeup attribute allows the user
|
||||
space to check if the device is enabled to wake up the system
|
||||
from sleep states, such as the memory sleep state (suspend to
|
||||
RAM) and hibernation (suspend to disk), and to enable or disable
|
||||
it to do that as desired.
|
||||
|
||||
Some devices support "wakeup" events, which are hardware signals
|
||||
used to activate the system from a sleep state. Such devices
|
||||
have one of the following two values for the sysfs power/wakeup
|
||||
file:
|
||||
|
||||
+ "enabled\n" to issue the events;
|
||||
+ "disabled\n" not to do so;
|
||||
|
||||
In that cases the user space can change the setting represented
|
||||
by the contents of this file by writing either "enabled", or
|
||||
"disabled" to it.
|
||||
|
||||
For the devices that are not capable of generating system wakeup
|
||||
events this file contains "\n". In that cases the user space
|
||||
cannot modify the contents of this file and the device cannot be
|
||||
enabled to wake up the system.
|
||||
|
||||
What: /sys/devices/.../power/control
|
||||
Date: January 2009
|
||||
Contact: Rafael J. Wysocki <rjw@sisk.pl>
|
||||
Description:
|
||||
The /sys/devices/.../power/control attribute allows the user
|
||||
space to control the run-time power management of the device.
|
||||
|
||||
All devices have one of the following two values for the
|
||||
power/control file:
|
||||
|
||||
+ "auto\n" to allow the device to be power managed at run time;
|
||||
+ "on\n" to prevent the device from being power managed;
|
||||
|
||||
The default for all devices is "auto", which means that they may
|
||||
be subject to automatic power management, depending on their
|
||||
drivers. Changing this attribute to "on" prevents the driver
|
||||
from power managing the device at run time. Doing that while
|
||||
the device is suspended causes it to be woken up.
|
||||
|
||||
What: /sys/devices/.../power/async
|
||||
Date: January 2009
|
||||
Contact: Rafael J. Wysocki <rjw@sisk.pl>
|
||||
Description:
|
||||
The /sys/devices/.../async attribute allows the user space to
|
||||
enable or diasble the device's suspend and resume callbacks to
|
||||
be executed asynchronously (ie. in separate threads, in parallel
|
||||
with the main suspend/resume thread) during system-wide power
|
||||
transitions (eg. suspend to RAM, hibernation).
|
||||
|
||||
All devices have one of the following two values for the
|
||||
power/async file:
|
||||
|
||||
+ "enabled\n" to permit the asynchronous suspend/resume;
|
||||
+ "disabled\n" to forbid it;
|
||||
|
||||
The value of this attribute may be changed by writing either
|
||||
"enabled", or "disabled" to it.
|
||||
|
||||
It generally is unsafe to permit the asynchronous suspend/resume
|
||||
of a device unless it is certain that all of the PM dependencies
|
||||
of the device are known to the PM core. However, for some
|
||||
devices this attribute is set to "enabled" by bus type code or
|
||||
device drivers and in that cases it should be safe to leave the
|
||||
default value.
|
||||
@@ -1,4 +1,4 @@
|
||||
What: /sys/devices/platform/asus-laptop/display
|
||||
What: /sys/devices/platform/asus_laptop/display
|
||||
Date: January 2007
|
||||
KernelVersion: 2.6.20
|
||||
Contact: "Corentin Chary" <corentincj@iksaif.net>
|
||||
@@ -13,7 +13,7 @@ Description:
|
||||
Ex: - 0 (0000b) means no display
|
||||
- 3 (0011b) CRT+LCD.
|
||||
|
||||
What: /sys/devices/platform/asus-laptop/gps
|
||||
What: /sys/devices/platform/asus_laptop/gps
|
||||
Date: January 2007
|
||||
KernelVersion: 2.6.20
|
||||
Contact: "Corentin Chary" <corentincj@iksaif.net>
|
||||
@@ -21,7 +21,7 @@ Description:
|
||||
Control the gps device. 1 means on, 0 means off.
|
||||
Users: Lapsus
|
||||
|
||||
What: /sys/devices/platform/asus-laptop/ledd
|
||||
What: /sys/devices/platform/asus_laptop/ledd
|
||||
Date: January 2007
|
||||
KernelVersion: 2.6.20
|
||||
Contact: "Corentin Chary" <corentincj@iksaif.net>
|
||||
@@ -29,11 +29,11 @@ Description:
|
||||
Some models like the W1N have a LED display that can be
|
||||
used to display several informations.
|
||||
To control the LED display, use the following :
|
||||
echo 0x0T000DDD > /sys/devices/platform/asus-laptop/
|
||||
echo 0x0T000DDD > /sys/devices/platform/asus_laptop/
|
||||
where T control the 3 letters display, and DDD the 3 digits display.
|
||||
The DDD table can be found in Documentation/laptops/asus-laptop.txt
|
||||
|
||||
What: /sys/devices/platform/asus-laptop/bluetooth
|
||||
What: /sys/devices/platform/asus_laptop/bluetooth
|
||||
Date: January 2007
|
||||
KernelVersion: 2.6.20
|
||||
Contact: "Corentin Chary" <corentincj@iksaif.net>
|
||||
@@ -42,7 +42,7 @@ Description:
|
||||
This may control the led, the device or both.
|
||||
Users: Lapsus
|
||||
|
||||
What: /sys/devices/platform/asus-laptop/wlan
|
||||
What: /sys/devices/platform/asus_laptop/wlan
|
||||
Date: January 2007
|
||||
KernelVersion: 2.6.20
|
||||
Contact: "Corentin Chary" <corentincj@iksaif.net>
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
What: /sys/devices/platform/eeepc-laptop/disp
|
||||
What: /sys/devices/platform/eeepc/disp
|
||||
Date: May 2008
|
||||
KernelVersion: 2.6.26
|
||||
Contact: "Corentin Chary" <corentincj@iksaif.net>
|
||||
@@ -9,21 +9,21 @@ Description:
|
||||
- 3 = LCD+CRT
|
||||
If you run X11, you should use xrandr instead.
|
||||
|
||||
What: /sys/devices/platform/eeepc-laptop/camera
|
||||
What: /sys/devices/platform/eeepc/camera
|
||||
Date: May 2008
|
||||
KernelVersion: 2.6.26
|
||||
Contact: "Corentin Chary" <corentincj@iksaif.net>
|
||||
Description:
|
||||
Control the camera. 1 means on, 0 means off.
|
||||
|
||||
What: /sys/devices/platform/eeepc-laptop/cardr
|
||||
What: /sys/devices/platform/eeepc/cardr
|
||||
Date: May 2008
|
||||
KernelVersion: 2.6.26
|
||||
Contact: "Corentin Chary" <corentincj@iksaif.net>
|
||||
Description:
|
||||
Control the card reader. 1 means on, 0 means off.
|
||||
|
||||
What: /sys/devices/platform/eeepc-laptop/cpufv
|
||||
What: /sys/devices/platform/eeepc/cpufv
|
||||
Date: Jun 2009
|
||||
KernelVersion: 2.6.31
|
||||
Contact: "Corentin Chary" <corentincj@iksaif.net>
|
||||
@@ -42,7 +42,7 @@ Description:
|
||||
`------------ Availables modes
|
||||
For example, 0x301 means: mode 1 selected, 3 available modes.
|
||||
|
||||
What: /sys/devices/platform/eeepc-laptop/available_cpufv
|
||||
What: /sys/devices/platform/eeepc/available_cpufv
|
||||
Date: Jun 2009
|
||||
KernelVersion: 2.6.31
|
||||
Contact: "Corentin Chary" <corentincj@iksaif.net>
|
||||
|
||||
@@ -101,3 +101,16 @@ Description:
|
||||
|
||||
CAUTION: Using it will cause your machine's real-time (CMOS)
|
||||
clock to be set to a random invalid time after a resume.
|
||||
|
||||
What: /sys/power/pm_async
|
||||
Date: January 2009
|
||||
Contact: Rafael J. Wysocki <rjw@sisk.pl>
|
||||
Description:
|
||||
The /sys/power/pm_async file controls the switch allowing the
|
||||
user space to enable or disable asynchronous suspend and resume
|
||||
of devices. If enabled, this feature will cause some device
|
||||
drivers' suspend and resume callbacks to be executed in parallel
|
||||
with each other and with the main suspend thread. It is enabled
|
||||
if this file contains "1", which is the default. It may be
|
||||
disabled by writing "0" to this file, in which case all devices
|
||||
will be suspended and resumed synchronously.
|
||||
|
||||
@@ -45,7 +45,7 @@
|
||||
</sect1>
|
||||
|
||||
<sect1><title>Atomic and pointer manipulation</title>
|
||||
!Iarch/x86/include/asm/atomic_32.h
|
||||
!Iarch/x86/include/asm/atomic.h
|
||||
!Iarch/x86/include/asm/unaligned.h
|
||||
</sect1>
|
||||
|
||||
|
||||
@@ -316,7 +316,7 @@ CPU B: spin_unlock_irqrestore(&dev_lock, flags)
|
||||
|
||||
<chapter id="pubfunctions">
|
||||
<title>Public Functions Provided</title>
|
||||
!Iarch/x86/include/asm/io_32.h
|
||||
!Iarch/x86/include/asm/io.h
|
||||
!Elib/iomap.c
|
||||
</chapter>
|
||||
|
||||
|
||||
@@ -144,7 +144,7 @@ usage should require reading the full document.
|
||||
this though and the recommendation to allow only a single
|
||||
interface in STA mode at first!
|
||||
</para>
|
||||
!Finclude/net/mac80211.h ieee80211_if_init_conf
|
||||
!Finclude/net/mac80211.h ieee80211_vif
|
||||
</chapter>
|
||||
|
||||
<chapter id="rx-tx">
|
||||
@@ -234,7 +234,6 @@ usage should require reading the full document.
|
||||
<title>Multiple queues and QoS support</title>
|
||||
<para>TBD</para>
|
||||
!Finclude/net/mac80211.h ieee80211_tx_queue_params
|
||||
!Finclude/net/mac80211.h ieee80211_tx_queue_stats
|
||||
</chapter>
|
||||
|
||||
<chapter id="AP">
|
||||
|
||||
@@ -589,7 +589,8 @@ number of a video input as in &v4l2-input; field
|
||||
<entry></entry>
|
||||
<entry>A place holder for future extensions and custom
|
||||
(driver defined) buffer types
|
||||
<constant>V4L2_BUF_TYPE_PRIVATE</constant> and higher.</entry>
|
||||
<constant>V4L2_BUF_TYPE_PRIVATE</constant> and higher. Applications
|
||||
should set this to 0.</entry>
|
||||
</row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
|
||||
@@ -54,12 +54,10 @@ to enqueue an empty (capturing) or filled (output) buffer in the
|
||||
driver's incoming queue. The semantics depend on the selected I/O
|
||||
method.</para>
|
||||
|
||||
<para>To enqueue a <link linkend="mmap">memory mapped</link>
|
||||
buffer applications set the <structfield>type</structfield> field of a
|
||||
&v4l2-buffer; to the same buffer type as previously &v4l2-format;
|
||||
<structfield>type</structfield> and &v4l2-requestbuffers;
|
||||
<structfield>type</structfield>, the <structfield>memory</structfield>
|
||||
field to <constant>V4L2_MEMORY_MMAP</constant> and the
|
||||
<para>To enqueue a buffer applications set the <structfield>type</structfield>
|
||||
field of a &v4l2-buffer; to the same buffer type as was previously used
|
||||
with &v4l2-format; <structfield>type</structfield> and &v4l2-requestbuffers;
|
||||
<structfield>type</structfield>. Applications must also set the
|
||||
<structfield>index</structfield> field. Valid index numbers range from
|
||||
zero to the number of buffers allocated with &VIDIOC-REQBUFS;
|
||||
(&v4l2-requestbuffers; <structfield>count</structfield>) minus one. The
|
||||
@@ -70,8 +68,19 @@ intended for output (<structfield>type</structfield> is
|
||||
<constant>V4L2_BUF_TYPE_VBI_OUTPUT</constant>) applications must also
|
||||
initialize the <structfield>bytesused</structfield>,
|
||||
<structfield>field</structfield> and
|
||||
<structfield>timestamp</structfield> fields. See <xref
|
||||
linkend="buffer" /> for details. When
|
||||
<structfield>timestamp</structfield> fields, see <xref
|
||||
linkend="buffer" /> for details.
|
||||
Applications must also set <structfield>flags</structfield> to 0. If a driver
|
||||
supports capturing from specific video inputs and you want to specify a video
|
||||
input, then <structfield>flags</structfield> should be set to
|
||||
<constant>V4L2_BUF_FLAG_INPUT</constant> and the field
|
||||
<structfield>input</structfield> must be initialized to the desired input.
|
||||
The <structfield>reserved</structfield> field must be set to 0.
|
||||
</para>
|
||||
|
||||
<para>To enqueue a <link linkend="mmap">memory mapped</link>
|
||||
buffer applications set the <structfield>memory</structfield>
|
||||
field to <constant>V4L2_MEMORY_MMAP</constant>. When
|
||||
<constant>VIDIOC_QBUF</constant> is called with a pointer to this
|
||||
structure the driver sets the
|
||||
<constant>V4L2_BUF_FLAG_MAPPED</constant> and
|
||||
@@ -81,14 +90,10 @@ structure the driver sets the
|
||||
&EINVAL;.</para>
|
||||
|
||||
<para>To enqueue a <link linkend="userp">user pointer</link>
|
||||
buffer applications set the <structfield>type</structfield> field of a
|
||||
&v4l2-buffer; to the same buffer type as previously &v4l2-format;
|
||||
<structfield>type</structfield> and &v4l2-requestbuffers;
|
||||
<structfield>type</structfield>, the <structfield>memory</structfield>
|
||||
field to <constant>V4L2_MEMORY_USERPTR</constant> and the
|
||||
buffer applications set the <structfield>memory</structfield>
|
||||
field to <constant>V4L2_MEMORY_USERPTR</constant>, the
|
||||
<structfield>m.userptr</structfield> field to the address of the
|
||||
buffer and <structfield>length</structfield> to its size. When the
|
||||
buffer is intended for output additional fields must be set as above.
|
||||
buffer and <structfield>length</structfield> to its size.
|
||||
When <constant>VIDIOC_QBUF</constant> is called with a pointer to this
|
||||
structure the driver sets the <constant>V4L2_BUF_FLAG_QUEUED</constant>
|
||||
flag and clears the <constant>V4L2_BUF_FLAG_MAPPED</constant> and
|
||||
@@ -96,13 +101,14 @@ flag and clears the <constant>V4L2_BUF_FLAG_MAPPED</constant> and
|
||||
<structfield>flags</structfield> field, or it returns an error code.
|
||||
This ioctl locks the memory pages of the buffer in physical memory,
|
||||
they cannot be swapped out to disk. Buffers remain locked until
|
||||
dequeued, until the &VIDIOC-STREAMOFF; or &VIDIOC-REQBUFS; ioctl are
|
||||
dequeued, until the &VIDIOC-STREAMOFF; or &VIDIOC-REQBUFS; ioctl is
|
||||
called, or until the device is closed.</para>
|
||||
|
||||
<para>Applications call the <constant>VIDIOC_DQBUF</constant>
|
||||
ioctl to dequeue a filled (capturing) or displayed (output) buffer
|
||||
from the driver's outgoing queue. They just set the
|
||||
<structfield>type</structfield> and <structfield>memory</structfield>
|
||||
<structfield>type</structfield>, <structfield>memory</structfield>
|
||||
and <structfield>reserved</structfield>
|
||||
fields of a &v4l2-buffer; as above, when <constant>VIDIOC_DQBUF</constant>
|
||||
is called with a pointer to this structure the driver fills the
|
||||
remaining fields or returns an error code.</para>
|
||||
|
||||
@@ -54,12 +54,13 @@ buffer at any time after buffers have been allocated with the
|
||||
&VIDIOC-REQBUFS; ioctl.</para>
|
||||
|
||||
<para>Applications set the <structfield>type</structfield> field
|
||||
of a &v4l2-buffer; to the same buffer type as previously
|
||||
of a &v4l2-buffer; to the same buffer type as was previously used with
|
||||
&v4l2-format; <structfield>type</structfield> and &v4l2-requestbuffers;
|
||||
<structfield>type</structfield>, and the <structfield>index</structfield>
|
||||
field. Valid index numbers range from zero
|
||||
to the number of buffers allocated with &VIDIOC-REQBUFS;
|
||||
(&v4l2-requestbuffers; <structfield>count</structfield>) minus one.
|
||||
The <structfield>reserved</structfield> field should to set to 0.
|
||||
After calling <constant>VIDIOC_QUERYBUF</constant> with a pointer to
|
||||
this structure drivers return an error code or fill the rest of
|
||||
the structure.</para>
|
||||
@@ -68,8 +69,8 @@ the structure.</para>
|
||||
<constant>V4L2_BUF_FLAG_MAPPED</constant>,
|
||||
<constant>V4L2_BUF_FLAG_QUEUED</constant> and
|
||||
<constant>V4L2_BUF_FLAG_DONE</constant> flags will be valid. The
|
||||
<structfield>memory</structfield> field will be set to
|
||||
<constant>V4L2_MEMORY_MMAP</constant>, the <structfield>m.offset</structfield>
|
||||
<structfield>memory</structfield> field will be set to the current
|
||||
I/O method, the <structfield>m.offset</structfield>
|
||||
contains the offset of the buffer from the start of the device memory,
|
||||
the <structfield>length</structfield> field its size. The driver may
|
||||
or may not set the remaining fields and flags, they are meaningless in
|
||||
|
||||
@@ -54,23 +54,23 @@ I/O. Memory mapped buffers are located in device memory and must be
|
||||
allocated with this ioctl before they can be mapped into the
|
||||
application's address space. User buffers are allocated by
|
||||
applications themselves, and this ioctl is merely used to switch the
|
||||
driver into user pointer I/O mode.</para>
|
||||
driver into user pointer I/O mode and to setup some internal structures.</para>
|
||||
|
||||
<para>To allocate device buffers applications initialize three
|
||||
fields of a <structname>v4l2_requestbuffers</structname> structure.
|
||||
<para>To allocate device buffers applications initialize all
|
||||
fields of the <structname>v4l2_requestbuffers</structname> structure.
|
||||
They set the <structfield>type</structfield> field to the respective
|
||||
stream or buffer type, the <structfield>count</structfield> field to
|
||||
the desired number of buffers, and <structfield>memory</structfield>
|
||||
must be set to <constant>V4L2_MEMORY_MMAP</constant>. When the ioctl
|
||||
is called with a pointer to this structure the driver attempts to
|
||||
allocate the requested number of buffers and stores the actual number
|
||||
the desired number of buffers, <structfield>memory</structfield>
|
||||
must be set to the requested I/O method and the reserved array
|
||||
must be zeroed. When the ioctl
|
||||
is called with a pointer to this structure the driver will attempt to allocate
|
||||
the requested number of buffers and it stores the actual number
|
||||
allocated in the <structfield>count</structfield> field. It can be
|
||||
smaller than the number requested, even zero, when the driver runs out
|
||||
of free memory. A larger number is possible when the driver requires
|
||||
more buffers to function correctly.<footnote>
|
||||
<para>For example video output requires at least two buffers,
|
||||
of free memory. A larger number is also possible when the driver requires
|
||||
more buffers to function correctly. For example video output requires at least two buffers,
|
||||
one displayed and one filled by the application.</para>
|
||||
</footnote> When memory mapping I/O is not supported the ioctl
|
||||
<para>When the I/O method is not supported the ioctl
|
||||
returns an &EINVAL;.</para>
|
||||
|
||||
<para>Applications can call <constant>VIDIOC_REQBUFS</constant>
|
||||
@@ -81,14 +81,6 @@ in progress, an implicit &VIDIOC-STREAMOFF;. <!-- mhs: I see no
|
||||
reason why munmap()ping one or even all buffers must imply
|
||||
streamoff.--></para>
|
||||
|
||||
<para>To negotiate user pointer I/O, applications initialize only
|
||||
the <structfield>type</structfield> field and set
|
||||
<structfield>memory</structfield> to
|
||||
<constant>V4L2_MEMORY_USERPTR</constant>. When the ioctl is called
|
||||
with a pointer to this structure the driver prepares for user pointer
|
||||
I/O, when this I/O method is not supported the ioctl returns an
|
||||
&EINVAL;.</para>
|
||||
|
||||
<table pgwide="1" frame="none" id="v4l2-requestbuffers">
|
||||
<title>struct <structname>v4l2_requestbuffers</structname></title>
|
||||
<tgroup cols="3">
|
||||
@@ -97,9 +89,7 @@ I/O, when this I/O method is not supported the ioctl returns an
|
||||
<row>
|
||||
<entry>__u32</entry>
|
||||
<entry><structfield>count</structfield></entry>
|
||||
<entry>The number of buffers requested or granted. This
|
||||
field is only used when <structfield>memory</structfield> is set to
|
||||
<constant>V4L2_MEMORY_MMAP</constant>.</entry>
|
||||
<entry>The number of buffers requested or granted.</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>&v4l2-buf-type;</entry>
|
||||
@@ -120,7 +110,7 @@ as the &v4l2-format; <structfield>type</structfield> field. See <xref
|
||||
<entry><structfield>reserved</structfield>[2]</entry>
|
||||
<entry>A place holder for future extensions and custom
|
||||
(driver defined) buffer types <constant>V4L2_BUF_TYPE_PRIVATE</constant> and
|
||||
higher.</entry>
|
||||
higher. This array should be zeroed by applications.</entry>
|
||||
</row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
|
||||
+32
-73
@@ -221,8 +221,8 @@ branches. These different branches are:
|
||||
- main 2.6.x kernel tree
|
||||
- 2.6.x.y -stable kernel tree
|
||||
- 2.6.x -git kernel patches
|
||||
- 2.6.x -mm kernel patches
|
||||
- subsystem specific kernel trees and patches
|
||||
- the 2.6.x -next kernel tree for integration tests
|
||||
|
||||
2.6.x kernel tree
|
||||
-----------------
|
||||
@@ -232,7 +232,7 @@ process is as follows:
|
||||
- As soon as a new kernel is released a two weeks window is open,
|
||||
during this period of time maintainers can submit big diffs to
|
||||
Linus, usually the patches that have already been included in the
|
||||
-mm kernel for a few weeks. The preferred way to submit big changes
|
||||
-next kernel for a few weeks. The preferred way to submit big changes
|
||||
is using git (the kernel's source management tool, more information
|
||||
can be found at http://git.or.cz/) but plain patches are also just
|
||||
fine.
|
||||
@@ -293,84 +293,43 @@ daily and represent the current state of Linus' tree. They are more
|
||||
experimental than -rc kernels since they are generated automatically
|
||||
without even a cursory glance to see if they are sane.
|
||||
|
||||
2.6.x -mm kernel patches
|
||||
------------------------
|
||||
These are experimental kernel patches released by Andrew Morton. Andrew
|
||||
takes all of the different subsystem kernel trees and patches and mushes
|
||||
them together, along with a lot of patches that have been plucked from
|
||||
the linux-kernel mailing list. This tree serves as a proving ground for
|
||||
new features and patches. Once a patch has proved its worth in -mm for
|
||||
a while Andrew or the subsystem maintainer pushes it on to Linus for
|
||||
inclusion in mainline.
|
||||
|
||||
It is heavily encouraged that all new patches get tested in the -mm tree
|
||||
before they are sent to Linus for inclusion in the main kernel tree. Code
|
||||
which does not make an appearance in -mm before the opening of the merge
|
||||
window will prove hard to merge into the mainline.
|
||||
|
||||
These kernels are not appropriate for use on systems that are supposed
|
||||
to be stable and they are more risky to run than any of the other
|
||||
branches.
|
||||
|
||||
If you wish to help out with the kernel development process, please test
|
||||
and use these kernel releases and provide feedback to the linux-kernel
|
||||
mailing list if you have any problems, and if everything works properly.
|
||||
|
||||
In addition to all the other experimental patches, these kernels usually
|
||||
also contain any changes in the mainline -git kernels available at the
|
||||
time of release.
|
||||
|
||||
The -mm kernels are not released on a fixed schedule, but usually a few
|
||||
-mm kernels are released in between each -rc kernel (1 to 3 is common).
|
||||
|
||||
Subsystem Specific kernel trees and patches
|
||||
-------------------------------------------
|
||||
A number of the different kernel subsystem developers expose their
|
||||
development trees so that others can see what is happening in the
|
||||
different areas of the kernel. These trees are pulled into the -mm
|
||||
kernel releases as described above.
|
||||
The maintainers of the various kernel subsystems --- and also many
|
||||
kernel subsystem developers --- expose their current state of
|
||||
development in source repositories. That way, others can see what is
|
||||
happening in the different areas of the kernel. In areas where
|
||||
development is rapid, a developer may be asked to base his submissions
|
||||
onto such a subsystem kernel tree so that conflicts between the
|
||||
submission and other already ongoing work are avoided.
|
||||
|
||||
Here is a list of some of the different kernel trees available:
|
||||
git trees:
|
||||
- Kbuild development tree, Sam Ravnborg <sam@ravnborg.org>
|
||||
git.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild.git
|
||||
Most of these repositories are git trees, but there are also other SCMs
|
||||
in use, or patch queues being published as quilt series. Addresses of
|
||||
these subsystem repositories are listed in the MAINTAINERS file. Many
|
||||
of them can be browsed at http://git.kernel.org/.
|
||||
|
||||
- ACPI development tree, Len Brown <len.brown@intel.com>
|
||||
git.kernel.org:/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git
|
||||
Before a proposed patch is committed to such a subsystem tree, it is
|
||||
subject to review which primarily happens on mailing lists (see the
|
||||
respective section below). For several kernel subsystems, this review
|
||||
process is tracked with the tool patchwork. Patchwork offers a web
|
||||
interface which shows patch postings, any comments on a patch or
|
||||
revisions to it, and maintainers can mark patches as under review,
|
||||
accepted, or rejected. Most of these patchwork sites are listed at
|
||||
http://patchwork.kernel.org/ or http://patchwork.ozlabs.org/.
|
||||
|
||||
- Block development tree, Jens Axboe <jens.axboe@oracle.com>
|
||||
git.kernel.org:/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git
|
||||
2.6.x -next kernel tree for integration tests
|
||||
---------------------------------------------
|
||||
Before updates from subsystem trees are merged into the mainline 2.6.x
|
||||
tree, they need to be integration-tested. For this purpose, a special
|
||||
testing repository exists into which virtually all subsystem trees are
|
||||
pulled on an almost daily basis:
|
||||
http://git.kernel.org/?p=linux/kernel/git/sfr/linux-next.git
|
||||
http://linux.f-seidel.de/linux-next/pmwiki/
|
||||
|
||||
- DRM development tree, Dave Airlie <airlied@linux.ie>
|
||||
git.kernel.org:/pub/scm/linux/kernel/git/airlied/drm-2.6.git
|
||||
This way, the -next kernel gives a summary outlook onto what will be
|
||||
expected to go into the mainline kernel at the next merge period.
|
||||
Adventurous testers are very welcome to runtime-test the -next kernel.
|
||||
|
||||
- ia64 development tree, Tony Luck <tony.luck@intel.com>
|
||||
git.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6.git
|
||||
|
||||
- infiniband, Roland Dreier <rolandd@cisco.com>
|
||||
git.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git
|
||||
|
||||
- libata, Jeff Garzik <jgarzik@pobox.com>
|
||||
git.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev.git
|
||||
|
||||
- network drivers, Jeff Garzik <jgarzik@pobox.com>
|
||||
git.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git
|
||||
|
||||
- pcmcia, Dominik Brodowski <linux@dominikbrodowski.net>
|
||||
git.kernel.org:/pub/scm/linux/kernel/git/brodo/pcmcia-2.6.git
|
||||
|
||||
- SCSI, James Bottomley <James.Bottomley@hansenpartnership.com>
|
||||
git.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git
|
||||
|
||||
- x86, Ingo Molnar <mingo@elte.hu>
|
||||
git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
|
||||
|
||||
quilt trees:
|
||||
- USB, Driver Core, and I2C, Greg Kroah-Hartman <gregkh@suse.de>
|
||||
kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/
|
||||
|
||||
Other kernel trees can be found listed at http://git.kernel.org/ and in
|
||||
the MAINTAINERS file.
|
||||
|
||||
Bug Reporting
|
||||
-------------
|
||||
|
||||
@@ -6,16 +6,22 @@ checklist.txt
|
||||
- Review Checklist for RCU Patches
|
||||
listRCU.txt
|
||||
- Using RCU to Protect Read-Mostly Linked Lists
|
||||
lockdep.txt
|
||||
- RCU and lockdep checking
|
||||
NMI-RCU.txt
|
||||
- Using RCU to Protect Dynamic NMI Handlers
|
||||
rcubarrier.txt
|
||||
- RCU and Unloadable Modules
|
||||
rculist_nulls.txt
|
||||
- RCU list primitives for use with SLAB_DESTROY_BY_RCU
|
||||
rcuref.txt
|
||||
- Reference-count design for elements of lists/arrays protected by RCU
|
||||
rcu.txt
|
||||
- RCU Concepts
|
||||
rcubarrier.txt
|
||||
- Unloading modules that use RCU callbacks
|
||||
RTFP.txt
|
||||
- List of RCU papers (bibliography) going back to 1980.
|
||||
stallwarn.txt
|
||||
- RCU CPU stall warnings (CONFIG_RCU_CPU_STALL_DETECTOR)
|
||||
torture.txt
|
||||
- RCU Torture Test Operation (CONFIG_RCU_TORTURE_TEST)
|
||||
trace.txt
|
||||
|
||||
@@ -25,10 +25,10 @@ to be referencing the data structure. However, this mechanism was not
|
||||
optimized for modern computer systems, which is not surprising given
|
||||
that these overheads were not so expensive in the mid-80s. Nonetheless,
|
||||
passive serialization appears to be the first deferred-destruction
|
||||
mechanism to be used in production. Furthermore, the relevant patent has
|
||||
lapsed, so this approach may be used in non-GPL software, if desired.
|
||||
(In contrast, use of RCU is permitted only in software licensed under
|
||||
GPL. Sorry!!!)
|
||||
mechanism to be used in production. Furthermore, the relevant patent
|
||||
has lapsed, so this approach may be used in non-GPL software, if desired.
|
||||
(In contrast, implementation of RCU is permitted only in software licensed
|
||||
under either GPL or LGPL. Sorry!!!)
|
||||
|
||||
In 1990, Pugh [Pugh90] noted that explicitly tracking which threads
|
||||
were reading a given data structure permitted deferred free to operate
|
||||
@@ -150,6 +150,18 @@ preemptible RCU [PaulEMcKenney2007PreemptibleRCU], and the three-part
|
||||
LWN "What is RCU?" series [PaulEMcKenney2007WhatIsRCUFundamentally,
|
||||
PaulEMcKenney2008WhatIsRCUUsage, and PaulEMcKenney2008WhatIsRCUAPI].
|
||||
|
||||
2008 saw a journal paper on real-time RCU [DinakarGuniguntala2008IBMSysJ],
|
||||
a history of how Linux changed RCU more than RCU changed Linux
|
||||
[PaulEMcKenney2008RCUOSR], and a design overview of hierarchical RCU
|
||||
[PaulEMcKenney2008HierarchicalRCU].
|
||||
|
||||
2009 introduced user-level RCU algorithms [PaulEMcKenney2009MaliciousURCU],
|
||||
which Mathieu Desnoyers is now maintaining [MathieuDesnoyers2009URCU]
|
||||
[MathieuDesnoyersPhD]. TINY_RCU [PaulEMcKenney2009BloatWatchRCU] made
|
||||
its appearance, as did expedited RCU [PaulEMcKenney2009expeditedRCU].
|
||||
The problem of resizeable RCU-protected hash tables may now be on a path
|
||||
to a solution [JoshTriplett2009RPHash].
|
||||
|
||||
Bibtex Entries
|
||||
|
||||
@article{Kung80
|
||||
@@ -730,6 +742,11 @@ Revised:
|
||||
"
|
||||
}
|
||||
|
||||
#
|
||||
# "What is RCU?" LWN series.
|
||||
#
|
||||
########################################################################
|
||||
|
||||
@article{DinakarGuniguntala2008IBMSysJ
|
||||
,author="D. Guniguntala and P. E. McKenney and J. Triplett and J. Walpole"
|
||||
,title="The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with {Linux}"
|
||||
@@ -820,3 +837,39 @@ Revised:
|
||||
Uniprocessor assumptions allow simplified RCU implementation.
|
||||
"
|
||||
}
|
||||
|
||||
@unpublished{PaulEMcKenney2009expeditedRCU
|
||||
,Author="Paul E. McKenney"
|
||||
,Title="[{PATCH} -tip 0/3] expedited 'big hammer' {RCU} grace periods"
|
||||
,month="June"
|
||||
,day="25"
|
||||
,year="2009"
|
||||
,note="Available:
|
||||
\url{http://lkml.org/lkml/2009/6/25/306}
|
||||
[Viewed August 16, 2009]"
|
||||
,annotation="
|
||||
First posting of expedited RCU to be accepted into -tip.
|
||||
"
|
||||
}
|
||||
|
||||
@unpublished{JoshTriplett2009RPHash
|
||||
,Author="Josh Triplett"
|
||||
,Title="Scalable concurrent hash tables via relativistic programming"
|
||||
,month="September"
|
||||
,year="2009"
|
||||
,note="Linux Plumbers Conference presentation"
|
||||
,annotation="
|
||||
RP fun with hash tables.
|
||||
"
|
||||
}
|
||||
|
||||
@phdthesis{MathieuDesnoyersPhD
|
||||
, title = "Low-Impact Operating System Tracing"
|
||||
, author = "Mathieu Desnoyers"
|
||||
, school = "Ecole Polytechnique de Montr\'{e}al"
|
||||
, month = "December"
|
||||
, year = 2009
|
||||
,note="Available:
|
||||
\url{http://www.lttng.org/pub/thesis/desnoyers-dissertation-2009-12.pdf}
|
||||
[Viewed December 9, 2009]"
|
||||
}
|
||||
|
||||
+124
-80
@@ -8,13 +8,12 @@ would cause. This list is based on experiences reviewing such patches
|
||||
over a rather long period of time, but improvements are always welcome!
|
||||
|
||||
0. Is RCU being applied to a read-mostly situation? If the data
|
||||
structure is updated more than about 10% of the time, then
|
||||
you should strongly consider some other approach, unless
|
||||
detailed performance measurements show that RCU is nonetheless
|
||||
the right tool for the job. Yes, you might think of RCU
|
||||
as simply cutting overhead off of the readers and imposing it
|
||||
on the writers. That is exactly why normal uses of RCU will
|
||||
do much more reading than updating.
|
||||
structure is updated more than about 10% of the time, then you
|
||||
should strongly consider some other approach, unless detailed
|
||||
performance measurements show that RCU is nonetheless the right
|
||||
tool for the job. Yes, RCU does reduce read-side overhead by
|
||||
increasing write-side overhead, which is exactly why normal uses
|
||||
of RCU will do much more reading than updating.
|
||||
|
||||
Another exception is where performance is not an issue, and RCU
|
||||
provides a simpler implementation. An example of this situation
|
||||
@@ -35,13 +34,13 @@ over a rather long period of time, but improvements are always welcome!
|
||||
|
||||
If you choose #b, be prepared to describe how you have handled
|
||||
memory barriers on weakly ordered machines (pretty much all of
|
||||
them -- even x86 allows reads to be reordered), and be prepared
|
||||
to explain why this added complexity is worthwhile. If you
|
||||
choose #c, be prepared to explain how this single task does not
|
||||
become a major bottleneck on big multiprocessor machines (for
|
||||
example, if the task is updating information relating to itself
|
||||
that other tasks can read, there by definition can be no
|
||||
bottleneck).
|
||||
them -- even x86 allows later loads to be reordered to precede
|
||||
earlier stores), and be prepared to explain why this added
|
||||
complexity is worthwhile. If you choose #c, be prepared to
|
||||
explain how this single task does not become a major bottleneck on
|
||||
big multiprocessor machines (for example, if the task is updating
|
||||
information relating to itself that other tasks can read, there
|
||||
by definition can be no bottleneck).
|
||||
|
||||
2. Do the RCU read-side critical sections make proper use of
|
||||
rcu_read_lock() and friends? These primitives are needed
|
||||
@@ -51,8 +50,10 @@ over a rather long period of time, but improvements are always welcome!
|
||||
actuarial risk of your kernel.
|
||||
|
||||
As a rough rule of thumb, any dereference of an RCU-protected
|
||||
pointer must be covered by rcu_read_lock() or rcu_read_lock_bh()
|
||||
or by the appropriate update-side lock.
|
||||
pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(),
|
||||
rcu_read_lock_sched(), or by the appropriate update-side lock.
|
||||
Disabling of preemption can serve as rcu_read_lock_sched(), but
|
||||
is less readable.
|
||||
|
||||
3. Does the update code tolerate concurrent accesses?
|
||||
|
||||
@@ -62,25 +63,27 @@ over a rather long period of time, but improvements are always welcome!
|
||||
of ways to handle this concurrency, depending on the situation:
|
||||
|
||||
a. Use the RCU variants of the list and hlist update
|
||||
primitives to add, remove, and replace elements on an
|
||||
RCU-protected list. Alternatively, use the RCU-protected
|
||||
trees that have been added to the Linux kernel.
|
||||
primitives to add, remove, and replace elements on
|
||||
an RCU-protected list. Alternatively, use the other
|
||||
RCU-protected data structures that have been added to
|
||||
the Linux kernel.
|
||||
|
||||
This is almost always the best approach.
|
||||
|
||||
b. Proceed as in (a) above, but also maintain per-element
|
||||
locks (that are acquired by both readers and writers)
|
||||
that guard per-element state. Of course, fields that
|
||||
the readers refrain from accessing can be guarded by the
|
||||
update-side lock.
|
||||
the readers refrain from accessing can be guarded by
|
||||
some other lock acquired only by updaters, if desired.
|
||||
|
||||
This works quite well, also.
|
||||
|
||||
c. Make updates appear atomic to readers. For example,
|
||||
pointer updates to properly aligned fields will appear
|
||||
atomic, as will individual atomic primitives. Operations
|
||||
performed under a lock and sequences of multiple atomic
|
||||
primitives will -not- appear to be atomic.
|
||||
pointer updates to properly aligned fields will
|
||||
appear atomic, as will individual atomic primitives.
|
||||
Sequences of perations performed under a lock will -not-
|
||||
appear to be atomic to RCU readers, nor will sequences
|
||||
of multiple atomic primitives.
|
||||
|
||||
This can work, but is starting to get a bit tricky.
|
||||
|
||||
@@ -98,9 +101,9 @@ over a rather long period of time, but improvements are always welcome!
|
||||
a new structure containing updated values.
|
||||
|
||||
4. Weakly ordered CPUs pose special challenges. Almost all CPUs
|
||||
are weakly ordered -- even i386 CPUs allow reads to be reordered.
|
||||
RCU code must take all of the following measures to prevent
|
||||
memory-corruption problems:
|
||||
are weakly ordered -- even x86 CPUs allow later loads to be
|
||||
reordered to precede earlier stores. RCU code must take all of
|
||||
the following measures to prevent memory-corruption problems:
|
||||
|
||||
a. Readers must maintain proper ordering of their memory
|
||||
accesses. The rcu_dereference() primitive ensures that
|
||||
@@ -113,14 +116,25 @@ over a rather long period of time, but improvements are always welcome!
|
||||
The rcu_dereference() primitive is also an excellent
|
||||
documentation aid, letting the person reading the code
|
||||
know exactly which pointers are protected by RCU.
|
||||
Please note that compilers can also reorder code, and
|
||||
they are becoming increasingly aggressive about doing
|
||||
just that. The rcu_dereference() primitive therefore
|
||||
also prevents destructive compiler optimizations.
|
||||
|
||||
The rcu_dereference() primitive is used by the various
|
||||
"_rcu()" list-traversal primitives, such as the
|
||||
list_for_each_entry_rcu(). Note that it is perfectly
|
||||
legal (if redundant) for update-side code to use
|
||||
rcu_dereference() and the "_rcu()" list-traversal
|
||||
primitives. This is particularly useful in code
|
||||
that is common to readers and updaters.
|
||||
The rcu_dereference() primitive is used by the
|
||||
various "_rcu()" list-traversal primitives, such
|
||||
as the list_for_each_entry_rcu(). Note that it is
|
||||
perfectly legal (if redundant) for update-side code to
|
||||
use rcu_dereference() and the "_rcu()" list-traversal
|
||||
primitives. This is particularly useful in code that
|
||||
is common to readers and updaters. However, lockdep
|
||||
will complain if you access rcu_dereference() outside
|
||||
of an RCU read-side critical section. See lockdep.txt
|
||||
to learn what to do about this.
|
||||
|
||||
Of course, neither rcu_dereference() nor the "_rcu()"
|
||||
list-traversal primitives can substitute for a good
|
||||
concurrency design coordinating among multiple updaters.
|
||||
|
||||
b. If the list macros are being used, the list_add_tail_rcu()
|
||||
and list_add_rcu() primitives must be used in order
|
||||
@@ -135,11 +149,14 @@ over a rather long period of time, but improvements are always welcome!
|
||||
readers. Similarly, if the hlist macros are being used,
|
||||
the hlist_del_rcu() primitive is required.
|
||||
|
||||
The list_replace_rcu() primitive may be used to
|
||||
replace an old structure with a new one in an
|
||||
RCU-protected list.
|
||||
The list_replace_rcu() and hlist_replace_rcu() primitives
|
||||
may be used to replace an old structure with a new one
|
||||
in their respective types of RCU-protected lists.
|
||||
|
||||
d. Updates must ensure that initialization of a given
|
||||
d. Rules similar to (4b) and (4c) apply to the "hlist_nulls"
|
||||
type of RCU-protected linked lists.
|
||||
|
||||
e. Updates must ensure that initialization of a given
|
||||
structure happens before pointers to that structure are
|
||||
publicized. Use the rcu_assign_pointer() primitive
|
||||
when publicizing a pointer to a structure that can
|
||||
@@ -151,16 +168,31 @@ over a rather long period of time, but improvements are always welcome!
|
||||
it cannot block.
|
||||
|
||||
6. Since synchronize_rcu() can block, it cannot be called from
|
||||
any sort of irq context. Ditto for synchronize_sched() and
|
||||
synchronize_srcu().
|
||||
any sort of irq context. The same rule applies for
|
||||
synchronize_rcu_bh(), synchronize_sched(), synchronize_srcu(),
|
||||
synchronize_rcu_expedited(), synchronize_rcu_bh_expedited(),
|
||||
synchronize_sched_expedite(), and synchronize_srcu_expedited().
|
||||
|
||||
7. If the updater uses call_rcu(), then the corresponding readers
|
||||
must use rcu_read_lock() and rcu_read_unlock(). If the updater
|
||||
uses call_rcu_bh(), then the corresponding readers must use
|
||||
rcu_read_lock_bh() and rcu_read_unlock_bh(). If the updater
|
||||
uses call_rcu_sched(), then the corresponding readers must
|
||||
disable preemption. Mixing things up will result in confusion
|
||||
and broken kernels.
|
||||
The expedited forms of these primitives have the same semantics
|
||||
as the non-expedited forms, but expediting is both expensive
|
||||
and unfriendly to real-time workloads. Use of the expedited
|
||||
primitives should be restricted to rare configuration-change
|
||||
operations that would not normally be undertaken while a real-time
|
||||
workload is running.
|
||||
|
||||
7. If the updater uses call_rcu() or synchronize_rcu(), then the
|
||||
corresponding readers must use rcu_read_lock() and
|
||||
rcu_read_unlock(). If the updater uses call_rcu_bh() or
|
||||
synchronize_rcu_bh(), then the corresponding readers must
|
||||
use rcu_read_lock_bh() and rcu_read_unlock_bh(). If the
|
||||
updater uses call_rcu_sched() or synchronize_sched(), then
|
||||
the corresponding readers must disable preemption, possibly
|
||||
by calling rcu_read_lock_sched() and rcu_read_unlock_sched().
|
||||
If the updater uses synchronize_srcu(), the the corresponding
|
||||
readers must use srcu_read_lock() and srcu_read_unlock(),
|
||||
and with the same srcu_struct. The rules for the expedited
|
||||
primitives are the same as for their non-expedited counterparts.
|
||||
Mixing things up will result in confusion and broken kernels.
|
||||
|
||||
One exception to this rule: rcu_read_lock() and rcu_read_unlock()
|
||||
may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh()
|
||||
@@ -212,6 +244,8 @@ over a rather long period of time, but improvements are always welcome!
|
||||
e. Periodically invoke synchronize_rcu(), permitting a limited
|
||||
number of updates per grace period.
|
||||
|
||||
The same cautions apply to call_rcu_bh() and call_rcu_sched().
|
||||
|
||||
9. All RCU list-traversal primitives, which include
|
||||
rcu_dereference(), list_for_each_entry_rcu(),
|
||||
list_for_each_continue_rcu(), and list_for_each_safe_rcu(),
|
||||
@@ -219,7 +253,9 @@ over a rather long period of time, but improvements are always welcome!
|
||||
must be protected by appropriate update-side locks. RCU
|
||||
read-side critical sections are delimited by rcu_read_lock()
|
||||
and rcu_read_unlock(), or by similar primitives such as
|
||||
rcu_read_lock_bh() and rcu_read_unlock_bh().
|
||||
rcu_read_lock_bh() and rcu_read_unlock_bh(), in which case
|
||||
the matching rcu_dereference() primitive must be used in order
|
||||
to keep lockdep happy, in this case, rcu_dereference_bh().
|
||||
|
||||
The reason that it is permissible to use RCU list-traversal
|
||||
primitives when the update-side lock is held is that doing so
|
||||
@@ -229,7 +265,8 @@ over a rather long period of time, but improvements are always welcome!
|
||||
10. Conversely, if you are in an RCU read-side critical section,
|
||||
and you don't hold the appropriate update-side lock, you -must-
|
||||
use the "_rcu()" variants of the list macros. Failing to do so
|
||||
will break Alpha and confuse people reading your code.
|
||||
will break Alpha, cause aggressive compilers to generate bad code,
|
||||
and confuse people trying to read your code.
|
||||
|
||||
11. Note that synchronize_rcu() -only- guarantees to wait until
|
||||
all currently executing rcu_read_lock()-protected RCU read-side
|
||||
@@ -239,15 +276,21 @@ over a rather long period of time, but improvements are always welcome!
|
||||
rcu_read_lock()-protected read-side critical sections, do -not-
|
||||
use synchronize_rcu().
|
||||
|
||||
If you want to wait for some of these other things, you might
|
||||
instead need to use synchronize_irq() or synchronize_sched().
|
||||
Similarly, disabling preemption is not an acceptable substitute
|
||||
for rcu_read_lock(). Code that attempts to use preemption
|
||||
disabling where it should be using rcu_read_lock() will break
|
||||
in real-time kernel builds.
|
||||
|
||||
If you want to wait for interrupt handlers, NMI handlers, and
|
||||
code under the influence of preempt_disable(), you instead
|
||||
need to use synchronize_irq() or synchronize_sched().
|
||||
|
||||
12. Any lock acquired by an RCU callback must be acquired elsewhere
|
||||
with softirq disabled, e.g., via spin_lock_irqsave(),
|
||||
spin_lock_bh(), etc. Failing to disable irq on a given
|
||||
acquisition of that lock will result in deadlock as soon as the
|
||||
RCU callback happens to interrupt that acquisition's critical
|
||||
section.
|
||||
acquisition of that lock will result in deadlock as soon as
|
||||
the RCU softirq handler happens to run your RCU callback while
|
||||
interrupting that acquisition's critical section.
|
||||
|
||||
13. RCU callbacks can be and are executed in parallel. In many cases,
|
||||
the callback code simply wrappers around kfree(), so that this
|
||||
@@ -265,29 +308,30 @@ over a rather long period of time, but improvements are always welcome!
|
||||
not the case, a self-spawning RCU callback would prevent the
|
||||
victim CPU from ever going offline.)
|
||||
|
||||
14. SRCU (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu())
|
||||
may only be invoked from process context. Unlike other forms of
|
||||
RCU, it -is- permissible to block in an SRCU read-side critical
|
||||
section (demarked by srcu_read_lock() and srcu_read_unlock()),
|
||||
hence the "SRCU": "sleepable RCU". Please note that if you
|
||||
don't need to sleep in read-side critical sections, you should
|
||||
be using RCU rather than SRCU, because RCU is almost always
|
||||
faster and easier to use than is SRCU.
|
||||
14. SRCU (srcu_read_lock(), srcu_read_unlock(), srcu_dereference(),
|
||||
synchronize_srcu(), and synchronize_srcu_expedited()) may only
|
||||
be invoked from process context. Unlike other forms of RCU, it
|
||||
-is- permissible to block in an SRCU read-side critical section
|
||||
(demarked by srcu_read_lock() and srcu_read_unlock()), hence the
|
||||
"SRCU": "sleepable RCU". Please note that if you don't need
|
||||
to sleep in read-side critical sections, you should be using
|
||||
RCU rather than SRCU, because RCU is almost always faster and
|
||||
easier to use than is SRCU.
|
||||
|
||||
Also unlike other forms of RCU, explicit initialization
|
||||
and cleanup is required via init_srcu_struct() and
|
||||
cleanup_srcu_struct(). These are passed a "struct srcu_struct"
|
||||
that defines the scope of a given SRCU domain. Once initialized,
|
||||
the srcu_struct is passed to srcu_read_lock(), srcu_read_unlock()
|
||||
and synchronize_srcu(). A given synchronize_srcu() waits only
|
||||
for SRCU read-side critical sections governed by srcu_read_lock()
|
||||
and srcu_read_unlock() calls that have been passd the same
|
||||
srcu_struct. This property is what makes sleeping read-side
|
||||
critical sections tolerable -- a given subsystem delays only
|
||||
its own updates, not those of other subsystems using SRCU.
|
||||
Therefore, SRCU is less prone to OOM the system than RCU would
|
||||
be if RCU's read-side critical sections were permitted to
|
||||
sleep.
|
||||
synchronize_srcu(), and synchronize_srcu_expedited(). A given
|
||||
synchronize_srcu() waits only for SRCU read-side critical
|
||||
sections governed by srcu_read_lock() and srcu_read_unlock()
|
||||
calls that have been passed the same srcu_struct. This property
|
||||
is what makes sleeping read-side critical sections tolerable --
|
||||
a given subsystem delays only its own updates, not those of other
|
||||
subsystems using SRCU. Therefore, SRCU is less prone to OOM the
|
||||
system than RCU would be if RCU's read-side critical sections
|
||||
were permitted to sleep.
|
||||
|
||||
The ability to sleep in read-side critical sections does not
|
||||
come for free. First, corresponding srcu_read_lock() and
|
||||
@@ -311,12 +355,12 @@ over a rather long period of time, but improvements are always welcome!
|
||||
destructive operation, and -only- -then- invoke call_rcu(),
|
||||
synchronize_rcu(), or friends.
|
||||
|
||||
Because these primitives only wait for pre-existing readers,
|
||||
it is the caller's responsibility to guarantee safety to
|
||||
any subsequent readers.
|
||||
Because these primitives only wait for pre-existing readers, it
|
||||
is the caller's responsibility to guarantee that any subsequent
|
||||
readers will execute safely.
|
||||
|
||||
16. The various RCU read-side primitives do -not- contain memory
|
||||
barriers. The CPU (and in some cases, the compiler) is free
|
||||
to reorder code into and out of RCU read-side critical sections.
|
||||
It is the responsibility of the RCU update-side primitives to
|
||||
deal with this.
|
||||
16. The various RCU read-side primitives do -not- necessarily contain
|
||||
memory barriers. You should therefore plan for the CPU
|
||||
and the compiler to freely reorder code into and out of RCU
|
||||
read-side critical sections. It is the responsibility of the
|
||||
RCU update-side primitives to deal with this.
|
||||
|
||||
@@ -0,0 +1,67 @@
|
||||
RCU and lockdep checking
|
||||
|
||||
All flavors of RCU have lockdep checking available, so that lockdep is
|
||||
aware of when each task enters and leaves any flavor of RCU read-side
|
||||
critical section. Each flavor of RCU is tracked separately (but note
|
||||
that this is not the case in 2.6.32 and earlier). This allows lockdep's
|
||||
tracking to include RCU state, which can sometimes help when debugging
|
||||
deadlocks and the like.
|
||||
|
||||
In addition, RCU provides the following primitives that check lockdep's
|
||||
state:
|
||||
|
||||
rcu_read_lock_held() for normal RCU.
|
||||
rcu_read_lock_bh_held() for RCU-bh.
|
||||
rcu_read_lock_sched_held() for RCU-sched.
|
||||
srcu_read_lock_held() for SRCU.
|
||||
|
||||
These functions are conservative, and will therefore return 1 if they
|
||||
aren't certain (for example, if CONFIG_DEBUG_LOCK_ALLOC is not set).
|
||||
This prevents things like WARN_ON(!rcu_read_lock_held()) from giving false
|
||||
positives when lockdep is disabled.
|
||||
|
||||
In addition, a separate kernel config parameter CONFIG_PROVE_RCU enables
|
||||
checking of rcu_dereference() primitives:
|
||||
|
||||
rcu_dereference(p):
|
||||
Check for RCU read-side critical section.
|
||||
rcu_dereference_bh(p):
|
||||
Check for RCU-bh read-side critical section.
|
||||
rcu_dereference_sched(p):
|
||||
Check for RCU-sched read-side critical section.
|
||||
srcu_dereference(p, sp):
|
||||
Check for SRCU read-side critical section.
|
||||
rcu_dereference_check(p, c):
|
||||
Use explicit check expression "c".
|
||||
rcu_dereference_raw(p)
|
||||
Don't check. (Use sparingly, if at all.)
|
||||
|
||||
The rcu_dereference_check() check expression can be any boolean
|
||||
expression, but would normally include one of the rcu_read_lock_held()
|
||||
family of functions and a lockdep expression. However, any boolean
|
||||
expression can be used. For a moderately ornate example, consider
|
||||
the following:
|
||||
|
||||
file = rcu_dereference_check(fdt->fd[fd],
|
||||
rcu_read_lock_held() ||
|
||||
lockdep_is_held(&files->file_lock) ||
|
||||
atomic_read(&files->count) == 1);
|
||||
|
||||
This expression picks up the pointer "fdt->fd[fd]" in an RCU-safe manner,
|
||||
and, if CONFIG_PROVE_RCU is configured, verifies that this expression
|
||||
is used in:
|
||||
|
||||
1. An RCU read-side critical section, or
|
||||
2. with files->file_lock held, or
|
||||
3. on an unshared files_struct.
|
||||
|
||||
In case (1), the pointer is picked up in an RCU-safe manner for vanilla
|
||||
RCU read-side critical sections, in case (2) the ->file_lock prevents
|
||||
any change from taking place, and finally, in case (3) the current task
|
||||
is the only task accessing the file_struct, again preventing any change
|
||||
from taking place.
|
||||
|
||||
There are currently only "universal" versions of the rcu_assign_pointer()
|
||||
and RCU list-/tree-traversal primitives, which do not (yet) check for
|
||||
being in an RCU read-side critical section. In the future, separate
|
||||
versions of these primitives might be created.
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user