You've already forked linux-apfs
mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
merge linus into release branch
Conflicts: drivers/acpi/acpi_memhotplug.c
This commit is contained in:
@@ -3401,10 +3401,10 @@ S: Czech Republic
|
||||
|
||||
N: Thibaut Varene
|
||||
E: T-Bone@parisc-linux.org
|
||||
W: http://www.parisc-linux.org/
|
||||
W: http://www.parisc-linux.org/~varenet/
|
||||
P: 1024D/B7D2F063 E67C 0D43 A75E 12A5 BB1C FA2F 1E32 C3DA B7D2 F063
|
||||
D: PA-RISC port minion, PDC and GSCPS2 drivers, debuglocks and other bits
|
||||
D: Some bits in an ARM port, S1D13XXX FB driver, random patches here and there
|
||||
D: Some ARM at91rm9200 bits, S1D13XXX FB driver, random patches here and there
|
||||
D: AD1889 sound driver
|
||||
S: Paris, France
|
||||
|
||||
|
||||
@@ -181,8 +181,8 @@ Intel IA32 microcode
|
||||
--------------------
|
||||
|
||||
A driver has been added to allow updating of Intel IA32 microcode,
|
||||
accessible as both a devfs regular file and as a normal (misc)
|
||||
character device. If you are not using devfs you may need to:
|
||||
accessible as a normal (misc) character device. If you are not using
|
||||
udev you may need to:
|
||||
|
||||
mkdir /dev/cpu
|
||||
mknod /dev/cpu/microcode c 10 184
|
||||
@@ -201,7 +201,9 @@ with programs using shared memory.
|
||||
udev
|
||||
----
|
||||
udev is a userspace application for populating /dev dynamically with
|
||||
only entries for devices actually present. udev replaces devfs.
|
||||
only entries for devices actually present. udev replaces the basic
|
||||
functionality of devfs, while allowing persistant device naming for
|
||||
devices.
|
||||
|
||||
FUSE
|
||||
----
|
||||
@@ -231,18 +233,13 @@ The PPP driver has been restructured to support multilink and to
|
||||
enable it to operate over diverse media layers. If you use PPP,
|
||||
upgrade pppd to at least 2.4.0.
|
||||
|
||||
If you are not using devfs, you must have the device file /dev/ppp
|
||||
If you are not using udev, you must have the device file /dev/ppp
|
||||
which can be made by:
|
||||
|
||||
mknod /dev/ppp c 108 0
|
||||
|
||||
as root.
|
||||
|
||||
If you use devfsd and build ppp support as modules, you will need
|
||||
the following in your /etc/devfsd.conf file:
|
||||
|
||||
LOOKUP PPP MODLOAD
|
||||
|
||||
Isdn4k-utils
|
||||
------------
|
||||
|
||||
|
||||
@@ -10,7 +10,8 @@ DOCBOOKS := wanbook.xml z8530book.xml mcabook.xml videobook.xml \
|
||||
kernel-hacking.xml kernel-locking.xml deviceiobook.xml \
|
||||
procfs-guide.xml writing_usb_driver.xml \
|
||||
kernel-api.xml journal-api.xml lsm.xml usb.xml \
|
||||
gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml
|
||||
gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
|
||||
genericirq.xml
|
||||
|
||||
###
|
||||
# The build process is as follows (targets):
|
||||
|
||||
@@ -0,0 +1,474 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
||||
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
|
||||
|
||||
<book id="Generic-IRQ-Guide">
|
||||
<bookinfo>
|
||||
<title>Linux generic IRQ handling</title>
|
||||
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Thomas</firstname>
|
||||
<surname>Gleixner</surname>
|
||||
<affiliation>
|
||||
<address>
|
||||
<email>tglx@linutronix.de</email>
|
||||
</address>
|
||||
</affiliation>
|
||||
</author>
|
||||
<author>
|
||||
<firstname>Ingo</firstname>
|
||||
<surname>Molnar</surname>
|
||||
<affiliation>
|
||||
<address>
|
||||
<email>mingo@elte.hu</email>
|
||||
</address>
|
||||
</affiliation>
|
||||
</author>
|
||||
</authorgroup>
|
||||
|
||||
<copyright>
|
||||
<year>2005-2006</year>
|
||||
<holder>Thomas Gleixner</holder>
|
||||
</copyright>
|
||||
<copyright>
|
||||
<year>2005-2006</year>
|
||||
<holder>Ingo Molnar</holder>
|
||||
</copyright>
|
||||
|
||||
<legalnotice>
|
||||
<para>
|
||||
This documentation is free software; you can redistribute
|
||||
it and/or modify it under the terms of the GNU General Public
|
||||
License version 2 as published by the Free Software Foundation.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This program is distributed in the hope that it will be
|
||||
useful, but WITHOUT ANY WARRANTY; without even the implied
|
||||
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
See the GNU General Public License for more details.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
You should have received a copy of the GNU General Public
|
||||
License along with this program; if not, write to the Free
|
||||
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
|
||||
MA 02111-1307 USA
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For more details see the file COPYING in the source
|
||||
distribution of Linux.
|
||||
</para>
|
||||
</legalnotice>
|
||||
</bookinfo>
|
||||
|
||||
<toc></toc>
|
||||
|
||||
<chapter id="intro">
|
||||
<title>Introduction</title>
|
||||
<para>
|
||||
The generic interrupt handling layer is designed to provide a
|
||||
complete abstraction of interrupt handling for device drivers.
|
||||
It is able to handle all the different types of interrupt controller
|
||||
hardware. Device drivers use generic API functions to request, enable,
|
||||
disable and free interrupts. The drivers do not have to know anything
|
||||
about interrupt hardware details, so they can be used on different
|
||||
platforms without code changes.
|
||||
</para>
|
||||
<para>
|
||||
This documentation is provided to developers who want to implement
|
||||
an interrupt subsystem based for their architecture, with the help
|
||||
of the generic IRQ handling layer.
|
||||
</para>
|
||||
</chapter>
|
||||
|
||||
<chapter id="rationale">
|
||||
<title>Rationale</title>
|
||||
<para>
|
||||
The original implementation of interrupt handling in Linux is using
|
||||
the __do_IRQ() super-handler, which is able to deal with every
|
||||
type of interrupt logic.
|
||||
</para>
|
||||
<para>
|
||||
Originally, Russell King identified different types of handlers to
|
||||
build a quite universal set for the ARM interrupt handler
|
||||
implementation in Linux 2.5/2.6. He distinguished between:
|
||||
<itemizedlist>
|
||||
<listitem><para>Level type</para></listitem>
|
||||
<listitem><para>Edge type</para></listitem>
|
||||
<listitem><para>Simple type</para></listitem>
|
||||
</itemizedlist>
|
||||
In the SMP world of the __do_IRQ() super-handler another type
|
||||
was identified:
|
||||
<itemizedlist>
|
||||
<listitem><para>Per CPU type</para></listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
<para>
|
||||
This split implementation of highlevel IRQ handlers allows us to
|
||||
optimize the flow of the interrupt handling for each specific
|
||||
interrupt type. This reduces complexity in that particular codepath
|
||||
and allows the optimized handling of a given type.
|
||||
</para>
|
||||
<para>
|
||||
The original general IRQ implementation used hw_interrupt_type
|
||||
structures and their ->ack(), ->end() [etc.] callbacks to
|
||||
differentiate the flow control in the super-handler. This leads to
|
||||
a mix of flow logic and lowlevel hardware logic, and it also leads
|
||||
to unnecessary code duplication: for example in i386, there is a
|
||||
ioapic_level_irq and a ioapic_edge_irq irq-type which share many
|
||||
of the lowlevel details but have different flow handling.
|
||||
</para>
|
||||
<para>
|
||||
A more natural abstraction is the clean separation of the
|
||||
'irq flow' and the 'chip details'.
|
||||
</para>
|
||||
<para>
|
||||
Analysing a couple of architecture's IRQ subsystem implementations
|
||||
reveals that most of them can use a generic set of 'irq flow'
|
||||
methods and only need to add the chip level specific code.
|
||||
The separation is also valuable for (sub)architectures
|
||||
which need specific quirks in the irq flow itself but not in the
|
||||
chip-details - and thus provides a more transparent IRQ subsystem
|
||||
design.
|
||||
</para>
|
||||
<para>
|
||||
Each interrupt descriptor is assigned its own highlevel flow
|
||||
handler, which is normally one of the generic
|
||||
implementations. (This highlevel flow handler implementation also
|
||||
makes it simple to provide demultiplexing handlers which can be
|
||||
found in embedded platforms on various architectures.)
|
||||
</para>
|
||||
<para>
|
||||
The separation makes the generic interrupt handling layer more
|
||||
flexible and extensible. For example, an (sub)architecture can
|
||||
use a generic irq-flow implementation for 'level type' interrupts
|
||||
and add a (sub)architecture specific 'edge type' implementation.
|
||||
</para>
|
||||
<para>
|
||||
To make the transition to the new model easier and prevent the
|
||||
breakage of existing implementations, the __do_IRQ() super-handler
|
||||
is still available. This leads to a kind of duality for the time
|
||||
being. Over time the new model should be used in more and more
|
||||
architectures, as it enables smaller and cleaner IRQ subsystems.
|
||||
</para>
|
||||
</chapter>
|
||||
<chapter id="bugs">
|
||||
<title>Known Bugs And Assumptions</title>
|
||||
<para>
|
||||
None (knock on wood).
|
||||
</para>
|
||||
</chapter>
|
||||
|
||||
<chapter id="Abstraction">
|
||||
<title>Abstraction layers</title>
|
||||
<para>
|
||||
There are three main levels of abstraction in the interrupt code:
|
||||
<orderedlist>
|
||||
<listitem><para>Highlevel driver API</para></listitem>
|
||||
<listitem><para>Highlevel IRQ flow handlers</para></listitem>
|
||||
<listitem><para>Chiplevel hardware encapsulation</para></listitem>
|
||||
</orderedlist>
|
||||
</para>
|
||||
<sect1>
|
||||
<title>Interrupt control flow</title>
|
||||
<para>
|
||||
Each interrupt is described by an interrupt descriptor structure
|
||||
irq_desc. The interrupt is referenced by an 'unsigned int' numeric
|
||||
value which selects the corresponding interrupt decription structure
|
||||
in the descriptor structures array.
|
||||
The descriptor structure contains status information and pointers
|
||||
to the interrupt flow method and the interrupt chip structure
|
||||
which are assigned to this interrupt.
|
||||
</para>
|
||||
<para>
|
||||
Whenever an interrupt triggers, the lowlevel arch code calls into
|
||||
the generic interrupt code by calling desc->handle_irq().
|
||||
This highlevel IRQ handling function only uses desc->chip primitives
|
||||
referenced by the assigned chip descriptor structure.
|
||||
</para>
|
||||
</sect1>
|
||||
<sect1>
|
||||
<title>Highlevel Driver API</title>
|
||||
<para>
|
||||
The highlevel Driver API consists of following functions:
|
||||
<itemizedlist>
|
||||
<listitem><para>request_irq()</para></listitem>
|
||||
<listitem><para>free_irq()</para></listitem>
|
||||
<listitem><para>disable_irq()</para></listitem>
|
||||
<listitem><para>enable_irq()</para></listitem>
|
||||
<listitem><para>disable_irq_nosync() (SMP only)</para></listitem>
|
||||
<listitem><para>synchronize_irq() (SMP only)</para></listitem>
|
||||
<listitem><para>set_irq_type()</para></listitem>
|
||||
<listitem><para>set_irq_wake()</para></listitem>
|
||||
<listitem><para>set_irq_data()</para></listitem>
|
||||
<listitem><para>set_irq_chip()</para></listitem>
|
||||
<listitem><para>set_irq_chip_data()</para></listitem>
|
||||
</itemizedlist>
|
||||
See the autogenerated function documentation for details.
|
||||
</para>
|
||||
</sect1>
|
||||
<sect1>
|
||||
<title>Highlevel IRQ flow handlers</title>
|
||||
<para>
|
||||
The generic layer provides a set of pre-defined irq-flow methods:
|
||||
<itemizedlist>
|
||||
<listitem><para>handle_level_irq</para></listitem>
|
||||
<listitem><para>handle_edge_irq</para></listitem>
|
||||
<listitem><para>handle_simple_irq</para></listitem>
|
||||
<listitem><para>handle_percpu_irq</para></listitem>
|
||||
</itemizedlist>
|
||||
The interrupt flow handlers (either predefined or architecture
|
||||
specific) are assigned to specific interrupts by the architecture
|
||||
either during bootup or during device initialization.
|
||||
</para>
|
||||
<sect2>
|
||||
<title>Default flow implementations</title>
|
||||
<sect3>
|
||||
<title>Helper functions</title>
|
||||
<para>
|
||||
The helper functions call the chip primitives and
|
||||
are used by the default flow implementations.
|
||||
The following helper functions are implemented (simplified excerpt):
|
||||
<programlisting>
|
||||
default_enable(irq)
|
||||
{
|
||||
desc->chip->unmask(irq);
|
||||
}
|
||||
|
||||
default_disable(irq)
|
||||
{
|
||||
if (!delay_disable(irq))
|
||||
desc->chip->mask(irq);
|
||||
}
|
||||
|
||||
default_ack(irq)
|
||||
{
|
||||
chip->ack(irq);
|
||||
}
|
||||
|
||||
default_mask_ack(irq)
|
||||
{
|
||||
if (chip->mask_ack) {
|
||||
chip->mask_ack(irq);
|
||||
} else {
|
||||
chip->mask(irq);
|
||||
chip->ack(irq);
|
||||
}
|
||||
}
|
||||
|
||||
noop(irq)
|
||||
{
|
||||
}
|
||||
|
||||
</programlisting>
|
||||
</para>
|
||||
</sect3>
|
||||
</sect2>
|
||||
<sect2>
|
||||
<title>Default flow handler implementations</title>
|
||||
<sect3>
|
||||
<title>Default Level IRQ flow handler</title>
|
||||
<para>
|
||||
handle_level_irq provides a generic implementation
|
||||
for level-triggered interrupts.
|
||||
</para>
|
||||
<para>
|
||||
The following control flow is implemented (simplified excerpt):
|
||||
<programlisting>
|
||||
desc->chip->start();
|
||||
handle_IRQ_event(desc->action);
|
||||
desc->chip->end();
|
||||
</programlisting>
|
||||
</para>
|
||||
</sect3>
|
||||
<sect3>
|
||||
<title>Default Edge IRQ flow handler</title>
|
||||
<para>
|
||||
handle_edge_irq provides a generic implementation
|
||||
for edge-triggered interrupts.
|
||||
</para>
|
||||
<para>
|
||||
The following control flow is implemented (simplified excerpt):
|
||||
<programlisting>
|
||||
if (desc->status & running) {
|
||||
desc->chip->hold();
|
||||
desc->status |= pending | masked;
|
||||
return;
|
||||
}
|
||||
desc->chip->start();
|
||||
desc->status |= running;
|
||||
do {
|
||||
if (desc->status & masked)
|
||||
desc->chip->enable();
|
||||
desc-status &= ~pending;
|
||||
handle_IRQ_event(desc->action);
|
||||
} while (status & pending);
|
||||
desc-status &= ~running;
|
||||
desc->chip->end();
|
||||
</programlisting>
|
||||
</para>
|
||||
</sect3>
|
||||
<sect3>
|
||||
<title>Default simple IRQ flow handler</title>
|
||||
<para>
|
||||
handle_simple_irq provides a generic implementation
|
||||
for simple interrupts.
|
||||
</para>
|
||||
<para>
|
||||
Note: The simple flow handler does not call any
|
||||
handler/chip primitives.
|
||||
</para>
|
||||
<para>
|
||||
The following control flow is implemented (simplified excerpt):
|
||||
<programlisting>
|
||||
handle_IRQ_event(desc->action);
|
||||
</programlisting>
|
||||
</para>
|
||||
</sect3>
|
||||
<sect3>
|
||||
<title>Default per CPU flow handler</title>
|
||||
<para>
|
||||
handle_percpu_irq provides a generic implementation
|
||||
for per CPU interrupts.
|
||||
</para>
|
||||
<para>
|
||||
Per CPU interrupts are only available on SMP and
|
||||
the handler provides a simplified version without
|
||||
locking.
|
||||
</para>
|
||||
<para>
|
||||
The following control flow is implemented (simplified excerpt):
|
||||
<programlisting>
|
||||
desc->chip->start();
|
||||
handle_IRQ_event(desc->action);
|
||||
desc->chip->end();
|
||||
</programlisting>
|
||||
</para>
|
||||
</sect3>
|
||||
</sect2>
|
||||
<sect2>
|
||||
<title>Quirks and optimizations</title>
|
||||
<para>
|
||||
The generic functions are intended for 'clean' architectures and chips,
|
||||
which have no platform-specific IRQ handling quirks. If an architecture
|
||||
needs to implement quirks on the 'flow' level then it can do so by
|
||||
overriding the highlevel irq-flow handler.
|
||||
</para>
|
||||
</sect2>
|
||||
<sect2>
|
||||
<title>Delayed interrupt disable</title>
|
||||
<para>
|
||||
This per interrupt selectable feature, which was introduced by Russell
|
||||
King in the ARM interrupt implementation, does not mask an interrupt
|
||||
at the hardware level when disable_irq() is called. The interrupt is
|
||||
kept enabled and is masked in the flow handler when an interrupt event
|
||||
happens. This prevents losing edge interrupts on hardware which does
|
||||
not store an edge interrupt event while the interrupt is disabled at
|
||||
the hardware level. When an interrupt arrives while the IRQ_DISABLED
|
||||
flag is set, then the interrupt is masked at the hardware level and
|
||||
the IRQ_PENDING bit is set. When the interrupt is re-enabled by
|
||||
enable_irq() the pending bit is checked and if it is set, the
|
||||
interrupt is resent either via hardware or by a software resend
|
||||
mechanism. (It's necessary to enable CONFIG_HARDIRQS_SW_RESEND when
|
||||
you want to use the delayed interrupt disable feature and your
|
||||
hardware is not capable of retriggering an interrupt.)
|
||||
The delayed interrupt disable can be runtime enabled, per interrupt,
|
||||
by setting the IRQ_DELAYED_DISABLE flag in the irq_desc status field.
|
||||
</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
<sect1>
|
||||
<title>Chiplevel hardware encapsulation</title>
|
||||
<para>
|
||||
The chip level hardware descriptor structure irq_chip
|
||||
contains all the direct chip relevant functions, which
|
||||
can be utilized by the irq flow implementations.
|
||||
<itemizedlist>
|
||||
<listitem><para>ack()</para></listitem>
|
||||
<listitem><para>mask_ack() - Optional, recommended for performance</para></listitem>
|
||||
<listitem><para>mask()</para></listitem>
|
||||
<listitem><para>unmask()</para></listitem>
|
||||
<listitem><para>retrigger() - Optional</para></listitem>
|
||||
<listitem><para>set_type() - Optional</para></listitem>
|
||||
<listitem><para>set_wake() - Optional</para></listitem>
|
||||
</itemizedlist>
|
||||
These primitives are strictly intended to mean what they say: ack means
|
||||
ACK, masking means masking of an IRQ line, etc. It is up to the flow
|
||||
handler(s) to use these basic units of lowlevel functionality.
|
||||
</para>
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
<chapter id="doirq">
|
||||
<title>__do_IRQ entry point</title>
|
||||
<para>
|
||||
The original implementation __do_IRQ() is an alternative entry
|
||||
point for all types of interrupts.
|
||||
</para>
|
||||
<para>
|
||||
This handler turned out to be not suitable for all
|
||||
interrupt hardware and was therefore reimplemented with split
|
||||
functionality for egde/level/simple/percpu interrupts. This is not
|
||||
only a functional optimization. It also shortens code paths for
|
||||
interrupts.
|
||||
</para>
|
||||
<para>
|
||||
To make use of the split implementation, replace the call to
|
||||
__do_IRQ by a call to desc->chip->handle_irq() and associate
|
||||
the appropriate handler function to desc->chip->handle_irq().
|
||||
In most cases the generic handler implementations should
|
||||
be sufficient.
|
||||
</para>
|
||||
</chapter>
|
||||
|
||||
<chapter id="locking">
|
||||
<title>Locking on SMP</title>
|
||||
<para>
|
||||
The locking of chip registers is up to the architecture that
|
||||
defines the chip primitives. There is a chip->lock field that can be used
|
||||
for serialization, but the generic layer does not touch it. The per-irq
|
||||
structure is protected via desc->lock, by the generic layer.
|
||||
</para>
|
||||
</chapter>
|
||||
<chapter id="structs">
|
||||
<title>Structures</title>
|
||||
<para>
|
||||
This chapter contains the autogenerated documentation of the structures which are
|
||||
used in the generic IRQ layer.
|
||||
</para>
|
||||
!Iinclude/linux/irq.h
|
||||
</chapter>
|
||||
|
||||
<chapter id="pubfunctions">
|
||||
<title>Public Functions Provided</title>
|
||||
<para>
|
||||
This chapter contains the autogenerated documentation of the kernel API functions
|
||||
which are exported.
|
||||
</para>
|
||||
!Ekernel/irq/manage.c
|
||||
!Ekernel/irq/chip.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="intfunctions">
|
||||
<title>Internal Functions Provided</title>
|
||||
<para>
|
||||
This chapter contains the autogenerated documentation of the internal functions.
|
||||
</para>
|
||||
!Ikernel/irq/handle.c
|
||||
!Ikernel/irq/chip.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="credits">
|
||||
<title>Credits</title>
|
||||
<para>
|
||||
The following people have contributed to this document:
|
||||
<orderedlist>
|
||||
<listitem><para>Thomas Gleixner<email>tglx@linutronix.de</email></para></listitem>
|
||||
<listitem><para>Ingo Molnar<email>mingo@elte.hu</email></para></listitem>
|
||||
</orderedlist>
|
||||
</para>
|
||||
</chapter>
|
||||
</book>
|
||||
@@ -348,11 +348,6 @@ X!Earch/i386/kernel/mca.c
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
<chapter id="devfs">
|
||||
<title>The Device File System</title>
|
||||
!Efs/devfs/base.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="sysfs">
|
||||
<title>The Filesystem for Exporting Kernel Objects</title>
|
||||
!Efs/sysfs/file.c
|
||||
|
||||
@@ -0,0 +1,22 @@
|
||||
What is an IRQ?
|
||||
|
||||
An IRQ is an interrupt request from a device.
|
||||
Currently they can come in over a pin, or over a packet.
|
||||
Several devices may be connected to the same pin thus
|
||||
sharing an IRQ.
|
||||
|
||||
An IRQ number is a kernel identifier used to talk about a hardware
|
||||
interrupt source. Typically this is an index into the global irq_desc
|
||||
array, but except for what linux/interrupt.h implements the details
|
||||
are architecture specific.
|
||||
|
||||
An IRQ number is an enumeration of the possible interrupt sources on a
|
||||
machine. Typically what is enumerated is the number of input pins on
|
||||
all of the interrupt controller in the system. In the case of ISA
|
||||
what is enumerated are the 16 input pins on the two i8259 interrupt
|
||||
controllers.
|
||||
|
||||
Architectures can assign additional meaning to the IRQ numbers, and
|
||||
are encouraged to in the case where there is any manual configuration
|
||||
of the hardware involved. The ISA IRQs are a classic example of
|
||||
assigning this kind of additional meaning.
|
||||
@@ -7,7 +7,7 @@ The CONFIG_RCU_TORTURE_TEST config option is available for all RCU
|
||||
implementations. It creates an rcutorture kernel module that can
|
||||
be loaded to run a torture test. The test periodically outputs
|
||||
status messages via printk(), which can be examined via the dmesg
|
||||
command (perhaps grepping for "rcutorture"). The test is started
|
||||
command (perhaps grepping for "torture"). The test is started
|
||||
when the module is loaded, and stops when the module is unloaded.
|
||||
|
||||
However, actually setting this config option to "y" results in the system
|
||||
@@ -35,6 +35,19 @@ stat_interval The number of seconds between output of torture
|
||||
be printed -only- when the module is unloaded, and this
|
||||
is the default.
|
||||
|
||||
shuffle_interval
|
||||
The number of seconds to keep the test threads affinitied
|
||||
to a particular subset of the CPUs. Used in conjunction
|
||||
with test_no_idle_hz.
|
||||
|
||||
test_no_idle_hz Whether or not to test the ability of RCU to operate in
|
||||
a kernel that disables the scheduling-clock interrupt to
|
||||
idle CPUs. Boolean parameter, "1" to test, "0" otherwise.
|
||||
|
||||
torture_type The type of RCU to test: "rcu" for the rcu_read_lock()
|
||||
API, "rcu_bh" for the rcu_read_lock_bh() API, and "srcu"
|
||||
for the "srcu_read_lock()" API.
|
||||
|
||||
verbose Enable debug printk()s. Default is disabled.
|
||||
|
||||
|
||||
@@ -42,14 +55,14 @@ OUTPUT
|
||||
|
||||
The statistics output is as follows:
|
||||
|
||||
rcutorture: --- Start of test: nreaders=16 stat_interval=0 verbose=0
|
||||
rcutorture: rtc: 0000000000000000 ver: 1916 tfle: 0 rta: 1916 rtaf: 0 rtf: 1915
|
||||
rcutorture: Reader Pipe: 1466408 9747 0 0 0 0 0 0 0 0 0
|
||||
rcutorture: Reader Batch: 1464477 11678 0 0 0 0 0 0 0 0
|
||||
rcutorture: Free-Block Circulation: 1915 1915 1915 1915 1915 1915 1915 1915 1915 1915 0
|
||||
rcutorture: --- End of test
|
||||
rcu-torture: --- Start of test: nreaders=16 stat_interval=0 verbose=0
|
||||
rcu-torture: rtc: 0000000000000000 ver: 1916 tfle: 0 rta: 1916 rtaf: 0 rtf: 1915
|
||||
rcu-torture: Reader Pipe: 1466408 9747 0 0 0 0 0 0 0 0 0
|
||||
rcu-torture: Reader Batch: 1464477 11678 0 0 0 0 0 0 0 0
|
||||
rcu-torture: Free-Block Circulation: 1915 1915 1915 1915 1915 1915 1915 1915 1915 1915 0
|
||||
rcu-torture: --- End of test
|
||||
|
||||
The command "dmesg | grep rcutorture:" will extract this information on
|
||||
The command "dmesg | grep torture:" will extract this information on
|
||||
most systems. On more esoteric configurations, it may be necessary to
|
||||
use other commands to access the output of the printk()s used by
|
||||
the RCU torture test. The printk()s use KERN_ALERT, so they should
|
||||
@@ -115,8 +128,9 @@ The following script may be used to torture RCU:
|
||||
modprobe rcutorture
|
||||
sleep 100
|
||||
rmmod rcutorture
|
||||
dmesg | grep rcutorture:
|
||||
dmesg | grep torture:
|
||||
|
||||
The output can be manually inspected for the error flag of "!!!".
|
||||
One could of course create a more elaborate script that automatically
|
||||
checked for such errors.
|
||||
checked for such errors. The "rmmod" command forces a "SUCCESS" or
|
||||
"FAILURE" indication to be printk()ed.
|
||||
|
||||
@@ -78,9 +78,9 @@ also known as "System Drives", and Drive Groups are also called "Packs". Both
|
||||
terms are in use in the Mylex documentation; I have chosen to standardize on
|
||||
the more generic "Logical Drive" and "Drive Group".
|
||||
|
||||
DAC960 RAID disk devices are named in the style of the Device File System
|
||||
(DEVFS). The device corresponding to Logical Drive D on Controller C is
|
||||
referred to as /dev/rd/cCdD, and the partitions are called /dev/rd/cCdDp1
|
||||
DAC960 RAID disk devices are named in the style of the obsolete Device File
|
||||
System (DEVFS). The device corresponding to Logical Drive D on Controller C
|
||||
is referred to as /dev/rd/cCdD, and the partitions are called /dev/rd/cCdDp1
|
||||
through /dev/rd/cCdDp7. For example, partition 3 of Logical Drive 5 on
|
||||
Controller 2 is referred to as /dev/rd/c2d5p3. Note that unlike with SCSI
|
||||
disks the device names will not change in the event of a disk drive failure.
|
||||
|
||||
@@ -6,17 +6,6 @@ be removed from this file.
|
||||
|
||||
---------------------------
|
||||
|
||||
What: devfs
|
||||
When: July 2005
|
||||
Files: fs/devfs/*, include/linux/devfs_fs*.h and assorted devfs
|
||||
function calls throughout the kernel tree
|
||||
Why: It has been unmaintained for a number of years, has unfixable
|
||||
races, contains a naming policy within the kernel that is
|
||||
against the LSB, and can be replaced by using udev.
|
||||
Who: Greg Kroah-Hartman <greg@kroah.com>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: RAW driver (CONFIG_RAW_DRIVER)
|
||||
When: December 2005
|
||||
Why: declared obsolete since kernel 2.6.3
|
||||
@@ -132,16 +121,6 @@ Who: NeilBrown <neilb@suse.de>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: au1x00_uart driver
|
||||
When: January 2006
|
||||
Why: The 8250 serial driver now has the ability to deal with the differences
|
||||
between the standard 8250 family of UARTs and their slightly strange
|
||||
brother on Alchemy SOCs. The loss of features is not considered an
|
||||
issue.
|
||||
Who: Ralf Baechle <ralf@linux-mips.org>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: eepro100 network driver
|
||||
When: January 2007
|
||||
Why: replaced by the e100 driver
|
||||
@@ -177,6 +156,16 @@ Who: Jean Delvare <khali@linux-fr.org>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: Unused EXPORT_SYMBOL/EXPORT_SYMBOL_GPL exports
|
||||
(temporary transition config option provided until then)
|
||||
The transition config option will also be removed at the same time.
|
||||
When: before 2.6.19
|
||||
Why: Unused symbols are both increasing the size of the kernel binary
|
||||
and are often a sign of "wrong API"
|
||||
Who: Arjan van de Ven <arjan@linux.intel.com>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: remove EXPORT_SYMBOL(tasklist_lock)
|
||||
When: August 2006
|
||||
Files: kernel/fork.c
|
||||
@@ -224,3 +213,47 @@ Why: The interface no longer has any callers left in the kernel. It
|
||||
Who: Nick Piggin <npiggin@suse.de>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: Support for the MIPS EV96100 evaluation board
|
||||
When: September 2006
|
||||
Why: Does no longer build since at least November 15, 2003, apparently
|
||||
no userbase left.
|
||||
Who: Ralf Baechle <ralf@linux-mips.org>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: Support for the Momentum / PMC-Sierra Jaguar ATX evaluation board
|
||||
When: September 2006
|
||||
Why: Does no longer build since quite some time, and was never popular,
|
||||
due to the platform being replaced by successor models. Apparently
|
||||
no user base left. It also is one of the last users of
|
||||
WANT_PAGE_VIRTUAL.
|
||||
Who: Ralf Baechle <ralf@linux-mips.org>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: Support for the Momentum Ocelot, Ocelot 3, Ocelot C and Ocelot G
|
||||
When: September 2006
|
||||
Why: Some do no longer build and apparently there is no user base left
|
||||
for these platforms.
|
||||
Who: Ralf Baechle <ralf@linux-mips.org>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: Support for MIPS Technologies' Altas and SEAD evaluation board
|
||||
When: September 2006
|
||||
Why: Some do no longer build and apparently there is no user base left
|
||||
for these platforms. Hardware out of production since several years.
|
||||
Who: Ralf Baechle <ralf@linux-mips.org>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: Support for the IT8172-based platforms, ITE 8172G and Globespan IVR
|
||||
When: September 2006
|
||||
Why: Code does no longer build since at least 2.6.0, apparently there is
|
||||
no user base left for these platforms. Hardware out of production
|
||||
since several years and hardly a trace of the manufacturer left on
|
||||
the net.
|
||||
Who: Ralf Baechle <ralf@linux-mips.org>
|
||||
|
||||
---------------------------
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -1,40 +0,0 @@
|
||||
Device File System (devfs) ToDo List
|
||||
|
||||
Richard Gooch <rgooch@atnf.csiro.au>
|
||||
|
||||
3-JUL-2000
|
||||
|
||||
This is a list of things to be done for better devfs support in the
|
||||
Linux kernel. If you'd like to contribute to the devfs, please have a
|
||||
look at this list for anything that is unallocated. Also, if there are
|
||||
items missing (surely), please contact me so I can add them to the
|
||||
list (preferably with your name attached to them:-).
|
||||
|
||||
|
||||
- >256 ptys
|
||||
Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
|
||||
|
||||
- Amiga floppy driver (drivers/block/amiflop.c)
|
||||
|
||||
- Atari floppy driver (drivers/block/ataflop.c)
|
||||
|
||||
- SWIM3 (Super Woz Integrated Machine 3) floppy driver (drivers/block/swim3.c)
|
||||
|
||||
- Amiga ZorroII ramdisc driver (drivers/block/z2ram.c)
|
||||
|
||||
- Parallel port ATAPI CD-ROM (drivers/block/paride/pcd.c)
|
||||
|
||||
- Parallel port ATAPI floppy (drivers/block/paride/pf.c)
|
||||
|
||||
- AP1000 block driver (drivers/ap1000/ap.c, drivers/ap1000/ddv.c)
|
||||
|
||||
- Archimedes floppy (drivers/acorn/block/fd1772.c)
|
||||
|
||||
- MFM hard drive (drivers/acorn/block/mfmhd.c)
|
||||
|
||||
- I2O block device (drivers/message/i2o/i2o_block.c)
|
||||
|
||||
- ST-RAM device (arch/m68k/atari/stram.c)
|
||||
|
||||
- Raw devices
|
||||
|
||||
@@ -1,65 +0,0 @@
|
||||
/* -*- auto-fill -*- */
|
||||
|
||||
Device File System (devfs) Boot Options
|
||||
|
||||
Richard Gooch <rgooch@atnf.csiro.au>
|
||||
|
||||
18-AUG-2001
|
||||
|
||||
|
||||
When CONFIG_DEVFS_DEBUG is enabled, you can pass several boot options
|
||||
to the kernel to debug devfs. The boot options are prefixed by
|
||||
"devfs=", and are separated by commas. Spaces are not allowed. The
|
||||
syntax looks like this:
|
||||
|
||||
devfs=<option1>,<option2>,<option3>
|
||||
|
||||
and so on. For example, if you wanted to turn on debugging for module
|
||||
load requests and device registration, you would do:
|
||||
|
||||
devfs=dmod,dreg
|
||||
|
||||
You may prefix "no" to any option. This will invert the option.
|
||||
|
||||
|
||||
Debugging Options
|
||||
=================
|
||||
|
||||
These requires CONFIG_DEVFS_DEBUG to be enabled.
|
||||
Note that all debugging options have 'd' as the first character. By
|
||||
default all options are off. All debugging output is sent to the
|
||||
kernel logs. The debugging options do not take effect until the devfs
|
||||
version message appears (just prior to the root filesystem being
|
||||
mounted).
|
||||
|
||||
These are the options:
|
||||
|
||||
dmod print module load requests to <request_module>
|
||||
|
||||
dreg print device register requests to <devfs_register>
|
||||
|
||||
dunreg print device unregister requests to <devfs_unregister>
|
||||
|
||||
dchange print device change requests to <devfs_set_flags>
|
||||
|
||||
dilookup print inode lookup requests
|
||||
|
||||
diget print VFS inode allocations
|
||||
|
||||
diunlink print inode unlinks
|
||||
|
||||
dichange print inode changes
|
||||
|
||||
dimknod print calls to mknod(2)
|
||||
|
||||
dall some debugging turned on
|
||||
|
||||
|
||||
Other Options
|
||||
=============
|
||||
|
||||
These control the default behaviour of devfs. The options are:
|
||||
|
||||
mount mount devfs onto /dev at boot time
|
||||
|
||||
only disable non-devfs device nodes for devfs-capable drivers
|
||||
@@ -67,8 +67,7 @@ initrd adds the following new options:
|
||||
as the last process has closed it, all data is freed and /dev/initrd
|
||||
can't be opened anymore.
|
||||
|
||||
root=/dev/ram0 (without devfs)
|
||||
root=/dev/rd/0 (with devfs)
|
||||
root=/dev/ram0
|
||||
|
||||
initrd is mounted as root, and the normal boot procedure is followed,
|
||||
with the RAM disk still mounted as root.
|
||||
@@ -90,8 +89,7 @@ you're building an install floppy), the root file system creation
|
||||
procedure should create the /initrd directory.
|
||||
|
||||
If initrd will not be mounted in some cases, its content is still
|
||||
accessible if the following device has been created (note that this
|
||||
does not work if using devfs):
|
||||
accessible if the following device has been created:
|
||||
|
||||
# mknod /dev/initrd b 1 250
|
||||
# chmod 400 /dev/initrd
|
||||
@@ -119,8 +117,7 @@ We'll describe the loopback device method:
|
||||
(if space is critical, you may want to use the Minix FS instead of Ext2)
|
||||
3) mount the file system, e.g.
|
||||
# mount -t ext2 -o loop initrd /mnt
|
||||
4) create the console device (not necessary if using devfs, but it can't
|
||||
hurt to do it anyway):
|
||||
4) create the console device:
|
||||
# mkdir /mnt/dev
|
||||
# mknod /mnt/dev/console c 5 1
|
||||
5) copy all the files that are needed to properly use the initrd
|
||||
@@ -152,12 +149,7 @@ have to be given:
|
||||
|
||||
root=/dev/ram0 init=/linuxrc rw
|
||||
|
||||
if not using devfs, or
|
||||
|
||||
root=/dev/rd/0 init=/linuxrc rw
|
||||
|
||||
if using devfs. (rw is only necessary if writing to the initrd file
|
||||
system.)
|
||||
(rw is only necessary if writing to the initrd file system.)
|
||||
|
||||
With LOADLIN, you simply execute
|
||||
|
||||
@@ -217,9 +209,9 @@ following command:
|
||||
# exec chroot . what-follows <dev/console >dev/console 2>&1
|
||||
|
||||
Where what-follows is a program under the new root, e.g. /sbin/init
|
||||
If the new root file system will be used with devfs and has no valid
|
||||
/dev directory, devfs must be mounted before invoking chroot in order to
|
||||
provide /dev/console.
|
||||
If the new root file system will be used with udev and has no valid
|
||||
/dev directory, udev must be initialized before invoking chroot in order
|
||||
to provide /dev/console.
|
||||
|
||||
Note: implementation details of pivot_root may change with time. In order
|
||||
to ensure compatibility, the following points should be observed:
|
||||
@@ -236,7 +228,7 @@ Now, the initrd can be unmounted and the memory allocated by the RAM
|
||||
disk can be freed:
|
||||
|
||||
# umount /initrd
|
||||
# blockdev --flushbufs /dev/ram0 # /dev/rd/0 if using devfs
|
||||
# blockdev --flushbufs /dev/ram0
|
||||
|
||||
It is also possible to use initrd with an NFS-mounted root, see the
|
||||
pivot_root(8) man page for details.
|
||||
|
||||
@@ -119,7 +119,6 @@ Code Seq# Include File Comments
|
||||
'c' 00-7F linux/comstats.h conflict!
|
||||
'c' 00-7F linux/coda.h conflict!
|
||||
'd' 00-FF linux/char/drm/drm/h conflict!
|
||||
'd' 00-1F linux/devfs_fs.h conflict!
|
||||
'd' 00-DF linux/video_decoder.h conflict!
|
||||
'd' F0-FF linux/digi1.h
|
||||
'e' all linux/digi1.h conflict!
|
||||
|
||||
@@ -35,7 +35,6 @@ parameter is applicable:
|
||||
APM Advanced Power Management support is enabled.
|
||||
AX25 Appropriate AX.25 support is enabled.
|
||||
CD Appropriate CD support is enabled.
|
||||
DEVFS devfs support is enabled.
|
||||
DRM Direct Rendering Management support is enabled.
|
||||
EDD BIOS Enhanced Disk Drive Services (EDD) is enabled
|
||||
EFI EFI Partitioning (GPT) is enabled
|
||||
@@ -440,9 +439,6 @@ running once the system is up.
|
||||
Format: <area>[,<node>]
|
||||
See also Documentation/networking/decnet.txt.
|
||||
|
||||
devfs= [DEVFS]
|
||||
See Documentation/filesystems/devfs/boot-options.
|
||||
|
||||
dhash_entries= [KNL]
|
||||
Set number of hash buckets for dentry cache.
|
||||
|
||||
@@ -1669,6 +1665,10 @@ running once the system is up.
|
||||
usbhid.mousepoll=
|
||||
[USBHID] The interval which mice are to be polled at.
|
||||
|
||||
vdso= [IA-32]
|
||||
vdso=1: enable VDSO (default)
|
||||
vdso=0: disable VDSO mapping
|
||||
|
||||
video= [FB] Frame buffer configuration
|
||||
See Documentation/fb/modedb.txt.
|
||||
|
||||
@@ -1685,9 +1685,14 @@ running once the system is up.
|
||||
decrease the size and leave more room for directly
|
||||
mapped kernel RAM.
|
||||
|
||||
vmhalt= [KNL,S390]
|
||||
vmhalt= [KNL,S390] Perform z/VM CP command after system halt.
|
||||
Format: <command>
|
||||
|
||||
vmpoff= [KNL,S390]
|
||||
vmpanic= [KNL,S390] Perform z/VM CP command after kernel panic.
|
||||
Format: <command>
|
||||
|
||||
vmpoff= [KNL,S390] Perform z/VM CP command after power off.
|
||||
Format: <command>
|
||||
|
||||
waveartist= [HW,OSS]
|
||||
Format: <io>,<irq>,<dma>,<dma2>
|
||||
|
||||
@@ -3,16 +3,23 @@
|
||||
===================
|
||||
|
||||
The key request service is part of the key retention service (refer to
|
||||
Documentation/keys.txt). This document explains more fully how that the
|
||||
requesting algorithm works.
|
||||
Documentation/keys.txt). This document explains more fully how the requesting
|
||||
algorithm works.
|
||||
|
||||
The process starts by either the kernel requesting a service by calling
|
||||
request_key():
|
||||
request_key*():
|
||||
|
||||
struct key *request_key(const struct key_type *type,
|
||||
const char *description,
|
||||
const char *callout_string);
|
||||
|
||||
or:
|
||||
|
||||
struct key *request_key_with_auxdata(const struct key_type *type,
|
||||
const char *description,
|
||||
const char *callout_string,
|
||||
void *aux);
|
||||
|
||||
Or by userspace invoking the request_key system call:
|
||||
|
||||
key_serial_t request_key(const char *type,
|
||||
@@ -20,16 +27,26 @@ Or by userspace invoking the request_key system call:
|
||||
const char *callout_info,
|
||||
key_serial_t dest_keyring);
|
||||
|
||||
The main difference between the two access points is that the in-kernel
|
||||
interface does not need to link the key to a keyring to prevent it from being
|
||||
immediately destroyed. The kernel interface returns a pointer directly to the
|
||||
key, and it's up to the caller to destroy the key.
|
||||
The main difference between the access points is that the in-kernel interface
|
||||
does not need to link the key to a keyring to prevent it from being immediately
|
||||
destroyed. The kernel interface returns a pointer directly to the key, and
|
||||
it's up to the caller to destroy the key.
|
||||
|
||||
The request_key_with_auxdata() call is like the in-kernel request_key() call,
|
||||
except that it permits auxiliary data to be passed to the upcaller (the default
|
||||
is NULL). This is only useful for those key types that define their own upcall
|
||||
mechanism rather than using /sbin/request-key.
|
||||
|
||||
The userspace interface links the key to a keyring associated with the process
|
||||
to prevent the key from going away, and returns the serial number of the key to
|
||||
the caller.
|
||||
|
||||
|
||||
The following example assumes that the key types involved don't define their
|
||||
own upcall mechanisms. If they do, then those should be substituted for the
|
||||
forking and execution of /sbin/request-key.
|
||||
|
||||
|
||||
===========
|
||||
THE PROCESS
|
||||
===========
|
||||
@@ -40,8 +57,8 @@ A request proceeds in the following manner:
|
||||
interface].
|
||||
|
||||
(2) request_key() searches the process's subscribed keyrings to see if there's
|
||||
a suitable key there. If there is, it returns the key. If there isn't, and
|
||||
callout_info is not set, an error is returned. Otherwise the process
|
||||
a suitable key there. If there is, it returns the key. If there isn't,
|
||||
and callout_info is not set, an error is returned. Otherwise the process
|
||||
proceeds to the next step.
|
||||
|
||||
(3) request_key() sees that A doesn't have the desired key yet, so it creates
|
||||
@@ -79,10 +96,11 @@ A request proceeds in the following manner:
|
||||
(10) The program then exits 0 and request_key() deletes key V and returns key
|
||||
U to the caller.
|
||||
|
||||
This also extends further. If key W (step 7 above) didn't exist, key W would be
|
||||
created uninstantiated, another auth key (X) would be created (as per step 3)
|
||||
and another copy of /sbin/request-key spawned (as per step 4); but the context
|
||||
specified by auth key X will still be process A, as it was in auth key V.
|
||||
This also extends further. If key W (step 7 above) didn't exist, key W would
|
||||
be created uninstantiated, another auth key (X) would be created (as per step
|
||||
3) and another copy of /sbin/request-key spawned (as per step 4); but the
|
||||
context specified by auth key X will still be process A, as it was in auth key
|
||||
V.
|
||||
|
||||
This is because process A's keyrings can't simply be attached to
|
||||
/sbin/request-key at the appropriate places because (a) execve will discard two
|
||||
|
||||
@@ -780,6 +780,17 @@ payload contents" for more information.
|
||||
See also Documentation/keys-request-key.txt.
|
||||
|
||||
|
||||
(*) To search for a key, passing auxiliary data to the upcaller, call:
|
||||
|
||||
struct key *request_key_with_auxdata(const struct key_type *type,
|
||||
const char *description,
|
||||
const char *callout_string,
|
||||
void *aux);
|
||||
|
||||
This is identical to request_key(), except that the auxiliary data is
|
||||
passed to the key_type->request_key() op if it exists.
|
||||
|
||||
|
||||
(*) When it is no longer required, the key should be released using:
|
||||
|
||||
void key_put(struct key *key);
|
||||
@@ -1031,6 +1042,24 @@ The structure has a number of fields, some of which are mandatory:
|
||||
as might happen when the userspace buffer is accessed.
|
||||
|
||||
|
||||
(*) int (*request_key)(struct key *key, struct key *authkey, const char *op,
|
||||
void *aux);
|
||||
|
||||
This method is optional. If provided, request_key() and
|
||||
request_key_with_auxdata() will invoke this function rather than
|
||||
upcalling to /sbin/request-key to operate upon a key of this type.
|
||||
|
||||
The aux parameter is as passed to request_key_with_auxdata() or is NULL
|
||||
otherwise. Also passed are the key to be operated upon, the
|
||||
authorisation key for this operation and the operation type (currently
|
||||
only "create").
|
||||
|
||||
This function should return only when the upcall is complete. Upon return
|
||||
the authorisation key will be revoked, and the target key will be
|
||||
negatively instantiated if it is still uninstantiated. The error will be
|
||||
returned to the caller of request_key*().
|
||||
|
||||
|
||||
============================
|
||||
REQUEST-KEY CALLBACK SERVICE
|
||||
============================
|
||||
|
||||
@@ -0,0 +1,121 @@
|
||||
Lightweight PI-futexes
|
||||
----------------------
|
||||
|
||||
We are calling them lightweight for 3 reasons:
|
||||
|
||||
- in the user-space fastpath a PI-enabled futex involves no kernel work
|
||||
(or any other PI complexity) at all. No registration, no extra kernel
|
||||
calls - just pure fast atomic ops in userspace.
|
||||
|
||||
- even in the slowpath, the system call and scheduling pattern is very
|
||||
similar to normal futexes.
|
||||
|
||||
- the in-kernel PI implementation is streamlined around the mutex
|
||||
abstraction, with strict rules that keep the implementation
|
||||
relatively simple: only a single owner may own a lock (i.e. no
|
||||
read-write lock support), only the owner may unlock a lock, no
|
||||
recursive locking, etc.
|
||||
|
||||
Priority Inheritance - why?
|
||||
---------------------------
|
||||
|
||||
The short reply: user-space PI helps achieving/improving determinism for
|
||||
user-space applications. In the best-case, it can help achieve
|
||||
determinism and well-bound latencies. Even in the worst-case, PI will
|
||||
improve the statistical distribution of locking related application
|
||||
delays.
|
||||
|
||||
The longer reply:
|
||||
-----------------
|
||||
|
||||
Firstly, sharing locks between multiple tasks is a common programming
|
||||
technique that often cannot be replaced with lockless algorithms. As we
|
||||
can see it in the kernel [which is a quite complex program in itself],
|
||||
lockless structures are rather the exception than the norm - the current
|
||||
ratio of lockless vs. locky code for shared data structures is somewhere
|
||||
between 1:10 and 1:100. Lockless is hard, and the complexity of lockless
|
||||
algorithms often endangers to ability to do robust reviews of said code.
|
||||
I.e. critical RT apps often choose lock structures to protect critical
|
||||
data structures, instead of lockless algorithms. Furthermore, there are
|
||||
cases (like shared hardware, or other resource limits) where lockless
|
||||
access is mathematically impossible.
|
||||
|
||||
Media players (such as Jack) are an example of reasonable application
|
||||
design with multiple tasks (with multiple priority levels) sharing
|
||||
short-held locks: for example, a highprio audio playback thread is
|
||||
combined with medium-prio construct-audio-data threads and low-prio
|
||||
display-colory-stuff threads. Add video and decoding to the mix and
|
||||
we've got even more priority levels.
|
||||
|
||||
So once we accept that synchronization objects (locks) are an
|
||||
unavoidable fact of life, and once we accept that multi-task userspace
|
||||
apps have a very fair expectation of being able to use locks, we've got
|
||||
to think about how to offer the option of a deterministic locking
|
||||
implementation to user-space.
|
||||
|
||||
Most of the technical counter-arguments against doing priority
|
||||
inheritance only apply to kernel-space locks. But user-space locks are
|
||||
different, there we cannot disable interrupts or make the task
|
||||
non-preemptible in a critical section, so the 'use spinlocks' argument
|
||||
does not apply (user-space spinlocks have the same priority inversion
|
||||
problems as other user-space locking constructs). Fact is, pretty much
|
||||
the only technique that currently enables good determinism for userspace
|
||||
locks (such as futex-based pthread mutexes) is priority inheritance:
|
||||
|
||||
Currently (without PI), if a high-prio and a low-prio task shares a lock
|
||||
[this is a quite common scenario for most non-trivial RT applications],
|
||||
even if all critical sections are coded carefully to be deterministic
|
||||
(i.e. all critical sections are short in duration and only execute a
|
||||
limited number of instructions), the kernel cannot guarantee any
|
||||
deterministic execution of the high-prio task: any medium-priority task
|
||||
could preempt the low-prio task while it holds the shared lock and
|
||||
executes the critical section, and could delay it indefinitely.
|
||||
|
||||
Implementation:
|
||||
---------------
|
||||
|
||||
As mentioned before, the userspace fastpath of PI-enabled pthread
|
||||
mutexes involves no kernel work at all - they behave quite similarly to
|
||||
normal futex-based locks: a 0 value means unlocked, and a value==TID
|
||||
means locked. (This is the same method as used by list-based robust
|
||||
futexes.) Userspace uses atomic ops to lock/unlock these mutexes without
|
||||
entering the kernel.
|
||||
|
||||
To handle the slowpath, we have added two new futex ops:
|
||||
|
||||
FUTEX_LOCK_PI
|
||||
FUTEX_UNLOCK_PI
|
||||
|
||||
If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to
|
||||
TID fails], then FUTEX_LOCK_PI is called. The kernel does all the
|
||||
remaining work: if there is no futex-queue attached to the futex address
|
||||
yet then the code looks up the task that owns the futex [it has put its
|
||||
own TID into the futex value], and attaches a 'PI state' structure to
|
||||
the futex-queue. The pi_state includes an rt-mutex, which is a PI-aware,
|
||||
kernel-based synchronization object. The 'other' task is made the owner
|
||||
of the rt-mutex, and the FUTEX_WAITERS bit is atomically set in the
|
||||
futex value. Then this task tries to lock the rt-mutex, on which it
|
||||
blocks. Once it returns, it has the mutex acquired, and it sets the
|
||||
futex value to its own TID and returns. Userspace has no other work to
|
||||
perform - it now owns the lock, and futex value contains
|
||||
FUTEX_WAITERS|TID.
|
||||
|
||||
If the unlock side fastpath succeeds, [i.e. userspace manages to do a
|
||||
TID -> 0 atomic transition of the futex value], then no kernel work is
|
||||
triggered.
|
||||
|
||||
If the unlock fastpath fails (because the FUTEX_WAITERS bit is set),
|
||||
then FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the
|
||||
behalf of userspace - and it also unlocks the attached
|
||||
pi_state->rt_mutex and thus wakes up any potential waiters.
|
||||
|
||||
Note that under this approach, contrary to previous PI-futex approaches,
|
||||
there is no prior 'registration' of a PI-futex. [which is not quite
|
||||
possible anyway, due to existing ABI properties of pthread mutexes.]
|
||||
|
||||
Also, under this scheme, 'robustness' and 'PI' are two orthogonal
|
||||
properties of futexes, and all four combinations are possible: futex,
|
||||
robust-futex, PI-futex, robust+PI-futex.
|
||||
|
||||
More details about priority inheritance can be found in
|
||||
Documentation/rtmutex.txt.
|
||||
@@ -95,7 +95,7 @@ comparison. If the thread has registered a list, then normally the list
|
||||
is empty. If the thread/process crashed or terminated in some incorrect
|
||||
way then the list might be non-empty: in this case the kernel carefully
|
||||
walks the list [not trusting it], and marks all locks that are owned by
|
||||
this thread with the FUTEX_OWNER_DEAD bit, and wakes up one waiter (if
|
||||
this thread with the FUTEX_OWNER_DIED bit, and wakes up one waiter (if
|
||||
any).
|
||||
|
||||
The list is guaranteed to be private and per-thread at do_exit() time,
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user