You've already forked linux-apfs
mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
Merge /spare/repo/linux-2.6/
This commit is contained in:
@@ -18,7 +18,7 @@
|
||||
Version 2, June 1991
|
||||
|
||||
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
|
||||
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
|
||||
51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
|
||||
Everyone is permitted to copy and distribute verbatim copies
|
||||
of this license document, but changing it is not allowed.
|
||||
|
||||
@@ -321,7 +321,7 @@ the "copyright" line and a pointer to where the full notice is found.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
|
||||
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
|
||||
|
||||
|
||||
Also add information on how to contact you by electronic and paper mail.
|
||||
|
||||
@@ -2423,8 +2423,7 @@ S: Toronto, Ontario
|
||||
S: Canada
|
||||
|
||||
N: Zwane Mwaikambo
|
||||
E: zwane@linuxpower.ca
|
||||
W: http://function.linuxpower.ca
|
||||
E: zwane@arm.linux.org.uk
|
||||
D: Various driver hacking
|
||||
D: Lowlevel x86 kernel hacking
|
||||
D: General debugging
|
||||
|
||||
@@ -46,6 +46,8 @@ SubmittingPatches
|
||||
- procedure to get a source patch included into the kernel tree.
|
||||
VGA-softcursor.txt
|
||||
- how to change your VGA cursor from a blinking underscore.
|
||||
applying-patches.txt
|
||||
- description of various trees and how to apply their patches.
|
||||
arm/
|
||||
- directory with info about Linux on the ARM architecture.
|
||||
basic_profiling.txt
|
||||
@@ -275,7 +277,7 @@ tty.txt
|
||||
unicode.txt
|
||||
- info on the Unicode character/font mapping used in Linux.
|
||||
uml/
|
||||
- directory with infomation about User Mode Linux.
|
||||
- directory with information about User Mode Linux.
|
||||
usb/
|
||||
- directory with info regarding the Universal Serial Bus.
|
||||
video4linux/
|
||||
|
||||
@@ -236,6 +236,9 @@ ugly), but try to avoid excess. Instead, put the comments at the head
|
||||
of the function, telling people what it does, and possibly WHY it does
|
||||
it.
|
||||
|
||||
When commenting the kernel API functions, please use the kerneldoc format.
|
||||
See the files Documentation/kernel-doc-nano-HOWTO.txt and scripts/kernel-doc
|
||||
for details.
|
||||
|
||||
Chapter 8: You've made a mess of it
|
||||
|
||||
|
||||
@@ -121,7 +121,7 @@ pool's device.
|
||||
dma_addr_t addr);
|
||||
|
||||
This puts memory back into the pool. The pool is what was passed to
|
||||
the the pool allocation routine; the cpu and dma addresses are what
|
||||
the pool allocation routine; the cpu and dma addresses are what
|
||||
were returned when that routine allocated the memory being freed.
|
||||
|
||||
|
||||
|
||||
@@ -0,0 +1,151 @@
|
||||
DMA with ISA and LPC devices
|
||||
============================
|
||||
|
||||
Pierre Ossman <drzeus@drzeus.cx>
|
||||
|
||||
This document describes how to do DMA transfers using the old ISA DMA
|
||||
controller. Even though ISA is more or less dead today the LPC bus
|
||||
uses the same DMA system so it will be around for quite some time.
|
||||
|
||||
Part I - Headers and dependencies
|
||||
---------------------------------
|
||||
|
||||
To do ISA style DMA you need to include two headers:
|
||||
|
||||
#include <linux/dma-mapping.h>
|
||||
#include <asm/dma.h>
|
||||
|
||||
The first is the generic DMA API used to convert virtual addresses to
|
||||
physical addresses (see Documentation/DMA-API.txt for details).
|
||||
|
||||
The second contains the routines specific to ISA DMA transfers. Since
|
||||
this is not present on all platforms make sure you construct your
|
||||
Kconfig to be dependent on ISA_DMA_API (not ISA) so that nobody tries
|
||||
to build your driver on unsupported platforms.
|
||||
|
||||
Part II - Buffer allocation
|
||||
---------------------------
|
||||
|
||||
The ISA DMA controller has some very strict requirements on which
|
||||
memory it can access so extra care must be taken when allocating
|
||||
buffers.
|
||||
|
||||
(You usually need a special buffer for DMA transfers instead of
|
||||
transferring directly to and from your normal data structures.)
|
||||
|
||||
The DMA-able address space is the lowest 16 MB of _physical_ memory.
|
||||
Also the transfer block may not cross page boundaries (which are 64
|
||||
or 128 KiB depending on which channel you use).
|
||||
|
||||
In order to allocate a piece of memory that satisfies all these
|
||||
requirements you pass the flag GFP_DMA to kmalloc.
|
||||
|
||||
Unfortunately the memory available for ISA DMA is scarce so unless you
|
||||
allocate the memory during boot-up it's a good idea to also pass
|
||||
__GFP_REPEAT and __GFP_NOWARN to make the allocater try a bit harder.
|
||||
|
||||
(This scarcity also means that you should allocate the buffer as
|
||||
early as possible and not release it until the driver is unloaded.)
|
||||
|
||||
Part III - Address translation
|
||||
------------------------------
|
||||
|
||||
To translate the virtual address to a physical use the normal DMA
|
||||
API. Do _not_ use isa_virt_to_phys() even though it does the same
|
||||
thing. The reason for this is that the function isa_virt_to_phys()
|
||||
will require a Kconfig dependency to ISA, not just ISA_DMA_API which
|
||||
is really all you need. Remember that even though the DMA controller
|
||||
has its origins in ISA it is used elsewhere.
|
||||
|
||||
Note: x86_64 had a broken DMA API when it came to ISA but has since
|
||||
been fixed. If your arch has problems then fix the DMA API instead of
|
||||
reverting to the ISA functions.
|
||||
|
||||
Part IV - Channels
|
||||
------------------
|
||||
|
||||
A normal ISA DMA controller has 8 channels. The lower four are for
|
||||
8-bit transfers and the upper four are for 16-bit transfers.
|
||||
|
||||
(Actually the DMA controller is really two separate controllers where
|
||||
channel 4 is used to give DMA access for the second controller (0-3).
|
||||
This means that of the four 16-bits channels only three are usable.)
|
||||
|
||||
You allocate these in a similar fashion as all basic resources:
|
||||
|
||||
extern int request_dma(unsigned int dmanr, const char * device_id);
|
||||
extern void free_dma(unsigned int dmanr);
|
||||
|
||||
The ability to use 16-bit or 8-bit transfers is _not_ up to you as a
|
||||
driver author but depends on what the hardware supports. Check your
|
||||
specs or test different channels.
|
||||
|
||||
Part V - Transfer data
|
||||
----------------------
|
||||
|
||||
Now for the good stuff, the actual DMA transfer. :)
|
||||
|
||||
Before you use any ISA DMA routines you need to claim the DMA lock
|
||||
using claim_dma_lock(). The reason is that some DMA operations are
|
||||
not atomic so only one driver may fiddle with the registers at a
|
||||
time.
|
||||
|
||||
The first time you use the DMA controller you should call
|
||||
clear_dma_ff(). This clears an internal register in the DMA
|
||||
controller that is used for the non-atomic operations. As long as you
|
||||
(and everyone else) uses the locking functions then you only need to
|
||||
reset this once.
|
||||
|
||||
Next, you tell the controller in which direction you intend to do the
|
||||
transfer using set_dma_mode(). Currently you have the options
|
||||
DMA_MODE_READ and DMA_MODE_WRITE.
|
||||
|
||||
Set the address from where the transfer should start (this needs to
|
||||
be 16-bit aligned for 16-bit transfers) and how many bytes to
|
||||
transfer. Note that it's _bytes_. The DMA routines will do all the
|
||||
required translation to values that the DMA controller understands.
|
||||
|
||||
The final step is enabling the DMA channel and releasing the DMA
|
||||
lock.
|
||||
|
||||
Once the DMA transfer is finished (or timed out) you should disable
|
||||
the channel again. You should also check get_dma_residue() to make
|
||||
sure that all data has been transfered.
|
||||
|
||||
Example:
|
||||
|
||||
int flags, residue;
|
||||
|
||||
flags = claim_dma_lock();
|
||||
|
||||
clear_dma_ff();
|
||||
|
||||
set_dma_mode(channel, DMA_MODE_WRITE);
|
||||
set_dma_addr(channel, phys_addr);
|
||||
set_dma_count(channel, num_bytes);
|
||||
|
||||
dma_enable(channel);
|
||||
|
||||
release_dma_lock(flags);
|
||||
|
||||
while (!device_done());
|
||||
|
||||
flags = claim_dma_lock();
|
||||
|
||||
dma_disable(channel);
|
||||
|
||||
residue = dma_get_residue(channel);
|
||||
if (residue != 0)
|
||||
printk(KERN_ERR "driver: Incomplete DMA transfer!"
|
||||
" %d bytes left!\n", residue);
|
||||
|
||||
release_dma_lock(flags);
|
||||
|
||||
Part VI - Suspend/resume
|
||||
------------------------
|
||||
|
||||
It is the driver's responsibility to make sure that the machine isn't
|
||||
suspended while a DMA transfer is in progress. Also, all DMA settings
|
||||
are lost when the system suspends so if your driver relies on the DMA
|
||||
controller being in a certain state then you have to restore these
|
||||
registers upon resume.
|
||||
@@ -116,7 +116,7 @@ filesystem. Almost.
|
||||
|
||||
You still need to actually journal your filesystem changes, this
|
||||
is done by wrapping them into transactions. Additionally you
|
||||
also need to wrap the modification of each of the the buffers
|
||||
also need to wrap the modification of each of the buffers
|
||||
with calls to the journal layer, so it knows what the modifications
|
||||
you are actually making are. To do this use journal_start() which
|
||||
returns a transaction handle.
|
||||
@@ -128,7 +128,7 @@ and its counterpart journal_stop(), which indicates the end of a transaction
|
||||
are nestable calls, so you can reenter a transaction if necessary,
|
||||
but remember you must call journal_stop() the same number of times as
|
||||
journal_start() before the transaction is completed (or more accurately
|
||||
leaves the the update phase). Ext3/VFS makes use of this feature to simplify
|
||||
leaves the update phase). Ext3/VFS makes use of this feature to simplify
|
||||
quota support.
|
||||
</para>
|
||||
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -96,7 +96,7 @@
|
||||
|
||||
<chapter id="pubfunctions">
|
||||
<title>Public Functions Provided</title>
|
||||
!Earch/i386/kernel/mca.c
|
||||
!Edrivers/mca/mca-legacy.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="dmafunctions">
|
||||
|
||||
@@ -841,7 +841,7 @@ usbdev_ioctl (int fd, int ifno, unsigned request, void *param)
|
||||
File modification time is not updated by this request.
|
||||
</para><para>
|
||||
Those struct members are from some interface descriptor
|
||||
applying to the the current configuration.
|
||||
applying to the current configuration.
|
||||
The interface number is the bInterfaceNumber value, and
|
||||
the altsetting number is the bAlternateSetting value.
|
||||
(This resets each endpoint in the interface.)
|
||||
|
||||
@@ -605,12 +605,13 @@ is in the ipmi_poweroff module. When the system requests a powerdown,
|
||||
it will send the proper IPMI commands to do this. This is supported on
|
||||
several platforms.
|
||||
|
||||
There is a module parameter named "poweroff_control" that may either be zero
|
||||
(do a power down) or 2 (do a power cycle, power the system off, then power
|
||||
it on in a few seconds). Setting ipmi_poweroff.poweroff_control=x will do
|
||||
the same thing on the kernel command line. The parameter is also available
|
||||
via the proc filesystem in /proc/ipmi/poweroff_control. Note that if the
|
||||
system does not support power cycling, it will always to the power off.
|
||||
There is a module parameter named "poweroff_powercycle" that may
|
||||
either be zero (do a power down) or non-zero (do a power cycle, power
|
||||
the system off, then power it on in a few seconds). Setting
|
||||
ipmi_poweroff.poweroff_control=x will do the same thing on the kernel
|
||||
command line. The parameter is also available via the proc filesystem
|
||||
in /proc/sys/dev/ipmi/poweroff_powercycle. Note that if the system
|
||||
does not support power cycling, it will always do the power off.
|
||||
|
||||
Note that if you have ACPI enabled, the system will prefer using ACPI to
|
||||
power off.
|
||||
|
||||
@@ -430,7 +430,7 @@ which may result in system hang. The software driver of specific
|
||||
MSI-capable hardware is responsible for whether calling
|
||||
pci_enable_msi or not. A return of zero indicates the kernel
|
||||
successfully initializes the MSI/MSI-X capability structure of the
|
||||
device funtion. The device function is now running on MSI/MSI-X mode.
|
||||
device function. The device function is now running on MSI/MSI-X mode.
|
||||
|
||||
5.6 How to tell whether MSI/MSI-X is enabled on device function
|
||||
|
||||
|
||||
@@ -0,0 +1,112 @@
|
||||
Using RCU to Protect Dynamic NMI Handlers
|
||||
|
||||
|
||||
Although RCU is usually used to protect read-mostly data structures,
|
||||
it is possible to use RCU to provide dynamic non-maskable interrupt
|
||||
handlers, as well as dynamic irq handlers. This document describes
|
||||
how to do this, drawing loosely from Zwane Mwaikambo's NMI-timer
|
||||
work in "arch/i386/oprofile/nmi_timer_int.c" and in
|
||||
"arch/i386/kernel/traps.c".
|
||||
|
||||
The relevant pieces of code are listed below, each followed by a
|
||||
brief explanation.
|
||||
|
||||
static int dummy_nmi_callback(struct pt_regs *regs, int cpu)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
The dummy_nmi_callback() function is a "dummy" NMI handler that does
|
||||
nothing, but returns zero, thus saying that it did nothing, allowing
|
||||
the NMI handler to take the default machine-specific action.
|
||||
|
||||
static nmi_callback_t nmi_callback = dummy_nmi_callback;
|
||||
|
||||
This nmi_callback variable is a global function pointer to the current
|
||||
NMI handler.
|
||||
|
||||
fastcall void do_nmi(struct pt_regs * regs, long error_code)
|
||||
{
|
||||
int cpu;
|
||||
|
||||
nmi_enter();
|
||||
|
||||
cpu = smp_processor_id();
|
||||
++nmi_count(cpu);
|
||||
|
||||
if (!rcu_dereference(nmi_callback)(regs, cpu))
|
||||
default_do_nmi(regs);
|
||||
|
||||
nmi_exit();
|
||||
}
|
||||
|
||||
The do_nmi() function processes each NMI. It first disables preemption
|
||||
in the same way that a hardware irq would, then increments the per-CPU
|
||||
count of NMIs. It then invokes the NMI handler stored in the nmi_callback
|
||||
function pointer. If this handler returns zero, do_nmi() invokes the
|
||||
default_do_nmi() function to handle a machine-specific NMI. Finally,
|
||||
preemption is restored.
|
||||
|
||||
Strictly speaking, rcu_dereference() is not needed, since this code runs
|
||||
only on i386, which does not need rcu_dereference() anyway. However,
|
||||
it is a good documentation aid, particularly for anyone attempting to
|
||||
do something similar on Alpha.
|
||||
|
||||
Quick Quiz: Why might the rcu_dereference() be necessary on Alpha,
|
||||
given that the code referenced by the pointer is read-only?
|
||||
|
||||
|
||||
Back to the discussion of NMI and RCU...
|
||||
|
||||
void set_nmi_callback(nmi_callback_t callback)
|
||||
{
|
||||
rcu_assign_pointer(nmi_callback, callback);
|
||||
}
|
||||
|
||||
The set_nmi_callback() function registers an NMI handler. Note that any
|
||||
data that is to be used by the callback must be initialized up -before-
|
||||
the call to set_nmi_callback(). On architectures that do not order
|
||||
writes, the rcu_assign_pointer() ensures that the NMI handler sees the
|
||||
initialized values.
|
||||
|
||||
void unset_nmi_callback(void)
|
||||
{
|
||||
rcu_assign_pointer(nmi_callback, dummy_nmi_callback);
|
||||
}
|
||||
|
||||
This function unregisters an NMI handler, restoring the original
|
||||
dummy_nmi_handler(). However, there may well be an NMI handler
|
||||
currently executing on some other CPU. We therefore cannot free
|
||||
up any data structures used by the old NMI handler until execution
|
||||
of it completes on all other CPUs.
|
||||
|
||||
One way to accomplish this is via synchronize_sched(), perhaps as
|
||||
follows:
|
||||
|
||||
unset_nmi_callback();
|
||||
synchronize_sched();
|
||||
kfree(my_nmi_data);
|
||||
|
||||
This works because synchronize_sched() blocks until all CPUs complete
|
||||
any preemption-disabled segments of code that they were executing.
|
||||
Since NMI handlers disable preemption, synchronize_sched() is guaranteed
|
||||
not to return until all ongoing NMI handlers exit. It is therefore safe
|
||||
to free up the handler's data as soon as synchronize_sched() returns.
|
||||
|
||||
|
||||
Answer to Quick Quiz
|
||||
|
||||
Why might the rcu_dereference() be necessary on Alpha, given
|
||||
that the code referenced by the pointer is read-only?
|
||||
|
||||
Answer: The caller to set_nmi_callback() might well have
|
||||
initialized some data that is to be used by the
|
||||
new NMI handler. In this case, the rcu_dereference()
|
||||
would be needed, because otherwise a CPU that received
|
||||
an NMI just after the new handler was set might see
|
||||
the pointer to the new NMI handler, but the old
|
||||
pre-initialized version of the handler's data.
|
||||
|
||||
More important, the rcu_dereference() makes it clear
|
||||
to someone reading the code that the pointer is being
|
||||
protected by RCU.
|
||||
@@ -2,7 +2,8 @@ Read the F-ing Papers!
|
||||
|
||||
|
||||
This document describes RCU-related publications, and is followed by
|
||||
the corresponding bibtex entries.
|
||||
the corresponding bibtex entries. A number of the publications may
|
||||
be found at http://www.rdrop.com/users/paulmck/RCU/.
|
||||
|
||||
The first thing resembling RCU was published in 1980, when Kung and Lehman
|
||||
[Kung80] recommended use of a garbage collector to defer destruction
|
||||
@@ -113,6 +114,10 @@ describing how to make RCU safe for soft-realtime applications [Sarma04c],
|
||||
and a paper describing SELinux performance with RCU [JamesMorris04b].
|
||||
|
||||
|
||||
2005 has seen further adaptation of RCU to realtime use, permitting
|
||||
preemption of RCU realtime critical sections [PaulMcKenney05a,
|
||||
PaulMcKenney05b].
|
||||
|
||||
Bibtex Entries
|
||||
|
||||
@article{Kung80
|
||||
@@ -410,3 +415,32 @@ Oregon Health and Sciences University"
|
||||
\url{http://www.livejournal.com/users/james_morris/2153.html}
|
||||
[Viewed December 10, 2004]"
|
||||
}
|
||||
|
||||
@unpublished{PaulMcKenney05a
|
||||
,Author="Paul E. McKenney"
|
||||
,Title="{[RFC]} {RCU} and {CONFIG\_PREEMPT\_RT} progress"
|
||||
,month="May"
|
||||
,year="2005"
|
||||
,note="Available:
|
||||
\url{http://lkml.org/lkml/2005/5/9/185}
|
||||
[Viewed May 13, 2005]"
|
||||
,annotation="
|
||||
First publication of working lock-based deferred free patches
|
||||
for the CONFIG_PREEMPT_RT environment.
|
||||
"
|
||||
}
|
||||
|
||||
@conference{PaulMcKenney05b
|
||||
,Author="Paul E. McKenney and Dipankar Sarma"
|
||||
,Title="Towards Hard Realtime Response from the Linux Kernel on SMP Hardware"
|
||||
,Booktitle="linux.conf.au 2005"
|
||||
,month="April"
|
||||
,year="2005"
|
||||
,address="Canberra, Australia"
|
||||
,note="Available:
|
||||
\url{http://www.rdrop.com/users/paulmck/RCU/realtimeRCU.2005.04.23a.pdf}
|
||||
[Viewed May 13, 2005]"
|
||||
,annotation="
|
||||
Realtime turns into making RCU yet more realtime friendly.
|
||||
"
|
||||
}
|
||||
|
||||
@@ -8,7 +8,7 @@ is that since there is only one CPU, it should not be necessary to
|
||||
wait for anything else to get done, since there are no other CPUs for
|
||||
anything else to be happening on. Although this approach will -sort- -of-
|
||||
work a surprising amount of the time, it is a very bad idea in general.
|
||||
This document presents two examples that demonstrate exactly how bad an
|
||||
This document presents three examples that demonstrate exactly how bad an
|
||||
idea this is.
|
||||
|
||||
|
||||
@@ -26,6 +26,9 @@ from softirq, the list scan would find itself referencing a newly freed
|
||||
element B. This situation can greatly decrease the life expectancy of
|
||||
your kernel.
|
||||
|
||||
This same problem can occur if call_rcu() is invoked from a hardware
|
||||
interrupt handler.
|
||||
|
||||
|
||||
Example 2: Function-Call Fatality
|
||||
|
||||
@@ -44,8 +47,37 @@ its arguments would cause it to fail to make the fundamental guarantee
|
||||
underlying RCU, namely that call_rcu() defers invoking its arguments until
|
||||
all RCU read-side critical sections currently executing have completed.
|
||||
|
||||
Quick Quiz: why is it -not- legal to invoke synchronize_rcu() in
|
||||
this case?
|
||||
Quick Quiz #1: why is it -not- legal to invoke synchronize_rcu() in
|
||||
this case?
|
||||
|
||||
|
||||
Example 3: Death by Deadlock
|
||||
|
||||
Suppose that call_rcu() is invoked while holding a lock, and that the
|
||||
callback function must acquire this same lock. In this case, if
|
||||
call_rcu() were to directly invoke the callback, the result would
|
||||
be self-deadlock.
|
||||
|
||||
In some cases, it would possible to restructure to code so that
|
||||
the call_rcu() is delayed until after the lock is released. However,
|
||||
there are cases where this can be quite ugly:
|
||||
|
||||
1. If a number of items need to be passed to call_rcu() within
|
||||
the same critical section, then the code would need to create
|
||||
a list of them, then traverse the list once the lock was
|
||||
released.
|
||||
|
||||
2. In some cases, the lock will be held across some kernel API,
|
||||
so that delaying the call_rcu() until the lock is released
|
||||
requires that the data item be passed up via a common API.
|
||||
It is far better to guarantee that callbacks are invoked
|
||||
with no locks held than to have to modify such APIs to allow
|
||||
arbitrary data items to be passed back up through them.
|
||||
|
||||
If call_rcu() directly invokes the callback, painful locking restrictions
|
||||
or API changes would be required.
|
||||
|
||||
Quick Quiz #2: What locking restriction must RCU callbacks respect?
|
||||
|
||||
|
||||
Summary
|
||||
@@ -53,12 +85,35 @@ Summary
|
||||
Permitting call_rcu() to immediately invoke its arguments or permitting
|
||||
synchronize_rcu() to immediately return breaks RCU, even on a UP system.
|
||||
So do not do it! Even on a UP system, the RCU infrastructure -must-
|
||||
respect grace periods.
|
||||
respect grace periods, and -must- invoke callbacks from a known environment
|
||||
in which no locks are held.
|
||||
|
||||
|
||||
Answer to Quick Quiz
|
||||
Answer to Quick Quiz #1:
|
||||
Why is it -not- legal to invoke synchronize_rcu() in this case?
|
||||
|
||||
The calling function is scanning an RCU-protected linked list, and
|
||||
is therefore within an RCU read-side critical section. Therefore,
|
||||
the called function has been invoked within an RCU read-side critical
|
||||
section, and is not permitted to block.
|
||||
Because the calling function is scanning an RCU-protected linked
|
||||
list, and is therefore within an RCU read-side critical section.
|
||||
Therefore, the called function has been invoked within an RCU
|
||||
read-side critical section, and is not permitted to block.
|
||||
|
||||
Answer to Quick Quiz #2:
|
||||
What locking restriction must RCU callbacks respect?
|
||||
|
||||
Any lock that is acquired within an RCU callback must be
|
||||
acquired elsewhere using an _irq variant of the spinlock
|
||||
primitive. For example, if "mylock" is acquired by an
|
||||
RCU callback, then a process-context acquisition of this
|
||||
lock must use something like spin_lock_irqsave() to
|
||||
acquire the lock.
|
||||
|
||||
If the process-context code were to simply use spin_lock(),
|
||||
then, since RCU callbacks can be invoked from softirq context,
|
||||
the callback might be called from a softirq that interrupted
|
||||
the process-context critical section. This would result in
|
||||
self-deadlock.
|
||||
|
||||
This restriction might seem gratuitous, since very few RCU
|
||||
callbacks acquire locks directly. However, a great many RCU
|
||||
callbacks do acquire locks -indirectly-, for example, via
|
||||
the kfree() primitive.
|
||||
|
||||
@@ -43,6 +43,10 @@ over a rather long period of time, but improvements are always welcome!
|
||||
rcu_read_lock_bh()) in the read-side critical sections,
|
||||
and are also an excellent aid to readability.
|
||||
|
||||
As a rough rule of thumb, any dereference of an RCU-protected
|
||||
pointer must be covered by rcu_read_lock() or rcu_read_lock_bh()
|
||||
or by the appropriate update-side lock.
|
||||
|
||||
3. Does the update code tolerate concurrent accesses?
|
||||
|
||||
The whole point of RCU is to permit readers to run without
|
||||
@@ -90,7 +94,11 @@ over a rather long period of time, but improvements are always welcome!
|
||||
|
||||
The rcu_dereference() primitive is used by the various
|
||||
"_rcu()" list-traversal primitives, such as the
|
||||
list_for_each_entry_rcu().
|
||||
list_for_each_entry_rcu(). Note that it is perfectly
|
||||
legal (if redundant) for update-side code to use
|
||||
rcu_dereference() and the "_rcu()" list-traversal
|
||||
primitives. This is particularly useful in code
|
||||
that is common to readers and updaters.
|
||||
|
||||
b. If the list macros are being used, the list_add_tail_rcu()
|
||||
and list_add_rcu() primitives must be used in order
|
||||
@@ -150,16 +158,9 @@ over a rather long period of time, but improvements are always welcome!
|
||||
|
||||
Use of the _rcu() list-traversal primitives outside of an
|
||||
RCU read-side critical section causes no harm other than
|
||||
a slight performance degradation on Alpha CPUs and some
|
||||
confusion on the part of people trying to read the code.
|
||||
|
||||
Another way of thinking of this is "If you are holding the
|
||||
lock that prevents the data structure from changing, why do
|
||||
you also need RCU-based protection?" That said, there may
|
||||
well be situations where use of the _rcu() list-traversal
|
||||
primitives while the update-side lock is held results in
|
||||
simpler and more maintainable code. The jury is still out
|
||||
on this question.
|
||||
a slight performance degradation on Alpha CPUs. It can
|
||||
also be quite helpful in reducing code bloat when common
|
||||
code is shared between readers and updaters.
|
||||
|
||||
10. Conversely, if you are in an RCU read-side critical section,
|
||||
you -must- use the "_rcu()" variants of the list macros.
|
||||
|
||||
@@ -64,6 +64,54 @@ o I hear that RCU is patented? What is with that?
|
||||
Of these, one was allowed to lapse by the assignee, and the
|
||||
others have been contributed to the Linux kernel under GPL.
|
||||
|
||||
o I hear that RCU needs work in order to support realtime kernels?
|
||||
|
||||
Yes, work in progress.
|
||||
|
||||
o Where can I find more information on RCU?
|
||||
|
||||
See the RTFP.txt file in this directory.
|
||||
Or point your browser at http://www.rdrop.com/users/paulmck/RCU/.
|
||||
|
||||
o What are all these files in this directory?
|
||||
|
||||
|
||||
NMI-RCU.txt
|
||||
|
||||
Describes how to use RCU to implement dynamic
|
||||
NMI handlers, which can be revectored on the fly,
|
||||
without rebooting.
|
||||
|
||||
RTFP.txt
|
||||
|
||||
List of RCU-related publications and web sites.
|
||||
|
||||
UP.txt
|
||||
|
||||
Discussion of RCU usage in UP kernels.
|
||||
|
||||
arrayRCU.txt
|
||||
|
||||
Describes how to use RCU to protect arrays, with
|
||||
resizeable arrays whose elements reference other
|
||||
data structures being of the most interest.
|
||||
|
||||
checklist.txt
|
||||
|
||||
Lists things to check for when inspecting code that
|
||||
uses RCU.
|
||||
|
||||
listRCU.txt
|
||||
|
||||
Describes how to use RCU to protect linked lists.
|
||||
This is the simplest and most common use of RCU
|
||||
in the Linux kernel.
|
||||
|
||||
rcu.txt
|
||||
|
||||
You are reading it!
|
||||
|
||||
whatisRCU.txt
|
||||
|
||||
Overview of how the RCU implementation works. Along
|
||||
the way, presents a conceptual view of RCU.
|
||||
|
||||
@@ -0,0 +1,74 @@
|
||||
Refcounter framework for elements of lists/arrays protected by
|
||||
RCU.
|
||||
|
||||
Refcounting on elements of lists which are protected by traditional
|
||||
reader/writer spinlocks or semaphores are straight forward as in:
|
||||
|
||||
1. 2.
|
||||
add() search_and_reference()
|
||||
{ {
|
||||
alloc_object read_lock(&list_lock);
|
||||
... search_for_element
|
||||
atomic_set(&el->rc, 1); atomic_inc(&el->rc);
|
||||
write_lock(&list_lock); ...
|
||||
add_element read_unlock(&list_lock);
|
||||
... ...
|
||||
write_unlock(&list_lock); }
|
||||
}
|
||||
|
||||
3. 4.
|
||||
release_referenced() delete()
|
||||
{ {
|
||||
... write_lock(&list_lock);
|
||||
atomic_dec(&el->rc, relfunc) ...
|
||||
... delete_element
|
||||
} write_unlock(&list_lock);
|
||||
...
|
||||
if (atomic_dec_and_test(&el->rc))
|
||||
kfree(el);
|
||||
...
|
||||
}
|
||||
|
||||
If this list/array is made lock free using rcu as in changing the
|
||||
write_lock in add() and delete() to spin_lock and changing read_lock
|
||||
in search_and_reference to rcu_read_lock(), the rcuref_get in
|
||||
search_and_reference could potentially hold reference to an element which
|
||||
has already been deleted from the list/array. rcuref_lf_get_rcu takes
|
||||
care of this scenario. search_and_reference should look as;
|
||||
|
||||
1. 2.
|
||||
add() search_and_reference()
|
||||
{ {
|
||||
alloc_object rcu_read_lock();
|
||||
... search_for_element
|
||||
atomic_set(&el->rc, 1); if (rcuref_inc_lf(&el->rc)) {
|
||||
write_lock(&list_lock); rcu_read_unlock();
|
||||
return FAIL;
|
||||
add_element }
|
||||
... ...
|
||||
write_unlock(&list_lock); rcu_read_unlock();
|
||||
} }
|
||||
3. 4.
|
||||
release_referenced() delete()
|
||||
{ {
|
||||
... write_lock(&list_lock);
|
||||
rcuref_dec(&el->rc, relfunc) ...
|
||||
... delete_element
|
||||
} write_unlock(&list_lock);
|
||||
...
|
||||
if (rcuref_dec_and_test(&el->rc))
|
||||
call_rcu(&el->head, el_free);
|
||||
...
|
||||
}
|
||||
|
||||
Sometimes, reference to the element need to be obtained in the
|
||||
update (write) stream. In such cases, rcuref_inc_lf might be an overkill
|
||||
since the spinlock serialising list updates are held. rcuref_inc
|
||||
is to be used in such cases.
|
||||
For arches which do not have cmpxchg rcuref_inc_lf
|
||||
api uses a hashed spinlock implementation and the same hashed spinlock
|
||||
is acquired in all rcuref_xxx primitives to preserve atomicity.
|
||||
Note: Use rcuref_inc api only if you need to use rcuref_inc_lf on the
|
||||
refcounter atleast at one place. Mixing rcuref_inc and atomic_xxx api
|
||||
might lead to races. rcuref_inc_lf() must be used in lockfree
|
||||
RCU critical sections only.
|
||||
File diff suppressed because it is too large
Load Diff
@@ -33,3 +33,6 @@ The result of the execution of this aml method is
|
||||
attached to /proc/acpi/hotkey/poll_method, which is dnyamically
|
||||
created. Please use command "cat /proc/acpi/hotkey/polling_method"
|
||||
to retrieve it.
|
||||
|
||||
Note: Use cmdline "acpi_generic_hotkey" to over-ride
|
||||
platform-specific with generic driver.
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user