[SCSI] Merge branch 'linus'

Conflicts:
	drivers/message/fusion/mptsas.c

fixed up conflict between req->data_len accessors and mptsas driver updates.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This commit is contained in:
James Bottomley
2009-06-12 10:02:03 -05:00
1794 changed files with 103325 additions and 35037 deletions
+59
View File
@@ -60,3 +60,62 @@ Description:
Indicates whether the block layer should automatically
generate checksums for write requests bound for
devices that support receiving integrity metadata.
What: /sys/block/<disk>/alignment_offset
Date: April 2009
Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
Storage devices may report a physical block size that is
bigger than the logical block size (for instance a drive
with 4KB physical sectors exposing 512-byte logical
blocks to the operating system). This parameter
indicates how many bytes the beginning of the device is
offset from the disk's natural alignment.
What: /sys/block/<disk>/<partition>/alignment_offset
Date: April 2009
Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
Storage devices may report a physical block size that is
bigger than the logical block size (for instance a drive
with 4KB physical sectors exposing 512-byte logical
blocks to the operating system). This parameter
indicates how many bytes the beginning of the partition
is offset from the disk's natural alignment.
What: /sys/block/<disk>/queue/logical_block_size
Date: May 2009
Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
This is the smallest unit the storage device can
address. It is typically 512 bytes.
What: /sys/block/<disk>/queue/physical_block_size
Date: May 2009
Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
This is the smallest unit the storage device can write
without resorting to read-modify-write operation. It is
usually the same as the logical block size but may be
bigger. One example is SATA drives with 4KB sectors
that expose a 512-byte logical block size to the
operating system.
What: /sys/block/<disk>/queue/minimum_io_size
Date: April 2009
Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
Storage devices may report a preferred minimum I/O size,
which is the smallest request the device can perform
without incurring a read-modify-write penalty. For disk
drives this is often the physical block size. For RAID
arrays it is often the stripe chunk size.
What: /sys/block/<disk>/queue/optimal_io_size
Date: April 2009
Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
Storage devices may report an optimal I/O size, which is
the device's preferred unit of receiving I/O. This is
rarely reported for disk drives. For RAID devices it is
usually the stripe width or the internal block size.
@@ -0,0 +1,33 @@
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/model
Date: March 2009
Kernel Version: 2.6.30
Contact: iss_storagedev@hp.com
Description: Displays the SCSI INQUIRY page 0 model for logical drive
Y of controller X.
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/rev
Date: March 2009
Kernel Version: 2.6.30
Contact: iss_storagedev@hp.com
Description: Displays the SCSI INQUIRY page 0 revision for logical
drive Y of controller X.
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/unique_id
Date: March 2009
Kernel Version: 2.6.30
Contact: iss_storagedev@hp.com
Description: Displays the SCSI INQUIRY page 83 serial number for logical
drive Y of controller X.
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/vendor
Date: March 2009
Kernel Version: 2.6.30
Contact: iss_storagedev@hp.com
Description: Displays the SCSI INQUIRY page 0 vendor for logical drive
Y of controller X.
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/block:cciss!cXdY
Date: March 2009
Kernel Version: 2.6.30
Contact: iss_storagedev@hp.com
Description: A symbolic link to /sys/block/cciss!cXdY
@@ -0,0 +1,18 @@
What: /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X
Date: August 2008
KernelVersion: 2.6.27
Contact: mark.langsdorf@amd.com
Description: These files exist in every cpu's cache index directories.
There are currently 2 cache_disable_# files in each
directory. Reading from these files on a supported
processor will return that cache disable index value
for that processor and node. Writing to one of these
files will cause the specificed cache index to be disabled.
Currently, only AMD Family 10h Processors support cache index
disable, and only for their L3 caches. See the BIOS and
Kernel Developer's Guide at
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116-Public-GH-BKDG_3.20_2-4-09.pdf
for formatting information and other details on the
cache index disable.
Users: joachim.deguara@amd.com
+12
View File
@@ -704,12 +704,24 @@ this directory the following files can currently be found:
The current number of free dma_debug_entries
in the allocator.
dma-api/driver-filter
You can write a name of a driver into this file
to limit the debug output to requests from that
particular driver. Write an empty string to
that file to disable the filter and see
all errors again.
If you have this code compiled into your kernel it will be enabled by default.
If you want to boot without the bookkeeping anyway you can provide
'dma_debug=off' as a boot parameter. This will disable DMA-API debugging.
Notice that you can not enable it again at runtime. You have to reboot to do
so.
If you want to see debug messages only for a special device driver you can
specify the dma_debug_driver=<drivername> parameter. This will enable the
driver filter at boot time. The debug code will only print errors for that
driver afterwards. This filter can be disabled or changed later using debugfs.
When the code disables itself at runtime this is most likely because it ran
out of dma_debug_entries. These entries are preallocated at boot. The number
of preallocated entries is defined per architecture. If it is too low for you
+2 -1
View File
@@ -13,7 +13,8 @@ DOCBOOKS := z8530book.xml mcabook.xml device-drivers.xml \
gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \
mac80211.xml debugobjects.xml sh.xml regulator.xml \
alsa-driver-api.xml writing-an-alsa-driver.xml
alsa-driver-api.xml writing-an-alsa-driver.xml \
tracepoint.xml
###
# The build process is as follows (targets):
+89
View File
@@ -0,0 +1,89 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="Tracepoints">
<bookinfo>
<title>The Linux Kernel Tracepoint API</title>
<authorgroup>
<author>
<firstname>Jason</firstname>
<surname>Baron</surname>
<affiliation>
<address>
<email>jbaron@redhat.com</email>
</address>
</affiliation>
</author>
</authorgroup>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="intro">
<title>Introduction</title>
<para>
Tracepoints are static probe points that are located in strategic points
throughout the kernel. 'Probes' register/unregister with tracepoints
via a callback mechanism. The 'probes' are strictly typed functions that
are passed a unique set of parameters defined by each tracepoint.
</para>
<para>
From this simple callback mechanism, 'probes' can be used to profile, debug,
and understand kernel behavior. There are a number of tools that provide a
framework for using 'probes'. These tools include Systemtap, ftrace, and
LTTng.
</para>
<para>
Tracepoints are defined in a number of header files via various macros. Thus,
the purpose of this document is to provide a clear accounting of the available
tracepoints. The intention is to understand not only what tracepoints are
available but also to understand where future tracepoints might be added.
</para>
<para>
The API presented has functions of the form:
<function>trace_tracepointname(function parameters)</function>. These are the
tracepoints callbacks that are found throughout the code. Registering and
unregistering probes with these callback sites is covered in the
<filename>Documentation/trace/*</filename> directory.
</para>
</chapter>
<chapter id="irq">
<title>IRQ</title>
!Iinclude/trace/events/irq.h
</chapter>
</book>
+80 -22
View File
@@ -192,23 +192,24 @@ rcu/rcuhier (which displays the struct rcu_node hierarchy).
The output of "cat rcu/rcudata" looks as follows:
rcu:
0 c=4011 g=4012 pq=1 pqc=4011 qp=0 rpfq=1 rp=3c2a dt=23301/73 dn=2 df=1882 of=0 ri=2126 ql=2 b=10
1 c=4011 g=4012 pq=1 pqc=4011 qp=0 rpfq=3 rp=39a6 dt=78073/1 dn=2 df=1402 of=0 ri=1875 ql=46 b=10
2 c=4010 g=4010 pq=1 pqc=4010 qp=0 rpfq=-5 rp=1d12 dt=16646/0 dn=2 df=3140 of=0 ri=2080 ql=0 b=10
3 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=2b50 dt=21159/1 dn=2 df=2230 of=0 ri=1923 ql=72 b=10
4 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=1644 dt=5783/1 dn=2 df=3348 of=0 ri=2805 ql=7 b=10
5 c=4012 g=4013 pq=0 pqc=4011 qp=1 rpfq=3 rp=1aac dt=5879/1 dn=2 df=3140 of=0 ri=2066 ql=10 b=10
6 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=ed8 dt=5847/1 dn=2 df=3797 of=0 ri=1266 ql=10 b=10
7 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=1fa2 dt=6199/1 dn=2 df=2795 of=0 ri=2162 ql=28 b=10
rcu:
0 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=10951/1 dn=0 df=1101 of=0 ri=36 ql=0 b=10
1 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=16117/1 dn=0 df=1015 of=0 ri=0 ql=0 b=10
2 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1445/1 dn=0 df=1839 of=0 ri=0 ql=0 b=10
3 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=6681/1 dn=0 df=1545 of=0 ri=0 ql=0 b=10
4 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1003/1 dn=0 df=1992 of=0 ri=0 ql=0 b=10
5 c=17829 g=17830 pq=1 pqc=17829 qp=1 dt=3887/1 dn=0 df=3331 of=0 ri=4 ql=2 b=10
6 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=859/1 dn=0 df=3224 of=0 ri=0 ql=0 b=10
7 c=17829 g=17830 pq=0 pqc=17829 qp=1 dt=3761/1 dn=0 df=1818 of=0 ri=0 ql=2 b=10
rcu_bh:
0 c=-268 g=-268 pq=1 pqc=-268 qp=0 rpfq=-145 rp=21d6 dt=23301/73 dn=2 df=0 of=0 ri=0 ql=0 b=10
1 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-170 rp=20ce dt=78073/1 dn=2 df=26 of=0 ri=5 ql=0 b=10
2 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-83 rp=fbd dt=16646/0 dn=2 df=28 of=0 ri=4 ql=0 b=10
3 c=-268 g=-268 pq=1 pqc=-268 qp=0 rpfq=-105 rp=178c dt=21159/1 dn=2 df=28 of=0 ri=2 ql=0 b=10
4 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-30 rp=b54 dt=5783/1 dn=2 df=32 of=0 ri=0 ql=0 b=10
5 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-29 rp=df5 dt=5879/1 dn=2 df=30 of=0 ri=3 ql=0 b=10
6 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-28 rp=788 dt=5847/1 dn=2 df=32 of=0 ri=0 ql=0 b=10
7 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-53 rp=1098 dt=6199/1 dn=2 df=30 of=0 ri=3 ql=0 b=10
0 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=10951/1 dn=0 df=0 of=0 ri=0 ql=0 b=10
1 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=16117/1 dn=0 df=13 of=0 ri=0 ql=0 b=10
2 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1445/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
3 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=6681/1 dn=0 df=9 of=0 ri=0 ql=0 b=10
4 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1003/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
5 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3887/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
6 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=859/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
7 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3761/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
The first section lists the rcu_data structures for rcu, the second for
rcu_bh. Each section has one line per CPU, or eight for this 8-CPU system.
@@ -253,12 +254,6 @@ o "pqc" indicates which grace period the last-observed quiescent
o "qp" indicates that RCU still expects a quiescent state from
this CPU.
o "rpfq" is the number of rcu_pending() calls on this CPU required
to induce this CPU to invoke force_quiescent_state().
o "rp" is low-order four hex digits of the count of how many times
rcu_pending() has been invoked on this CPU.
o "dt" is the current value of the dyntick counter that is incremented
when entering or leaving dynticks idle state, either by the
scheduler or by irq. The number after the "/" is the interrupt
@@ -305,6 +300,9 @@ o "b" is the batch limit for this CPU. If more than this number
of RCU callbacks is ready to invoke, then the remainder will
be deferred.
There is also an rcu/rcudata.csv file with the same information in
comma-separated-variable spreadsheet format.
The output of "cat rcu/rcugp" looks as follows:
@@ -411,3 +409,63 @@ o Each element of the form "1/1 0:127 ^0" represents one struct
For example, the first entry at the lowest level shows
"^0", indicating that it corresponds to bit zero in
the first entry at the middle level.
The output of "cat rcu/rcu_pending" looks as follows:
rcu:
0 np=255892 qsp=53936 cbr=0 cng=14417 gpc=10033 gps=24320 nf=6445 nn=146741
1 np=261224 qsp=54638 cbr=0 cng=25723 gpc=16310 gps=2849 nf=5912 nn=155792
2 np=237496 qsp=49664 cbr=0 cng=2762 gpc=45478 gps=1762 nf=1201 nn=136629
3 np=236249 qsp=48766 cbr=0 cng=286 gpc=48049 gps=1218 nf=207 nn=137723
4 np=221310 qsp=46850 cbr=0 cng=26 gpc=43161 gps=4634 nf=3529 nn=123110
5 np=237332 qsp=48449 cbr=0 cng=54 gpc=47920 gps=3252 nf=201 nn=137456
6 np=219995 qsp=46718 cbr=0 cng=50 gpc=42098 gps=6093 nf=4202 nn=120834
7 np=249893 qsp=49390 cbr=0 cng=72 gpc=38400 gps=17102 nf=41 nn=144888
rcu_bh:
0 np=146741 qsp=1419 cbr=0 cng=6 gpc=0 gps=0 nf=2 nn=145314
1 np=155792 qsp=12597 cbr=0 cng=0 gpc=4 gps=8 nf=3 nn=143180
2 np=136629 qsp=18680 cbr=0 cng=0 gpc=7 gps=6 nf=0 nn=117936
3 np=137723 qsp=2843 cbr=0 cng=0 gpc=10 gps=7 nf=0 nn=134863
4 np=123110 qsp=12433 cbr=0 cng=0 gpc=4 gps=2 nf=0 nn=110671
5 np=137456 qsp=4210 cbr=0 cng=0 gpc=6 gps=5 nf=0 nn=133235
6 np=120834 qsp=9902 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921
7 np=144888 qsp=26336 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542
As always, this is once again split into "rcu" and "rcu_bh" portions.
The fields are as follows:
o "np" is the number of times that __rcu_pending() has been invoked
for the corresponding flavor of RCU.
o "qsp" is the number of times that the RCU was waiting for a
quiescent state from this CPU.
o "cbr" is the number of times that this CPU had RCU callbacks
that had passed through a grace period, and were thus ready
to be invoked.
o "cng" is the number of times that this CPU needed another
grace period while RCU was idle.
o "gpc" is the number of times that an old grace period had
completed, but this CPU was not yet aware of it.
o "gps" is the number of times that a new grace period had started,
but this CPU was not yet aware of it.
o "nf" is the number of times that this CPU suspected that the
current grace period had run for too long, and thus needed to
be forced.
Please note that "forcing" consists of sending resched IPIs
to holdout CPUs. If that CPU really still is in an old RCU
read-side critical section, then we really do have to wait for it.
The assumption behing "forcing" is that the CPU is not still in
an old RCU read-side critical section, but has not yet responded
for some other reason.
o "nn" is the number of times that this CPU needed nothing. Alert
readers will note that the rcu "nn" number for a given CPU very
closely matches the rcu_bh "np" number for that same CPU. This
is due to short-circuit evaluation in rcu_pending().
+18 -2
View File
@@ -184,8 +184,9 @@ length. Single character labels using special characters, that being anything
other than a letter or digit, are reserved for use by the Smack development
team. Smack labels are unstructured, case sensitive, and the only operation
ever performed on them is comparison for equality. Smack labels cannot
contain unprintable characters or the "/" (slash) character. Smack labels
cannot begin with a '-', which is reserved for special options.
contain unprintable characters, the "/" (slash), the "\" (backslash), the "'"
(quote) and '"' (double-quote) characters.
Smack labels cannot begin with a '-', which is reserved for special options.
There are some predefined labels:
@@ -523,3 +524,18 @@ Smack supports some mount options:
These mount options apply to all file system types.
Smack auditing
If you want Smack auditing of security events, you need to set CONFIG_AUDIT
in your kernel configuration.
By default, all denied events will be audited. You can change this behavior by
writing a single character to the /smack/logging file :
0 : no logging
1 : log denied (default)
2 : log accepted
3 : log denied & accepted
Events are logged as 'key=value' pairs, for each event you at least will get
the subjet, the object, the rights requested, the action, the kernel function
that triggered the event, plus other pairs depending on the type of event
audited.
+1 -1
View File
@@ -186,7 +186,7 @@ a virtual address mapping (unlike the earlier scheme of virtual address
do not have a corresponding kernel virtual address space mapping) and
low-memory pages.
Note: Please refer to Documentation/PCI/PCI-DMA-mapping.txt for a discussion
Note: Please refer to Documentation/DMA-mapping.txt for a discussion
on PCI high mem DMA aspects and mapping of scatter gather lists, and support
for 64 bit PCI.
+1 -1
View File
@@ -60,7 +60,7 @@ go_lock | Called for the first local holder of a lock
go_unlock | Called on the final local unlock of a lock
go_dump | Called to print content of object for debugfs file, or on
| error to dump glock to the log.
go_type; | The type of the glock, LM_TYPE_.....
go_type | The type of the glock, LM_TYPE_.....
go_min_hold_time | The minimum hold time
The minimum hold time for each lock is the time after a remote lock
+11 -8
View File
@@ -11,18 +11,15 @@ their I/O so file system consistency is maintained. One of the nifty
features of GFS is perfect consistency -- changes made to the file system
on one machine show up immediately on all other machines in the cluster.
GFS uses interchangable inter-node locking mechanisms. Different lock
modules can plug into GFS and each file system selects the appropriate
lock module at mount time. Lock modules include:
GFS uses interchangable inter-node locking mechanisms, the currently
supported mechanisms are:
lock_nolock -- allows gfs to be used as a local file system
lock_dlm -- uses a distributed lock manager (dlm) for inter-node locking
The dlm is found at linux/fs/dlm/
In addition to interfacing with an external locking manager, a gfs lock
module is responsible for interacting with external cluster management
systems. Lock_dlm depends on user space cluster management systems found
Lock_dlm depends on user space cluster management systems found
at the URL above.
To use gfs as a local file system, no external clustering systems are
@@ -31,13 +28,19 @@ needed, simply:
$ mkfs -t gfs2 -p lock_nolock -j 1 /dev/block_device
$ mount -t gfs2 /dev/block_device /dir
GFS2 is not on-disk compatible with previous versions of GFS.
If you are using Fedora, you need to install the gfs2-utils package
and, for lock_dlm, you will also need to install the cman package
and write a cluster.conf as per the documentation.
GFS2 is not on-disk compatible with previous versions of GFS, but it
is pretty close.
The following man pages can be found at the URL above:
gfs2_fsck to repair a filesystem
fsck.gfs2 to repair a filesystem
gfs2_grow to expand a filesystem online
gfs2_jadd to add journals to a filesystem online
gfs2_tool to manipulate, examine and tune a filesystem
gfs2_quota to examine and change quota values in a filesystem
gfs2_convert to convert a gfs filesystem to gfs2 in-place
mount.gfs2 to help mount(8) mount a filesystem
mkfs.gfs2 to make a filesystem
+1 -1
View File
@@ -133,4 +133,4 @@ RAM/SWAP in 10240 inodes and it is only accessible by root.
Author:
Christoph Rohland <cr@sap.com>, 1.12.01
Updated:
Hugh Dickins <hugh@veritas.com>, 4 June 2007
Hugh Dickins, 4 June 2007
+131
View File
@@ -0,0 +1,131 @@
Futex Requeue PI
----------------
Requeueing of tasks from a non-PI futex to a PI futex requires
special handling in order to ensure the underlying rt_mutex is never
left without an owner if it has waiters; doing so would break the PI
boosting logic [see rt-mutex-desgin.txt] For the purposes of
brevity, this action will be referred to as "requeue_pi" throughout
this document. Priority inheritance is abbreviated throughout as
"PI".
Motivation
----------
Without requeue_pi, the glibc implementation of
pthread_cond_broadcast() must resort to waking all the tasks waiting
on a pthread_condvar and letting them try to sort out which task
gets to run first in classic thundering-herd formation. An ideal
implementation would wake the highest-priority waiter, and leave the
rest to the natural wakeup inherent in unlocking the mutex
associated with the condvar.
Consider the simplified glibc calls:
/* caller must lock mutex */
pthread_cond_wait(cond, mutex)
{
lock(cond->__data.__lock);
unlock(mutex);
do {
unlock(cond->__data.__lock);
futex_wait(cond->__data.__futex);
lock(cond->__data.__lock);
} while(...)
unlock(cond->__data.__lock);
lock(mutex);
}
pthread_cond_broadcast(cond)
{
lock(cond->__data.__lock);
unlock(cond->__data.__lock);
futex_requeue(cond->data.__futex, cond->mutex);
}
Once pthread_cond_broadcast() requeues the tasks, the cond->mutex
has waiters. Note that pthread_cond_wait() attempts to lock the
mutex only after it has returned to user space. This will leave the
underlying rt_mutex with waiters, and no owner, breaking the
previously mentioned PI-boosting algorithms.
In order to support PI-aware pthread_condvar's, the kernel needs to
be able to requeue tasks to PI futexes. This support implies that
upon a successful futex_wait system call, the caller would return to
user space already holding the PI futex. The glibc implementation
would be modified as follows:
/* caller must lock mutex */
pthread_cond_wait_pi(cond, mutex)
{
lock(cond->__data.__lock);
unlock(mutex);
do {
unlock(cond->__data.__lock);
futex_wait_requeue_pi(cond->__data.__futex);
lock(cond->__data.__lock);
} while(...)
unlock(cond->__data.__lock);
/* the kernel acquired the the mutex for us */
}
pthread_cond_broadcast_pi(cond)
{
lock(cond->__data.__lock);
unlock(cond->__data.__lock);
futex_requeue_pi(cond->data.__futex, cond->mutex);
}
The actual glibc implementation will likely test for PI and make the
necessary changes inside the existing calls rather than creating new
calls for the PI cases. Similar changes are needed for
pthread_cond_timedwait() and pthread_cond_signal().
Implementation
--------------
In order to ensure the rt_mutex has an owner if it has waiters, it
is necessary for both the requeue code, as well as the waiting code,
to be able to acquire the rt_mutex before returning to user space.
The requeue code cannot simply wake the waiter and leave it to
acquire the rt_mutex as it would open a race window between the
requeue call returning to user space and the waiter waking and
starting to run. This is especially true in the uncontended case.
The solution involves two new rt_mutex helper routines,
rt_mutex_start_proxy_lock() and rt_mutex_finish_proxy_lock(), which
allow the requeue code to acquire an uncontended rt_mutex on behalf
of the waiter and to enqueue the waiter on a contended rt_mutex.
Two new system calls provide the kernel<->user interface to
requeue_pi: FUTEX_WAIT_REQUEUE_PI and FUTEX_REQUEUE_CMP_PI.
FUTEX_WAIT_REQUEUE_PI is called by the waiter (pthread_cond_wait()
and pthread_cond_timedwait()) to block on the initial futex and wait
to be requeued to a PI-aware futex. The implementation is the
result of a high-speed collision between futex_wait() and
futex_lock_pi(), with some extra logic to check for the additional
wake-up scenarios.
FUTEX_REQUEUE_CMP_PI is called by the waker
(pthread_cond_broadcast() and pthread_cond_signal()) to requeue and
possibly wake the waiting tasks. Internally, this system call is
still handled by futex_requeue (by passing requeue_pi=1). Before
requeueing, futex_requeue() attempts to acquire the requeue target
PI futex on behalf of the top waiter. If it can, this waiter is
woken. futex_requeue() then proceeds to requeue the remaining
nr_wake+nr_requeue tasks to the PI futex, calling
rt_mutex_start_proxy_lock() prior to each requeue to prepare the
task as a waiter on the underlying rt_mutex. It is possible that
the lock can be acquired at this stage as well, if so, the next
waiter is woken to finish the acquisition of the lock.
FUTEX_REQUEUE_PI accepts nr_wake and nr_requeue as arguments, but
their sum is all that really matters. futex_requeue() will wake or
requeue up to nr_wake + nr_requeue tasks. It will wake only as many
tasks as it can acquire the lock for, which in the majority of cases
should be 0 as good programming practice dictates that the caller of
either pthread_cond_broadcast() or pthread_cond_signal() acquire the
mutex prior to making the call. FUTEX_REQUEUE_PI requires that
nr_wake=1. nr_requeue should be INT_MAX for broadcast and 0 for
signal.
+6
View File
@@ -150,6 +150,11 @@ fan[1-*]_min Fan minimum value
Unit: revolution/min (RPM)
RW
fan[1-*]_max Fan maximum value
Unit: revolution/min (RPM)
Only rarely supported by the hardware.
RW
fan[1-*]_input Fan input value.
Unit: revolution/min (RPM)
RO
@@ -390,6 +395,7 @@ OR
in[0-*]_min_alarm
in[0-*]_max_alarm
fan[1-*]_min_alarm
fan[1-*]_max_alarm
temp[1-*]_min_alarm
temp[1-*]_max_alarm
temp[1-*]_crit_alarm
+79 -24
View File
@@ -18,8 +18,12 @@ Usage
Anonymous finger details are sent sequentially as separate packets of ABS
events. Only the ABS_MT events are recognized as part of a finger
packet. The end of a packet is marked by calling the input_mt_sync()
function, which generates a SYN_MT_REPORT event. The end of multi-touch
transfer is marked by calling the usual input_sync() function.
function, which generates a SYN_MT_REPORT event. This instructs the
receiver to accept the data for the current finger and prepare to receive
another. The end of a multi-touch transfer is marked by calling the usual
input_sync() function. This instructs the receiver to act upon events
accumulated since last EV_SYN/SYN_REPORT and prepare to receive a new
set of events/packets.
A set of ABS_MT events with the desired properties is defined. The events
are divided into categories, to allow for partial implementation. The
@@ -27,11 +31,26 @@ minimum set consists of ABS_MT_TOUCH_MAJOR, ABS_MT_POSITION_X and
ABS_MT_POSITION_Y, which allows for multiple fingers to be tracked. If the
device supports it, the ABS_MT_WIDTH_MAJOR may be used to provide the size
of the approaching finger. Anisotropy and direction may be specified with
ABS_MT_TOUCH_MINOR, ABS_MT_WIDTH_MINOR and ABS_MT_ORIENTATION. Devices with
more granular information may specify general shapes as blobs, i.e., as a
sequence of rectangular shapes grouped together by an
ABS_MT_BLOB_ID. Finally, the ABS_MT_TOOL_TYPE may be used to specify
whether the touching tool is a finger or a pen or something else.
ABS_MT_TOUCH_MINOR, ABS_MT_WIDTH_MINOR and ABS_MT_ORIENTATION. The
ABS_MT_TOOL_TYPE may be used to specify whether the touching tool is a
finger or a pen or something else. Devices with more granular information
may specify general shapes as blobs, i.e., as a sequence of rectangular
shapes grouped together by an ABS_MT_BLOB_ID. Finally, for the few devices
that currently support it, the ABS_MT_TRACKING_ID event may be used to
report finger tracking from hardware [5].
Here is what a minimal event sequence for a two-finger touch would look
like:
ABS_MT_TOUCH_MAJOR
ABS_MT_POSITION_X
ABS_MT_POSITION_Y
SYN_MT_REPORT
ABS_MT_TOUCH_MAJOR
ABS_MT_POSITION_X
ABS_MT_POSITION_Y
SYN_MT_REPORT
SYN_REPORT
Event Semantics
@@ -44,24 +63,24 @@ ABS_MT_TOUCH_MAJOR
The length of the major axis of the contact. The length should be given in
surface units. If the surface has an X times Y resolution, the largest
possible value of ABS_MT_TOUCH_MAJOR is sqrt(X^2 + Y^2), the diagonal.
possible value of ABS_MT_TOUCH_MAJOR is sqrt(X^2 + Y^2), the diagonal [4].
ABS_MT_TOUCH_MINOR
The length, in surface units, of the minor axis of the contact. If the
contact is circular, this event can be omitted.
contact is circular, this event can be omitted [4].
ABS_MT_WIDTH_MAJOR
The length, in surface units, of the major axis of the approaching
tool. This should be understood as the size of the tool itself. The
orientation of the contact and the approaching tool are assumed to be the
same.
same [4].
ABS_MT_WIDTH_MINOR
The length, in surface units, of the minor axis of the approaching
tool. Omit if circular.
tool. Omit if circular [4].
The above four values can be used to derive additional information about
the contact. The ratio ABS_MT_TOUCH_MAJOR / ABS_MT_WIDTH_MAJOR approximates
@@ -70,14 +89,17 @@ different characteristic widths [1].
ABS_MT_ORIENTATION
The orientation of the ellipse. The value should describe half a revolution
clockwise around the touch center. The scale of the value is arbitrary, but
zero should be returned for an ellipse aligned along the Y axis of the
surface. As an example, an index finger placed straight onto the axis could
return zero orientation, something negative when twisted to the left, and
something positive when twisted to the right. This value can be omitted if
the touching object is circular, or if the information is not available in
the kernel driver.
The orientation of the ellipse. The value should describe a signed quarter
of a revolution clockwise around the touch center. The signed value range
is arbitrary, but zero should be returned for a finger aligned along the Y
axis of the surface, a negative value when finger is turned to the left, and
a positive value when finger turned to the right. When completely aligned with
the X axis, the range max should be returned. Orientation can be omitted
if the touching object is circular, or if the information is not available
in the kernel driver. Partial orientation support is possible if the device
can distinguish between the two axis, but not (uniquely) any values in
between. In such cases, the range of ABS_MT_ORIENTATION should be [0, 1]
[4].
ABS_MT_POSITION_X
@@ -98,8 +120,35 @@ ABS_MT_BLOB_ID
The BLOB_ID groups several packets together into one arbitrarily shaped
contact. This is a low-level anonymous grouping, and should not be confused
with the high-level contactID, explained below. Most kernel drivers will
not have this capability, and can safely omit the event.
with the high-level trackingID [5]. Most kernel drivers will not have blob
capability, and can safely omit the event.
ABS_MT_TRACKING_ID
The TRACKING_ID identifies an initiated contact throughout its life cycle
[5]. There are currently only a few devices that support it, so this event
should normally be omitted.
Event Computation
-----------------
The flora of different hardware unavoidably leads to some devices fitting
better to the MT protocol than others. To simplify and unify the mapping,
this section gives recipes for how to compute certain events.
For devices reporting contacts as rectangular shapes, signed orientation
cannot be obtained. Assuming X and Y are the lengths of the sides of the
touching rectangle, here is a simple formula that retains the most
information possible:
ABS_MT_TOUCH_MAJOR := max(X, Y)
ABS_MT_TOUCH_MINOR := min(X, Y)
ABS_MT_ORIENTATION := bool(X > Y)
The range of ABS_MT_ORIENTATION should be set to [0, 1], to indicate that
the device can distinguish between a finger along the Y axis (0) and a
finger along the X axis (1).
Finger Tracking
@@ -109,14 +158,18 @@ The kernel driver should generate an arbitrary enumeration of the set of
anonymous contacts currently on the surface. The order in which the packets
appear in the event stream is not important.
The process of finger tracking, i.e., to assign a unique contactID to each
The process of finger tracking, i.e., to assign a unique trackingID to each
initiated contact on the surface, is left to user space; preferably the
multi-touch X driver [3]. In that driver, the contactID stays the same and
multi-touch X driver [3]. In that driver, the trackingID stays the same and
unique until the contact vanishes (when the finger leaves the surface). The
problem of assigning a set of anonymous fingers to a set of identified
fingers is a euclidian bipartite matching problem at each event update, and
relies on a sufficiently rapid update rate.
There are a few devices that support trackingID in hardware. User space can
make use of these native identifiers to reduce bandwidth and cpu usage.
Notes
-----
@@ -136,5 +189,7 @@ could be used to derive tilt.
time of writing (April 2009), the MT protocol is not yet merged, and the
prototype implements finger matching, basic mouse support and two-finger
scrolling. The project aims at improving the quality of current multi-touch
functionality available in the synaptics X driver, and in addition
functionality available in the Synaptics X driver, and in addition
implement more advanced gestures.
[4] See the section on event computation.
[5] See the section on finger tracking.
+47 -17
View File
@@ -56,7 +56,6 @@ parameter is applicable:
ISAPNP ISA PnP code is enabled.
ISDN Appropriate ISDN support is enabled.
JOY Appropriate joystick support is enabled.
KMEMTRACE kmemtrace is enabled.
LIBATA Libata driver is enabled
LP Printer support is enabled.
LOOP Loopback device support is enabled.
@@ -329,11 +328,6 @@ and is between 256 and 4096 characters. It is defined in the file
flushed before they will be reused, which
is a lot of faster
amd_iommu_size= [HW,X86-64]
Define the size of the aperture for the AMD IOMMU
driver. Possible values are:
'32M', '64M' (default), '128M', '256M', '512M', '1G'
amijoy.map= [HW,JOY] Amiga joystick support
Map of devices attached to JOY0DAT and JOY1DAT
Format: <a>,<b>
@@ -646,6 +640,13 @@ and is between 256 and 4096 characters. It is defined in the file
DMA-API debugging code disables itself because the
architectural default is too low.
dma_debug_driver=<driver_name>
With this option the DMA-API debugging driver
filter feature can be enabled at boot time. Just
pass the driver to filter for as the parameter.
The filter can be disabled or changed to another
driver later using sysfs.
dscc4.setup= [NET]
dtc3181e= [HW,SCSI]
@@ -752,12 +753,25 @@ and is between 256 and 4096 characters. It is defined in the file
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
ftrace=[tracer]
[ftrace] will set and start the specified tracer
[FTRACE] will set and start the specified tracer
as early as possible in order to facilitate early
boot debugging.
ftrace_dump_on_oops
[ftrace] will dump the trace buffers on oops.
[FTRACE] will dump the trace buffers on oops.
ftrace_filter=[function-list]
[FTRACE] Limit the functions traced by the function
tracer at boot up. function-list is a comma separated
list of functions. This list can be changed at run
time by the set_ftrace_filter file in the debugfs
tracing directory.
ftrace_notrace=[function-list]
[FTRACE] Do not trace the functions specified in
function-list. This list can be changed at run time
by the set_ftrace_notrace file in the debugfs
tracing directory.
gamecon.map[2|3]=
[HW,JOY] Multisystem joystick and NES/SNES/PSX pad
@@ -914,6 +928,12 @@ and is between 256 and 4096 characters. It is defined in the file
Formt: { "sha1" | "md5" }
default: "sha1"
ima_tcb [IMA]
Load a policy which meets the needs of the Trusted
Computing Base. This means IMA will measure all
programs exec'd, files mmap'd for exec, and all files
opened for read by uid=0.
in2000= [HW,SCSI]
See header of drivers/scsi/in2000.c.
@@ -1054,15 +1074,6 @@ and is between 256 and 4096 characters. It is defined in the file
use the HighMem zone if it exists, and the Normal
zone if it does not.
kmemtrace.enable= [KNL,KMEMTRACE] Format: { yes | no }
Controls whether kmemtrace is enabled
at boot-time.
kmemtrace.subbufs=n [KNL,KMEMTRACE] Overrides the number of
subbufs kmemtrace's relay channel has. Set this
higher than default (KMEMTRACE_N_SUBBUFS in code) if
you experience buffer overruns.
kgdboc= [HW] kgdb over consoles.
Requires a tty driver that supports console polling.
(only serial suported for now)
@@ -1072,6 +1083,10 @@ and is between 256 and 4096 characters. It is defined in the file
Configure the RouterBoard 532 series on-chip
Ethernet adapter MAC address.
kmemleak= [KNL] Boot-time kmemleak enable/disable
Valid arguments: on, off
Default: on
kstack=N [X86] Print N words from the kernel stack
in oops dumps.
@@ -1535,6 +1550,10 @@ and is between 256 and 4096 characters. It is defined in the file
register save and restore. The kernel will only save
legacy floating-point registers on task switch.
noxsave [BUGS=X86] Disables x86 extended register state save
and restore using xsave. The kernel will fallback to
enabling legacy floating-point and sse state.
nohlt [BUGS=ARM,SH] Tells the kernel that the sleep(SH) or
wfi(ARM) instruction doesn't work correctly and not to
use it. This is also useful when using JTAG debugger.
@@ -1571,6 +1590,9 @@ and is between 256 and 4096 characters. It is defined in the file
noinitrd [RAM] Tells the kernel not to load any configured
initial RAM disk.
nointremap [X86-64, Intel-IOMMU] Do not enable interrupt
remapping.
nointroute [IA-64]
nojitter [IA64] Disables jitter checking for ITC timers.
@@ -1656,6 +1678,14 @@ and is between 256 and 4096 characters. It is defined in the file
oprofile.timer= [HW]
Use timer interrupt instead of performance counters
oprofile.cpu_type= Force an oprofile cpu type
This might be useful if you have an older oprofile
userland or if you want common events.
Format: { archperfmon }
archperfmon: [X86] Force use of architectural
perfmon on Intel CPUs instead of the
CPU specific event set.
osst= [HW,SCSI] SCSI Tape Driver
Format: <buffer_size>,<write_threshold>
See also Documentation/scsi/st.txt.
+142
View File
@@ -0,0 +1,142 @@
Kernel Memory Leak Detector
===========================
Introduction
------------
Kmemleak provides a way of detecting possible kernel memory leaks in a
way similar to a tracing garbage collector
(http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29#Tracing_garbage_collectors),
with the difference that the orphan objects are not freed but only
reported via /sys/kernel/debug/kmemleak. A similar method is used by the
Valgrind tool (memcheck --leak-check) to detect the memory leaks in
user-space applications.
Usage
-----
CONFIG_DEBUG_KMEMLEAK in "Kernel hacking" has to be enabled. A kernel
thread scans the memory every 10 minutes (by default) and prints any new
unreferenced objects found. To trigger an intermediate scan and display
all the possible memory leaks:
# mount -t debugfs nodev /sys/kernel/debug/
# cat /sys/kernel/debug/kmemleak
Note that the orphan objects are listed in the order they were allocated
and one object at the beginning of the list may cause other subsequent
objects to be reported as orphan.
Memory scanning parameters can be modified at run-time by writing to the
/sys/kernel/debug/kmemleak file. The following parameters are supported:
off - disable kmemleak (irreversible)
stack=on - enable the task stacks scanning
stack=off - disable the tasks stacks scanning
scan=on - start the automatic memory scanning thread
scan=off - stop the automatic memory scanning thread
scan=<secs> - set the automatic memory scanning period in seconds (0
to disable it)
Kmemleak can also be disabled at boot-time by passing "kmemleak=off" on
the kernel command line.
Basic Algorithm
---------------
The memory allocations via kmalloc, vmalloc, kmem_cache_alloc and
friends are traced and the pointers, together with additional
information like size and stack trace, are stored in a prio search tree.
The corresponding freeing function calls are tracked and the pointers
removed from the kmemleak data structures.
An allocated block of memory is considered orphan if no pointer to its
start address or to any location inside the block can be found by
scanning the memory (including saved registers). This means that there
might be no way for the kernel to pass the address of the allocated
block to a freeing function and therefore the block is considered a
memory leak.
The scanning algorithm steps:
1. mark all objects as white (remaining white objects will later be
considered orphan)
2. scan the memory starting with the data section and stacks, checking
the values against the addresses stored in the prio search tree. If
a pointer to a white object is found, the object is added to the
gray list
3. scan the gray objects for matching addresses (some white objects
can become gray and added at the end of the gray list) until the
gray set is finished
4. the remaining white objects are considered orphan and reported via
/sys/kernel/debug/kmemleak
Some allocated memory blocks have pointers stored in the kernel's
internal data structures and they cannot be detected as orphans. To
avoid this, kmemleak can also store the number of values pointing to an
address inside the block address range that need to be found so that the
block is not considered a leak. One example is __vmalloc().
Kmemleak API
------------
See the include/linux/kmemleak.h header for the functions prototype.
kmemleak_init - initialize kmemleak
kmemleak_alloc - notify of a memory block allocation
kmemleak_free - notify of a memory block freeing
kmemleak_not_leak - mark an object as not a leak
kmemleak_ignore - do not scan or report an object as leak
kmemleak_scan_area - add scan areas inside a memory block
kmemleak_no_scan - do not scan a memory block
kmemleak_erase - erase an old value in a pointer variable
kmemleak_alloc_recursive - as kmemleak_alloc but checks the recursiveness
kmemleak_free_recursive - as kmemleak_free but checks the recursiveness
Dealing with false positives/negatives
--------------------------------------
The false negatives are real memory leaks (orphan objects) but not
reported by kmemleak because values found during the memory scanning
point to such objects. To reduce the number of false negatives, kmemleak
provides the kmemleak_ignore, kmemleak_scan_area, kmemleak_no_scan and
kmemleak_erase functions (see above). The task stacks also increase the
amount of false negatives and their scanning is not enabled by default.
The false positives are objects wrongly reported as being memory leaks
(orphan). For objects known not to be leaks, kmemleak provides the
kmemleak_not_leak function. The kmemleak_ignore could also be used if
the memory block is known not to contain other pointers and it will no
longer be scanned.
Some of the reported leaks are only transient, especially on SMP
systems, because of pointers temporarily stored in CPU registers or
stacks. Kmemleak defines MSECS_MIN_AGE (defaulting to 1000) representing
the minimum age of an object to be reported as a memory leak.
Limitations and Drawbacks
-------------------------
The main drawback is the reduced performance of memory allocation and
freeing. To avoid other penalties, the memory scanning is only performed
when the /sys/kernel/debug/kmemleak file is read. Anyway, this tool is
intended for debugging purposes where the performance might not be the
most important requirement.
To keep the algorithm simple, kmemleak scans for values pointing to any
address inside a block's address range. This may lead to an increased
number of false negatives. However, it is likely that a real memory leak
will eventually become visible.
Another source of false negatives is the data stored in non-pointer
values. In a future version, kmemleak could only scan the pointer
members in the allocated structures. This feature would solve many of
the false negative cases described above.
The tool can report false positives. These are cases where an allocated
block doesn't need to be freed (some cases in the init_call functions),
the pointer is calculated by other methods than the usual container_of
macro or the pointer is stored in a location not scanned by kmemleak.
Page allocations and ioremap are not tracked. Only the ARM and x86
architectures are currently supported.
+128 -1
View File
@@ -31,6 +31,7 @@ Contents:
- Locking functions.
- Interrupt disabling functions.
- Sleep and wake-up functions.
- Miscellaneous functions.
(*) Inter-CPU locking barrier effects.
@@ -1217,6 +1218,132 @@ barriers are required in such a situation, they must be provided from some
other means.
SLEEP AND WAKE-UP FUNCTIONS
---------------------------
Sleeping and waking on an event flagged in global data can be viewed as an
interaction between two pieces of data: the task state of the task waiting for
the event and the global data used to indicate the event. To make sure that
these appear to happen in the right order, the primitives to begin the process
of going to sleep, and the primitives to initiate a wake up imply certain
barriers.
Firstly, the sleeper normally follows something like this sequence of events:
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (event_indicated)
break;
schedule();
}
A general memory barrier is interpolated automatically by set_current_state()
after it has altered the task state:
CPU 1
===============================
set_current_state();
set_mb();
STORE current->state
<general barrier>
LOAD event_indicated
set_current_state() may be wrapped by:
prepare_to_wait();
prepare_to_wait_exclusive();
which therefore also imply a general memory barrier after setting the state.
The whole sequence above is available in various canned forms, all of which
interpolate the memory barrier in the right place:
wait_event();
wait_event_interruptible();
wait_event_interruptible_exclusive();
wait_event_interruptible_timeout();
wait_event_killable();
wait_event_timeout();
wait_on_bit();
wait_on_bit_lock();
Secondly, code that performs a wake up normally follows something like this:
event_indicated = 1;
wake_up(&event_wait_queue);
or:
event_indicated = 1;
wake_up_process(event_daemon);
A write memory barrier is implied by wake_up() and co. if and only if they wake
something up. The barrier occurs before the task state is cleared, and so sits
between the STORE to indicate the event and the STORE to set TASK_RUNNING:
CPU 1 CPU 2
=============================== ===============================
set_current_state(); STORE event_indicated
set_mb(); wake_up();
STORE current->state <write barrier>
<general barrier> STORE current->state
LOAD event_indicated
The available waker functions include:
complete();
wake_up();
wake_up_all();
wake_up_bit();
wake_up_interruptible();
wake_up_interruptible_all();
wake_up_interruptible_nr();
wake_up_interruptible_poll();
wake_up_interruptible_sync();
wake_up_interruptible_sync_poll();
wake_up_locked();
wake_up_locked_poll();
wake_up_nr();
wake_up_poll();
wake_up_process();
[!] Note that the memory barriers implied by the sleeper and the waker do _not_
order multiple stores before the wake-up with respect to loads of those stored
values after the sleeper has called set_current_state(). For instance, if the
sleeper does:
set_current_state(TASK_INTERRUPTIBLE);
if (event_indicated)
break;
__set_current_state(TASK_RUNNING);
do_something(my_data);
and the waker does:
my_data = value;
event_indicated = 1;
wake_up(&event_wait_queue);
there's no guarantee that the change to event_indicated will be perceived by
the sleeper as coming after the change to my_data. In such a circumstance, the
code on both sides must interpolate its own memory barriers between the
separate data accesses. Thus the above sleeper ought to do:
set_current_state(TASK_INTERRUPTIBLE);
if (event_indicated) {
smp_rmb();
do_something(my_data);
}
and the waker should do:
my_data = value;
smp_wmb();
event_indicated = 1;
wake_up(&event_wait_queue);
MISCELLANEOUS FUNCTIONS
-----------------------
@@ -1366,7 +1493,7 @@ WHERE ARE MEMORY BARRIERS NEEDED?
Under normal operation, memory operation reordering is generally not going to
be a problem as a single-threaded linear piece of code will still appear to
work correctly, even if it's in an SMP kernel. There are, however, three
work correctly, even if it's in an SMP kernel. There are, however, four
circumstances in which reordering definitely _could_ be a problem:
(*) Interprocessor interaction.
+12 -3
View File
@@ -1266,13 +1266,22 @@ sctp_rmem - vector of 3 INTEGERs: min, default, max
sctp_wmem - vector of 3 INTEGERs: min, default, max
See tcp_wmem for a description.
UNDOCUMENTED:
/proc/sys/net/core/*
dev_weight FIXME
dev_weight - INTEGER
The maximum number of packets that kernel can handle on a NAPI
interrupt, it's a Per-CPU variable.
Default: 64
/proc/sys/net/unix/*
max_dgram_qlen FIXME
max_dgram_qlen - INTEGER
The maximum length of dgram socket receive queue
Default: 10
UNDOCUMENTED:
/proc/sys/net/irda/*
fast_poll_increase FIXME
+19 -1
View File
@@ -4,6 +4,7 @@
CONTENTS
========
0. WARNING
1. Overview
1.1 The problem
1.2 The solution
@@ -14,6 +15,23 @@ CONTENTS
3. Future plans
0. WARNING
==========
Fiddling with these settings can result in an unstable system, the knobs are
root only and assumes root knows what he is doing.
Most notable:
* very small values in sched_rt_period_us can result in an unstable
system when the period is smaller than either the available hrtimer
resolution, or the time it takes to handle the budget refresh itself.
* very small values in sched_rt_runtime_us can result in an unstable
system when the runtime is so small the system has difficulty making
forward progress (NOTE: the migration thread and kstopmachine both
are real-time processes).
1. Overview
===========
@@ -169,7 +187,7 @@ get their allocated time.
Implementing SCHED_EDF might take a while to complete. Priority Inheritance is
the biggest challenge as the current linux PI infrastructure is geared towards
the limited static priority levels 0-139. With deadline scheduling you need to
the limited static priority levels 0-99. With deadline scheduling you need to
do deadline inheritance (since priority is inversely proportional to the
deadline delta (deadline - now).

Some files were not shown because too many files have changed in this diff Show More