You've already forked linux-apfs
mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
[SCSI] Merge branch 'linus'
Conflicts: drivers/message/fusion/mptsas.c fixed up conflict between req->data_len accessors and mptsas driver updates. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This commit is contained in:
@@ -60,3 +60,62 @@ Description:
|
|||||||
Indicates whether the block layer should automatically
|
Indicates whether the block layer should automatically
|
||||||
generate checksums for write requests bound for
|
generate checksums for write requests bound for
|
||||||
devices that support receiving integrity metadata.
|
devices that support receiving integrity metadata.
|
||||||
|
|
||||||
|
What: /sys/block/<disk>/alignment_offset
|
||||||
|
Date: April 2009
|
||||||
|
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||||
|
Description:
|
||||||
|
Storage devices may report a physical block size that is
|
||||||
|
bigger than the logical block size (for instance a drive
|
||||||
|
with 4KB physical sectors exposing 512-byte logical
|
||||||
|
blocks to the operating system). This parameter
|
||||||
|
indicates how many bytes the beginning of the device is
|
||||||
|
offset from the disk's natural alignment.
|
||||||
|
|
||||||
|
What: /sys/block/<disk>/<partition>/alignment_offset
|
||||||
|
Date: April 2009
|
||||||
|
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||||
|
Description:
|
||||||
|
Storage devices may report a physical block size that is
|
||||||
|
bigger than the logical block size (for instance a drive
|
||||||
|
with 4KB physical sectors exposing 512-byte logical
|
||||||
|
blocks to the operating system). This parameter
|
||||||
|
indicates how many bytes the beginning of the partition
|
||||||
|
is offset from the disk's natural alignment.
|
||||||
|
|
||||||
|
What: /sys/block/<disk>/queue/logical_block_size
|
||||||
|
Date: May 2009
|
||||||
|
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||||
|
Description:
|
||||||
|
This is the smallest unit the storage device can
|
||||||
|
address. It is typically 512 bytes.
|
||||||
|
|
||||||
|
What: /sys/block/<disk>/queue/physical_block_size
|
||||||
|
Date: May 2009
|
||||||
|
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||||
|
Description:
|
||||||
|
This is the smallest unit the storage device can write
|
||||||
|
without resorting to read-modify-write operation. It is
|
||||||
|
usually the same as the logical block size but may be
|
||||||
|
bigger. One example is SATA drives with 4KB sectors
|
||||||
|
that expose a 512-byte logical block size to the
|
||||||
|
operating system.
|
||||||
|
|
||||||
|
What: /sys/block/<disk>/queue/minimum_io_size
|
||||||
|
Date: April 2009
|
||||||
|
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||||
|
Description:
|
||||||
|
Storage devices may report a preferred minimum I/O size,
|
||||||
|
which is the smallest request the device can perform
|
||||||
|
without incurring a read-modify-write penalty. For disk
|
||||||
|
drives this is often the physical block size. For RAID
|
||||||
|
arrays it is often the stripe chunk size.
|
||||||
|
|
||||||
|
What: /sys/block/<disk>/queue/optimal_io_size
|
||||||
|
Date: April 2009
|
||||||
|
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||||
|
Description:
|
||||||
|
Storage devices may report an optimal I/O size, which is
|
||||||
|
the device's preferred unit of receiving I/O. This is
|
||||||
|
rarely reported for disk drives. For RAID devices it is
|
||||||
|
usually the stripe width or the internal block size.
|
||||||
|
|||||||
@@ -0,0 +1,33 @@
|
|||||||
|
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/model
|
||||||
|
Date: March 2009
|
||||||
|
Kernel Version: 2.6.30
|
||||||
|
Contact: iss_storagedev@hp.com
|
||||||
|
Description: Displays the SCSI INQUIRY page 0 model for logical drive
|
||||||
|
Y of controller X.
|
||||||
|
|
||||||
|
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/rev
|
||||||
|
Date: March 2009
|
||||||
|
Kernel Version: 2.6.30
|
||||||
|
Contact: iss_storagedev@hp.com
|
||||||
|
Description: Displays the SCSI INQUIRY page 0 revision for logical
|
||||||
|
drive Y of controller X.
|
||||||
|
|
||||||
|
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/unique_id
|
||||||
|
Date: March 2009
|
||||||
|
Kernel Version: 2.6.30
|
||||||
|
Contact: iss_storagedev@hp.com
|
||||||
|
Description: Displays the SCSI INQUIRY page 83 serial number for logical
|
||||||
|
drive Y of controller X.
|
||||||
|
|
||||||
|
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/vendor
|
||||||
|
Date: March 2009
|
||||||
|
Kernel Version: 2.6.30
|
||||||
|
Contact: iss_storagedev@hp.com
|
||||||
|
Description: Displays the SCSI INQUIRY page 0 vendor for logical drive
|
||||||
|
Y of controller X.
|
||||||
|
|
||||||
|
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/block:cciss!cXdY
|
||||||
|
Date: March 2009
|
||||||
|
Kernel Version: 2.6.30
|
||||||
|
Contact: iss_storagedev@hp.com
|
||||||
|
Description: A symbolic link to /sys/block/cciss!cXdY
|
||||||
@@ -0,0 +1,18 @@
|
|||||||
|
What: /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X
|
||||||
|
Date: August 2008
|
||||||
|
KernelVersion: 2.6.27
|
||||||
|
Contact: mark.langsdorf@amd.com
|
||||||
|
Description: These files exist in every cpu's cache index directories.
|
||||||
|
There are currently 2 cache_disable_# files in each
|
||||||
|
directory. Reading from these files on a supported
|
||||||
|
processor will return that cache disable index value
|
||||||
|
for that processor and node. Writing to one of these
|
||||||
|
files will cause the specificed cache index to be disabled.
|
||||||
|
|
||||||
|
Currently, only AMD Family 10h Processors support cache index
|
||||||
|
disable, and only for their L3 caches. See the BIOS and
|
||||||
|
Kernel Developer's Guide at
|
||||||
|
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116-Public-GH-BKDG_3.20_2-4-09.pdf
|
||||||
|
for formatting information and other details on the
|
||||||
|
cache index disable.
|
||||||
|
Users: joachim.deguara@amd.com
|
||||||
@@ -704,12 +704,24 @@ this directory the following files can currently be found:
|
|||||||
The current number of free dma_debug_entries
|
The current number of free dma_debug_entries
|
||||||
in the allocator.
|
in the allocator.
|
||||||
|
|
||||||
|
dma-api/driver-filter
|
||||||
|
You can write a name of a driver into this file
|
||||||
|
to limit the debug output to requests from that
|
||||||
|
particular driver. Write an empty string to
|
||||||
|
that file to disable the filter and see
|
||||||
|
all errors again.
|
||||||
|
|
||||||
If you have this code compiled into your kernel it will be enabled by default.
|
If you have this code compiled into your kernel it will be enabled by default.
|
||||||
If you want to boot without the bookkeeping anyway you can provide
|
If you want to boot without the bookkeeping anyway you can provide
|
||||||
'dma_debug=off' as a boot parameter. This will disable DMA-API debugging.
|
'dma_debug=off' as a boot parameter. This will disable DMA-API debugging.
|
||||||
Notice that you can not enable it again at runtime. You have to reboot to do
|
Notice that you can not enable it again at runtime. You have to reboot to do
|
||||||
so.
|
so.
|
||||||
|
|
||||||
|
If you want to see debug messages only for a special device driver you can
|
||||||
|
specify the dma_debug_driver=<drivername> parameter. This will enable the
|
||||||
|
driver filter at boot time. The debug code will only print errors for that
|
||||||
|
driver afterwards. This filter can be disabled or changed later using debugfs.
|
||||||
|
|
||||||
When the code disables itself at runtime this is most likely because it ran
|
When the code disables itself at runtime this is most likely because it ran
|
||||||
out of dma_debug_entries. These entries are preallocated at boot. The number
|
out of dma_debug_entries. These entries are preallocated at boot. The number
|
||||||
of preallocated entries is defined per architecture. If it is too low for you
|
of preallocated entries is defined per architecture. If it is too low for you
|
||||||
|
|||||||
@@ -13,7 +13,8 @@ DOCBOOKS := z8530book.xml mcabook.xml device-drivers.xml \
|
|||||||
gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
|
gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
|
||||||
genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \
|
genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \
|
||||||
mac80211.xml debugobjects.xml sh.xml regulator.xml \
|
mac80211.xml debugobjects.xml sh.xml regulator.xml \
|
||||||
alsa-driver-api.xml writing-an-alsa-driver.xml
|
alsa-driver-api.xml writing-an-alsa-driver.xml \
|
||||||
|
tracepoint.xml
|
||||||
|
|
||||||
###
|
###
|
||||||
# The build process is as follows (targets):
|
# The build process is as follows (targets):
|
||||||
|
|||||||
@@ -0,0 +1,89 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
||||||
|
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
|
||||||
|
|
||||||
|
<book id="Tracepoints">
|
||||||
|
<bookinfo>
|
||||||
|
<title>The Linux Kernel Tracepoint API</title>
|
||||||
|
|
||||||
|
<authorgroup>
|
||||||
|
<author>
|
||||||
|
<firstname>Jason</firstname>
|
||||||
|
<surname>Baron</surname>
|
||||||
|
<affiliation>
|
||||||
|
<address>
|
||||||
|
<email>jbaron@redhat.com</email>
|
||||||
|
</address>
|
||||||
|
</affiliation>
|
||||||
|
</author>
|
||||||
|
</authorgroup>
|
||||||
|
|
||||||
|
<legalnotice>
|
||||||
|
<para>
|
||||||
|
This documentation is free software; you can redistribute
|
||||||
|
it and/or modify it under the terms of the GNU General Public
|
||||||
|
License as published by the Free Software Foundation; either
|
||||||
|
version 2 of the License, or (at your option) any later
|
||||||
|
version.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
This program is distributed in the hope that it will be
|
||||||
|
useful, but WITHOUT ANY WARRANTY; without even the implied
|
||||||
|
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||||
|
See the GNU General Public License for more details.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
You should have received a copy of the GNU General Public
|
||||||
|
License along with this program; if not, write to the Free
|
||||||
|
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
|
||||||
|
MA 02111-1307 USA
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
For more details see the file COPYING in the source
|
||||||
|
distribution of Linux.
|
||||||
|
</para>
|
||||||
|
</legalnotice>
|
||||||
|
</bookinfo>
|
||||||
|
|
||||||
|
<toc></toc>
|
||||||
|
<chapter id="intro">
|
||||||
|
<title>Introduction</title>
|
||||||
|
<para>
|
||||||
|
Tracepoints are static probe points that are located in strategic points
|
||||||
|
throughout the kernel. 'Probes' register/unregister with tracepoints
|
||||||
|
via a callback mechanism. The 'probes' are strictly typed functions that
|
||||||
|
are passed a unique set of parameters defined by each tracepoint.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
From this simple callback mechanism, 'probes' can be used to profile, debug,
|
||||||
|
and understand kernel behavior. There are a number of tools that provide a
|
||||||
|
framework for using 'probes'. These tools include Systemtap, ftrace, and
|
||||||
|
LTTng.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Tracepoints are defined in a number of header files via various macros. Thus,
|
||||||
|
the purpose of this document is to provide a clear accounting of the available
|
||||||
|
tracepoints. The intention is to understand not only what tracepoints are
|
||||||
|
available but also to understand where future tracepoints might be added.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The API presented has functions of the form:
|
||||||
|
<function>trace_tracepointname(function parameters)</function>. These are the
|
||||||
|
tracepoints callbacks that are found throughout the code. Registering and
|
||||||
|
unregistering probes with these callback sites is covered in the
|
||||||
|
<filename>Documentation/trace/*</filename> directory.
|
||||||
|
</para>
|
||||||
|
</chapter>
|
||||||
|
|
||||||
|
<chapter id="irq">
|
||||||
|
<title>IRQ</title>
|
||||||
|
!Iinclude/trace/events/irq.h
|
||||||
|
</chapter>
|
||||||
|
|
||||||
|
</book>
|
||||||
+80
-22
@@ -192,23 +192,24 @@ rcu/rcuhier (which displays the struct rcu_node hierarchy).
|
|||||||
The output of "cat rcu/rcudata" looks as follows:
|
The output of "cat rcu/rcudata" looks as follows:
|
||||||
|
|
||||||
rcu:
|
rcu:
|
||||||
0 c=4011 g=4012 pq=1 pqc=4011 qp=0 rpfq=1 rp=3c2a dt=23301/73 dn=2 df=1882 of=0 ri=2126 ql=2 b=10
|
rcu:
|
||||||
1 c=4011 g=4012 pq=1 pqc=4011 qp=0 rpfq=3 rp=39a6 dt=78073/1 dn=2 df=1402 of=0 ri=1875 ql=46 b=10
|
0 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=10951/1 dn=0 df=1101 of=0 ri=36 ql=0 b=10
|
||||||
2 c=4010 g=4010 pq=1 pqc=4010 qp=0 rpfq=-5 rp=1d12 dt=16646/0 dn=2 df=3140 of=0 ri=2080 ql=0 b=10
|
1 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=16117/1 dn=0 df=1015 of=0 ri=0 ql=0 b=10
|
||||||
3 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=2b50 dt=21159/1 dn=2 df=2230 of=0 ri=1923 ql=72 b=10
|
2 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1445/1 dn=0 df=1839 of=0 ri=0 ql=0 b=10
|
||||||
4 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=1644 dt=5783/1 dn=2 df=3348 of=0 ri=2805 ql=7 b=10
|
3 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=6681/1 dn=0 df=1545 of=0 ri=0 ql=0 b=10
|
||||||
5 c=4012 g=4013 pq=0 pqc=4011 qp=1 rpfq=3 rp=1aac dt=5879/1 dn=2 df=3140 of=0 ri=2066 ql=10 b=10
|
4 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1003/1 dn=0 df=1992 of=0 ri=0 ql=0 b=10
|
||||||
6 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=ed8 dt=5847/1 dn=2 df=3797 of=0 ri=1266 ql=10 b=10
|
5 c=17829 g=17830 pq=1 pqc=17829 qp=1 dt=3887/1 dn=0 df=3331 of=0 ri=4 ql=2 b=10
|
||||||
7 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=1fa2 dt=6199/1 dn=2 df=2795 of=0 ri=2162 ql=28 b=10
|
6 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=859/1 dn=0 df=3224 of=0 ri=0 ql=0 b=10
|
||||||
|
7 c=17829 g=17830 pq=0 pqc=17829 qp=1 dt=3761/1 dn=0 df=1818 of=0 ri=0 ql=2 b=10
|
||||||
rcu_bh:
|
rcu_bh:
|
||||||
0 c=-268 g=-268 pq=1 pqc=-268 qp=0 rpfq=-145 rp=21d6 dt=23301/73 dn=2 df=0 of=0 ri=0 ql=0 b=10
|
0 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=10951/1 dn=0 df=0 of=0 ri=0 ql=0 b=10
|
||||||
1 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-170 rp=20ce dt=78073/1 dn=2 df=26 of=0 ri=5 ql=0 b=10
|
1 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=16117/1 dn=0 df=13 of=0 ri=0 ql=0 b=10
|
||||||
2 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-83 rp=fbd dt=16646/0 dn=2 df=28 of=0 ri=4 ql=0 b=10
|
2 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1445/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||||
3 c=-268 g=-268 pq=1 pqc=-268 qp=0 rpfq=-105 rp=178c dt=21159/1 dn=2 df=28 of=0 ri=2 ql=0 b=10
|
3 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=6681/1 dn=0 df=9 of=0 ri=0 ql=0 b=10
|
||||||
4 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-30 rp=b54 dt=5783/1 dn=2 df=32 of=0 ri=0 ql=0 b=10
|
4 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1003/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||||
5 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-29 rp=df5 dt=5879/1 dn=2 df=30 of=0 ri=3 ql=0 b=10
|
5 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3887/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||||
6 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-28 rp=788 dt=5847/1 dn=2 df=32 of=0 ri=0 ql=0 b=10
|
6 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=859/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||||
7 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-53 rp=1098 dt=6199/1 dn=2 df=30 of=0 ri=3 ql=0 b=10
|
7 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3761/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||||
|
|
||||||
The first section lists the rcu_data structures for rcu, the second for
|
The first section lists the rcu_data structures for rcu, the second for
|
||||||
rcu_bh. Each section has one line per CPU, or eight for this 8-CPU system.
|
rcu_bh. Each section has one line per CPU, or eight for this 8-CPU system.
|
||||||
@@ -253,12 +254,6 @@ o "pqc" indicates which grace period the last-observed quiescent
|
|||||||
o "qp" indicates that RCU still expects a quiescent state from
|
o "qp" indicates that RCU still expects a quiescent state from
|
||||||
this CPU.
|
this CPU.
|
||||||
|
|
||||||
o "rpfq" is the number of rcu_pending() calls on this CPU required
|
|
||||||
to induce this CPU to invoke force_quiescent_state().
|
|
||||||
|
|
||||||
o "rp" is low-order four hex digits of the count of how many times
|
|
||||||
rcu_pending() has been invoked on this CPU.
|
|
||||||
|
|
||||||
o "dt" is the current value of the dyntick counter that is incremented
|
o "dt" is the current value of the dyntick counter that is incremented
|
||||||
when entering or leaving dynticks idle state, either by the
|
when entering or leaving dynticks idle state, either by the
|
||||||
scheduler or by irq. The number after the "/" is the interrupt
|
scheduler or by irq. The number after the "/" is the interrupt
|
||||||
@@ -305,6 +300,9 @@ o "b" is the batch limit for this CPU. If more than this number
|
|||||||
of RCU callbacks is ready to invoke, then the remainder will
|
of RCU callbacks is ready to invoke, then the remainder will
|
||||||
be deferred.
|
be deferred.
|
||||||
|
|
||||||
|
There is also an rcu/rcudata.csv file with the same information in
|
||||||
|
comma-separated-variable spreadsheet format.
|
||||||
|
|
||||||
|
|
||||||
The output of "cat rcu/rcugp" looks as follows:
|
The output of "cat rcu/rcugp" looks as follows:
|
||||||
|
|
||||||
@@ -411,3 +409,63 @@ o Each element of the form "1/1 0:127 ^0" represents one struct
|
|||||||
For example, the first entry at the lowest level shows
|
For example, the first entry at the lowest level shows
|
||||||
"^0", indicating that it corresponds to bit zero in
|
"^0", indicating that it corresponds to bit zero in
|
||||||
the first entry at the middle level.
|
the first entry at the middle level.
|
||||||
|
|
||||||
|
|
||||||
|
The output of "cat rcu/rcu_pending" looks as follows:
|
||||||
|
|
||||||
|
rcu:
|
||||||
|
0 np=255892 qsp=53936 cbr=0 cng=14417 gpc=10033 gps=24320 nf=6445 nn=146741
|
||||||
|
1 np=261224 qsp=54638 cbr=0 cng=25723 gpc=16310 gps=2849 nf=5912 nn=155792
|
||||||
|
2 np=237496 qsp=49664 cbr=0 cng=2762 gpc=45478 gps=1762 nf=1201 nn=136629
|
||||||
|
3 np=236249 qsp=48766 cbr=0 cng=286 gpc=48049 gps=1218 nf=207 nn=137723
|
||||||
|
4 np=221310 qsp=46850 cbr=0 cng=26 gpc=43161 gps=4634 nf=3529 nn=123110
|
||||||
|
5 np=237332 qsp=48449 cbr=0 cng=54 gpc=47920 gps=3252 nf=201 nn=137456
|
||||||
|
6 np=219995 qsp=46718 cbr=0 cng=50 gpc=42098 gps=6093 nf=4202 nn=120834
|
||||||
|
7 np=249893 qsp=49390 cbr=0 cng=72 gpc=38400 gps=17102 nf=41 nn=144888
|
||||||
|
rcu_bh:
|
||||||
|
0 np=146741 qsp=1419 cbr=0 cng=6 gpc=0 gps=0 nf=2 nn=145314
|
||||||
|
1 np=155792 qsp=12597 cbr=0 cng=0 gpc=4 gps=8 nf=3 nn=143180
|
||||||
|
2 np=136629 qsp=18680 cbr=0 cng=0 gpc=7 gps=6 nf=0 nn=117936
|
||||||
|
3 np=137723 qsp=2843 cbr=0 cng=0 gpc=10 gps=7 nf=0 nn=134863
|
||||||
|
4 np=123110 qsp=12433 cbr=0 cng=0 gpc=4 gps=2 nf=0 nn=110671
|
||||||
|
5 np=137456 qsp=4210 cbr=0 cng=0 gpc=6 gps=5 nf=0 nn=133235
|
||||||
|
6 np=120834 qsp=9902 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921
|
||||||
|
7 np=144888 qsp=26336 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542
|
||||||
|
|
||||||
|
As always, this is once again split into "rcu" and "rcu_bh" portions.
|
||||||
|
The fields are as follows:
|
||||||
|
|
||||||
|
o "np" is the number of times that __rcu_pending() has been invoked
|
||||||
|
for the corresponding flavor of RCU.
|
||||||
|
|
||||||
|
o "qsp" is the number of times that the RCU was waiting for a
|
||||||
|
quiescent state from this CPU.
|
||||||
|
|
||||||
|
o "cbr" is the number of times that this CPU had RCU callbacks
|
||||||
|
that had passed through a grace period, and were thus ready
|
||||||
|
to be invoked.
|
||||||
|
|
||||||
|
o "cng" is the number of times that this CPU needed another
|
||||||
|
grace period while RCU was idle.
|
||||||
|
|
||||||
|
o "gpc" is the number of times that an old grace period had
|
||||||
|
completed, but this CPU was not yet aware of it.
|
||||||
|
|
||||||
|
o "gps" is the number of times that a new grace period had started,
|
||||||
|
but this CPU was not yet aware of it.
|
||||||
|
|
||||||
|
o "nf" is the number of times that this CPU suspected that the
|
||||||
|
current grace period had run for too long, and thus needed to
|
||||||
|
be forced.
|
||||||
|
|
||||||
|
Please note that "forcing" consists of sending resched IPIs
|
||||||
|
to holdout CPUs. If that CPU really still is in an old RCU
|
||||||
|
read-side critical section, then we really do have to wait for it.
|
||||||
|
The assumption behing "forcing" is that the CPU is not still in
|
||||||
|
an old RCU read-side critical section, but has not yet responded
|
||||||
|
for some other reason.
|
||||||
|
|
||||||
|
o "nn" is the number of times that this CPU needed nothing. Alert
|
||||||
|
readers will note that the rcu "nn" number for a given CPU very
|
||||||
|
closely matches the rcu_bh "np" number for that same CPU. This
|
||||||
|
is due to short-circuit evaluation in rcu_pending().
|
||||||
|
|||||||
+18
-2
@@ -184,8 +184,9 @@ length. Single character labels using special characters, that being anything
|
|||||||
other than a letter or digit, are reserved for use by the Smack development
|
other than a letter or digit, are reserved for use by the Smack development
|
||||||
team. Smack labels are unstructured, case sensitive, and the only operation
|
team. Smack labels are unstructured, case sensitive, and the only operation
|
||||||
ever performed on them is comparison for equality. Smack labels cannot
|
ever performed on them is comparison for equality. Smack labels cannot
|
||||||
contain unprintable characters or the "/" (slash) character. Smack labels
|
contain unprintable characters, the "/" (slash), the "\" (backslash), the "'"
|
||||||
cannot begin with a '-', which is reserved for special options.
|
(quote) and '"' (double-quote) characters.
|
||||||
|
Smack labels cannot begin with a '-', which is reserved for special options.
|
||||||
|
|
||||||
There are some predefined labels:
|
There are some predefined labels:
|
||||||
|
|
||||||
@@ -523,3 +524,18 @@ Smack supports some mount options:
|
|||||||
|
|
||||||
These mount options apply to all file system types.
|
These mount options apply to all file system types.
|
||||||
|
|
||||||
|
Smack auditing
|
||||||
|
|
||||||
|
If you want Smack auditing of security events, you need to set CONFIG_AUDIT
|
||||||
|
in your kernel configuration.
|
||||||
|
By default, all denied events will be audited. You can change this behavior by
|
||||||
|
writing a single character to the /smack/logging file :
|
||||||
|
0 : no logging
|
||||||
|
1 : log denied (default)
|
||||||
|
2 : log accepted
|
||||||
|
3 : log denied & accepted
|
||||||
|
|
||||||
|
Events are logged as 'key=value' pairs, for each event you at least will get
|
||||||
|
the subjet, the object, the rights requested, the action, the kernel function
|
||||||
|
that triggered the event, plus other pairs depending on the type of event
|
||||||
|
audited.
|
||||||
|
|||||||
@@ -186,7 +186,7 @@ a virtual address mapping (unlike the earlier scheme of virtual address
|
|||||||
do not have a corresponding kernel virtual address space mapping) and
|
do not have a corresponding kernel virtual address space mapping) and
|
||||||
low-memory pages.
|
low-memory pages.
|
||||||
|
|
||||||
Note: Please refer to Documentation/PCI/PCI-DMA-mapping.txt for a discussion
|
Note: Please refer to Documentation/DMA-mapping.txt for a discussion
|
||||||
on PCI high mem DMA aspects and mapping of scatter gather lists, and support
|
on PCI high mem DMA aspects and mapping of scatter gather lists, and support
|
||||||
for 64 bit PCI.
|
for 64 bit PCI.
|
||||||
|
|
||||||
|
|||||||
@@ -60,7 +60,7 @@ go_lock | Called for the first local holder of a lock
|
|||||||
go_unlock | Called on the final local unlock of a lock
|
go_unlock | Called on the final local unlock of a lock
|
||||||
go_dump | Called to print content of object for debugfs file, or on
|
go_dump | Called to print content of object for debugfs file, or on
|
||||||
| error to dump glock to the log.
|
| error to dump glock to the log.
|
||||||
go_type; | The type of the glock, LM_TYPE_.....
|
go_type | The type of the glock, LM_TYPE_.....
|
||||||
go_min_hold_time | The minimum hold time
|
go_min_hold_time | The minimum hold time
|
||||||
|
|
||||||
The minimum hold time for each lock is the time after a remote lock
|
The minimum hold time for each lock is the time after a remote lock
|
||||||
|
|||||||
@@ -11,18 +11,15 @@ their I/O so file system consistency is maintained. One of the nifty
|
|||||||
features of GFS is perfect consistency -- changes made to the file system
|
features of GFS is perfect consistency -- changes made to the file system
|
||||||
on one machine show up immediately on all other machines in the cluster.
|
on one machine show up immediately on all other machines in the cluster.
|
||||||
|
|
||||||
GFS uses interchangable inter-node locking mechanisms. Different lock
|
GFS uses interchangable inter-node locking mechanisms, the currently
|
||||||
modules can plug into GFS and each file system selects the appropriate
|
supported mechanisms are:
|
||||||
lock module at mount time. Lock modules include:
|
|
||||||
|
|
||||||
lock_nolock -- allows gfs to be used as a local file system
|
lock_nolock -- allows gfs to be used as a local file system
|
||||||
|
|
||||||
lock_dlm -- uses a distributed lock manager (dlm) for inter-node locking
|
lock_dlm -- uses a distributed lock manager (dlm) for inter-node locking
|
||||||
The dlm is found at linux/fs/dlm/
|
The dlm is found at linux/fs/dlm/
|
||||||
|
|
||||||
In addition to interfacing with an external locking manager, a gfs lock
|
Lock_dlm depends on user space cluster management systems found
|
||||||
module is responsible for interacting with external cluster management
|
|
||||||
systems. Lock_dlm depends on user space cluster management systems found
|
|
||||||
at the URL above.
|
at the URL above.
|
||||||
|
|
||||||
To use gfs as a local file system, no external clustering systems are
|
To use gfs as a local file system, no external clustering systems are
|
||||||
@@ -31,13 +28,19 @@ needed, simply:
|
|||||||
$ mkfs -t gfs2 -p lock_nolock -j 1 /dev/block_device
|
$ mkfs -t gfs2 -p lock_nolock -j 1 /dev/block_device
|
||||||
$ mount -t gfs2 /dev/block_device /dir
|
$ mount -t gfs2 /dev/block_device /dir
|
||||||
|
|
||||||
GFS2 is not on-disk compatible with previous versions of GFS.
|
If you are using Fedora, you need to install the gfs2-utils package
|
||||||
|
and, for lock_dlm, you will also need to install the cman package
|
||||||
|
and write a cluster.conf as per the documentation.
|
||||||
|
|
||||||
|
GFS2 is not on-disk compatible with previous versions of GFS, but it
|
||||||
|
is pretty close.
|
||||||
|
|
||||||
The following man pages can be found at the URL above:
|
The following man pages can be found at the URL above:
|
||||||
gfs2_fsck to repair a filesystem
|
fsck.gfs2 to repair a filesystem
|
||||||
gfs2_grow to expand a filesystem online
|
gfs2_grow to expand a filesystem online
|
||||||
gfs2_jadd to add journals to a filesystem online
|
gfs2_jadd to add journals to a filesystem online
|
||||||
gfs2_tool to manipulate, examine and tune a filesystem
|
gfs2_tool to manipulate, examine and tune a filesystem
|
||||||
gfs2_quota to examine and change quota values in a filesystem
|
gfs2_quota to examine and change quota values in a filesystem
|
||||||
|
gfs2_convert to convert a gfs filesystem to gfs2 in-place
|
||||||
mount.gfs2 to help mount(8) mount a filesystem
|
mount.gfs2 to help mount(8) mount a filesystem
|
||||||
mkfs.gfs2 to make a filesystem
|
mkfs.gfs2 to make a filesystem
|
||||||
|
|||||||
@@ -133,4 +133,4 @@ RAM/SWAP in 10240 inodes and it is only accessible by root.
|
|||||||
Author:
|
Author:
|
||||||
Christoph Rohland <cr@sap.com>, 1.12.01
|
Christoph Rohland <cr@sap.com>, 1.12.01
|
||||||
Updated:
|
Updated:
|
||||||
Hugh Dickins <hugh@veritas.com>, 4 June 2007
|
Hugh Dickins, 4 June 2007
|
||||||
|
|||||||
@@ -0,0 +1,131 @@
|
|||||||
|
Futex Requeue PI
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Requeueing of tasks from a non-PI futex to a PI futex requires
|
||||||
|
special handling in order to ensure the underlying rt_mutex is never
|
||||||
|
left without an owner if it has waiters; doing so would break the PI
|
||||||
|
boosting logic [see rt-mutex-desgin.txt] For the purposes of
|
||||||
|
brevity, this action will be referred to as "requeue_pi" throughout
|
||||||
|
this document. Priority inheritance is abbreviated throughout as
|
||||||
|
"PI".
|
||||||
|
|
||||||
|
Motivation
|
||||||
|
----------
|
||||||
|
|
||||||
|
Without requeue_pi, the glibc implementation of
|
||||||
|
pthread_cond_broadcast() must resort to waking all the tasks waiting
|
||||||
|
on a pthread_condvar and letting them try to sort out which task
|
||||||
|
gets to run first in classic thundering-herd formation. An ideal
|
||||||
|
implementation would wake the highest-priority waiter, and leave the
|
||||||
|
rest to the natural wakeup inherent in unlocking the mutex
|
||||||
|
associated with the condvar.
|
||||||
|
|
||||||
|
Consider the simplified glibc calls:
|
||||||
|
|
||||||
|
/* caller must lock mutex */
|
||||||
|
pthread_cond_wait(cond, mutex)
|
||||||
|
{
|
||||||
|
lock(cond->__data.__lock);
|
||||||
|
unlock(mutex);
|
||||||
|
do {
|
||||||
|
unlock(cond->__data.__lock);
|
||||||
|
futex_wait(cond->__data.__futex);
|
||||||
|
lock(cond->__data.__lock);
|
||||||
|
} while(...)
|
||||||
|
unlock(cond->__data.__lock);
|
||||||
|
lock(mutex);
|
||||||
|
}
|
||||||
|
|
||||||
|
pthread_cond_broadcast(cond)
|
||||||
|
{
|
||||||
|
lock(cond->__data.__lock);
|
||||||
|
unlock(cond->__data.__lock);
|
||||||
|
futex_requeue(cond->data.__futex, cond->mutex);
|
||||||
|
}
|
||||||
|
|
||||||
|
Once pthread_cond_broadcast() requeues the tasks, the cond->mutex
|
||||||
|
has waiters. Note that pthread_cond_wait() attempts to lock the
|
||||||
|
mutex only after it has returned to user space. This will leave the
|
||||||
|
underlying rt_mutex with waiters, and no owner, breaking the
|
||||||
|
previously mentioned PI-boosting algorithms.
|
||||||
|
|
||||||
|
In order to support PI-aware pthread_condvar's, the kernel needs to
|
||||||
|
be able to requeue tasks to PI futexes. This support implies that
|
||||||
|
upon a successful futex_wait system call, the caller would return to
|
||||||
|
user space already holding the PI futex. The glibc implementation
|
||||||
|
would be modified as follows:
|
||||||
|
|
||||||
|
|
||||||
|
/* caller must lock mutex */
|
||||||
|
pthread_cond_wait_pi(cond, mutex)
|
||||||
|
{
|
||||||
|
lock(cond->__data.__lock);
|
||||||
|
unlock(mutex);
|
||||||
|
do {
|
||||||
|
unlock(cond->__data.__lock);
|
||||||
|
futex_wait_requeue_pi(cond->__data.__futex);
|
||||||
|
lock(cond->__data.__lock);
|
||||||
|
} while(...)
|
||||||
|
unlock(cond->__data.__lock);
|
||||||
|
/* the kernel acquired the the mutex for us */
|
||||||
|
}
|
||||||
|
|
||||||
|
pthread_cond_broadcast_pi(cond)
|
||||||
|
{
|
||||||
|
lock(cond->__data.__lock);
|
||||||
|
unlock(cond->__data.__lock);
|
||||||
|
futex_requeue_pi(cond->data.__futex, cond->mutex);
|
||||||
|
}
|
||||||
|
|
||||||
|
The actual glibc implementation will likely test for PI and make the
|
||||||
|
necessary changes inside the existing calls rather than creating new
|
||||||
|
calls for the PI cases. Similar changes are needed for
|
||||||
|
pthread_cond_timedwait() and pthread_cond_signal().
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
--------------
|
||||||
|
|
||||||
|
In order to ensure the rt_mutex has an owner if it has waiters, it
|
||||||
|
is necessary for both the requeue code, as well as the waiting code,
|
||||||
|
to be able to acquire the rt_mutex before returning to user space.
|
||||||
|
The requeue code cannot simply wake the waiter and leave it to
|
||||||
|
acquire the rt_mutex as it would open a race window between the
|
||||||
|
requeue call returning to user space and the waiter waking and
|
||||||
|
starting to run. This is especially true in the uncontended case.
|
||||||
|
|
||||||
|
The solution involves two new rt_mutex helper routines,
|
||||||
|
rt_mutex_start_proxy_lock() and rt_mutex_finish_proxy_lock(), which
|
||||||
|
allow the requeue code to acquire an uncontended rt_mutex on behalf
|
||||||
|
of the waiter and to enqueue the waiter on a contended rt_mutex.
|
||||||
|
Two new system calls provide the kernel<->user interface to
|
||||||
|
requeue_pi: FUTEX_WAIT_REQUEUE_PI and FUTEX_REQUEUE_CMP_PI.
|
||||||
|
|
||||||
|
FUTEX_WAIT_REQUEUE_PI is called by the waiter (pthread_cond_wait()
|
||||||
|
and pthread_cond_timedwait()) to block on the initial futex and wait
|
||||||
|
to be requeued to a PI-aware futex. The implementation is the
|
||||||
|
result of a high-speed collision between futex_wait() and
|
||||||
|
futex_lock_pi(), with some extra logic to check for the additional
|
||||||
|
wake-up scenarios.
|
||||||
|
|
||||||
|
FUTEX_REQUEUE_CMP_PI is called by the waker
|
||||||
|
(pthread_cond_broadcast() and pthread_cond_signal()) to requeue and
|
||||||
|
possibly wake the waiting tasks. Internally, this system call is
|
||||||
|
still handled by futex_requeue (by passing requeue_pi=1). Before
|
||||||
|
requeueing, futex_requeue() attempts to acquire the requeue target
|
||||||
|
PI futex on behalf of the top waiter. If it can, this waiter is
|
||||||
|
woken. futex_requeue() then proceeds to requeue the remaining
|
||||||
|
nr_wake+nr_requeue tasks to the PI futex, calling
|
||||||
|
rt_mutex_start_proxy_lock() prior to each requeue to prepare the
|
||||||
|
task as a waiter on the underlying rt_mutex. It is possible that
|
||||||
|
the lock can be acquired at this stage as well, if so, the next
|
||||||
|
waiter is woken to finish the acquisition of the lock.
|
||||||
|
|
||||||
|
FUTEX_REQUEUE_PI accepts nr_wake and nr_requeue as arguments, but
|
||||||
|
their sum is all that really matters. futex_requeue() will wake or
|
||||||
|
requeue up to nr_wake + nr_requeue tasks. It will wake only as many
|
||||||
|
tasks as it can acquire the lock for, which in the majority of cases
|
||||||
|
should be 0 as good programming practice dictates that the caller of
|
||||||
|
either pthread_cond_broadcast() or pthread_cond_signal() acquire the
|
||||||
|
mutex prior to making the call. FUTEX_REQUEUE_PI requires that
|
||||||
|
nr_wake=1. nr_requeue should be INT_MAX for broadcast and 0 for
|
||||||
|
signal.
|
||||||
@@ -150,6 +150,11 @@ fan[1-*]_min Fan minimum value
|
|||||||
Unit: revolution/min (RPM)
|
Unit: revolution/min (RPM)
|
||||||
RW
|
RW
|
||||||
|
|
||||||
|
fan[1-*]_max Fan maximum value
|
||||||
|
Unit: revolution/min (RPM)
|
||||||
|
Only rarely supported by the hardware.
|
||||||
|
RW
|
||||||
|
|
||||||
fan[1-*]_input Fan input value.
|
fan[1-*]_input Fan input value.
|
||||||
Unit: revolution/min (RPM)
|
Unit: revolution/min (RPM)
|
||||||
RO
|
RO
|
||||||
@@ -390,6 +395,7 @@ OR
|
|||||||
in[0-*]_min_alarm
|
in[0-*]_min_alarm
|
||||||
in[0-*]_max_alarm
|
in[0-*]_max_alarm
|
||||||
fan[1-*]_min_alarm
|
fan[1-*]_min_alarm
|
||||||
|
fan[1-*]_max_alarm
|
||||||
temp[1-*]_min_alarm
|
temp[1-*]_min_alarm
|
||||||
temp[1-*]_max_alarm
|
temp[1-*]_max_alarm
|
||||||
temp[1-*]_crit_alarm
|
temp[1-*]_crit_alarm
|
||||||
|
|||||||
@@ -18,8 +18,12 @@ Usage
|
|||||||
Anonymous finger details are sent sequentially as separate packets of ABS
|
Anonymous finger details are sent sequentially as separate packets of ABS
|
||||||
events. Only the ABS_MT events are recognized as part of a finger
|
events. Only the ABS_MT events are recognized as part of a finger
|
||||||
packet. The end of a packet is marked by calling the input_mt_sync()
|
packet. The end of a packet is marked by calling the input_mt_sync()
|
||||||
function, which generates a SYN_MT_REPORT event. The end of multi-touch
|
function, which generates a SYN_MT_REPORT event. This instructs the
|
||||||
transfer is marked by calling the usual input_sync() function.
|
receiver to accept the data for the current finger and prepare to receive
|
||||||
|
another. The end of a multi-touch transfer is marked by calling the usual
|
||||||
|
input_sync() function. This instructs the receiver to act upon events
|
||||||
|
accumulated since last EV_SYN/SYN_REPORT and prepare to receive a new
|
||||||
|
set of events/packets.
|
||||||
|
|
||||||
A set of ABS_MT events with the desired properties is defined. The events
|
A set of ABS_MT events with the desired properties is defined. The events
|
||||||
are divided into categories, to allow for partial implementation. The
|
are divided into categories, to allow for partial implementation. The
|
||||||
@@ -27,11 +31,26 @@ minimum set consists of ABS_MT_TOUCH_MAJOR, ABS_MT_POSITION_X and
|
|||||||
ABS_MT_POSITION_Y, which allows for multiple fingers to be tracked. If the
|
ABS_MT_POSITION_Y, which allows for multiple fingers to be tracked. If the
|
||||||
device supports it, the ABS_MT_WIDTH_MAJOR may be used to provide the size
|
device supports it, the ABS_MT_WIDTH_MAJOR may be used to provide the size
|
||||||
of the approaching finger. Anisotropy and direction may be specified with
|
of the approaching finger. Anisotropy and direction may be specified with
|
||||||
ABS_MT_TOUCH_MINOR, ABS_MT_WIDTH_MINOR and ABS_MT_ORIENTATION. Devices with
|
ABS_MT_TOUCH_MINOR, ABS_MT_WIDTH_MINOR and ABS_MT_ORIENTATION. The
|
||||||
more granular information may specify general shapes as blobs, i.e., as a
|
ABS_MT_TOOL_TYPE may be used to specify whether the touching tool is a
|
||||||
sequence of rectangular shapes grouped together by an
|
finger or a pen or something else. Devices with more granular information
|
||||||
ABS_MT_BLOB_ID. Finally, the ABS_MT_TOOL_TYPE may be used to specify
|
may specify general shapes as blobs, i.e., as a sequence of rectangular
|
||||||
whether the touching tool is a finger or a pen or something else.
|
shapes grouped together by an ABS_MT_BLOB_ID. Finally, for the few devices
|
||||||
|
that currently support it, the ABS_MT_TRACKING_ID event may be used to
|
||||||
|
report finger tracking from hardware [5].
|
||||||
|
|
||||||
|
Here is what a minimal event sequence for a two-finger touch would look
|
||||||
|
like:
|
||||||
|
|
||||||
|
ABS_MT_TOUCH_MAJOR
|
||||||
|
ABS_MT_POSITION_X
|
||||||
|
ABS_MT_POSITION_Y
|
||||||
|
SYN_MT_REPORT
|
||||||
|
ABS_MT_TOUCH_MAJOR
|
||||||
|
ABS_MT_POSITION_X
|
||||||
|
ABS_MT_POSITION_Y
|
||||||
|
SYN_MT_REPORT
|
||||||
|
SYN_REPORT
|
||||||
|
|
||||||
|
|
||||||
Event Semantics
|
Event Semantics
|
||||||
@@ -44,24 +63,24 @@ ABS_MT_TOUCH_MAJOR
|
|||||||
|
|
||||||
The length of the major axis of the contact. The length should be given in
|
The length of the major axis of the contact. The length should be given in
|
||||||
surface units. If the surface has an X times Y resolution, the largest
|
surface units. If the surface has an X times Y resolution, the largest
|
||||||
possible value of ABS_MT_TOUCH_MAJOR is sqrt(X^2 + Y^2), the diagonal.
|
possible value of ABS_MT_TOUCH_MAJOR is sqrt(X^2 + Y^2), the diagonal [4].
|
||||||
|
|
||||||
ABS_MT_TOUCH_MINOR
|
ABS_MT_TOUCH_MINOR
|
||||||
|
|
||||||
The length, in surface units, of the minor axis of the contact. If the
|
The length, in surface units, of the minor axis of the contact. If the
|
||||||
contact is circular, this event can be omitted.
|
contact is circular, this event can be omitted [4].
|
||||||
|
|
||||||
ABS_MT_WIDTH_MAJOR
|
ABS_MT_WIDTH_MAJOR
|
||||||
|
|
||||||
The length, in surface units, of the major axis of the approaching
|
The length, in surface units, of the major axis of the approaching
|
||||||
tool. This should be understood as the size of the tool itself. The
|
tool. This should be understood as the size of the tool itself. The
|
||||||
orientation of the contact and the approaching tool are assumed to be the
|
orientation of the contact and the approaching tool are assumed to be the
|
||||||
same.
|
same [4].
|
||||||
|
|
||||||
ABS_MT_WIDTH_MINOR
|
ABS_MT_WIDTH_MINOR
|
||||||
|
|
||||||
The length, in surface units, of the minor axis of the approaching
|
The length, in surface units, of the minor axis of the approaching
|
||||||
tool. Omit if circular.
|
tool. Omit if circular [4].
|
||||||
|
|
||||||
The above four values can be used to derive additional information about
|
The above four values can be used to derive additional information about
|
||||||
the contact. The ratio ABS_MT_TOUCH_MAJOR / ABS_MT_WIDTH_MAJOR approximates
|
the contact. The ratio ABS_MT_TOUCH_MAJOR / ABS_MT_WIDTH_MAJOR approximates
|
||||||
@@ -70,14 +89,17 @@ different characteristic widths [1].
|
|||||||
|
|
||||||
ABS_MT_ORIENTATION
|
ABS_MT_ORIENTATION
|
||||||
|
|
||||||
The orientation of the ellipse. The value should describe half a revolution
|
The orientation of the ellipse. The value should describe a signed quarter
|
||||||
clockwise around the touch center. The scale of the value is arbitrary, but
|
of a revolution clockwise around the touch center. The signed value range
|
||||||
zero should be returned for an ellipse aligned along the Y axis of the
|
is arbitrary, but zero should be returned for a finger aligned along the Y
|
||||||
surface. As an example, an index finger placed straight onto the axis could
|
axis of the surface, a negative value when finger is turned to the left, and
|
||||||
return zero orientation, something negative when twisted to the left, and
|
a positive value when finger turned to the right. When completely aligned with
|
||||||
something positive when twisted to the right. This value can be omitted if
|
the X axis, the range max should be returned. Orientation can be omitted
|
||||||
the touching object is circular, or if the information is not available in
|
if the touching object is circular, or if the information is not available
|
||||||
the kernel driver.
|
in the kernel driver. Partial orientation support is possible if the device
|
||||||
|
can distinguish between the two axis, but not (uniquely) any values in
|
||||||
|
between. In such cases, the range of ABS_MT_ORIENTATION should be [0, 1]
|
||||||
|
[4].
|
||||||
|
|
||||||
ABS_MT_POSITION_X
|
ABS_MT_POSITION_X
|
||||||
|
|
||||||
@@ -98,8 +120,35 @@ ABS_MT_BLOB_ID
|
|||||||
|
|
||||||
The BLOB_ID groups several packets together into one arbitrarily shaped
|
The BLOB_ID groups several packets together into one arbitrarily shaped
|
||||||
contact. This is a low-level anonymous grouping, and should not be confused
|
contact. This is a low-level anonymous grouping, and should not be confused
|
||||||
with the high-level contactID, explained below. Most kernel drivers will
|
with the high-level trackingID [5]. Most kernel drivers will not have blob
|
||||||
not have this capability, and can safely omit the event.
|
capability, and can safely omit the event.
|
||||||
|
|
||||||
|
ABS_MT_TRACKING_ID
|
||||||
|
|
||||||
|
The TRACKING_ID identifies an initiated contact throughout its life cycle
|
||||||
|
[5]. There are currently only a few devices that support it, so this event
|
||||||
|
should normally be omitted.
|
||||||
|
|
||||||
|
|
||||||
|
Event Computation
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
The flora of different hardware unavoidably leads to some devices fitting
|
||||||
|
better to the MT protocol than others. To simplify and unify the mapping,
|
||||||
|
this section gives recipes for how to compute certain events.
|
||||||
|
|
||||||
|
For devices reporting contacts as rectangular shapes, signed orientation
|
||||||
|
cannot be obtained. Assuming X and Y are the lengths of the sides of the
|
||||||
|
touching rectangle, here is a simple formula that retains the most
|
||||||
|
information possible:
|
||||||
|
|
||||||
|
ABS_MT_TOUCH_MAJOR := max(X, Y)
|
||||||
|
ABS_MT_TOUCH_MINOR := min(X, Y)
|
||||||
|
ABS_MT_ORIENTATION := bool(X > Y)
|
||||||
|
|
||||||
|
The range of ABS_MT_ORIENTATION should be set to [0, 1], to indicate that
|
||||||
|
the device can distinguish between a finger along the Y axis (0) and a
|
||||||
|
finger along the X axis (1).
|
||||||
|
|
||||||
|
|
||||||
Finger Tracking
|
Finger Tracking
|
||||||
@@ -109,14 +158,18 @@ The kernel driver should generate an arbitrary enumeration of the set of
|
|||||||
anonymous contacts currently on the surface. The order in which the packets
|
anonymous contacts currently on the surface. The order in which the packets
|
||||||
appear in the event stream is not important.
|
appear in the event stream is not important.
|
||||||
|
|
||||||
The process of finger tracking, i.e., to assign a unique contactID to each
|
The process of finger tracking, i.e., to assign a unique trackingID to each
|
||||||
initiated contact on the surface, is left to user space; preferably the
|
initiated contact on the surface, is left to user space; preferably the
|
||||||
multi-touch X driver [3]. In that driver, the contactID stays the same and
|
multi-touch X driver [3]. In that driver, the trackingID stays the same and
|
||||||
unique until the contact vanishes (when the finger leaves the surface). The
|
unique until the contact vanishes (when the finger leaves the surface). The
|
||||||
problem of assigning a set of anonymous fingers to a set of identified
|
problem of assigning a set of anonymous fingers to a set of identified
|
||||||
fingers is a euclidian bipartite matching problem at each event update, and
|
fingers is a euclidian bipartite matching problem at each event update, and
|
||||||
relies on a sufficiently rapid update rate.
|
relies on a sufficiently rapid update rate.
|
||||||
|
|
||||||
|
There are a few devices that support trackingID in hardware. User space can
|
||||||
|
make use of these native identifiers to reduce bandwidth and cpu usage.
|
||||||
|
|
||||||
|
|
||||||
Notes
|
Notes
|
||||||
-----
|
-----
|
||||||
|
|
||||||
@@ -136,5 +189,7 @@ could be used to derive tilt.
|
|||||||
time of writing (April 2009), the MT protocol is not yet merged, and the
|
time of writing (April 2009), the MT protocol is not yet merged, and the
|
||||||
prototype implements finger matching, basic mouse support and two-finger
|
prototype implements finger matching, basic mouse support and two-finger
|
||||||
scrolling. The project aims at improving the quality of current multi-touch
|
scrolling. The project aims at improving the quality of current multi-touch
|
||||||
functionality available in the synaptics X driver, and in addition
|
functionality available in the Synaptics X driver, and in addition
|
||||||
implement more advanced gestures.
|
implement more advanced gestures.
|
||||||
|
[4] See the section on event computation.
|
||||||
|
[5] See the section on finger tracking.
|
||||||
|
|||||||
@@ -56,7 +56,6 @@ parameter is applicable:
|
|||||||
ISAPNP ISA PnP code is enabled.
|
ISAPNP ISA PnP code is enabled.
|
||||||
ISDN Appropriate ISDN support is enabled.
|
ISDN Appropriate ISDN support is enabled.
|
||||||
JOY Appropriate joystick support is enabled.
|
JOY Appropriate joystick support is enabled.
|
||||||
KMEMTRACE kmemtrace is enabled.
|
|
||||||
LIBATA Libata driver is enabled
|
LIBATA Libata driver is enabled
|
||||||
LP Printer support is enabled.
|
LP Printer support is enabled.
|
||||||
LOOP Loopback device support is enabled.
|
LOOP Loopback device support is enabled.
|
||||||
@@ -329,11 +328,6 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||||||
flushed before they will be reused, which
|
flushed before they will be reused, which
|
||||||
is a lot of faster
|
is a lot of faster
|
||||||
|
|
||||||
amd_iommu_size= [HW,X86-64]
|
|
||||||
Define the size of the aperture for the AMD IOMMU
|
|
||||||
driver. Possible values are:
|
|
||||||
'32M', '64M' (default), '128M', '256M', '512M', '1G'
|
|
||||||
|
|
||||||
amijoy.map= [HW,JOY] Amiga joystick support
|
amijoy.map= [HW,JOY] Amiga joystick support
|
||||||
Map of devices attached to JOY0DAT and JOY1DAT
|
Map of devices attached to JOY0DAT and JOY1DAT
|
||||||
Format: <a>,<b>
|
Format: <a>,<b>
|
||||||
@@ -646,6 +640,13 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||||||
DMA-API debugging code disables itself because the
|
DMA-API debugging code disables itself because the
|
||||||
architectural default is too low.
|
architectural default is too low.
|
||||||
|
|
||||||
|
dma_debug_driver=<driver_name>
|
||||||
|
With this option the DMA-API debugging driver
|
||||||
|
filter feature can be enabled at boot time. Just
|
||||||
|
pass the driver to filter for as the parameter.
|
||||||
|
The filter can be disabled or changed to another
|
||||||
|
driver later using sysfs.
|
||||||
|
|
||||||
dscc4.setup= [NET]
|
dscc4.setup= [NET]
|
||||||
|
|
||||||
dtc3181e= [HW,SCSI]
|
dtc3181e= [HW,SCSI]
|
||||||
@@ -752,12 +753,25 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||||||
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
|
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
|
||||||
|
|
||||||
ftrace=[tracer]
|
ftrace=[tracer]
|
||||||
[ftrace] will set and start the specified tracer
|
[FTRACE] will set and start the specified tracer
|
||||||
as early as possible in order to facilitate early
|
as early as possible in order to facilitate early
|
||||||
boot debugging.
|
boot debugging.
|
||||||
|
|
||||||
ftrace_dump_on_oops
|
ftrace_dump_on_oops
|
||||||
[ftrace] will dump the trace buffers on oops.
|
[FTRACE] will dump the trace buffers on oops.
|
||||||
|
|
||||||
|
ftrace_filter=[function-list]
|
||||||
|
[FTRACE] Limit the functions traced by the function
|
||||||
|
tracer at boot up. function-list is a comma separated
|
||||||
|
list of functions. This list can be changed at run
|
||||||
|
time by the set_ftrace_filter file in the debugfs
|
||||||
|
tracing directory.
|
||||||
|
|
||||||
|
ftrace_notrace=[function-list]
|
||||||
|
[FTRACE] Do not trace the functions specified in
|
||||||
|
function-list. This list can be changed at run time
|
||||||
|
by the set_ftrace_notrace file in the debugfs
|
||||||
|
tracing directory.
|
||||||
|
|
||||||
gamecon.map[2|3]=
|
gamecon.map[2|3]=
|
||||||
[HW,JOY] Multisystem joystick and NES/SNES/PSX pad
|
[HW,JOY] Multisystem joystick and NES/SNES/PSX pad
|
||||||
@@ -914,6 +928,12 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||||||
Formt: { "sha1" | "md5" }
|
Formt: { "sha1" | "md5" }
|
||||||
default: "sha1"
|
default: "sha1"
|
||||||
|
|
||||||
|
ima_tcb [IMA]
|
||||||
|
Load a policy which meets the needs of the Trusted
|
||||||
|
Computing Base. This means IMA will measure all
|
||||||
|
programs exec'd, files mmap'd for exec, and all files
|
||||||
|
opened for read by uid=0.
|
||||||
|
|
||||||
in2000= [HW,SCSI]
|
in2000= [HW,SCSI]
|
||||||
See header of drivers/scsi/in2000.c.
|
See header of drivers/scsi/in2000.c.
|
||||||
|
|
||||||
@@ -1054,15 +1074,6 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||||||
use the HighMem zone if it exists, and the Normal
|
use the HighMem zone if it exists, and the Normal
|
||||||
zone if it does not.
|
zone if it does not.
|
||||||
|
|
||||||
kmemtrace.enable= [KNL,KMEMTRACE] Format: { yes | no }
|
|
||||||
Controls whether kmemtrace is enabled
|
|
||||||
at boot-time.
|
|
||||||
|
|
||||||
kmemtrace.subbufs=n [KNL,KMEMTRACE] Overrides the number of
|
|
||||||
subbufs kmemtrace's relay channel has. Set this
|
|
||||||
higher than default (KMEMTRACE_N_SUBBUFS in code) if
|
|
||||||
you experience buffer overruns.
|
|
||||||
|
|
||||||
kgdboc= [HW] kgdb over consoles.
|
kgdboc= [HW] kgdb over consoles.
|
||||||
Requires a tty driver that supports console polling.
|
Requires a tty driver that supports console polling.
|
||||||
(only serial suported for now)
|
(only serial suported for now)
|
||||||
@@ -1072,6 +1083,10 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||||||
Configure the RouterBoard 532 series on-chip
|
Configure the RouterBoard 532 series on-chip
|
||||||
Ethernet adapter MAC address.
|
Ethernet adapter MAC address.
|
||||||
|
|
||||||
|
kmemleak= [KNL] Boot-time kmemleak enable/disable
|
||||||
|
Valid arguments: on, off
|
||||||
|
Default: on
|
||||||
|
|
||||||
kstack=N [X86] Print N words from the kernel stack
|
kstack=N [X86] Print N words from the kernel stack
|
||||||
in oops dumps.
|
in oops dumps.
|
||||||
|
|
||||||
@@ -1535,6 +1550,10 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||||||
register save and restore. The kernel will only save
|
register save and restore. The kernel will only save
|
||||||
legacy floating-point registers on task switch.
|
legacy floating-point registers on task switch.
|
||||||
|
|
||||||
|
noxsave [BUGS=X86] Disables x86 extended register state save
|
||||||
|
and restore using xsave. The kernel will fallback to
|
||||||
|
enabling legacy floating-point and sse state.
|
||||||
|
|
||||||
nohlt [BUGS=ARM,SH] Tells the kernel that the sleep(SH) or
|
nohlt [BUGS=ARM,SH] Tells the kernel that the sleep(SH) or
|
||||||
wfi(ARM) instruction doesn't work correctly and not to
|
wfi(ARM) instruction doesn't work correctly and not to
|
||||||
use it. This is also useful when using JTAG debugger.
|
use it. This is also useful when using JTAG debugger.
|
||||||
@@ -1571,6 +1590,9 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||||||
noinitrd [RAM] Tells the kernel not to load any configured
|
noinitrd [RAM] Tells the kernel not to load any configured
|
||||||
initial RAM disk.
|
initial RAM disk.
|
||||||
|
|
||||||
|
nointremap [X86-64, Intel-IOMMU] Do not enable interrupt
|
||||||
|
remapping.
|
||||||
|
|
||||||
nointroute [IA-64]
|
nointroute [IA-64]
|
||||||
|
|
||||||
nojitter [IA64] Disables jitter checking for ITC timers.
|
nojitter [IA64] Disables jitter checking for ITC timers.
|
||||||
@@ -1656,6 +1678,14 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||||||
oprofile.timer= [HW]
|
oprofile.timer= [HW]
|
||||||
Use timer interrupt instead of performance counters
|
Use timer interrupt instead of performance counters
|
||||||
|
|
||||||
|
oprofile.cpu_type= Force an oprofile cpu type
|
||||||
|
This might be useful if you have an older oprofile
|
||||||
|
userland or if you want common events.
|
||||||
|
Format: { archperfmon }
|
||||||
|
archperfmon: [X86] Force use of architectural
|
||||||
|
perfmon on Intel CPUs instead of the
|
||||||
|
CPU specific event set.
|
||||||
|
|
||||||
osst= [HW,SCSI] SCSI Tape Driver
|
osst= [HW,SCSI] SCSI Tape Driver
|
||||||
Format: <buffer_size>,<write_threshold>
|
Format: <buffer_size>,<write_threshold>
|
||||||
See also Documentation/scsi/st.txt.
|
See also Documentation/scsi/st.txt.
|
||||||
|
|||||||
@@ -0,0 +1,142 @@
|
|||||||
|
Kernel Memory Leak Detector
|
||||||
|
===========================
|
||||||
|
|
||||||
|
Introduction
|
||||||
|
------------
|
||||||
|
|
||||||
|
Kmemleak provides a way of detecting possible kernel memory leaks in a
|
||||||
|
way similar to a tracing garbage collector
|
||||||
|
(http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29#Tracing_garbage_collectors),
|
||||||
|
with the difference that the orphan objects are not freed but only
|
||||||
|
reported via /sys/kernel/debug/kmemleak. A similar method is used by the
|
||||||
|
Valgrind tool (memcheck --leak-check) to detect the memory leaks in
|
||||||
|
user-space applications.
|
||||||
|
|
||||||
|
Usage
|
||||||
|
-----
|
||||||
|
|
||||||
|
CONFIG_DEBUG_KMEMLEAK in "Kernel hacking" has to be enabled. A kernel
|
||||||
|
thread scans the memory every 10 minutes (by default) and prints any new
|
||||||
|
unreferenced objects found. To trigger an intermediate scan and display
|
||||||
|
all the possible memory leaks:
|
||||||
|
|
||||||
|
# mount -t debugfs nodev /sys/kernel/debug/
|
||||||
|
# cat /sys/kernel/debug/kmemleak
|
||||||
|
|
||||||
|
Note that the orphan objects are listed in the order they were allocated
|
||||||
|
and one object at the beginning of the list may cause other subsequent
|
||||||
|
objects to be reported as orphan.
|
||||||
|
|
||||||
|
Memory scanning parameters can be modified at run-time by writing to the
|
||||||
|
/sys/kernel/debug/kmemleak file. The following parameters are supported:
|
||||||
|
|
||||||
|
off - disable kmemleak (irreversible)
|
||||||
|
stack=on - enable the task stacks scanning
|
||||||
|
stack=off - disable the tasks stacks scanning
|
||||||
|
scan=on - start the automatic memory scanning thread
|
||||||
|
scan=off - stop the automatic memory scanning thread
|
||||||
|
scan=<secs> - set the automatic memory scanning period in seconds (0
|
||||||
|
to disable it)
|
||||||
|
|
||||||
|
Kmemleak can also be disabled at boot-time by passing "kmemleak=off" on
|
||||||
|
the kernel command line.
|
||||||
|
|
||||||
|
Basic Algorithm
|
||||||
|
---------------
|
||||||
|
|
||||||
|
The memory allocations via kmalloc, vmalloc, kmem_cache_alloc and
|
||||||
|
friends are traced and the pointers, together with additional
|
||||||
|
information like size and stack trace, are stored in a prio search tree.
|
||||||
|
The corresponding freeing function calls are tracked and the pointers
|
||||||
|
removed from the kmemleak data structures.
|
||||||
|
|
||||||
|
An allocated block of memory is considered orphan if no pointer to its
|
||||||
|
start address or to any location inside the block can be found by
|
||||||
|
scanning the memory (including saved registers). This means that there
|
||||||
|
might be no way for the kernel to pass the address of the allocated
|
||||||
|
block to a freeing function and therefore the block is considered a
|
||||||
|
memory leak.
|
||||||
|
|
||||||
|
The scanning algorithm steps:
|
||||||
|
|
||||||
|
1. mark all objects as white (remaining white objects will later be
|
||||||
|
considered orphan)
|
||||||
|
2. scan the memory starting with the data section and stacks, checking
|
||||||
|
the values against the addresses stored in the prio search tree. If
|
||||||
|
a pointer to a white object is found, the object is added to the
|
||||||
|
gray list
|
||||||
|
3. scan the gray objects for matching addresses (some white objects
|
||||||
|
can become gray and added at the end of the gray list) until the
|
||||||
|
gray set is finished
|
||||||
|
4. the remaining white objects are considered orphan and reported via
|
||||||
|
/sys/kernel/debug/kmemleak
|
||||||
|
|
||||||
|
Some allocated memory blocks have pointers stored in the kernel's
|
||||||
|
internal data structures and they cannot be detected as orphans. To
|
||||||
|
avoid this, kmemleak can also store the number of values pointing to an
|
||||||
|
address inside the block address range that need to be found so that the
|
||||||
|
block is not considered a leak. One example is __vmalloc().
|
||||||
|
|
||||||
|
Kmemleak API
|
||||||
|
------------
|
||||||
|
|
||||||
|
See the include/linux/kmemleak.h header for the functions prototype.
|
||||||
|
|
||||||
|
kmemleak_init - initialize kmemleak
|
||||||
|
kmemleak_alloc - notify of a memory block allocation
|
||||||
|
kmemleak_free - notify of a memory block freeing
|
||||||
|
kmemleak_not_leak - mark an object as not a leak
|
||||||
|
kmemleak_ignore - do not scan or report an object as leak
|
||||||
|
kmemleak_scan_area - add scan areas inside a memory block
|
||||||
|
kmemleak_no_scan - do not scan a memory block
|
||||||
|
kmemleak_erase - erase an old value in a pointer variable
|
||||||
|
kmemleak_alloc_recursive - as kmemleak_alloc but checks the recursiveness
|
||||||
|
kmemleak_free_recursive - as kmemleak_free but checks the recursiveness
|
||||||
|
|
||||||
|
Dealing with false positives/negatives
|
||||||
|
--------------------------------------
|
||||||
|
|
||||||
|
The false negatives are real memory leaks (orphan objects) but not
|
||||||
|
reported by kmemleak because values found during the memory scanning
|
||||||
|
point to such objects. To reduce the number of false negatives, kmemleak
|
||||||
|
provides the kmemleak_ignore, kmemleak_scan_area, kmemleak_no_scan and
|
||||||
|
kmemleak_erase functions (see above). The task stacks also increase the
|
||||||
|
amount of false negatives and their scanning is not enabled by default.
|
||||||
|
|
||||||
|
The false positives are objects wrongly reported as being memory leaks
|
||||||
|
(orphan). For objects known not to be leaks, kmemleak provides the
|
||||||
|
kmemleak_not_leak function. The kmemleak_ignore could also be used if
|
||||||
|
the memory block is known not to contain other pointers and it will no
|
||||||
|
longer be scanned.
|
||||||
|
|
||||||
|
Some of the reported leaks are only transient, especially on SMP
|
||||||
|
systems, because of pointers temporarily stored in CPU registers or
|
||||||
|
stacks. Kmemleak defines MSECS_MIN_AGE (defaulting to 1000) representing
|
||||||
|
the minimum age of an object to be reported as a memory leak.
|
||||||
|
|
||||||
|
Limitations and Drawbacks
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
The main drawback is the reduced performance of memory allocation and
|
||||||
|
freeing. To avoid other penalties, the memory scanning is only performed
|
||||||
|
when the /sys/kernel/debug/kmemleak file is read. Anyway, this tool is
|
||||||
|
intended for debugging purposes where the performance might not be the
|
||||||
|
most important requirement.
|
||||||
|
|
||||||
|
To keep the algorithm simple, kmemleak scans for values pointing to any
|
||||||
|
address inside a block's address range. This may lead to an increased
|
||||||
|
number of false negatives. However, it is likely that a real memory leak
|
||||||
|
will eventually become visible.
|
||||||
|
|
||||||
|
Another source of false negatives is the data stored in non-pointer
|
||||||
|
values. In a future version, kmemleak could only scan the pointer
|
||||||
|
members in the allocated structures. This feature would solve many of
|
||||||
|
the false negative cases described above.
|
||||||
|
|
||||||
|
The tool can report false positives. These are cases where an allocated
|
||||||
|
block doesn't need to be freed (some cases in the init_call functions),
|
||||||
|
the pointer is calculated by other methods than the usual container_of
|
||||||
|
macro or the pointer is stored in a location not scanned by kmemleak.
|
||||||
|
|
||||||
|
Page allocations and ioremap are not tracked. Only the ARM and x86
|
||||||
|
architectures are currently supported.
|
||||||
@@ -31,6 +31,7 @@ Contents:
|
|||||||
|
|
||||||
- Locking functions.
|
- Locking functions.
|
||||||
- Interrupt disabling functions.
|
- Interrupt disabling functions.
|
||||||
|
- Sleep and wake-up functions.
|
||||||
- Miscellaneous functions.
|
- Miscellaneous functions.
|
||||||
|
|
||||||
(*) Inter-CPU locking barrier effects.
|
(*) Inter-CPU locking barrier effects.
|
||||||
@@ -1217,6 +1218,132 @@ barriers are required in such a situation, they must be provided from some
|
|||||||
other means.
|
other means.
|
||||||
|
|
||||||
|
|
||||||
|
SLEEP AND WAKE-UP FUNCTIONS
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
Sleeping and waking on an event flagged in global data can be viewed as an
|
||||||
|
interaction between two pieces of data: the task state of the task waiting for
|
||||||
|
the event and the global data used to indicate the event. To make sure that
|
||||||
|
these appear to happen in the right order, the primitives to begin the process
|
||||||
|
of going to sleep, and the primitives to initiate a wake up imply certain
|
||||||
|
barriers.
|
||||||
|
|
||||||
|
Firstly, the sleeper normally follows something like this sequence of events:
|
||||||
|
|
||||||
|
for (;;) {
|
||||||
|
set_current_state(TASK_UNINTERRUPTIBLE);
|
||||||
|
if (event_indicated)
|
||||||
|
break;
|
||||||
|
schedule();
|
||||||
|
}
|
||||||
|
|
||||||
|
A general memory barrier is interpolated automatically by set_current_state()
|
||||||
|
after it has altered the task state:
|
||||||
|
|
||||||
|
CPU 1
|
||||||
|
===============================
|
||||||
|
set_current_state();
|
||||||
|
set_mb();
|
||||||
|
STORE current->state
|
||||||
|
<general barrier>
|
||||||
|
LOAD event_indicated
|
||||||
|
|
||||||
|
set_current_state() may be wrapped by:
|
||||||
|
|
||||||
|
prepare_to_wait();
|
||||||
|
prepare_to_wait_exclusive();
|
||||||
|
|
||||||
|
which therefore also imply a general memory barrier after setting the state.
|
||||||
|
The whole sequence above is available in various canned forms, all of which
|
||||||
|
interpolate the memory barrier in the right place:
|
||||||
|
|
||||||
|
wait_event();
|
||||||
|
wait_event_interruptible();
|
||||||
|
wait_event_interruptible_exclusive();
|
||||||
|
wait_event_interruptible_timeout();
|
||||||
|
wait_event_killable();
|
||||||
|
wait_event_timeout();
|
||||||
|
wait_on_bit();
|
||||||
|
wait_on_bit_lock();
|
||||||
|
|
||||||
|
|
||||||
|
Secondly, code that performs a wake up normally follows something like this:
|
||||||
|
|
||||||
|
event_indicated = 1;
|
||||||
|
wake_up(&event_wait_queue);
|
||||||
|
|
||||||
|
or:
|
||||||
|
|
||||||
|
event_indicated = 1;
|
||||||
|
wake_up_process(event_daemon);
|
||||||
|
|
||||||
|
A write memory barrier is implied by wake_up() and co. if and only if they wake
|
||||||
|
something up. The barrier occurs before the task state is cleared, and so sits
|
||||||
|
between the STORE to indicate the event and the STORE to set TASK_RUNNING:
|
||||||
|
|
||||||
|
CPU 1 CPU 2
|
||||||
|
=============================== ===============================
|
||||||
|
set_current_state(); STORE event_indicated
|
||||||
|
set_mb(); wake_up();
|
||||||
|
STORE current->state <write barrier>
|
||||||
|
<general barrier> STORE current->state
|
||||||
|
LOAD event_indicated
|
||||||
|
|
||||||
|
The available waker functions include:
|
||||||
|
|
||||||
|
complete();
|
||||||
|
wake_up();
|
||||||
|
wake_up_all();
|
||||||
|
wake_up_bit();
|
||||||
|
wake_up_interruptible();
|
||||||
|
wake_up_interruptible_all();
|
||||||
|
wake_up_interruptible_nr();
|
||||||
|
wake_up_interruptible_poll();
|
||||||
|
wake_up_interruptible_sync();
|
||||||
|
wake_up_interruptible_sync_poll();
|
||||||
|
wake_up_locked();
|
||||||
|
wake_up_locked_poll();
|
||||||
|
wake_up_nr();
|
||||||
|
wake_up_poll();
|
||||||
|
wake_up_process();
|
||||||
|
|
||||||
|
|
||||||
|
[!] Note that the memory barriers implied by the sleeper and the waker do _not_
|
||||||
|
order multiple stores before the wake-up with respect to loads of those stored
|
||||||
|
values after the sleeper has called set_current_state(). For instance, if the
|
||||||
|
sleeper does:
|
||||||
|
|
||||||
|
set_current_state(TASK_INTERRUPTIBLE);
|
||||||
|
if (event_indicated)
|
||||||
|
break;
|
||||||
|
__set_current_state(TASK_RUNNING);
|
||||||
|
do_something(my_data);
|
||||||
|
|
||||||
|
and the waker does:
|
||||||
|
|
||||||
|
my_data = value;
|
||||||
|
event_indicated = 1;
|
||||||
|
wake_up(&event_wait_queue);
|
||||||
|
|
||||||
|
there's no guarantee that the change to event_indicated will be perceived by
|
||||||
|
the sleeper as coming after the change to my_data. In such a circumstance, the
|
||||||
|
code on both sides must interpolate its own memory barriers between the
|
||||||
|
separate data accesses. Thus the above sleeper ought to do:
|
||||||
|
|
||||||
|
set_current_state(TASK_INTERRUPTIBLE);
|
||||||
|
if (event_indicated) {
|
||||||
|
smp_rmb();
|
||||||
|
do_something(my_data);
|
||||||
|
}
|
||||||
|
|
||||||
|
and the waker should do:
|
||||||
|
|
||||||
|
my_data = value;
|
||||||
|
smp_wmb();
|
||||||
|
event_indicated = 1;
|
||||||
|
wake_up(&event_wait_queue);
|
||||||
|
|
||||||
|
|
||||||
MISCELLANEOUS FUNCTIONS
|
MISCELLANEOUS FUNCTIONS
|
||||||
-----------------------
|
-----------------------
|
||||||
|
|
||||||
@@ -1366,7 +1493,7 @@ WHERE ARE MEMORY BARRIERS NEEDED?
|
|||||||
|
|
||||||
Under normal operation, memory operation reordering is generally not going to
|
Under normal operation, memory operation reordering is generally not going to
|
||||||
be a problem as a single-threaded linear piece of code will still appear to
|
be a problem as a single-threaded linear piece of code will still appear to
|
||||||
work correctly, even if it's in an SMP kernel. There are, however, three
|
work correctly, even if it's in an SMP kernel. There are, however, four
|
||||||
circumstances in which reordering definitely _could_ be a problem:
|
circumstances in which reordering definitely _could_ be a problem:
|
||||||
|
|
||||||
(*) Interprocessor interaction.
|
(*) Interprocessor interaction.
|
||||||
|
|||||||
@@ -1266,13 +1266,22 @@ sctp_rmem - vector of 3 INTEGERs: min, default, max
|
|||||||
sctp_wmem - vector of 3 INTEGERs: min, default, max
|
sctp_wmem - vector of 3 INTEGERs: min, default, max
|
||||||
See tcp_wmem for a description.
|
See tcp_wmem for a description.
|
||||||
|
|
||||||
UNDOCUMENTED:
|
|
||||||
|
|
||||||
/proc/sys/net/core/*
|
/proc/sys/net/core/*
|
||||||
dev_weight FIXME
|
dev_weight - INTEGER
|
||||||
|
The maximum number of packets that kernel can handle on a NAPI
|
||||||
|
interrupt, it's a Per-CPU variable.
|
||||||
|
|
||||||
|
Default: 64
|
||||||
|
|
||||||
/proc/sys/net/unix/*
|
/proc/sys/net/unix/*
|
||||||
max_dgram_qlen FIXME
|
max_dgram_qlen - INTEGER
|
||||||
|
The maximum length of dgram socket receive queue
|
||||||
|
|
||||||
|
Default: 10
|
||||||
|
|
||||||
|
|
||||||
|
UNDOCUMENTED:
|
||||||
|
|
||||||
/proc/sys/net/irda/*
|
/proc/sys/net/irda/*
|
||||||
fast_poll_increase FIXME
|
fast_poll_increase FIXME
|
||||||
|
|||||||
@@ -4,6 +4,7 @@
|
|||||||
CONTENTS
|
CONTENTS
|
||||||
========
|
========
|
||||||
|
|
||||||
|
0. WARNING
|
||||||
1. Overview
|
1. Overview
|
||||||
1.1 The problem
|
1.1 The problem
|
||||||
1.2 The solution
|
1.2 The solution
|
||||||
@@ -14,6 +15,23 @@ CONTENTS
|
|||||||
3. Future plans
|
3. Future plans
|
||||||
|
|
||||||
|
|
||||||
|
0. WARNING
|
||||||
|
==========
|
||||||
|
|
||||||
|
Fiddling with these settings can result in an unstable system, the knobs are
|
||||||
|
root only and assumes root knows what he is doing.
|
||||||
|
|
||||||
|
Most notable:
|
||||||
|
|
||||||
|
* very small values in sched_rt_period_us can result in an unstable
|
||||||
|
system when the period is smaller than either the available hrtimer
|
||||||
|
resolution, or the time it takes to handle the budget refresh itself.
|
||||||
|
|
||||||
|
* very small values in sched_rt_runtime_us can result in an unstable
|
||||||
|
system when the runtime is so small the system has difficulty making
|
||||||
|
forward progress (NOTE: the migration thread and kstopmachine both
|
||||||
|
are real-time processes).
|
||||||
|
|
||||||
1. Overview
|
1. Overview
|
||||||
===========
|
===========
|
||||||
|
|
||||||
@@ -169,7 +187,7 @@ get their allocated time.
|
|||||||
|
|
||||||
Implementing SCHED_EDF might take a while to complete. Priority Inheritance is
|
Implementing SCHED_EDF might take a while to complete. Priority Inheritance is
|
||||||
the biggest challenge as the current linux PI infrastructure is geared towards
|
the biggest challenge as the current linux PI infrastructure is geared towards
|
||||||
the limited static priority levels 0-139. With deadline scheduling you need to
|
the limited static priority levels 0-99. With deadline scheduling you need to
|
||||||
do deadline inheritance (since priority is inversely proportional to the
|
do deadline inheritance (since priority is inversely proportional to the
|
||||||
deadline delta (deadline - now).
|
deadline delta (deadline - now).
|
||||||
|
|
||||||
|
|||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user