You've already forked linux-apfs
mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
[SCSI] Merge branch 'linus'
Conflicts: drivers/message/fusion/mptsas.c fixed up conflict between req->data_len accessors and mptsas driver updates. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This commit is contained in:
@@ -60,3 +60,62 @@ Description:
|
||||
Indicates whether the block layer should automatically
|
||||
generate checksums for write requests bound for
|
||||
devices that support receiving integrity metadata.
|
||||
|
||||
What: /sys/block/<disk>/alignment_offset
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Storage devices may report a physical block size that is
|
||||
bigger than the logical block size (for instance a drive
|
||||
with 4KB physical sectors exposing 512-byte logical
|
||||
blocks to the operating system). This parameter
|
||||
indicates how many bytes the beginning of the device is
|
||||
offset from the disk's natural alignment.
|
||||
|
||||
What: /sys/block/<disk>/<partition>/alignment_offset
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Storage devices may report a physical block size that is
|
||||
bigger than the logical block size (for instance a drive
|
||||
with 4KB physical sectors exposing 512-byte logical
|
||||
blocks to the operating system). This parameter
|
||||
indicates how many bytes the beginning of the partition
|
||||
is offset from the disk's natural alignment.
|
||||
|
||||
What: /sys/block/<disk>/queue/logical_block_size
|
||||
Date: May 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
This is the smallest unit the storage device can
|
||||
address. It is typically 512 bytes.
|
||||
|
||||
What: /sys/block/<disk>/queue/physical_block_size
|
||||
Date: May 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
This is the smallest unit the storage device can write
|
||||
without resorting to read-modify-write operation. It is
|
||||
usually the same as the logical block size but may be
|
||||
bigger. One example is SATA drives with 4KB sectors
|
||||
that expose a 512-byte logical block size to the
|
||||
operating system.
|
||||
|
||||
What: /sys/block/<disk>/queue/minimum_io_size
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Storage devices may report a preferred minimum I/O size,
|
||||
which is the smallest request the device can perform
|
||||
without incurring a read-modify-write penalty. For disk
|
||||
drives this is often the physical block size. For RAID
|
||||
arrays it is often the stripe chunk size.
|
||||
|
||||
What: /sys/block/<disk>/queue/optimal_io_size
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Storage devices may report an optimal I/O size, which is
|
||||
the device's preferred unit of receiving I/O. This is
|
||||
rarely reported for disk drives. For RAID devices it is
|
||||
usually the stripe width or the internal block size.
|
||||
|
||||
@@ -0,0 +1,33 @@
|
||||
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/model
|
||||
Date: March 2009
|
||||
Kernel Version: 2.6.30
|
||||
Contact: iss_storagedev@hp.com
|
||||
Description: Displays the SCSI INQUIRY page 0 model for logical drive
|
||||
Y of controller X.
|
||||
|
||||
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/rev
|
||||
Date: March 2009
|
||||
Kernel Version: 2.6.30
|
||||
Contact: iss_storagedev@hp.com
|
||||
Description: Displays the SCSI INQUIRY page 0 revision for logical
|
||||
drive Y of controller X.
|
||||
|
||||
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/unique_id
|
||||
Date: March 2009
|
||||
Kernel Version: 2.6.30
|
||||
Contact: iss_storagedev@hp.com
|
||||
Description: Displays the SCSI INQUIRY page 83 serial number for logical
|
||||
drive Y of controller X.
|
||||
|
||||
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/vendor
|
||||
Date: March 2009
|
||||
Kernel Version: 2.6.30
|
||||
Contact: iss_storagedev@hp.com
|
||||
Description: Displays the SCSI INQUIRY page 0 vendor for logical drive
|
||||
Y of controller X.
|
||||
|
||||
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/block:cciss!cXdY
|
||||
Date: March 2009
|
||||
Kernel Version: 2.6.30
|
||||
Contact: iss_storagedev@hp.com
|
||||
Description: A symbolic link to /sys/block/cciss!cXdY
|
||||
@@ -0,0 +1,18 @@
|
||||
What: /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X
|
||||
Date: August 2008
|
||||
KernelVersion: 2.6.27
|
||||
Contact: mark.langsdorf@amd.com
|
||||
Description: These files exist in every cpu's cache index directories.
|
||||
There are currently 2 cache_disable_# files in each
|
||||
directory. Reading from these files on a supported
|
||||
processor will return that cache disable index value
|
||||
for that processor and node. Writing to one of these
|
||||
files will cause the specificed cache index to be disabled.
|
||||
|
||||
Currently, only AMD Family 10h Processors support cache index
|
||||
disable, and only for their L3 caches. See the BIOS and
|
||||
Kernel Developer's Guide at
|
||||
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116-Public-GH-BKDG_3.20_2-4-09.pdf
|
||||
for formatting information and other details on the
|
||||
cache index disable.
|
||||
Users: joachim.deguara@amd.com
|
||||
@@ -704,12 +704,24 @@ this directory the following files can currently be found:
|
||||
The current number of free dma_debug_entries
|
||||
in the allocator.
|
||||
|
||||
dma-api/driver-filter
|
||||
You can write a name of a driver into this file
|
||||
to limit the debug output to requests from that
|
||||
particular driver. Write an empty string to
|
||||
that file to disable the filter and see
|
||||
all errors again.
|
||||
|
||||
If you have this code compiled into your kernel it will be enabled by default.
|
||||
If you want to boot without the bookkeeping anyway you can provide
|
||||
'dma_debug=off' as a boot parameter. This will disable DMA-API debugging.
|
||||
Notice that you can not enable it again at runtime. You have to reboot to do
|
||||
so.
|
||||
|
||||
If you want to see debug messages only for a special device driver you can
|
||||
specify the dma_debug_driver=<drivername> parameter. This will enable the
|
||||
driver filter at boot time. The debug code will only print errors for that
|
||||
driver afterwards. This filter can be disabled or changed later using debugfs.
|
||||
|
||||
When the code disables itself at runtime this is most likely because it ran
|
||||
out of dma_debug_entries. These entries are preallocated at boot. The number
|
||||
of preallocated entries is defined per architecture. If it is too low for you
|
||||
|
||||
@@ -13,7 +13,8 @@ DOCBOOKS := z8530book.xml mcabook.xml device-drivers.xml \
|
||||
gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
|
||||
genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \
|
||||
mac80211.xml debugobjects.xml sh.xml regulator.xml \
|
||||
alsa-driver-api.xml writing-an-alsa-driver.xml
|
||||
alsa-driver-api.xml writing-an-alsa-driver.xml \
|
||||
tracepoint.xml
|
||||
|
||||
###
|
||||
# The build process is as follows (targets):
|
||||
|
||||
@@ -0,0 +1,89 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
||||
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
|
||||
|
||||
<book id="Tracepoints">
|
||||
<bookinfo>
|
||||
<title>The Linux Kernel Tracepoint API</title>
|
||||
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Jason</firstname>
|
||||
<surname>Baron</surname>
|
||||
<affiliation>
|
||||
<address>
|
||||
<email>jbaron@redhat.com</email>
|
||||
</address>
|
||||
</affiliation>
|
||||
</author>
|
||||
</authorgroup>
|
||||
|
||||
<legalnotice>
|
||||
<para>
|
||||
This documentation is free software; you can redistribute
|
||||
it and/or modify it under the terms of the GNU General Public
|
||||
License as published by the Free Software Foundation; either
|
||||
version 2 of the License, or (at your option) any later
|
||||
version.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This program is distributed in the hope that it will be
|
||||
useful, but WITHOUT ANY WARRANTY; without even the implied
|
||||
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
See the GNU General Public License for more details.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
You should have received a copy of the GNU General Public
|
||||
License along with this program; if not, write to the Free
|
||||
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
|
||||
MA 02111-1307 USA
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For more details see the file COPYING in the source
|
||||
distribution of Linux.
|
||||
</para>
|
||||
</legalnotice>
|
||||
</bookinfo>
|
||||
|
||||
<toc></toc>
|
||||
<chapter id="intro">
|
||||
<title>Introduction</title>
|
||||
<para>
|
||||
Tracepoints are static probe points that are located in strategic points
|
||||
throughout the kernel. 'Probes' register/unregister with tracepoints
|
||||
via a callback mechanism. The 'probes' are strictly typed functions that
|
||||
are passed a unique set of parameters defined by each tracepoint.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
From this simple callback mechanism, 'probes' can be used to profile, debug,
|
||||
and understand kernel behavior. There are a number of tools that provide a
|
||||
framework for using 'probes'. These tools include Systemtap, ftrace, and
|
||||
LTTng.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Tracepoints are defined in a number of header files via various macros. Thus,
|
||||
the purpose of this document is to provide a clear accounting of the available
|
||||
tracepoints. The intention is to understand not only what tracepoints are
|
||||
available but also to understand where future tracepoints might be added.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The API presented has functions of the form:
|
||||
<function>trace_tracepointname(function parameters)</function>. These are the
|
||||
tracepoints callbacks that are found throughout the code. Registering and
|
||||
unregistering probes with these callback sites is covered in the
|
||||
<filename>Documentation/trace/*</filename> directory.
|
||||
</para>
|
||||
</chapter>
|
||||
|
||||
<chapter id="irq">
|
||||
<title>IRQ</title>
|
||||
!Iinclude/trace/events/irq.h
|
||||
</chapter>
|
||||
|
||||
</book>
|
||||
+80
-22
@@ -192,23 +192,24 @@ rcu/rcuhier (which displays the struct rcu_node hierarchy).
|
||||
The output of "cat rcu/rcudata" looks as follows:
|
||||
|
||||
rcu:
|
||||
0 c=4011 g=4012 pq=1 pqc=4011 qp=0 rpfq=1 rp=3c2a dt=23301/73 dn=2 df=1882 of=0 ri=2126 ql=2 b=10
|
||||
1 c=4011 g=4012 pq=1 pqc=4011 qp=0 rpfq=3 rp=39a6 dt=78073/1 dn=2 df=1402 of=0 ri=1875 ql=46 b=10
|
||||
2 c=4010 g=4010 pq=1 pqc=4010 qp=0 rpfq=-5 rp=1d12 dt=16646/0 dn=2 df=3140 of=0 ri=2080 ql=0 b=10
|
||||
3 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=2b50 dt=21159/1 dn=2 df=2230 of=0 ri=1923 ql=72 b=10
|
||||
4 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=1644 dt=5783/1 dn=2 df=3348 of=0 ri=2805 ql=7 b=10
|
||||
5 c=4012 g=4013 pq=0 pqc=4011 qp=1 rpfq=3 rp=1aac dt=5879/1 dn=2 df=3140 of=0 ri=2066 ql=10 b=10
|
||||
6 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=ed8 dt=5847/1 dn=2 df=3797 of=0 ri=1266 ql=10 b=10
|
||||
7 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=1fa2 dt=6199/1 dn=2 df=2795 of=0 ri=2162 ql=28 b=10
|
||||
rcu:
|
||||
0 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=10951/1 dn=0 df=1101 of=0 ri=36 ql=0 b=10
|
||||
1 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=16117/1 dn=0 df=1015 of=0 ri=0 ql=0 b=10
|
||||
2 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1445/1 dn=0 df=1839 of=0 ri=0 ql=0 b=10
|
||||
3 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=6681/1 dn=0 df=1545 of=0 ri=0 ql=0 b=10
|
||||
4 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1003/1 dn=0 df=1992 of=0 ri=0 ql=0 b=10
|
||||
5 c=17829 g=17830 pq=1 pqc=17829 qp=1 dt=3887/1 dn=0 df=3331 of=0 ri=4 ql=2 b=10
|
||||
6 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=859/1 dn=0 df=3224 of=0 ri=0 ql=0 b=10
|
||||
7 c=17829 g=17830 pq=0 pqc=17829 qp=1 dt=3761/1 dn=0 df=1818 of=0 ri=0 ql=2 b=10
|
||||
rcu_bh:
|
||||
0 c=-268 g=-268 pq=1 pqc=-268 qp=0 rpfq=-145 rp=21d6 dt=23301/73 dn=2 df=0 of=0 ri=0 ql=0 b=10
|
||||
1 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-170 rp=20ce dt=78073/1 dn=2 df=26 of=0 ri=5 ql=0 b=10
|
||||
2 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-83 rp=fbd dt=16646/0 dn=2 df=28 of=0 ri=4 ql=0 b=10
|
||||
3 c=-268 g=-268 pq=1 pqc=-268 qp=0 rpfq=-105 rp=178c dt=21159/1 dn=2 df=28 of=0 ri=2 ql=0 b=10
|
||||
4 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-30 rp=b54 dt=5783/1 dn=2 df=32 of=0 ri=0 ql=0 b=10
|
||||
5 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-29 rp=df5 dt=5879/1 dn=2 df=30 of=0 ri=3 ql=0 b=10
|
||||
6 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-28 rp=788 dt=5847/1 dn=2 df=32 of=0 ri=0 ql=0 b=10
|
||||
7 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-53 rp=1098 dt=6199/1 dn=2 df=30 of=0 ri=3 ql=0 b=10
|
||||
0 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=10951/1 dn=0 df=0 of=0 ri=0 ql=0 b=10
|
||||
1 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=16117/1 dn=0 df=13 of=0 ri=0 ql=0 b=10
|
||||
2 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1445/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||
3 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=6681/1 dn=0 df=9 of=0 ri=0 ql=0 b=10
|
||||
4 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1003/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||
5 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3887/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||
6 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=859/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||
7 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3761/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||
|
||||
The first section lists the rcu_data structures for rcu, the second for
|
||||
rcu_bh. Each section has one line per CPU, or eight for this 8-CPU system.
|
||||
@@ -253,12 +254,6 @@ o "pqc" indicates which grace period the last-observed quiescent
|
||||
o "qp" indicates that RCU still expects a quiescent state from
|
||||
this CPU.
|
||||
|
||||
o "rpfq" is the number of rcu_pending() calls on this CPU required
|
||||
to induce this CPU to invoke force_quiescent_state().
|
||||
|
||||
o "rp" is low-order four hex digits of the count of how many times
|
||||
rcu_pending() has been invoked on this CPU.
|
||||
|
||||
o "dt" is the current value of the dyntick counter that is incremented
|
||||
when entering or leaving dynticks idle state, either by the
|
||||
scheduler or by irq. The number after the "/" is the interrupt
|
||||
@@ -305,6 +300,9 @@ o "b" is the batch limit for this CPU. If more than this number
|
||||
of RCU callbacks is ready to invoke, then the remainder will
|
||||
be deferred.
|
||||
|
||||
There is also an rcu/rcudata.csv file with the same information in
|
||||
comma-separated-variable spreadsheet format.
|
||||
|
||||
|
||||
The output of "cat rcu/rcugp" looks as follows:
|
||||
|
||||
@@ -411,3 +409,63 @@ o Each element of the form "1/1 0:127 ^0" represents one struct
|
||||
For example, the first entry at the lowest level shows
|
||||
"^0", indicating that it corresponds to bit zero in
|
||||
the first entry at the middle level.
|
||||
|
||||
|
||||
The output of "cat rcu/rcu_pending" looks as follows:
|
||||
|
||||
rcu:
|
||||
0 np=255892 qsp=53936 cbr=0 cng=14417 gpc=10033 gps=24320 nf=6445 nn=146741
|
||||
1 np=261224 qsp=54638 cbr=0 cng=25723 gpc=16310 gps=2849 nf=5912 nn=155792
|
||||
2 np=237496 qsp=49664 cbr=0 cng=2762 gpc=45478 gps=1762 nf=1201 nn=136629
|
||||
3 np=236249 qsp=48766 cbr=0 cng=286 gpc=48049 gps=1218 nf=207 nn=137723
|
||||
4 np=221310 qsp=46850 cbr=0 cng=26 gpc=43161 gps=4634 nf=3529 nn=123110
|
||||
5 np=237332 qsp=48449 cbr=0 cng=54 gpc=47920 gps=3252 nf=201 nn=137456
|
||||
6 np=219995 qsp=46718 cbr=0 cng=50 gpc=42098 gps=6093 nf=4202 nn=120834
|
||||
7 np=249893 qsp=49390 cbr=0 cng=72 gpc=38400 gps=17102 nf=41 nn=144888
|
||||
rcu_bh:
|
||||
0 np=146741 qsp=1419 cbr=0 cng=6 gpc=0 gps=0 nf=2 nn=145314
|
||||
1 np=155792 qsp=12597 cbr=0 cng=0 gpc=4 gps=8 nf=3 nn=143180
|
||||
2 np=136629 qsp=18680 cbr=0 cng=0 gpc=7 gps=6 nf=0 nn=117936
|
||||
3 np=137723 qsp=2843 cbr=0 cng=0 gpc=10 gps=7 nf=0 nn=134863
|
||||
4 np=123110 qsp=12433 cbr=0 cng=0 gpc=4 gps=2 nf=0 nn=110671
|
||||
5 np=137456 qsp=4210 cbr=0 cng=0 gpc=6 gps=5 nf=0 nn=133235
|
||||
6 np=120834 qsp=9902 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921
|
||||
7 np=144888 qsp=26336 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542
|
||||
|
||||
As always, this is once again split into "rcu" and "rcu_bh" portions.
|
||||
The fields are as follows:
|
||||
|
||||
o "np" is the number of times that __rcu_pending() has been invoked
|
||||
for the corresponding flavor of RCU.
|
||||
|
||||
o "qsp" is the number of times that the RCU was waiting for a
|
||||
quiescent state from this CPU.
|
||||
|
||||
o "cbr" is the number of times that this CPU had RCU callbacks
|
||||
that had passed through a grace period, and were thus ready
|
||||
to be invoked.
|
||||
|
||||
o "cng" is the number of times that this CPU needed another
|
||||
grace period while RCU was idle.
|
||||
|
||||
o "gpc" is the number of times that an old grace period had
|
||||
completed, but this CPU was not yet aware of it.
|
||||
|
||||
o "gps" is the number of times that a new grace period had started,
|
||||
but this CPU was not yet aware of it.
|
||||
|
||||
o "nf" is the number of times that this CPU suspected that the
|
||||
current grace period had run for too long, and thus needed to
|
||||
be forced.
|
||||
|
||||
Please note that "forcing" consists of sending resched IPIs
|
||||
to holdout CPUs. If that CPU really still is in an old RCU
|
||||
read-side critical section, then we really do have to wait for it.
|
||||
The assumption behing "forcing" is that the CPU is not still in
|
||||
an old RCU read-side critical section, but has not yet responded
|
||||
for some other reason.
|
||||
|
||||
o "nn" is the number of times that this CPU needed nothing. Alert
|
||||
readers will note that the rcu "nn" number for a given CPU very
|
||||
closely matches the rcu_bh "np" number for that same CPU. This
|
||||
is due to short-circuit evaluation in rcu_pending().
|
||||
|
||||
+18
-2
@@ -184,8 +184,9 @@ length. Single character labels using special characters, that being anything
|
||||
other than a letter or digit, are reserved for use by the Smack development
|
||||
team. Smack labels are unstructured, case sensitive, and the only operation
|
||||
ever performed on them is comparison for equality. Smack labels cannot
|
||||
contain unprintable characters or the "/" (slash) character. Smack labels
|
||||
cannot begin with a '-', which is reserved for special options.
|
||||
contain unprintable characters, the "/" (slash), the "\" (backslash), the "'"
|
||||
(quote) and '"' (double-quote) characters.
|
||||
Smack labels cannot begin with a '-', which is reserved for special options.
|
||||
|
||||
There are some predefined labels:
|
||||
|
||||
@@ -523,3 +524,18 @@ Smack supports some mount options:
|
||||
|
||||
These mount options apply to all file system types.
|
||||
|
||||
Smack auditing
|
||||
|
||||
If you want Smack auditing of security events, you need to set CONFIG_AUDIT
|
||||
in your kernel configuration.
|
||||
By default, all denied events will be audited. You can change this behavior by
|
||||
writing a single character to the /smack/logging file :
|
||||
0 : no logging
|
||||
1 : log denied (default)
|
||||
2 : log accepted
|
||||
3 : log denied & accepted
|
||||
|
||||
Events are logged as 'key=value' pairs, for each event you at least will get
|
||||
the subjet, the object, the rights requested, the action, the kernel function
|
||||
that triggered the event, plus other pairs depending on the type of event
|
||||
audited.
|
||||
|
||||
@@ -186,7 +186,7 @@ a virtual address mapping (unlike the earlier scheme of virtual address
|
||||
do not have a corresponding kernel virtual address space mapping) and
|
||||
low-memory pages.
|
||||
|
||||
Note: Please refer to Documentation/PCI/PCI-DMA-mapping.txt for a discussion
|
||||
Note: Please refer to Documentation/DMA-mapping.txt for a discussion
|
||||
on PCI high mem DMA aspects and mapping of scatter gather lists, and support
|
||||
for 64 bit PCI.
|
||||
|
||||
|
||||
@@ -60,7 +60,7 @@ go_lock | Called for the first local holder of a lock
|
||||
go_unlock | Called on the final local unlock of a lock
|
||||
go_dump | Called to print content of object for debugfs file, or on
|
||||
| error to dump glock to the log.
|
||||
go_type; | The type of the glock, LM_TYPE_.....
|
||||
go_type | The type of the glock, LM_TYPE_.....
|
||||
go_min_hold_time | The minimum hold time
|
||||
|
||||
The minimum hold time for each lock is the time after a remote lock
|
||||
|
||||
@@ -11,18 +11,15 @@ their I/O so file system consistency is maintained. One of the nifty
|
||||
features of GFS is perfect consistency -- changes made to the file system
|
||||
on one machine show up immediately on all other machines in the cluster.
|
||||
|
||||
GFS uses interchangable inter-node locking mechanisms. Different lock
|
||||
modules can plug into GFS and each file system selects the appropriate
|
||||
lock module at mount time. Lock modules include:
|
||||
GFS uses interchangable inter-node locking mechanisms, the currently
|
||||
supported mechanisms are:
|
||||
|
||||
lock_nolock -- allows gfs to be used as a local file system
|
||||
|
||||
lock_dlm -- uses a distributed lock manager (dlm) for inter-node locking
|
||||
The dlm is found at linux/fs/dlm/
|
||||
|
||||
In addition to interfacing with an external locking manager, a gfs lock
|
||||
module is responsible for interacting with external cluster management
|
||||
systems. Lock_dlm depends on user space cluster management systems found
|
||||
Lock_dlm depends on user space cluster management systems found
|
||||
at the URL above.
|
||||
|
||||
To use gfs as a local file system, no external clustering systems are
|
||||
@@ -31,13 +28,19 @@ needed, simply:
|
||||
$ mkfs -t gfs2 -p lock_nolock -j 1 /dev/block_device
|
||||
$ mount -t gfs2 /dev/block_device /dir
|
||||
|
||||
GFS2 is not on-disk compatible with previous versions of GFS.
|
||||
If you are using Fedora, you need to install the gfs2-utils package
|
||||
and, for lock_dlm, you will also need to install the cman package
|
||||
and write a cluster.conf as per the documentation.
|
||||
|
||||
GFS2 is not on-disk compatible with previous versions of GFS, but it
|
||||
is pretty close.
|
||||
|
||||
The following man pages can be found at the URL above:
|
||||
gfs2_fsck to repair a filesystem
|
||||
fsck.gfs2 to repair a filesystem
|
||||
gfs2_grow to expand a filesystem online
|
||||
gfs2_jadd to add journals to a filesystem online
|
||||
gfs2_tool to manipulate, examine and tune a filesystem
|
||||
gfs2_quota to examine and change quota values in a filesystem
|
||||
gfs2_convert to convert a gfs filesystem to gfs2 in-place
|
||||
mount.gfs2 to help mount(8) mount a filesystem
|
||||
mkfs.gfs2 to make a filesystem
|
||||
|
||||
@@ -133,4 +133,4 @@ RAM/SWAP in 10240 inodes and it is only accessible by root.
|
||||
Author:
|
||||
Christoph Rohland <cr@sap.com>, 1.12.01
|
||||
Updated:
|
||||
Hugh Dickins <hugh@veritas.com>, 4 June 2007
|
||||
Hugh Dickins, 4 June 2007
|
||||
|
||||
@@ -0,0 +1,131 @@
|
||||
Futex Requeue PI
|
||||
----------------
|
||||
|
||||
Requeueing of tasks from a non-PI futex to a PI futex requires
|
||||
special handling in order to ensure the underlying rt_mutex is never
|
||||
left without an owner if it has waiters; doing so would break the PI
|
||||
boosting logic [see rt-mutex-desgin.txt] For the purposes of
|
||||
brevity, this action will be referred to as "requeue_pi" throughout
|
||||
this document. Priority inheritance is abbreviated throughout as
|
||||
"PI".
|
||||
|
||||
Motivation
|
||||
----------
|
||||
|
||||
Without requeue_pi, the glibc implementation of
|
||||
pthread_cond_broadcast() must resort to waking all the tasks waiting
|
||||
on a pthread_condvar and letting them try to sort out which task
|
||||
gets to run first in classic thundering-herd formation. An ideal
|
||||
implementation would wake the highest-priority waiter, and leave the
|
||||
rest to the natural wakeup inherent in unlocking the mutex
|
||||
associated with the condvar.
|
||||
|
||||
Consider the simplified glibc calls:
|
||||
|
||||
/* caller must lock mutex */
|
||||
pthread_cond_wait(cond, mutex)
|
||||
{
|
||||
lock(cond->__data.__lock);
|
||||
unlock(mutex);
|
||||
do {
|
||||
unlock(cond->__data.__lock);
|
||||
futex_wait(cond->__data.__futex);
|
||||
lock(cond->__data.__lock);
|
||||
} while(...)
|
||||
unlock(cond->__data.__lock);
|
||||
lock(mutex);
|
||||
}
|
||||
|
||||
pthread_cond_broadcast(cond)
|
||||
{
|
||||
lock(cond->__data.__lock);
|
||||
unlock(cond->__data.__lock);
|
||||
futex_requeue(cond->data.__futex, cond->mutex);
|
||||
}
|
||||
|
||||
Once pthread_cond_broadcast() requeues the tasks, the cond->mutex
|
||||
has waiters. Note that pthread_cond_wait() attempts to lock the
|
||||
mutex only after it has returned to user space. This will leave the
|
||||
underlying rt_mutex with waiters, and no owner, breaking the
|
||||
previously mentioned PI-boosting algorithms.
|
||||
|
||||
In order to support PI-aware pthread_condvar's, the kernel needs to
|
||||
be able to requeue tasks to PI futexes. This support implies that
|
||||
upon a successful futex_wait system call, the caller would return to
|
||||
user space already holding the PI futex. The glibc implementation
|
||||
would be modified as follows:
|
||||
|
||||
|
||||
/* caller must lock mutex */
|
||||
pthread_cond_wait_pi(cond, mutex)
|
||||
{
|
||||
lock(cond->__data.__lock);
|
||||
unlock(mutex);
|
||||
do {
|
||||
unlock(cond->__data.__lock);
|
||||
futex_wait_requeue_pi(cond->__data.__futex);
|
||||
lock(cond->__data.__lock);
|
||||
} while(...)
|
||||
unlock(cond->__data.__lock);
|
||||
/* the kernel acquired the the mutex for us */
|
||||
}
|
||||
|
||||
pthread_cond_broadcast_pi(cond)
|
||||
{
|
||||
lock(cond->__data.__lock);
|
||||
unlock(cond->__data.__lock);
|
||||
futex_requeue_pi(cond->data.__futex, cond->mutex);
|
||||
}
|
||||
|
||||
The actual glibc implementation will likely test for PI and make the
|
||||
necessary changes inside the existing calls rather than creating new
|
||||
calls for the PI cases. Similar changes are needed for
|
||||
pthread_cond_timedwait() and pthread_cond_signal().
|
||||
|
||||
Implementation
|
||||
--------------
|
||||
|
||||
In order to ensure the rt_mutex has an owner if it has waiters, it
|
||||
is necessary for both the requeue code, as well as the waiting code,
|
||||
to be able to acquire the rt_mutex before returning to user space.
|
||||
The requeue code cannot simply wake the waiter and leave it to
|
||||
acquire the rt_mutex as it would open a race window between the
|
||||
requeue call returning to user space and the waiter waking and
|
||||
starting to run. This is especially true in the uncontended case.
|
||||
|
||||
The solution involves two new rt_mutex helper routines,
|
||||
rt_mutex_start_proxy_lock() and rt_mutex_finish_proxy_lock(), which
|
||||
allow the requeue code to acquire an uncontended rt_mutex on behalf
|
||||
of the waiter and to enqueue the waiter on a contended rt_mutex.
|
||||
Two new system calls provide the kernel<->user interface to
|
||||
requeue_pi: FUTEX_WAIT_REQUEUE_PI and FUTEX_REQUEUE_CMP_PI.
|
||||
|
||||
FUTEX_WAIT_REQUEUE_PI is called by the waiter (pthread_cond_wait()
|
||||
and pthread_cond_timedwait()) to block on the initial futex and wait
|
||||
to be requeued to a PI-aware futex. The implementation is the
|
||||
result of a high-speed collision between futex_wait() and
|
||||
futex_lock_pi(), with some extra logic to check for the additional
|
||||
wake-up scenarios.
|
||||
|
||||
FUTEX_REQUEUE_CMP_PI is called by the waker
|
||||
(pthread_cond_broadcast() and pthread_cond_signal()) to requeue and
|
||||
possibly wake the waiting tasks. Internally, this system call is
|
||||
still handled by futex_requeue (by passing requeue_pi=1). Before
|
||||
requeueing, futex_requeue() attempts to acquire the requeue target
|
||||
PI futex on behalf of the top waiter. If it can, this waiter is
|
||||
woken. futex_requeue() then proceeds to requeue the remaining
|
||||
nr_wake+nr_requeue tasks to the PI futex, calling
|
||||
rt_mutex_start_proxy_lock() prior to each requeue to prepare the
|
||||
task as a waiter on the underlying rt_mutex. It is possible that
|
||||
the lock can be acquired at this stage as well, if so, the next
|
||||
waiter is woken to finish the acquisition of the lock.
|
||||
|
||||
FUTEX_REQUEUE_PI accepts nr_wake and nr_requeue as arguments, but
|
||||
their sum is all that really matters. futex_requeue() will wake or
|
||||
requeue up to nr_wake + nr_requeue tasks. It will wake only as many
|
||||
tasks as it can acquire the lock for, which in the majority of cases
|
||||
should be 0 as good programming practice dictates that the caller of
|
||||
either pthread_cond_broadcast() or pthread_cond_signal() acquire the
|
||||
mutex prior to making the call. FUTEX_REQUEUE_PI requires that
|
||||
nr_wake=1. nr_requeue should be INT_MAX for broadcast and 0 for
|
||||
signal.
|
||||
@@ -150,6 +150,11 @@ fan[1-*]_min Fan minimum value
|
||||
Unit: revolution/min (RPM)
|
||||
RW
|
||||
|
||||
fan[1-*]_max Fan maximum value
|
||||
Unit: revolution/min (RPM)
|
||||
Only rarely supported by the hardware.
|
||||
RW
|
||||
|
||||
fan[1-*]_input Fan input value.
|
||||
Unit: revolution/min (RPM)
|
||||
RO
|
||||
@@ -390,6 +395,7 @@ OR
|
||||
in[0-*]_min_alarm
|
||||
in[0-*]_max_alarm
|
||||
fan[1-*]_min_alarm
|
||||
fan[1-*]_max_alarm
|
||||
temp[1-*]_min_alarm
|
||||
temp[1-*]_max_alarm
|
||||
temp[1-*]_crit_alarm
|
||||
|
||||
@@ -18,8 +18,12 @@ Usage
|
||||
Anonymous finger details are sent sequentially as separate packets of ABS
|
||||
events. Only the ABS_MT events are recognized as part of a finger
|
||||
packet. The end of a packet is marked by calling the input_mt_sync()
|
||||
function, which generates a SYN_MT_REPORT event. The end of multi-touch
|
||||
transfer is marked by calling the usual input_sync() function.
|
||||
function, which generates a SYN_MT_REPORT event. This instructs the
|
||||
receiver to accept the data for the current finger and prepare to receive
|
||||
another. The end of a multi-touch transfer is marked by calling the usual
|
||||
input_sync() function. This instructs the receiver to act upon events
|
||||
accumulated since last EV_SYN/SYN_REPORT and prepare to receive a new
|
||||
set of events/packets.
|
||||
|
||||
A set of ABS_MT events with the desired properties is defined. The events
|
||||
are divided into categories, to allow for partial implementation. The
|
||||
@@ -27,11 +31,26 @@ minimum set consists of ABS_MT_TOUCH_MAJOR, ABS_MT_POSITION_X and
|
||||
ABS_MT_POSITION_Y, which allows for multiple fingers to be tracked. If the
|
||||
device supports it, the ABS_MT_WIDTH_MAJOR may be used to provide the size
|
||||
of the approaching finger. Anisotropy and direction may be specified with
|
||||
ABS_MT_TOUCH_MINOR, ABS_MT_WIDTH_MINOR and ABS_MT_ORIENTATION. Devices with
|
||||
more granular information may specify general shapes as blobs, i.e., as a
|
||||
sequence of rectangular shapes grouped together by an
|
||||
ABS_MT_BLOB_ID. Finally, the ABS_MT_TOOL_TYPE may be used to specify
|
||||
whether the touching tool is a finger or a pen or something else.
|
||||
ABS_MT_TOUCH_MINOR, ABS_MT_WIDTH_MINOR and ABS_MT_ORIENTATION. The
|
||||
ABS_MT_TOOL_TYPE may be used to specify whether the touching tool is a
|
||||
finger or a pen or something else. Devices with more granular information
|
||||
may specify general shapes as blobs, i.e., as a sequence of rectangular
|
||||
shapes grouped together by an ABS_MT_BLOB_ID. Finally, for the few devices
|
||||
that currently support it, the ABS_MT_TRACKING_ID event may be used to
|
||||
report finger tracking from hardware [5].
|
||||
|
||||
Here is what a minimal event sequence for a two-finger touch would look
|
||||
like:
|
||||
|
||||
ABS_MT_TOUCH_MAJOR
|
||||
ABS_MT_POSITION_X
|
||||
ABS_MT_POSITION_Y
|
||||
SYN_MT_REPORT
|
||||
ABS_MT_TOUCH_MAJOR
|
||||
ABS_MT_POSITION_X
|
||||
ABS_MT_POSITION_Y
|
||||
SYN_MT_REPORT
|
||||
SYN_REPORT
|
||||
|
||||
|
||||
Event Semantics
|
||||
@@ -44,24 +63,24 @@ ABS_MT_TOUCH_MAJOR
|
||||
|
||||
The length of the major axis of the contact. The length should be given in
|
||||
surface units. If the surface has an X times Y resolution, the largest
|
||||
possible value of ABS_MT_TOUCH_MAJOR is sqrt(X^2 + Y^2), the diagonal.
|
||||
possible value of ABS_MT_TOUCH_MAJOR is sqrt(X^2 + Y^2), the diagonal [4].
|
||||
|
||||
ABS_MT_TOUCH_MINOR
|
||||
|
||||
The length, in surface units, of the minor axis of the contact. If the
|
||||
contact is circular, this event can be omitted.
|
||||
contact is circular, this event can be omitted [4].
|
||||
|
||||
ABS_MT_WIDTH_MAJOR
|
||||
|
||||
The length, in surface units, of the major axis of the approaching
|
||||
tool. This should be understood as the size of the tool itself. The
|
||||
orientation of the contact and the approaching tool are assumed to be the
|
||||
same.
|
||||
same [4].
|
||||
|
||||
ABS_MT_WIDTH_MINOR
|
||||
|
||||
The length, in surface units, of the minor axis of the approaching
|
||||
tool. Omit if circular.
|
||||
tool. Omit if circular [4].
|
||||
|
||||
The above four values can be used to derive additional information about
|
||||
the contact. The ratio ABS_MT_TOUCH_MAJOR / ABS_MT_WIDTH_MAJOR approximates
|
||||
@@ -70,14 +89,17 @@ different characteristic widths [1].
|
||||
|
||||
ABS_MT_ORIENTATION
|
||||
|
||||
The orientation of the ellipse. The value should describe half a revolution
|
||||
clockwise around the touch center. The scale of the value is arbitrary, but
|
||||
zero should be returned for an ellipse aligned along the Y axis of the
|
||||
surface. As an example, an index finger placed straight onto the axis could
|
||||
return zero orientation, something negative when twisted to the left, and
|
||||
something positive when twisted to the right. This value can be omitted if
|
||||
the touching object is circular, or if the information is not available in
|
||||
the kernel driver.
|
||||
The orientation of the ellipse. The value should describe a signed quarter
|
||||
of a revolution clockwise around the touch center. The signed value range
|
||||
is arbitrary, but zero should be returned for a finger aligned along the Y
|
||||
axis of the surface, a negative value when finger is turned to the left, and
|
||||
a positive value when finger turned to the right. When completely aligned with
|
||||
the X axis, the range max should be returned. Orientation can be omitted
|
||||
if the touching object is circular, or if the information is not available
|
||||
in the kernel driver. Partial orientation support is possible if the device
|
||||
can distinguish between the two axis, but not (uniquely) any values in
|
||||
between. In such cases, the range of ABS_MT_ORIENTATION should be [0, 1]
|
||||
[4].
|
||||
|
||||
ABS_MT_POSITION_X
|
||||
|
||||
@@ -98,8 +120,35 @@ ABS_MT_BLOB_ID
|
||||
|
||||
The BLOB_ID groups several packets together into one arbitrarily shaped
|
||||
contact. This is a low-level anonymous grouping, and should not be confused
|
||||
with the high-level contactID, explained below. Most kernel drivers will
|
||||
not have this capability, and can safely omit the event.
|
||||
with the high-level trackingID [5]. Most kernel drivers will not have blob
|
||||
capability, and can safely omit the event.
|
||||
|
||||
ABS_MT_TRACKING_ID
|
||||
|
||||
The TRACKING_ID identifies an initiated contact throughout its life cycle
|
||||
[5]. There are currently only a few devices that support it, so this event
|
||||
should normally be omitted.
|
||||
|
||||
|
||||
Event Computation
|
||||
-----------------
|
||||
|
||||
The flora of different hardware unavoidably leads to some devices fitting
|
||||
better to the MT protocol than others. To simplify and unify the mapping,
|
||||
this section gives recipes for how to compute certain events.
|
||||
|
||||
For devices reporting contacts as rectangular shapes, signed orientation
|
||||
cannot be obtained. Assuming X and Y are the lengths of the sides of the
|
||||
touching rectangle, here is a simple formula that retains the most
|
||||
information possible:
|
||||
|
||||
ABS_MT_TOUCH_MAJOR := max(X, Y)
|
||||
ABS_MT_TOUCH_MINOR := min(X, Y)
|
||||
ABS_MT_ORIENTATION := bool(X > Y)
|
||||
|
||||
The range of ABS_MT_ORIENTATION should be set to [0, 1], to indicate that
|
||||
the device can distinguish between a finger along the Y axis (0) and a
|
||||
finger along the X axis (1).
|
||||
|
||||
|
||||
Finger Tracking
|
||||
@@ -109,14 +158,18 @@ The kernel driver should generate an arbitrary enumeration of the set of
|
||||
anonymous contacts currently on the surface. The order in which the packets
|
||||
appear in the event stream is not important.
|
||||
|
||||
The process of finger tracking, i.e., to assign a unique contactID to each
|
||||
The process of finger tracking, i.e., to assign a unique trackingID to each
|
||||
initiated contact on the surface, is left to user space; preferably the
|
||||
multi-touch X driver [3]. In that driver, the contactID stays the same and
|
||||
multi-touch X driver [3]. In that driver, the trackingID stays the same and
|
||||
unique until the contact vanishes (when the finger leaves the surface). The
|
||||
problem of assigning a set of anonymous fingers to a set of identified
|
||||
fingers is a euclidian bipartite matching problem at each event update, and
|
||||
relies on a sufficiently rapid update rate.
|
||||
|
||||
There are a few devices that support trackingID in hardware. User space can
|
||||
make use of these native identifiers to reduce bandwidth and cpu usage.
|
||||
|
||||
|
||||
Notes
|
||||
-----
|
||||
|
||||
@@ -136,5 +189,7 @@ could be used to derive tilt.
|
||||
time of writing (April 2009), the MT protocol is not yet merged, and the
|
||||
prototype implements finger matching, basic mouse support and two-finger
|
||||
scrolling. The project aims at improving the quality of current multi-touch
|
||||
functionality available in the synaptics X driver, and in addition
|
||||
functionality available in the Synaptics X driver, and in addition
|
||||
implement more advanced gestures.
|
||||
[4] See the section on event computation.
|
||||
[5] See the section on finger tracking.
|
||||
|
||||
@@ -56,7 +56,6 @@ parameter is applicable:
|
||||
ISAPNP ISA PnP code is enabled.
|
||||
ISDN Appropriate ISDN support is enabled.
|
||||
JOY Appropriate joystick support is enabled.
|
||||
KMEMTRACE kmemtrace is enabled.
|
||||
LIBATA Libata driver is enabled
|
||||
LP Printer support is enabled.
|
||||
LOOP Loopback device support is enabled.
|
||||
@@ -329,11 +328,6 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
flushed before they will be reused, which
|
||||
is a lot of faster
|
||||
|
||||
amd_iommu_size= [HW,X86-64]
|
||||
Define the size of the aperture for the AMD IOMMU
|
||||
driver. Possible values are:
|
||||
'32M', '64M' (default), '128M', '256M', '512M', '1G'
|
||||
|
||||
amijoy.map= [HW,JOY] Amiga joystick support
|
||||
Map of devices attached to JOY0DAT and JOY1DAT
|
||||
Format: <a>,<b>
|
||||
@@ -646,6 +640,13 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
DMA-API debugging code disables itself because the
|
||||
architectural default is too low.
|
||||
|
||||
dma_debug_driver=<driver_name>
|
||||
With this option the DMA-API debugging driver
|
||||
filter feature can be enabled at boot time. Just
|
||||
pass the driver to filter for as the parameter.
|
||||
The filter can be disabled or changed to another
|
||||
driver later using sysfs.
|
||||
|
||||
dscc4.setup= [NET]
|
||||
|
||||
dtc3181e= [HW,SCSI]
|
||||
@@ -752,12 +753,25 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
|
||||
|
||||
ftrace=[tracer]
|
||||
[ftrace] will set and start the specified tracer
|
||||
[FTRACE] will set and start the specified tracer
|
||||
as early as possible in order to facilitate early
|
||||
boot debugging.
|
||||
|
||||
ftrace_dump_on_oops
|
||||
[ftrace] will dump the trace buffers on oops.
|
||||
[FTRACE] will dump the trace buffers on oops.
|
||||
|
||||
ftrace_filter=[function-list]
|
||||
[FTRACE] Limit the functions traced by the function
|
||||
tracer at boot up. function-list is a comma separated
|
||||
list of functions. This list can be changed at run
|
||||
time by the set_ftrace_filter file in the debugfs
|
||||
tracing directory.
|
||||
|
||||
ftrace_notrace=[function-list]
|
||||
[FTRACE] Do not trace the functions specified in
|
||||
function-list. This list can be changed at run time
|
||||
by the set_ftrace_notrace file in the debugfs
|
||||
tracing directory.
|
||||
|
||||
gamecon.map[2|3]=
|
||||
[HW,JOY] Multisystem joystick and NES/SNES/PSX pad
|
||||
@@ -914,6 +928,12 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
Formt: { "sha1" | "md5" }
|
||||
default: "sha1"
|
||||
|
||||
ima_tcb [IMA]
|
||||
Load a policy which meets the needs of the Trusted
|
||||
Computing Base. This means IMA will measure all
|
||||
programs exec'd, files mmap'd for exec, and all files
|
||||
opened for read by uid=0.
|
||||
|
||||
in2000= [HW,SCSI]
|
||||
See header of drivers/scsi/in2000.c.
|
||||
|
||||
@@ -1054,15 +1074,6 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
use the HighMem zone if it exists, and the Normal
|
||||
zone if it does not.
|
||||
|
||||
kmemtrace.enable= [KNL,KMEMTRACE] Format: { yes | no }
|
||||
Controls whether kmemtrace is enabled
|
||||
at boot-time.
|
||||
|
||||
kmemtrace.subbufs=n [KNL,KMEMTRACE] Overrides the number of
|
||||
subbufs kmemtrace's relay channel has. Set this
|
||||
higher than default (KMEMTRACE_N_SUBBUFS in code) if
|
||||
you experience buffer overruns.
|
||||
|
||||
kgdboc= [HW] kgdb over consoles.
|
||||
Requires a tty driver that supports console polling.
|
||||
(only serial suported for now)
|
||||
@@ -1072,6 +1083,10 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
Configure the RouterBoard 532 series on-chip
|
||||
Ethernet adapter MAC address.
|
||||
|
||||
kmemleak= [KNL] Boot-time kmemleak enable/disable
|
||||
Valid arguments: on, off
|
||||
Default: on
|
||||
|
||||
kstack=N [X86] Print N words from the kernel stack
|
||||
in oops dumps.
|
||||
|
||||
@@ -1535,6 +1550,10 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
register save and restore. The kernel will only save
|
||||
legacy floating-point registers on task switch.
|
||||
|
||||
noxsave [BUGS=X86] Disables x86 extended register state save
|
||||
and restore using xsave. The kernel will fallback to
|
||||
enabling legacy floating-point and sse state.
|
||||
|
||||
nohlt [BUGS=ARM,SH] Tells the kernel that the sleep(SH) or
|
||||
wfi(ARM) instruction doesn't work correctly and not to
|
||||
use it. This is also useful when using JTAG debugger.
|
||||
@@ -1571,6 +1590,9 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
noinitrd [RAM] Tells the kernel not to load any configured
|
||||
initial RAM disk.
|
||||
|
||||
nointremap [X86-64, Intel-IOMMU] Do not enable interrupt
|
||||
remapping.
|
||||
|
||||
nointroute [IA-64]
|
||||
|
||||
nojitter [IA64] Disables jitter checking for ITC timers.
|
||||
@@ -1656,6 +1678,14 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
oprofile.timer= [HW]
|
||||
Use timer interrupt instead of performance counters
|
||||
|
||||
oprofile.cpu_type= Force an oprofile cpu type
|
||||
This might be useful if you have an older oprofile
|
||||
userland or if you want common events.
|
||||
Format: { archperfmon }
|
||||
archperfmon: [X86] Force use of architectural
|
||||
perfmon on Intel CPUs instead of the
|
||||
CPU specific event set.
|
||||
|
||||
osst= [HW,SCSI] SCSI Tape Driver
|
||||
Format: <buffer_size>,<write_threshold>
|
||||
See also Documentation/scsi/st.txt.
|
||||
|
||||
@@ -0,0 +1,142 @@
|
||||
Kernel Memory Leak Detector
|
||||
===========================
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
Kmemleak provides a way of detecting possible kernel memory leaks in a
|
||||
way similar to a tracing garbage collector
|
||||
(http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29#Tracing_garbage_collectors),
|
||||
with the difference that the orphan objects are not freed but only
|
||||
reported via /sys/kernel/debug/kmemleak. A similar method is used by the
|
||||
Valgrind tool (memcheck --leak-check) to detect the memory leaks in
|
||||
user-space applications.
|
||||
|
||||
Usage
|
||||
-----
|
||||
|
||||
CONFIG_DEBUG_KMEMLEAK in "Kernel hacking" has to be enabled. A kernel
|
||||
thread scans the memory every 10 minutes (by default) and prints any new
|
||||
unreferenced objects found. To trigger an intermediate scan and display
|
||||
all the possible memory leaks:
|
||||
|
||||
# mount -t debugfs nodev /sys/kernel/debug/
|
||||
# cat /sys/kernel/debug/kmemleak
|
||||
|
||||
Note that the orphan objects are listed in the order they were allocated
|
||||
and one object at the beginning of the list may cause other subsequent
|
||||
objects to be reported as orphan.
|
||||
|
||||
Memory scanning parameters can be modified at run-time by writing to the
|
||||
/sys/kernel/debug/kmemleak file. The following parameters are supported:
|
||||
|
||||
off - disable kmemleak (irreversible)
|
||||
stack=on - enable the task stacks scanning
|
||||
stack=off - disable the tasks stacks scanning
|
||||
scan=on - start the automatic memory scanning thread
|
||||
scan=off - stop the automatic memory scanning thread
|
||||
scan=<secs> - set the automatic memory scanning period in seconds (0
|
||||
to disable it)
|
||||
|
||||
Kmemleak can also be disabled at boot-time by passing "kmemleak=off" on
|
||||
the kernel command line.
|
||||
|
||||
Basic Algorithm
|
||||
---------------
|
||||
|
||||
The memory allocations via kmalloc, vmalloc, kmem_cache_alloc and
|
||||
friends are traced and the pointers, together with additional
|
||||
information like size and stack trace, are stored in a prio search tree.
|
||||
The corresponding freeing function calls are tracked and the pointers
|
||||
removed from the kmemleak data structures.
|
||||
|
||||
An allocated block of memory is considered orphan if no pointer to its
|
||||
start address or to any location inside the block can be found by
|
||||
scanning the memory (including saved registers). This means that there
|
||||
might be no way for the kernel to pass the address of the allocated
|
||||
block to a freeing function and therefore the block is considered a
|
||||
memory leak.
|
||||
|
||||
The scanning algorithm steps:
|
||||
|
||||
1. mark all objects as white (remaining white objects will later be
|
||||
considered orphan)
|
||||
2. scan the memory starting with the data section and stacks, checking
|
||||
the values against the addresses stored in the prio search tree. If
|
||||
a pointer to a white object is found, the object is added to the
|
||||
gray list
|
||||
3. scan the gray objects for matching addresses (some white objects
|
||||
can become gray and added at the end of the gray list) until the
|
||||
gray set is finished
|
||||
4. the remaining white objects are considered orphan and reported via
|
||||
/sys/kernel/debug/kmemleak
|
||||
|
||||
Some allocated memory blocks have pointers stored in the kernel's
|
||||
internal data structures and they cannot be detected as orphans. To
|
||||
avoid this, kmemleak can also store the number of values pointing to an
|
||||
address inside the block address range that need to be found so that the
|
||||
block is not considered a leak. One example is __vmalloc().
|
||||
|
||||
Kmemleak API
|
||||
------------
|
||||
|
||||
See the include/linux/kmemleak.h header for the functions prototype.
|
||||
|
||||
kmemleak_init - initialize kmemleak
|
||||
kmemleak_alloc - notify of a memory block allocation
|
||||
kmemleak_free - notify of a memory block freeing
|
||||
kmemleak_not_leak - mark an object as not a leak
|
||||
kmemleak_ignore - do not scan or report an object as leak
|
||||
kmemleak_scan_area - add scan areas inside a memory block
|
||||
kmemleak_no_scan - do not scan a memory block
|
||||
kmemleak_erase - erase an old value in a pointer variable
|
||||
kmemleak_alloc_recursive - as kmemleak_alloc but checks the recursiveness
|
||||
kmemleak_free_recursive - as kmemleak_free but checks the recursiveness
|
||||
|
||||
Dealing with false positives/negatives
|
||||
--------------------------------------
|
||||
|
||||
The false negatives are real memory leaks (orphan objects) but not
|
||||
reported by kmemleak because values found during the memory scanning
|
||||
point to such objects. To reduce the number of false negatives, kmemleak
|
||||
provides the kmemleak_ignore, kmemleak_scan_area, kmemleak_no_scan and
|
||||
kmemleak_erase functions (see above). The task stacks also increase the
|
||||
amount of false negatives and their scanning is not enabled by default.
|
||||
|
||||
The false positives are objects wrongly reported as being memory leaks
|
||||
(orphan). For objects known not to be leaks, kmemleak provides the
|
||||
kmemleak_not_leak function. The kmemleak_ignore could also be used if
|
||||
the memory block is known not to contain other pointers and it will no
|
||||
longer be scanned.
|
||||
|
||||
Some of the reported leaks are only transient, especially on SMP
|
||||
systems, because of pointers temporarily stored in CPU registers or
|
||||
stacks. Kmemleak defines MSECS_MIN_AGE (defaulting to 1000) representing
|
||||
the minimum age of an object to be reported as a memory leak.
|
||||
|
||||
Limitations and Drawbacks
|
||||
-------------------------
|
||||
|
||||
The main drawback is the reduced performance of memory allocation and
|
||||
freeing. To avoid other penalties, the memory scanning is only performed
|
||||
when the /sys/kernel/debug/kmemleak file is read. Anyway, this tool is
|
||||
intended for debugging purposes where the performance might not be the
|
||||
most important requirement.
|
||||
|
||||
To keep the algorithm simple, kmemleak scans for values pointing to any
|
||||
address inside a block's address range. This may lead to an increased
|
||||
number of false negatives. However, it is likely that a real memory leak
|
||||
will eventually become visible.
|
||||
|
||||
Another source of false negatives is the data stored in non-pointer
|
||||
values. In a future version, kmemleak could only scan the pointer
|
||||
members in the allocated structures. This feature would solve many of
|
||||
the false negative cases described above.
|
||||
|
||||
The tool can report false positives. These are cases where an allocated
|
||||
block doesn't need to be freed (some cases in the init_call functions),
|
||||
the pointer is calculated by other methods than the usual container_of
|
||||
macro or the pointer is stored in a location not scanned by kmemleak.
|
||||
|
||||
Page allocations and ioremap are not tracked. Only the ARM and x86
|
||||
architectures are currently supported.
|
||||
@@ -31,6 +31,7 @@ Contents:
|
||||
|
||||
- Locking functions.
|
||||
- Interrupt disabling functions.
|
||||
- Sleep and wake-up functions.
|
||||
- Miscellaneous functions.
|
||||
|
||||
(*) Inter-CPU locking barrier effects.
|
||||
@@ -1217,6 +1218,132 @@ barriers are required in such a situation, they must be provided from some
|
||||
other means.
|
||||
|
||||
|
||||
SLEEP AND WAKE-UP FUNCTIONS
|
||||
---------------------------
|
||||
|
||||
Sleeping and waking on an event flagged in global data can be viewed as an
|
||||
interaction between two pieces of data: the task state of the task waiting for
|
||||
the event and the global data used to indicate the event. To make sure that
|
||||
these appear to happen in the right order, the primitives to begin the process
|
||||
of going to sleep, and the primitives to initiate a wake up imply certain
|
||||
barriers.
|
||||
|
||||
Firstly, the sleeper normally follows something like this sequence of events:
|
||||
|
||||
for (;;) {
|
||||
set_current_state(TASK_UNINTERRUPTIBLE);
|
||||
if (event_indicated)
|
||||
break;
|
||||
schedule();
|
||||
}
|
||||
|
||||
A general memory barrier is interpolated automatically by set_current_state()
|
||||
after it has altered the task state:
|
||||
|
||||
CPU 1
|
||||
===============================
|
||||
set_current_state();
|
||||
set_mb();
|
||||
STORE current->state
|
||||
<general barrier>
|
||||
LOAD event_indicated
|
||||
|
||||
set_current_state() may be wrapped by:
|
||||
|
||||
prepare_to_wait();
|
||||
prepare_to_wait_exclusive();
|
||||
|
||||
which therefore also imply a general memory barrier after setting the state.
|
||||
The whole sequence above is available in various canned forms, all of which
|
||||
interpolate the memory barrier in the right place:
|
||||
|
||||
wait_event();
|
||||
wait_event_interruptible();
|
||||
wait_event_interruptible_exclusive();
|
||||
wait_event_interruptible_timeout();
|
||||
wait_event_killable();
|
||||
wait_event_timeout();
|
||||
wait_on_bit();
|
||||
wait_on_bit_lock();
|
||||
|
||||
|
||||
Secondly, code that performs a wake up normally follows something like this:
|
||||
|
||||
event_indicated = 1;
|
||||
wake_up(&event_wait_queue);
|
||||
|
||||
or:
|
||||
|
||||
event_indicated = 1;
|
||||
wake_up_process(event_daemon);
|
||||
|
||||
A write memory barrier is implied by wake_up() and co. if and only if they wake
|
||||
something up. The barrier occurs before the task state is cleared, and so sits
|
||||
between the STORE to indicate the event and the STORE to set TASK_RUNNING:
|
||||
|
||||
CPU 1 CPU 2
|
||||
=============================== ===============================
|
||||
set_current_state(); STORE event_indicated
|
||||
set_mb(); wake_up();
|
||||
STORE current->state <write barrier>
|
||||
<general barrier> STORE current->state
|
||||
LOAD event_indicated
|
||||
|
||||
The available waker functions include:
|
||||
|
||||
complete();
|
||||
wake_up();
|
||||
wake_up_all();
|
||||
wake_up_bit();
|
||||
wake_up_interruptible();
|
||||
wake_up_interruptible_all();
|
||||
wake_up_interruptible_nr();
|
||||
wake_up_interruptible_poll();
|
||||
wake_up_interruptible_sync();
|
||||
wake_up_interruptible_sync_poll();
|
||||
wake_up_locked();
|
||||
wake_up_locked_poll();
|
||||
wake_up_nr();
|
||||
wake_up_poll();
|
||||
wake_up_process();
|
||||
|
||||
|
||||
[!] Note that the memory barriers implied by the sleeper and the waker do _not_
|
||||
order multiple stores before the wake-up with respect to loads of those stored
|
||||
values after the sleeper has called set_current_state(). For instance, if the
|
||||
sleeper does:
|
||||
|
||||
set_current_state(TASK_INTERRUPTIBLE);
|
||||
if (event_indicated)
|
||||
break;
|
||||
__set_current_state(TASK_RUNNING);
|
||||
do_something(my_data);
|
||||
|
||||
and the waker does:
|
||||
|
||||
my_data = value;
|
||||
event_indicated = 1;
|
||||
wake_up(&event_wait_queue);
|
||||
|
||||
there's no guarantee that the change to event_indicated will be perceived by
|
||||
the sleeper as coming after the change to my_data. In such a circumstance, the
|
||||
code on both sides must interpolate its own memory barriers between the
|
||||
separate data accesses. Thus the above sleeper ought to do:
|
||||
|
||||
set_current_state(TASK_INTERRUPTIBLE);
|
||||
if (event_indicated) {
|
||||
smp_rmb();
|
||||
do_something(my_data);
|
||||
}
|
||||
|
||||
and the waker should do:
|
||||
|
||||
my_data = value;
|
||||
smp_wmb();
|
||||
event_indicated = 1;
|
||||
wake_up(&event_wait_queue);
|
||||
|
||||
|
||||
MISCELLANEOUS FUNCTIONS
|
||||
-----------------------
|
||||
|
||||
@@ -1366,7 +1493,7 @@ WHERE ARE MEMORY BARRIERS NEEDED?
|
||||
|
||||
Under normal operation, memory operation reordering is generally not going to
|
||||
be a problem as a single-threaded linear piece of code will still appear to
|
||||
work correctly, even if it's in an SMP kernel. There are, however, three
|
||||
work correctly, even if it's in an SMP kernel. There are, however, four
|
||||
circumstances in which reordering definitely _could_ be a problem:
|
||||
|
||||
(*) Interprocessor interaction.
|
||||
|
||||
@@ -1266,13 +1266,22 @@ sctp_rmem - vector of 3 INTEGERs: min, default, max
|
||||
sctp_wmem - vector of 3 INTEGERs: min, default, max
|
||||
See tcp_wmem for a description.
|
||||
|
||||
UNDOCUMENTED:
|
||||
|
||||
/proc/sys/net/core/*
|
||||
dev_weight FIXME
|
||||
dev_weight - INTEGER
|
||||
The maximum number of packets that kernel can handle on a NAPI
|
||||
interrupt, it's a Per-CPU variable.
|
||||
|
||||
Default: 64
|
||||
|
||||
/proc/sys/net/unix/*
|
||||
max_dgram_qlen FIXME
|
||||
max_dgram_qlen - INTEGER
|
||||
The maximum length of dgram socket receive queue
|
||||
|
||||
Default: 10
|
||||
|
||||
|
||||
UNDOCUMENTED:
|
||||
|
||||
/proc/sys/net/irda/*
|
||||
fast_poll_increase FIXME
|
||||
|
||||
@@ -4,6 +4,7 @@
|
||||
CONTENTS
|
||||
========
|
||||
|
||||
0. WARNING
|
||||
1. Overview
|
||||
1.1 The problem
|
||||
1.2 The solution
|
||||
@@ -14,6 +15,23 @@ CONTENTS
|
||||
3. Future plans
|
||||
|
||||
|
||||
0. WARNING
|
||||
==========
|
||||
|
||||
Fiddling with these settings can result in an unstable system, the knobs are
|
||||
root only and assumes root knows what he is doing.
|
||||
|
||||
Most notable:
|
||||
|
||||
* very small values in sched_rt_period_us can result in an unstable
|
||||
system when the period is smaller than either the available hrtimer
|
||||
resolution, or the time it takes to handle the budget refresh itself.
|
||||
|
||||
* very small values in sched_rt_runtime_us can result in an unstable
|
||||
system when the runtime is so small the system has difficulty making
|
||||
forward progress (NOTE: the migration thread and kstopmachine both
|
||||
are real-time processes).
|
||||
|
||||
1. Overview
|
||||
===========
|
||||
|
||||
@@ -169,7 +187,7 @@ get their allocated time.
|
||||
|
||||
Implementing SCHED_EDF might take a while to complete. Priority Inheritance is
|
||||
the biggest challenge as the current linux PI infrastructure is geared towards
|
||||
the limited static priority levels 0-139. With deadline scheduling you need to
|
||||
the limited static priority levels 0-99. With deadline scheduling you need to
|
||||
do deadline inheritance (since priority is inversely proportional to the
|
||||
deadline delta (deadline - now).
|
||||
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user