You've already forked linux-apfs
mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
Merge Linux 2.6.23
This commit is contained in:
@@ -134,8 +134,6 @@ dvb/
|
||||
- info on Linux Digital Video Broadcast (DVB) subsystem.
|
||||
early-userspace/
|
||||
- info about initramfs, klibc, and userspace early during boot.
|
||||
ecryptfs.txt
|
||||
- docs on eCryptfs: stacked cryptographic filesystem for Linux.
|
||||
eisa.txt
|
||||
- info on EISA bus support.
|
||||
exception.txt
|
||||
|
||||
@@ -316,7 +316,8 @@ CPU B: spin_unlock_irqrestore(&dev_lock, flags)
|
||||
|
||||
<chapter id="pubfunctions">
|
||||
<title>Public Functions Provided</title>
|
||||
!Einclude/asm-i386/io.h
|
||||
!Iinclude/asm-i386/io.h
|
||||
!Elib/iomap.c
|
||||
</chapter>
|
||||
|
||||
</book>
|
||||
|
||||
+2
-2
@@ -208,7 +208,7 @@ tools. One such tool that is particularly recommended is the Linux
|
||||
Cross-Reference project, which is able to present source code in a
|
||||
self-referential, indexed webpage format. An excellent up-to-date
|
||||
repository of the kernel code may be found at:
|
||||
http://sosdg.org/~coywolf/lxr/
|
||||
http://users.sosdg.org/~qiyong/lxr/
|
||||
|
||||
|
||||
The development process
|
||||
@@ -384,7 +384,7 @@ One of the best ways to put into practice your hacking skills is by fixing
|
||||
bugs reported by other people. Not only you will help to make the kernel
|
||||
more stable, you'll learn to fix real world problems and you will improve
|
||||
your skills, and other developers will be aware of your presence. Fixing
|
||||
bugs is one of the best ways to earn merit amongst the developers, because
|
||||
bugs is one of the best ways to get merits among other developers, because
|
||||
not many people like wasting time fixing other people's bugs.
|
||||
|
||||
To work in the already reported bug reports, go to http://bugzilla.kernel.org.
|
||||
|
||||
@@ -166,7 +166,7 @@ To solve this problem, you really only have two options:
|
||||
The option of being unfailingly polite really doesn't exist. Nobody will
|
||||
trust somebody who is so clearly hiding his true character.
|
||||
|
||||
(*) Paul Simon sang "Fifty Ways to Lose Your Lover", because quite
|
||||
(*) Paul Simon sang "Fifty Ways to Leave Your Lover", because quite
|
||||
frankly, "A Million Ways to Tell a Developer He Is a D*ckhead" doesn't
|
||||
scan nearly as well. But I'm sure he thought about it.
|
||||
|
||||
|
||||
@@ -126,7 +126,7 @@ the reviewers time and will get your patch rejected, probably
|
||||
without even being read.
|
||||
|
||||
At a minimum you should check your patches with the patch style
|
||||
checker prior to submission (scripts/patchcheck.pl). You should
|
||||
checker prior to submission (scripts/checkpatch.pl). You should
|
||||
be able to justify all violations that remain in your patch.
|
||||
|
||||
|
||||
@@ -560,7 +560,7 @@ NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people!
|
||||
<http://marc.theaimsgroup.com/?l=linux-kernel&m=112112749912944&w=2>
|
||||
|
||||
Kernel Documentation/CodingStyle:
|
||||
<http://sosdg.org/~coywolf/lxr/source/Documentation/CodingStyle>
|
||||
<http://users.sosdg.org/~qiyong/lxr/source/Documentation/CodingStyle>
|
||||
|
||||
Linus Torvalds's mail on the canonical patch format:
|
||||
<http://lkml.org/lkml/2005/4/7/183>
|
||||
|
||||
@@ -0,0 +1,219 @@
|
||||
Asynchronous Transfers/Transforms API
|
||||
|
||||
1 INTRODUCTION
|
||||
|
||||
2 GENEALOGY
|
||||
|
||||
3 USAGE
|
||||
3.1 General format of the API
|
||||
3.2 Supported operations
|
||||
3.3 Descriptor management
|
||||
3.4 When does the operation execute?
|
||||
3.5 When does the operation complete?
|
||||
3.6 Constraints
|
||||
3.7 Example
|
||||
|
||||
4 DRIVER DEVELOPER NOTES
|
||||
4.1 Conformance points
|
||||
4.2 "My application needs finer control of hardware channels"
|
||||
|
||||
5 SOURCE
|
||||
|
||||
---
|
||||
|
||||
1 INTRODUCTION
|
||||
|
||||
The async_tx API provides methods for describing a chain of asynchronous
|
||||
bulk memory transfers/transforms with support for inter-transactional
|
||||
dependencies. It is implemented as a dmaengine client that smooths over
|
||||
the details of different hardware offload engine implementations. Code
|
||||
that is written to the API can optimize for asynchronous operation and
|
||||
the API will fit the chain of operations to the available offload
|
||||
resources.
|
||||
|
||||
2 GENEALOGY
|
||||
|
||||
The API was initially designed to offload the memory copy and
|
||||
xor-parity-calculations of the md-raid5 driver using the offload engines
|
||||
present in the Intel(R) Xscale series of I/O processors. It also built
|
||||
on the 'dmaengine' layer developed for offloading memory copies in the
|
||||
network stack using Intel(R) I/OAT engines. The following design
|
||||
features surfaced as a result:
|
||||
1/ implicit synchronous path: users of the API do not need to know if
|
||||
the platform they are running on has offload capabilities. The
|
||||
operation will be offloaded when an engine is available and carried out
|
||||
in software otherwise.
|
||||
2/ cross channel dependency chains: the API allows a chain of dependent
|
||||
operations to be submitted, like xor->copy->xor in the raid5 case. The
|
||||
API automatically handles cases where the transition from one operation
|
||||
to another implies a hardware channel switch.
|
||||
3/ dmaengine extensions to support multiple clients and operation types
|
||||
beyond 'memcpy'
|
||||
|
||||
3 USAGE
|
||||
|
||||
3.1 General format of the API:
|
||||
struct dma_async_tx_descriptor *
|
||||
async_<operation>(<op specific parameters>,
|
||||
enum async_tx_flags flags,
|
||||
struct dma_async_tx_descriptor *dependency,
|
||||
dma_async_tx_callback callback_routine,
|
||||
void *callback_parameter);
|
||||
|
||||
3.2 Supported operations:
|
||||
memcpy - memory copy between a source and a destination buffer
|
||||
memset - fill a destination buffer with a byte value
|
||||
xor - xor a series of source buffers and write the result to a
|
||||
destination buffer
|
||||
xor_zero_sum - xor a series of source buffers and set a flag if the
|
||||
result is zero. The implementation attempts to prevent
|
||||
writes to memory
|
||||
|
||||
3.3 Descriptor management:
|
||||
The return value is non-NULL and points to a 'descriptor' when the operation
|
||||
has been queued to execute asynchronously. Descriptors are recycled
|
||||
resources, under control of the offload engine driver, to be reused as
|
||||
operations complete. When an application needs to submit a chain of
|
||||
operations it must guarantee that the descriptor is not automatically recycled
|
||||
before the dependency is submitted. This requires that all descriptors be
|
||||
acknowledged by the application before the offload engine driver is allowed to
|
||||
recycle (or free) the descriptor. A descriptor can be acked by one of the
|
||||
following methods:
|
||||
1/ setting the ASYNC_TX_ACK flag if no child operations are to be submitted
|
||||
2/ setting the ASYNC_TX_DEP_ACK flag to acknowledge the parent
|
||||
descriptor of a new operation.
|
||||
3/ calling async_tx_ack() on the descriptor.
|
||||
|
||||
3.4 When does the operation execute?
|
||||
Operations do not immediately issue after return from the
|
||||
async_<operation> call. Offload engine drivers batch operations to
|
||||
improve performance by reducing the number of mmio cycles needed to
|
||||
manage the channel. Once a driver-specific threshold is met the driver
|
||||
automatically issues pending operations. An application can force this
|
||||
event by calling async_tx_issue_pending_all(). This operates on all
|
||||
channels since the application has no knowledge of channel to operation
|
||||
mapping.
|
||||
|
||||
3.5 When does the operation complete?
|
||||
There are two methods for an application to learn about the completion
|
||||
of an operation.
|
||||
1/ Call dma_wait_for_async_tx(). This call causes the CPU to spin while
|
||||
it polls for the completion of the operation. It handles dependency
|
||||
chains and issuing pending operations.
|
||||
2/ Specify a completion callback. The callback routine runs in tasklet
|
||||
context if the offload engine driver supports interrupts, or it is
|
||||
called in application context if the operation is carried out
|
||||
synchronously in software. The callback can be set in the call to
|
||||
async_<operation>, or when the application needs to submit a chain of
|
||||
unknown length it can use the async_trigger_callback() routine to set a
|
||||
completion interrupt/callback at the end of the chain.
|
||||
|
||||
3.6 Constraints:
|
||||
1/ Calls to async_<operation> are not permitted in IRQ context. Other
|
||||
contexts are permitted provided constraint #2 is not violated.
|
||||
2/ Completion callback routines cannot submit new operations. This
|
||||
results in recursion in the synchronous case and spin_locks being
|
||||
acquired twice in the asynchronous case.
|
||||
|
||||
3.7 Example:
|
||||
Perform a xor->copy->xor operation where each operation depends on the
|
||||
result from the previous operation:
|
||||
|
||||
void complete_xor_copy_xor(void *param)
|
||||
{
|
||||
printk("complete\n");
|
||||
}
|
||||
|
||||
int run_xor_copy_xor(struct page **xor_srcs,
|
||||
int xor_src_cnt,
|
||||
struct page *xor_dest,
|
||||
size_t xor_len,
|
||||
struct page *copy_src,
|
||||
struct page *copy_dest,
|
||||
size_t copy_len)
|
||||
{
|
||||
struct dma_async_tx_descriptor *tx;
|
||||
|
||||
tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len,
|
||||
ASYNC_TX_XOR_DROP_DST, NULL, NULL, NULL);
|
||||
tx = async_memcpy(copy_dest, copy_src, 0, 0, copy_len,
|
||||
ASYNC_TX_DEP_ACK, tx, NULL, NULL);
|
||||
tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len,
|
||||
ASYNC_TX_XOR_DROP_DST | ASYNC_TX_DEP_ACK | ASYNC_TX_ACK,
|
||||
tx, complete_xor_copy_xor, NULL);
|
||||
|
||||
async_tx_issue_pending_all();
|
||||
}
|
||||
|
||||
See include/linux/async_tx.h for more information on the flags. See the
|
||||
ops_run_* and ops_complete_* routines in drivers/md/raid5.c for more
|
||||
implementation examples.
|
||||
|
||||
4 DRIVER DEVELOPMENT NOTES
|
||||
4.1 Conformance points:
|
||||
There are a few conformance points required in dmaengine drivers to
|
||||
accommodate assumptions made by applications using the async_tx API:
|
||||
1/ Completion callbacks are expected to happen in tasklet context
|
||||
2/ dma_async_tx_descriptor fields are never manipulated in IRQ context
|
||||
3/ Use async_tx_run_dependencies() in the descriptor clean up path to
|
||||
handle submission of dependent operations
|
||||
|
||||
4.2 "My application needs finer control of hardware channels"
|
||||
This requirement seems to arise from cases where a DMA engine driver is
|
||||
trying to support device-to-memory DMA. The dmaengine and async_tx
|
||||
implementations were designed for offloading memory-to-memory
|
||||
operations; however, there are some capabilities of the dmaengine layer
|
||||
that can be used for platform-specific channel management.
|
||||
Platform-specific constraints can be handled by registering the
|
||||
application as a 'dma_client' and implementing a 'dma_event_callback' to
|
||||
apply a filter to the available channels in the system. Before showing
|
||||
how to implement a custom dma_event callback some background of
|
||||
dmaengine's client support is required.
|
||||
|
||||
The following routines in dmaengine support multiple clients requesting
|
||||
use of a channel:
|
||||
- dma_async_client_register(struct dma_client *client)
|
||||
- dma_async_client_chan_request(struct dma_client *client)
|
||||
|
||||
dma_async_client_register takes a pointer to an initialized dma_client
|
||||
structure. It expects that the 'event_callback' and 'cap_mask' fields
|
||||
are already initialized.
|
||||
|
||||
dma_async_client_chan_request triggers dmaengine to notify the client of
|
||||
all channels that satisfy the capability mask. It is up to the client's
|
||||
event_callback routine to track how many channels the client needs and
|
||||
how many it is currently using. The dma_event_callback routine returns a
|
||||
dma_state_client code to let dmaengine know the status of the
|
||||
allocation.
|
||||
|
||||
Below is the example of how to extend this functionality for
|
||||
platform-specific filtering of the available channels beyond the
|
||||
standard capability mask:
|
||||
|
||||
static enum dma_state_client
|
||||
my_dma_client_callback(struct dma_client *client,
|
||||
struct dma_chan *chan, enum dma_state state)
|
||||
{
|
||||
struct dma_device *dma_dev;
|
||||
struct my_platform_specific_dma *plat_dma_dev;
|
||||
|
||||
dma_dev = chan->device;
|
||||
plat_dma_dev = container_of(dma_dev,
|
||||
struct my_platform_specific_dma,
|
||||
dma_dev);
|
||||
|
||||
if (!plat_dma_dev->platform_specific_capability)
|
||||
return DMA_DUP;
|
||||
|
||||
. . .
|
||||
}
|
||||
|
||||
5 SOURCE
|
||||
include/linux/dmaengine.h: core header file for DMA drivers and clients
|
||||
drivers/dma/dmaengine.c: offload engine channel management routines
|
||||
drivers/dma/: location for offload engine drivers
|
||||
include/linux/async_tx.h: core header file for the async_tx api
|
||||
crypto/async_tx/async_tx.c: async_tx interface to dmaengine and common code
|
||||
crypto/async_tx/async_memcpy.c: copy offload
|
||||
crypto/async_tx/async_memset.c: memory fill offload
|
||||
crypto/async_tx/async_xor.c: xor and xor zero sum offload
|
||||
@@ -94,6 +94,8 @@ Your cooperation is appreciated.
|
||||
9 = /dev/urandom Faster, less secure random number gen.
|
||||
10 = /dev/aio Asynchronous I/O notification interface
|
||||
11 = /dev/kmsg Writes to this come out as printk's
|
||||
12 = /dev/oldmem Used by crashdump kernels to access
|
||||
the memory of the kernel that crashed.
|
||||
|
||||
1 block RAM disk
|
||||
0 = /dev/ram0 First RAM disk
|
||||
|
||||
@@ -197,6 +197,14 @@ Who: Len Brown <len.brown@intel.com>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: /proc/acpi/event
|
||||
When: February 2008
|
||||
Why: /proc/acpi/event has been replaced by events via the input layer
|
||||
and netlink since 2.6.23.
|
||||
Who: Len Brown <len.brown@intel.com>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: Compaq touchscreen device emulation
|
||||
When: Oct 2007
|
||||
Files: drivers/input/tsdev.c
|
||||
@@ -290,3 +298,11 @@ Why: All mthca hardware also supports MSI-X, which provides
|
||||
Who: Roland Dreier <rolandd@cisco.com>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: sk98lin network driver
|
||||
When: Feburary 2008
|
||||
Why: In kernel tree version of driver is unmaintained. Sk98lin driver
|
||||
replaced by the skge driver.
|
||||
Who: Stephen Hemminger <shemminger@linux-foundation.org>
|
||||
|
||||
---------------------------
|
||||
|
||||
@@ -32,6 +32,8 @@ directory-locking
|
||||
- info about the locking scheme used for directory operations.
|
||||
dlmfs.txt
|
||||
- info on the userspace interface to the OCFS2 DLM.
|
||||
ecryptfs.txt
|
||||
- docs on eCryptfs: stacked cryptographic filesystem for Linux.
|
||||
ext2.txt
|
||||
- info, mount options and specifications for the Ext2 filesystem.
|
||||
ext3.txt
|
||||
|
||||
@@ -6,12 +6,26 @@ ABOUT
|
||||
|
||||
v9fs is a Unix implementation of the Plan 9 9p remote filesystem protocol.
|
||||
|
||||
This software was originally developed by Ron Minnich <rminnich@lanl.gov>
|
||||
and Maya Gokhale <maya@lanl.gov>. Additional development by Greg Watson
|
||||
This software was originally developed by Ron Minnich <rminnich@sandia.gov>
|
||||
and Maya Gokhale. Additional development by Greg Watson
|
||||
<gwatson@lanl.gov> and most recently Eric Van Hensbergen
|
||||
<ericvh@gmail.com>, Latchesar Ionkov <lucho@ionkov.net> and Russ Cox
|
||||
<rsc@swtch.com>.
|
||||
|
||||
The best detailed explanation of the Linux implementation and applications of
|
||||
the 9p client is available in the form of a USENIX paper:
|
||||
http://www.usenix.org/events/usenix05/tech/freenix/hensbergen.html
|
||||
|
||||
Other applications are described in the following papers:
|
||||
* XCPU & Clustering
|
||||
http://www.xcpu.org/xcpu-talk.pdf
|
||||
* KVMFS: control file system for KVM
|
||||
http://www.xcpu.org/kvmfs.pdf
|
||||
* CellFS: A New ProgrammingModel for the Cell BE
|
||||
http://www.xcpu.org/cellfs-talk.pdf
|
||||
* PROSE I/O: Using 9p to enable Application Partitions
|
||||
http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf
|
||||
|
||||
USAGE
|
||||
=====
|
||||
|
||||
@@ -90,9 +104,9 @@ subset of the namespace by extending the path: '#U*'/tmp would just export
|
||||
and export.
|
||||
|
||||
A Linux version of the 9p server is now maintained under the npfs project
|
||||
on sourceforge (http://sourceforge.net/projects/npfs). There is also a
|
||||
more stable single-threaded version of the server (named spfs) available from
|
||||
the same CVS repository.
|
||||
on sourceforge (http://sourceforge.net/projects/npfs). The currently
|
||||
maintained version is the single-threaded version of the server (named spfs)
|
||||
available from the same CVS repository.
|
||||
|
||||
There are user and developer mailing lists available through the v9fs project
|
||||
on sourceforge (http://sourceforge.net/projects/v9fs).
|
||||
|
||||
@@ -28,11 +28,7 @@ Manish Singh <manish.singh@oracle.com>
|
||||
Caveats
|
||||
=======
|
||||
Features which OCFS2 does not support yet:
|
||||
- sparse files
|
||||
- extended attributes
|
||||
- shared writable mmap
|
||||
- loopback is supported, but data written will not
|
||||
be cluster coherent.
|
||||
- quotas
|
||||
- cluster aware flock
|
||||
- cluster aware lockf
|
||||
@@ -57,3 +53,12 @@ nointr Do not allow signals to interrupt cluster
|
||||
atime_quantum=60(*) OCFS2 will not update atime unless this number
|
||||
of seconds has passed since the last update.
|
||||
Set to zero to always update atime.
|
||||
data=ordered (*) All data are forced directly out to the main file
|
||||
system prior to its metadata being committed to the
|
||||
journal.
|
||||
data=writeback Data ordering is not preserved, data may be written
|
||||
into the main file system after its metadata has been
|
||||
committed to the journal.
|
||||
preferred_slot=0(*) During mount, try to use this filesystem slot first. If
|
||||
it is in use by another node, the first empty one found
|
||||
will be chosen. Invalid values will be ignored.
|
||||
|
||||
@@ -6,7 +6,7 @@ Supported adapters:
|
||||
Datasheet: Publicly available at the Intel website
|
||||
* ServerWorks OSB4, CSB5, CSB6 and HT-1000 southbridges
|
||||
Datasheet: Only available via NDA from ServerWorks
|
||||
* ATI IXP200, IXP300, IXP400, SB600 and SB700 southbridges
|
||||
* ATI IXP200, IXP300, IXP400, SB600, SB700 and SB800 southbridges
|
||||
Datasheet: Not publicly available
|
||||
* Standard Microsystems (SMSC) SLC90E66 (Victory66) southbridge
|
||||
Datasheet: Publicly available at the SMSC website http://www.smsc.com
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -468,9 +468,6 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
Format:
|
||||
<first_slot>,<last_slot>,<port>,<enum_bit>[,<debug>]
|
||||
|
||||
cpia_pp= [HW,PPT]
|
||||
Format: { parport<nr> | auto | none }
|
||||
|
||||
crashkernel=nn[KMG]@ss[KMG]
|
||||
[KNL] Reserve a chunk of physical memory to
|
||||
hold a kernel to switch to with kexec on panic.
|
||||
@@ -952,14 +949,10 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
Format: <1-256>
|
||||
|
||||
maxcpus= [SMP] Maximum number of processors that an SMP kernel
|
||||
should make use of.
|
||||
Using "nosmp" or "maxcpus=0" will disable SMP
|
||||
entirely (the MPS table probe still happens, though).
|
||||
A command-line option of "maxcpus=<NUM>", where <NUM>
|
||||
is an integer greater than 0, limits the maximum number
|
||||
of CPUs activated in SMP mode to <NUM>.
|
||||
Using "maxcpus=1" on an SMP kernel is the trivial
|
||||
case of an SMP kernel with only one CPU.
|
||||
should make use of. maxcpus=n : n >= 0 limits the
|
||||
kernel to using 'n' processors. n=0 is a special case,
|
||||
it is equivalent to "nosmp", which also disables
|
||||
the IO APIC.
|
||||
|
||||
max_addr=[KMG] [KNL,BOOT,ia64] All physical memory greater than or
|
||||
equal to this physical address is ignored.
|
||||
@@ -1184,7 +1177,8 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
|
||||
nosep [BUGS=X86-32] Disables x86 SYSENTER/SYSEXIT support.
|
||||
|
||||
nosmp [SMP] Tells an SMP kernel to act as a UP kernel.
|
||||
nosmp [SMP] Tells an SMP kernel to act as a UP kernel,
|
||||
and disable the IO APIC. legacy for "maxcpus=0".
|
||||
|
||||
nosoftlockup [KNL] Disable the soft-lockup detector.
|
||||
|
||||
@@ -1826,6 +1820,10 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
-1: disable all active trip points in all thermal zones
|
||||
<degrees C>: override all lowest active trip points
|
||||
|
||||
thermal.crt= [HW,ACPI]
|
||||
-1: disable all critical trip points in all thermal zones
|
||||
<degrees C>: lower all critical trip points
|
||||
|
||||
thermal.nocrt= [HW,ACPI]
|
||||
Set to disable actions on ACPI thermal zone
|
||||
critical and hot trip points.
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -882,7 +882,7 @@ static u32 handle_block_output(int fd, const struct iovec *iov,
|
||||
* of the block file (possibly extending it). */
|
||||
if (off + len > device_len) {
|
||||
/* Trim it back to the correct length */
|
||||
ftruncate(dev->fd, device_len);
|
||||
ftruncate64(dev->fd, device_len);
|
||||
/* Die, bad Guest, die. */
|
||||
errx(1, "Write past end %llu+%u", off, len);
|
||||
}
|
||||
|
||||
@@ -0,0 +1,120 @@
|
||||
|
||||
LOCK STATISTICS
|
||||
|
||||
- WHAT
|
||||
|
||||
As the name suggests, it provides statistics on locks.
|
||||
|
||||
- WHY
|
||||
|
||||
Because things like lock contention can severely impact performance.
|
||||
|
||||
- HOW
|
||||
|
||||
Lockdep already has hooks in the lock functions and maps lock instances to
|
||||
lock classes. We build on that. The graph below shows the relation between
|
||||
the lock functions and the various hooks therein.
|
||||
|
||||
__acquire
|
||||
|
|
||||
lock _____
|
||||
| \
|
||||
| __contended
|
||||
| |
|
||||
| <wait>
|
||||
| _______/
|
||||
|/
|
||||
|
|
||||
__acquired
|
||||
|
|
||||
.
|
||||
<hold>
|
||||
.
|
||||
|
|
||||
__release
|
||||
|
|
||||
unlock
|
||||
|
||||
lock, unlock - the regular lock functions
|
||||
__* - the hooks
|
||||
<> - states
|
||||
|
||||
With these hooks we provide the following statistics:
|
||||
|
||||
con-bounces - number of lock contention that involved x-cpu data
|
||||
contentions - number of lock acquisitions that had to wait
|
||||
wait time min - shortest (non-0) time we ever had to wait for a lock
|
||||
max - longest time we ever had to wait for a lock
|
||||
total - total time we spend waiting on this lock
|
||||
acq-bounces - number of lock acquisitions that involved x-cpu data
|
||||
acquisitions - number of times we took the lock
|
||||
hold time min - shortest (non-0) time we ever held the lock
|
||||
max - longest time we ever held the lock
|
||||
total - total time this lock was held
|
||||
|
||||
From these number various other statistics can be derived, such as:
|
||||
|
||||
hold time average = hold time total / acquisitions
|
||||
|
||||
These numbers are gathered per lock class, per read/write state (when
|
||||
applicable).
|
||||
|
||||
It also tracks 4 contention points per class. A contention point is a call site
|
||||
that had to wait on lock acquisition.
|
||||
|
||||
- USAGE
|
||||
|
||||
Look at the current lock statistics:
|
||||
|
||||
( line numbers not part of actual output, done for clarity in the explanation
|
||||
below )
|
||||
|
||||
# less /proc/lock_stat
|
||||
|
||||
01 lock_stat version 0.2
|
||||
02 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
||||
03 class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acquisitions holdtime-min holdtime-max holdtime-total
|
||||
04 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
||||
05
|
||||
06 &inode->i_data.tree_lock-W: 15 21657 0.18 1093295.30 11547131054.85 58 10415 0.16 87.51 6387.60
|
||||
07 &inode->i_data.tree_lock-R: 0 0 0.00 0.00 0.00 23302 231198 0.25 8.45 98023.38
|
||||
08 --------------------------
|
||||
09 &inode->i_data.tree_lock 0 [<ffffffff8027c08f>] add_to_page_cache+0x5f/0x190
|
||||
10
|
||||
11 ...............................................................................................................................................................................................
|
||||
12
|
||||
13 dcache_lock: 1037 1161 0.38 45.32 774.51 6611 243371 0.15 306.48 77387.24
|
||||
14 -----------
|
||||
15 dcache_lock 180 [<ffffffff802c0d7e>] sys_getcwd+0x11e/0x230
|
||||
16 dcache_lock 165 [<ffffffff802c002a>] d_alloc+0x15a/0x210
|
||||
17 dcache_lock 33 [<ffffffff8035818d>] _atomic_dec_and_lock+0x4d/0x70
|
||||
18 dcache_lock 1 [<ffffffff802beef8>] shrink_dcache_parent+0x18/0x130
|
||||
|
||||
This excerpt shows the first two lock class statistics. Line 01 shows the
|
||||
output version - each time the format changes this will be updated. Line 02-04
|
||||
show the header with column descriptions. Lines 05-10 and 13-18 show the actual
|
||||
statistics. These statistics come in two parts; the actual stats separated by a
|
||||
short separator (line 08, 14) from the contention points.
|
||||
|
||||
The first lock (05-10) is a read/write lock, and shows two lines above the
|
||||
short separator. The contention points don't match the column descriptors,
|
||||
they have two: contentions and [<IP>] symbol.
|
||||
|
||||
|
||||
View the top contending locks:
|
||||
|
||||
# grep : /proc/lock_stat | head
|
||||
&inode->i_data.tree_lock-W: 15 21657 0.18 1093295.30 11547131054.85 58 10415 0.16 87.51 6387.60
|
||||
&inode->i_data.tree_lock-R: 0 0 0.00 0.00 0.00 23302 231198 0.25 8.45 98023.38
|
||||
dcache_lock: 1037 1161 0.38 45.32 774.51 6611 243371 0.15 306.48 77387.24
|
||||
&inode->i_mutex: 161 286 18446744073709 62882.54 1244614.55 3653 20598 18446744073709 62318.60 1693822.74
|
||||
&zone->lru_lock: 94 94 0.53 7.33 92.10 4366 32690 0.29 59.81 16350.06
|
||||
&inode->i_data.i_mmap_lock: 79 79 0.40 3.77 53.03 11779 87755 0.28 116.93 29898.44
|
||||
&q->__queue_lock: 48 50 0.52 31.62 86.31 774 13131 0.17 113.08 12277.52
|
||||
&rq->rq_lock_key: 43 47 0.74 68.50 170.63 3706 33929 0.22 107.99 17460.62
|
||||
&rq->rq_lock_key#2: 39 46 0.75 6.68 49.03 2979 32292 0.17 125.17 17137.63
|
||||
tasklist_lock-W: 15 15 1.45 10.87 32.70 1201 7390 0.58 62.55 13648.47
|
||||
|
||||
Clear the statistics:
|
||||
|
||||
# echo 0 > /proc/lock_stat
|
||||
@@ -96,6 +96,9 @@ routing.txt
|
||||
- the new routing mechanism
|
||||
shaper.txt
|
||||
- info on the module that can shape/limit transmitted traffic.
|
||||
sk98lin.txt
|
||||
- Marvell Yukon Chipset / SysKonnect SK-98xx compliant Gigabit
|
||||
Ethernet Adapter family driver info
|
||||
skfp.txt
|
||||
- SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info.
|
||||
smc9.txt
|
||||
|
||||
@@ -58,9 +58,13 @@ software, so it's a straight round-robin qdisc. It uses the same syntax and
|
||||
classification priomap that sch_prio uses, so it should be intuitive to
|
||||
configure for people who've used sch_prio.
|
||||
|
||||
The PRIO qdisc naturally plugs into a multiqueue device. If PRIO has been
|
||||
built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of
|
||||
bands requested is equal to the number of queues on the hardware. If they
|
||||
In order to utilitize the multiqueue features of the qdiscs, the network
|
||||
device layer needs to enable multiple queue support. This can be done by
|
||||
selecting NETDEVICES_MULTIQUEUE under Drivers.
|
||||
|
||||
The PRIO qdisc naturally plugs into a multiqueue device. If
|
||||
NETDEVICES_MULTIQUEUE is selected, then on qdisc load, the number of
|
||||
bands requested is compared to the number of queues on the hardware. If they
|
||||
are equal, it sets a one-to-one mapping up between the queues and bands. If
|
||||
they're not equal, it will not load the qdisc. This is the same behavior
|
||||
for RR. Once the association is made, any skb that is classified will have
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user