You've already forked linux-apfs
mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
Merge commit '85082fd7cbe3173198aac0eb5e85ab1edcc6352c' into test-build
Manual fixup of: arch/powerpc/Kconfig
This commit is contained in:
@@ -26,3 +26,37 @@ Description:
|
||||
I/O statistics of partition <part>. The format is the
|
||||
same as the above-written /sys/block/<disk>/stat
|
||||
format.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/integrity/format
|
||||
Date: June 2008
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Metadata format for integrity capable block device.
|
||||
E.g. T10-DIF-TYPE1-CRC.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/integrity/read_verify
|
||||
Date: June 2008
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Indicates whether the block layer should verify the
|
||||
integrity of read requests serviced by devices that
|
||||
support sending integrity metadata.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/integrity/tag_size
|
||||
Date: June 2008
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Number of bytes of integrity tag space available per
|
||||
512 bytes of data.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/integrity/write_generate
|
||||
Date: June 2008
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Indicates whether the block layer should automatically
|
||||
generate checksums for write requests bound for
|
||||
devices that support receiving integrity metadata.
|
||||
|
||||
@@ -0,0 +1,35 @@
|
||||
What: /sys/bus/css/devices/.../type
|
||||
Date: March 2008
|
||||
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
|
||||
linux-s390@vger.kernel.org
|
||||
Description: Contains the subchannel type, as reported by the hardware.
|
||||
This attribute is present for all subchannel types.
|
||||
|
||||
What: /sys/bus/css/devices/.../modalias
|
||||
Date: March 2008
|
||||
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
|
||||
linux-s390@vger.kernel.org
|
||||
Description: Contains the module alias as reported with uevents.
|
||||
It is of the format css:t<type> and present for all
|
||||
subchannel types.
|
||||
|
||||
What: /sys/bus/css/drivers/io_subchannel/.../chpids
|
||||
Date: December 2002
|
||||
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
|
||||
linux-s390@vger.kernel.org
|
||||
Description: Contains the ids of the channel paths used by this
|
||||
subchannel, as reported by the channel subsystem
|
||||
during subchannel recognition.
|
||||
Note: This is an I/O-subchannel specific attribute.
|
||||
Users: s390-tools, HAL
|
||||
|
||||
What: /sys/bus/css/drivers/io_subchannel/.../pimpampom
|
||||
Date: December 2002
|
||||
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
|
||||
linux-s390@vger.kernel.org
|
||||
Description: Contains the PIM/PAM/POM values, as reported by the
|
||||
channel subsystem when last queried by the common I/O
|
||||
layer (this implies that this attribute is not neccessarily
|
||||
in sync with the values current in the channel subsystem).
|
||||
Note: This is an I/O-subchannel specific attribute.
|
||||
Users: s390-tools, HAL
|
||||
@@ -0,0 +1,71 @@
|
||||
What: /sys/firmware/memmap/
|
||||
Date: June 2008
|
||||
Contact: Bernhard Walle <bwalle@suse.de>
|
||||
Description:
|
||||
On all platforms, the firmware provides a memory map which the
|
||||
kernel reads. The resources from that memory map are registered
|
||||
in the kernel resource tree and exposed to userspace via
|
||||
/proc/iomem (together with other resources).
|
||||
|
||||
However, on most architectures that firmware-provided memory
|
||||
map is modified afterwards by the kernel itself, either because
|
||||
the kernel merges that memory map with other information or
|
||||
just because the user overwrites that memory map via command
|
||||
line.
|
||||
|
||||
kexec needs the raw firmware-provided memory map to setup the
|
||||
parameter segment of the kernel that should be booted with
|
||||
kexec. Also, the raw memory map is useful for debugging. For
|
||||
that reason, /sys/firmware/memmap is an interface that provides
|
||||
the raw memory map to userspace.
|
||||
|
||||
The structure is as follows: Under /sys/firmware/memmap there
|
||||
are subdirectories with the number of the entry as their name:
|
||||
|
||||
/sys/firmware/memmap/0
|
||||
/sys/firmware/memmap/1
|
||||
/sys/firmware/memmap/2
|
||||
/sys/firmware/memmap/3
|
||||
...
|
||||
|
||||
The maximum depends on the number of memory map entries provided
|
||||
by the firmware. The order is just the order that the firmware
|
||||
provides.
|
||||
|
||||
Each directory contains three files:
|
||||
|
||||
start : The start address (as hexadecimal number with the
|
||||
'0x' prefix).
|
||||
end : The end address, inclusive (regardless whether the
|
||||
firmware provides inclusive or exclusive ranges).
|
||||
type : Type of the entry as string. See below for a list of
|
||||
valid types.
|
||||
|
||||
So, for example:
|
||||
|
||||
/sys/firmware/memmap/0/start
|
||||
/sys/firmware/memmap/0/end
|
||||
/sys/firmware/memmap/0/type
|
||||
/sys/firmware/memmap/1/start
|
||||
...
|
||||
|
||||
Currently following types exist:
|
||||
|
||||
- System RAM
|
||||
- ACPI Tables
|
||||
- ACPI Non-volatile Storage
|
||||
- reserved
|
||||
|
||||
Following shell snippet can be used to display that memory
|
||||
map in a human-readable format:
|
||||
|
||||
-------------------- 8< ----------------------------------------
|
||||
#!/bin/bash
|
||||
cd /sys/firmware/memmap
|
||||
for dir in * ; do
|
||||
start=$(cat $dir/start)
|
||||
end=$(cat $dir/end)
|
||||
type=$(cat $dir/type)
|
||||
printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type"
|
||||
done
|
||||
-------------------- >8 ----------------------------------------
|
||||
@@ -0,0 +1,327 @@
|
||||
----------------------------------------------------------------------
|
||||
1. INTRODUCTION
|
||||
|
||||
Modern filesystems feature checksumming of data and metadata to
|
||||
protect against data corruption. However, the detection of the
|
||||
corruption is done at read time which could potentially be months
|
||||
after the data was written. At that point the original data that the
|
||||
application tried to write is most likely lost.
|
||||
|
||||
The solution is to ensure that the disk is actually storing what the
|
||||
application meant it to. Recent additions to both the SCSI family
|
||||
protocols (SBC Data Integrity Field, SCC protection proposal) as well
|
||||
as SATA/T13 (External Path Protection) try to remedy this by adding
|
||||
support for appending integrity metadata to an I/O. The integrity
|
||||
metadata (or protection information in SCSI terminology) includes a
|
||||
checksum for each sector as well as an incrementing counter that
|
||||
ensures the individual sectors are written in the right order. And
|
||||
for some protection schemes also that the I/O is written to the right
|
||||
place on disk.
|
||||
|
||||
Current storage controllers and devices implement various protective
|
||||
measures, for instance checksumming and scrubbing. But these
|
||||
technologies are working in their own isolated domains or at best
|
||||
between adjacent nodes in the I/O path. The interesting thing about
|
||||
DIF and the other integrity extensions is that the protection format
|
||||
is well defined and every node in the I/O path can verify the
|
||||
integrity of the I/O and reject it if corruption is detected. This
|
||||
allows not only corruption prevention but also isolation of the point
|
||||
of failure.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
2. THE DATA INTEGRITY EXTENSIONS
|
||||
|
||||
As written, the protocol extensions only protect the path between
|
||||
controller and storage device. However, many controllers actually
|
||||
allow the operating system to interact with the integrity metadata
|
||||
(IMD). We have been working with several FC/SAS HBA vendors to enable
|
||||
the protection information to be transferred to and from their
|
||||
controllers.
|
||||
|
||||
The SCSI Data Integrity Field works by appending 8 bytes of protection
|
||||
information to each sector. The data + integrity metadata is stored
|
||||
in 520 byte sectors on disk. Data + IMD are interleaved when
|
||||
transferred between the controller and target. The T13 proposal is
|
||||
similar.
|
||||
|
||||
Because it is highly inconvenient for operating systems to deal with
|
||||
520 (and 4104) byte sectors, we approached several HBA vendors and
|
||||
encouraged them to allow separation of the data and integrity metadata
|
||||
scatter-gather lists.
|
||||
|
||||
The controller will interleave the buffers on write and split them on
|
||||
read. This means that the Linux can DMA the data buffers to and from
|
||||
host memory without changes to the page cache.
|
||||
|
||||
Also, the 16-bit CRC checksum mandated by both the SCSI and SATA specs
|
||||
is somewhat heavy to compute in software. Benchmarks found that
|
||||
calculating this checksum had a significant impact on system
|
||||
performance for a number of workloads. Some controllers allow a
|
||||
lighter-weight checksum to be used when interfacing with the operating
|
||||
system. Emulex, for instance, supports the TCP/IP checksum instead.
|
||||
The IP checksum received from the OS is converted to the 16-bit CRC
|
||||
when writing and vice versa. This allows the integrity metadata to be
|
||||
generated by Linux or the application at very low cost (comparable to
|
||||
software RAID5).
|
||||
|
||||
The IP checksum is weaker than the CRC in terms of detecting bit
|
||||
errors. However, the strength is really in the separation of the data
|
||||
buffers and the integrity metadata. These two distinct buffers much
|
||||
match up for an I/O to complete.
|
||||
|
||||
The separation of the data and integrity metadata buffers as well as
|
||||
the choice in checksums is referred to as the Data Integrity
|
||||
Extensions. As these extensions are outside the scope of the protocol
|
||||
bodies (T10, T13), Oracle and its partners are trying to standardize
|
||||
them within the Storage Networking Industry Association.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
3. KERNEL CHANGES
|
||||
|
||||
The data integrity framework in Linux enables protection information
|
||||
to be pinned to I/Os and sent to/received from controllers that
|
||||
support it.
|
||||
|
||||
The advantage to the integrity extensions in SCSI and SATA is that
|
||||
they enable us to protect the entire path from application to storage
|
||||
device. However, at the same time this is also the biggest
|
||||
disadvantage. It means that the protection information must be in a
|
||||
format that can be understood by the disk.
|
||||
|
||||
Generally Linux/POSIX applications are agnostic to the intricacies of
|
||||
the storage devices they are accessing. The virtual filesystem switch
|
||||
and the block layer make things like hardware sector size and
|
||||
transport protocols completely transparent to the application.
|
||||
|
||||
However, this level of detail is required when preparing the
|
||||
protection information to send to a disk. Consequently, the very
|
||||
concept of an end-to-end protection scheme is a layering violation.
|
||||
It is completely unreasonable for an application to be aware whether
|
||||
it is accessing a SCSI or SATA disk.
|
||||
|
||||
The data integrity support implemented in Linux attempts to hide this
|
||||
from the application. As far as the application (and to some extent
|
||||
the kernel) is concerned, the integrity metadata is opaque information
|
||||
that's attached to the I/O.
|
||||
|
||||
The current implementation allows the block layer to automatically
|
||||
generate the protection information for any I/O. Eventually the
|
||||
intent is to move the integrity metadata calculation to userspace for
|
||||
user data. Metadata and other I/O that originates within the kernel
|
||||
will still use the automatic generation interface.
|
||||
|
||||
Some storage devices allow each hardware sector to be tagged with a
|
||||
16-bit value. The owner of this tag space is the owner of the block
|
||||
device. I.e. the filesystem in most cases. The filesystem can use
|
||||
this extra space to tag sectors as they see fit. Because the tag
|
||||
space is limited, the block interface allows tagging bigger chunks by
|
||||
way of interleaving. This way, 8*16 bits of information can be
|
||||
attached to a typical 4KB filesystem block.
|
||||
|
||||
This also means that applications such as fsck and mkfs will need
|
||||
access to manipulate the tags from user space. A passthrough
|
||||
interface for this is being worked on.
|
||||
|
||||
|
||||
----------------------------------------------------------------------
|
||||
4. BLOCK LAYER IMPLEMENTATION DETAILS
|
||||
|
||||
4.1 BIO
|
||||
|
||||
The data integrity patches add a new field to struct bio when
|
||||
CONFIG_BLK_DEV_INTEGRITY is enabled. bio->bi_integrity is a pointer
|
||||
to a struct bip which contains the bio integrity payload. Essentially
|
||||
a bip is a trimmed down struct bio which holds a bio_vec containing
|
||||
the integrity metadata and the required housekeeping information (bvec
|
||||
pool, vector count, etc.)
|
||||
|
||||
A kernel subsystem can enable data integrity protection on a bio by
|
||||
calling bio_integrity_alloc(bio). This will allocate and attach the
|
||||
bip to the bio.
|
||||
|
||||
Individual pages containing integrity metadata can subsequently be
|
||||
attached using bio_integrity_add_page().
|
||||
|
||||
bio_free() will automatically free the bip.
|
||||
|
||||
|
||||
4.2 BLOCK DEVICE
|
||||
|
||||
Because the format of the protection data is tied to the physical
|
||||
disk, each block device has been extended with a block integrity
|
||||
profile (struct blk_integrity). This optional profile is registered
|
||||
with the block layer using blk_integrity_register().
|
||||
|
||||
The profile contains callback functions for generating and verifying
|
||||
the protection data, as well as getting and setting application tags.
|
||||
The profile also contains a few constants to aid in completing,
|
||||
merging and splitting the integrity metadata.
|
||||
|
||||
Layered block devices will need to pick a profile that's appropriate
|
||||
for all subdevices. blk_integrity_compare() can help with that. DM
|
||||
and MD linear, RAID0 and RAID1 are currently supported. RAID4/5/6
|
||||
will require extra work due to the application tag.
|
||||
|
||||
|
||||
----------------------------------------------------------------------
|
||||
5.0 BLOCK LAYER INTEGRITY API
|
||||
|
||||
5.1 NORMAL FILESYSTEM
|
||||
|
||||
The normal filesystem is unaware that the underlying block device
|
||||
is capable of sending/receiving integrity metadata. The IMD will
|
||||
be automatically generated by the block layer at submit_bio() time
|
||||
in case of a WRITE. A READ request will cause the I/O integrity
|
||||
to be verified upon completion.
|
||||
|
||||
IMD generation and verification can be toggled using the
|
||||
|
||||
/sys/block/<bdev>/integrity/write_generate
|
||||
|
||||
and
|
||||
|
||||
/sys/block/<bdev>/integrity/read_verify
|
||||
|
||||
flags.
|
||||
|
||||
|
||||
5.2 INTEGRITY-AWARE FILESYSTEM
|
||||
|
||||
A filesystem that is integrity-aware can prepare I/Os with IMD
|
||||
attached. It can also use the application tag space if this is
|
||||
supported by the block device.
|
||||
|
||||
|
||||
int bdev_integrity_enabled(block_device, int rw);
|
||||
|
||||
bdev_integrity_enabled() will return 1 if the block device
|
||||
supports integrity metadata transfer for the data direction
|
||||
specified in 'rw'.
|
||||
|
||||
bdev_integrity_enabled() honors the write_generate and
|
||||
read_verify flags in sysfs and will respond accordingly.
|
||||
|
||||
|
||||
int bio_integrity_prep(bio);
|
||||
|
||||
To generate IMD for WRITE and to set up buffers for READ, the
|
||||
filesystem must call bio_integrity_prep(bio).
|
||||
|
||||
Prior to calling this function, the bio data direction and start
|
||||
sector must be set, and the bio should have all data pages
|
||||
added. It is up to the caller to ensure that the bio does not
|
||||
change while I/O is in progress.
|
||||
|
||||
bio_integrity_prep() should only be called if
|
||||
bio_integrity_enabled() returned 1.
|
||||
|
||||
|
||||
int bio_integrity_tag_size(bio);
|
||||
|
||||
If the filesystem wants to use the application tag space it will
|
||||
first have to find out how much storage space is available.
|
||||
Because tag space is generally limited (usually 2 bytes per
|
||||
sector regardless of sector size), the integrity framework
|
||||
supports interleaving the information between the sectors in an
|
||||
I/O.
|
||||
|
||||
Filesystems can call bio_integrity_tag_size(bio) to find out how
|
||||
many bytes of storage are available for that particular bio.
|
||||
|
||||
Another option is bdev_get_tag_size(block_device) which will
|
||||
return the number of available bytes per hardware sector.
|
||||
|
||||
|
||||
int bio_integrity_set_tag(bio, void *tag_buf, len);
|
||||
|
||||
After a successful return from bio_integrity_prep(),
|
||||
bio_integrity_set_tag() can be used to attach an opaque tag
|
||||
buffer to a bio. Obviously this only makes sense if the I/O is
|
||||
a WRITE.
|
||||
|
||||
|
||||
int bio_integrity_get_tag(bio, void *tag_buf, len);
|
||||
|
||||
Similarly, at READ I/O completion time the filesystem can
|
||||
retrieve the tag buffer using bio_integrity_get_tag().
|
||||
|
||||
|
||||
6.3 PASSING EXISTING INTEGRITY METADATA
|
||||
|
||||
Filesystems that either generate their own integrity metadata or
|
||||
are capable of transferring IMD from user space can use the
|
||||
following calls:
|
||||
|
||||
|
||||
struct bip * bio_integrity_alloc(bio, gfp_mask, nr_pages);
|
||||
|
||||
Allocates the bio integrity payload and hangs it off of the bio.
|
||||
nr_pages indicate how many pages of protection data need to be
|
||||
stored in the integrity bio_vec list (similar to bio_alloc()).
|
||||
|
||||
The integrity payload will be freed at bio_free() time.
|
||||
|
||||
|
||||
int bio_integrity_add_page(bio, page, len, offset);
|
||||
|
||||
Attaches a page containing integrity metadata to an existing
|
||||
bio. The bio must have an existing bip,
|
||||
i.e. bio_integrity_alloc() must have been called. For a WRITE,
|
||||
the integrity metadata in the pages must be in a format
|
||||
understood by the target device with the notable exception that
|
||||
the sector numbers will be remapped as the request traverses the
|
||||
I/O stack. This implies that the pages added using this call
|
||||
will be modified during I/O! The first reference tag in the
|
||||
integrity metadata must have a value of bip->bip_sector.
|
||||
|
||||
Pages can be added using bio_integrity_add_page() as long as
|
||||
there is room in the bip bio_vec array (nr_pages).
|
||||
|
||||
Upon completion of a READ operation, the attached pages will
|
||||
contain the integrity metadata received from the storage device.
|
||||
It is up to the receiver to process them and verify data
|
||||
integrity upon completion.
|
||||
|
||||
|
||||
6.4 REGISTERING A BLOCK DEVICE AS CAPABLE OF EXCHANGING INTEGRITY
|
||||
METADATA
|
||||
|
||||
To enable integrity exchange on a block device the gendisk must be
|
||||
registered as capable:
|
||||
|
||||
int blk_integrity_register(gendisk, blk_integrity);
|
||||
|
||||
The blk_integrity struct is a template and should contain the
|
||||
following:
|
||||
|
||||
static struct blk_integrity my_profile = {
|
||||
.name = "STANDARDSBODY-TYPE-VARIANT-CSUM",
|
||||
.generate_fn = my_generate_fn,
|
||||
.verify_fn = my_verify_fn,
|
||||
.get_tag_fn = my_get_tag_fn,
|
||||
.set_tag_fn = my_set_tag_fn,
|
||||
.tuple_size = sizeof(struct my_tuple_size),
|
||||
.tag_size = <tag bytes per hw sector>,
|
||||
};
|
||||
|
||||
'name' is a text string which will be visible in sysfs. This is
|
||||
part of the userland API so chose it carefully and never change
|
||||
it. The format is standards body-type-variant.
|
||||
E.g. T10-DIF-TYPE1-IP or T13-EPP-0-CRC.
|
||||
|
||||
'generate_fn' generates appropriate integrity metadata (for WRITE).
|
||||
|
||||
'verify_fn' verifies that the data buffer matches the integrity
|
||||
metadata.
|
||||
|
||||
'tuple_size' must be set to match the size of the integrity
|
||||
metadata per sector. I.e. 8 for DIF and EPP.
|
||||
|
||||
'tag_size' must be set to identify how many bytes of tag space
|
||||
are available per hardware sector. For DIF this is either 2 or
|
||||
0 depending on the value of the Control Mode Page ATO bit.
|
||||
|
||||
See 6.2 for a description of get_tag_fn and set_tag_fn.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
2007-12-24 Martin K. Petersen <martin.petersen@oracle.com>
|
||||
+71
-63
@@ -2,8 +2,11 @@
|
||||
========================
|
||||
|
||||
Copyright 2008 Red Hat Inc.
|
||||
Author: Steven Rostedt <srostedt@redhat.com>
|
||||
Author: Steven Rostedt <srostedt@redhat.com>
|
||||
License: The GNU Free Documentation License, Version 1.2
|
||||
Reviewers: Elias Oltmanns and Randy Dunlap
|
||||
|
||||
Writen for: 2.6.26-rc8 linux-2.6-tip.git tip/tracing/ftrace branch
|
||||
|
||||
Introduction
|
||||
------------
|
||||
@@ -46,7 +49,7 @@ of ftrace. Here is a list of some of the key files:
|
||||
that is configured.
|
||||
|
||||
available_tracers : This holds the different types of tracers that
|
||||
has been compiled into the kernel. The tracers
|
||||
have been compiled into the kernel. The tracers
|
||||
listed here can be configured by echoing in their
|
||||
name into current_tracer.
|
||||
|
||||
@@ -90,11 +93,13 @@ of ftrace. Here is a list of some of the key files:
|
||||
trace_entries : This sets or displays the number of trace
|
||||
entries each CPU buffer can hold. The tracer buffers
|
||||
are the same size for each CPU, so care must be
|
||||
taken when modifying the trace_entries. The number
|
||||
of actually entries will be the number given
|
||||
times the number of possible CPUS. The buffers
|
||||
are saved as individual pages, and the actual entries
|
||||
will always be rounded up to entries per page.
|
||||
taken when modifying the trace_entries. The trace
|
||||
buffers are allocated in pages (blocks of memory that
|
||||
the kernel uses for allocation, usually 4 KB in size).
|
||||
Since each entry is smaller than a page, if the last
|
||||
allocated page has room for more entries than were
|
||||
requested, the rest of the page is used to allocate
|
||||
entries.
|
||||
|
||||
This can only be updated when the current_tracer
|
||||
is set to "none".
|
||||
@@ -114,13 +119,13 @@ of ftrace. Here is a list of some of the key files:
|
||||
in performance. This also has a side effect of
|
||||
enabling or disabling specific functions to be
|
||||
traced. Echoing in names of functions into this
|
||||
file will limit the trace to only those files.
|
||||
file will limit the trace to only these functions.
|
||||
|
||||
set_ftrace_notrace: This has the opposite effect that
|
||||
set_ftrace_filter has. Any function that is added
|
||||
here will not be traced. If a function exists
|
||||
in both set_ftrace_filter and set_ftrace_notrace
|
||||
the function will _not_ bet traced.
|
||||
in both set_ftrace_filter and set_ftrace_notrace,
|
||||
the function will _not_ be traced.
|
||||
|
||||
available_filter_functions : When a function is encountered the first
|
||||
time by the dynamic tracer, it is recorded and
|
||||
@@ -138,7 +143,7 @@ Here are the list of current tracers that can be configured.
|
||||
|
||||
ftrace - function tracer that uses mcount to trace all functions.
|
||||
It is possible to filter out which functions that are
|
||||
traced when dynamic ftrace is configured in.
|
||||
to be traced when dynamic ftrace is configured in.
|
||||
|
||||
sched_switch - traces the context switches between tasks.
|
||||
|
||||
@@ -297,13 +302,13 @@ explains which is which.
|
||||
|
||||
The above is mostly meaningful for kernel developers.
|
||||
|
||||
time: This differs from the trace output where as the trace output
|
||||
contained a absolute timestamp. This timestamp is relative
|
||||
to the start of the first entry in the the trace.
|
||||
time: This differs from the trace file output. The trace file output
|
||||
included an absolute timestamp. The timestamp used by the
|
||||
latency_trace file is relative to the start of the trace.
|
||||
|
||||
delay: This is just to help catch your eye a bit better. And
|
||||
needs to be fixed to be only relative to the same CPU.
|
||||
The marks is determined by the difference between this
|
||||
The marks are determined by the difference between this
|
||||
current trace and the next trace.
|
||||
'!' - greater than preempt_mark_thresh (default 100)
|
||||
'+' - greater than 1 microsecond
|
||||
@@ -322,13 +327,13 @@ output. To see what is available, simply cat the file:
|
||||
print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \
|
||||
noblock nostacktrace nosched-tree
|
||||
|
||||
To disable one of the options, echo in the option appended with "no".
|
||||
To disable one of the options, echo in the option prepended with "no".
|
||||
|
||||
echo noprint-parent > /debug/tracing/iter_ctrl
|
||||
|
||||
To enable an option, leave off the "no".
|
||||
|
||||
echo sym-offest > /debug/tracing/iter_ctrl
|
||||
echo sym-offset > /debug/tracing/iter_ctrl
|
||||
|
||||
Here are the available options:
|
||||
|
||||
@@ -344,7 +349,7 @@ Here are the available options:
|
||||
|
||||
sym-offset - Display not only the function name, but also the offset
|
||||
in the function. For example, instead of seeing just
|
||||
"ktime_get" you will see "ktime_get+0xb/0x20"
|
||||
"ktime_get", you will see "ktime_get+0xb/0x20".
|
||||
|
||||
sym-offset:
|
||||
bash-4000 [01] 1477.606694: simple_strtoul+0x6/0xa0
|
||||
@@ -364,7 +369,7 @@ Here are the available options:
|
||||
user applications that can translate the raw numbers better than
|
||||
having it done in the kernel.
|
||||
|
||||
hex - similar to raw, but the numbers will be in a hexadecimal format.
|
||||
hex - Similar to raw, but the numbers will be in a hexadecimal format.
|
||||
|
||||
bin - This will print out the formats in raw binary.
|
||||
|
||||
@@ -381,7 +386,7 @@ sched_switch
|
||||
------------
|
||||
|
||||
This tracer simply records schedule switches. Here's an example
|
||||
on how to implement it.
|
||||
of how to use it.
|
||||
|
||||
# echo sched_switch > /debug/tracing/current_tracer
|
||||
# echo 1 > /debug/tracing/tracing_enabled
|
||||
@@ -470,7 +475,7 @@ interrupt from triggering or the mouse interrupt from letting the
|
||||
kernel know of a new mouse event. The result is a latency with the
|
||||
reaction time.
|
||||
|
||||
The irqsoff tracer tracks the time interrupts are disabled and when
|
||||
The irqsoff tracer tracks the time interrupts are disabled to the time
|
||||
they are re-enabled. When a new maximum latency is hit, it saves off
|
||||
the trace so that it may be retrieved at a later time. Every time a
|
||||
new maximum in reached, the old saved trace is discarded and the new
|
||||
@@ -519,7 +524,7 @@ The difference between the 6 and the displayed timestamp 7us is
|
||||
because the clock must have incremented between the time of recording
|
||||
the max latency and recording the function that had that latency.
|
||||
|
||||
Note the above had ftrace_enabled not set. If we set the ftrace_enabled
|
||||
Note the above had ftrace_enabled not set. If we set the ftrace_enabled,
|
||||
we get a much larger output:
|
||||
|
||||
# tracer: irqsoff
|
||||
@@ -570,21 +575,21 @@ vim:ft=help
|
||||
|
||||
|
||||
Here we traced a 50 microsecond latency. But we also see all the
|
||||
functions that were called during that time. Note that enabling
|
||||
function tracing we endure an added overhead. This overhead may
|
||||
extend the latency times. But never the less, this trace has provided
|
||||
some very helpful debugging.
|
||||
functions that were called during that time. Note that by enabling
|
||||
function tracing, we endure an added overhead. This overhead may
|
||||
extend the latency times. But nevertheless, this trace has provided
|
||||
some very helpful debugging information.
|
||||
|
||||
|
||||
preemptoff
|
||||
----------
|
||||
|
||||
When preemption is disabled we may be able to receive interrupts but
|
||||
the task can not be preempted and a higher priority task must wait
|
||||
When preemption is disabled, we may be able to receive interrupts but
|
||||
the task cannot be preempted and a higher priority task must wait
|
||||
for preemption to be enabled again before it can preempt a lower
|
||||
priority task.
|
||||
|
||||
The preemptoff tracer traces the places that disables preemption.
|
||||
The preemptoff tracer traces the places that disable preemption.
|
||||
Like the irqsoff, it records the maximum latency that preemption
|
||||
was disabled. The control of preemptoff is much like the irqsoff.
|
||||
|
||||
@@ -696,7 +701,7 @@ Notice that the __do_softirq when called doesn't have a preempt_count.
|
||||
It may seem that we missed a preempt enabled. What really happened
|
||||
is that the preempt count is held on the threads stack and we
|
||||
switched to the softirq stack (4K stacks in effect). The code
|
||||
does not copy the preempt count, but because interrupts are disabled
|
||||
does not copy the preempt count, but because interrupts are disabled,
|
||||
we don't need to worry about it. Having a tracer like this is good
|
||||
to let people know what really happens inside the kernel.
|
||||
|
||||
@@ -732,7 +737,7 @@ To record this time, use the preemptirqsoff tracer.
|
||||
|
||||
Again, using this trace is much like the irqsoff and preemptoff tracers.
|
||||
|
||||
# echo preemptoff > /debug/tracing/current_tracer
|
||||
# echo preemptirqsoff > /debug/tracing/current_tracer
|
||||
# echo 0 > /debug/tracing/tracing_max_latency
|
||||
# echo 1 > /debug/tracing/tracing_enabled
|
||||
# ls -ltr
|
||||
@@ -862,9 +867,9 @@ This is a very interesting trace. It started with the preemption of
|
||||
the ls task. We see that the task had the "need_resched" bit set
|
||||
with the 'N' in the trace. Interrupts are disabled in the spin_lock
|
||||
and the trace started. We see that a schedule took place to run
|
||||
sshd. When the interrupts were enabled we took an interrupt.
|
||||
On return of the interrupt the softirq ran. We took another interrupt
|
||||
while running the softirq as we see with the capital 'H'.
|
||||
sshd. When the interrupts were enabled, we took an interrupt.
|
||||
On return from the interrupt handler, the softirq ran. We took another
|
||||
interrupt while running the softirq as we see with the capital 'H'.
|
||||
|
||||
|
||||
wakeup
|
||||
@@ -876,9 +881,9 @@ time it executes. This is also known as "schedule latency".
|
||||
I stress the point that this is about RT tasks. It is also important
|
||||
to know the scheduling latency of non-RT tasks, but the average
|
||||
schedule latency is better for non-RT tasks. Tools like
|
||||
LatencyTop is more appropriate for such measurements.
|
||||
LatencyTop are more appropriate for such measurements.
|
||||
|
||||
Real-Time environments is interested in the worst case latency.
|
||||
Real-Time environments are interested in the worst case latency.
|
||||
That is the longest latency it takes for something to happen, and
|
||||
not the average. We can have a very fast scheduler that may only
|
||||
have a large latency once in a while, but that would not work well
|
||||
@@ -889,8 +894,8 @@ tasks that are unpredictable will overwrite the worst case latency
|
||||
of RT tasks.
|
||||
|
||||
Since this tracer only deals with RT tasks, we will run this slightly
|
||||
different than we did with the previous tracers. Instead of performing
|
||||
an 'ls' we will run 'sleep 1' under 'chrt' which changes the
|
||||
differently than we did with the previous tracers. Instead of performing
|
||||
an 'ls', we will run 'sleep 1' under 'chrt' which changes the
|
||||
priority of the task.
|
||||
|
||||
# echo wakeup > /debug/tracing/current_tracer
|
||||
@@ -924,9 +929,9 @@ wakeup latency trace v1.1.5 on 2.6.26-rc8
|
||||
vim:ft=help
|
||||
|
||||
|
||||
Running this on an idle system we see that it only took 4 microseconds
|
||||
Running this on an idle system, we see that it only took 4 microseconds
|
||||
to perform the task switch. Note, since the trace marker in the
|
||||
schedule is before the actual "switch" we stop the tracing when
|
||||
schedule is before the actual "switch", we stop the tracing when
|
||||
the recorded task is about to schedule in. This may change if
|
||||
we add a new marker at the end of the scheduler.
|
||||
|
||||
@@ -992,12 +997,15 @@ ksoftirq-7 1d..4 50us : schedule (__cond_resched)
|
||||
|
||||
The interrupt went off while running ksoftirqd. This task runs at
|
||||
SCHED_OTHER. Why didn't we see the 'N' set early? This may be
|
||||
a harmless bug with x86_32 and 4K stacks. The need_reched() function
|
||||
that tests if we need to reschedule looks on the actual stack.
|
||||
Where as the setting of the NEED_RESCHED bit happens on the
|
||||
task's stack. But because we are in a hard interrupt, the test
|
||||
is with the interrupts stack which has that to be false. We don't
|
||||
see the 'N' until we switch back to the task's stack.
|
||||
a harmless bug with x86_32 and 4K stacks. On x86_32 with 4K stacks
|
||||
configured, the interrupt and softirq runs with their own stack.
|
||||
Some information is held on the top of the task's stack (need_resched
|
||||
and preempt_count are both stored there). The setting of the NEED_RESCHED
|
||||
bit is done directly to the task's stack, but the reading of the
|
||||
NEED_RESCHED is done by looking at the current stack, which in this case
|
||||
is the stack for the hard interrupt. This hides the fact that NEED_RESCHED
|
||||
has been set. We don't see the 'N' until we switch back to the task's
|
||||
assigned stack.
|
||||
|
||||
ftrace
|
||||
------
|
||||
@@ -1067,10 +1075,10 @@ this works is the mcount function call (placed at the start of
|
||||
every kernel function, produced by the -pg switch in gcc), starts
|
||||
of pointing to a simple return.
|
||||
|
||||
When dynamic ftrace is initialized, it calls kstop_machine to make it
|
||||
act like a uniprocessor so that it can freely modify code without
|
||||
worrying about other processors executing that same code. At
|
||||
initialization, the mcount calls are change to call a "record_ip"
|
||||
When dynamic ftrace is initialized, it calls kstop_machine to make
|
||||
the machine act like a uniprocessor so that it can freely modify code
|
||||
without worrying about other processors executing that same code. At
|
||||
initialization, the mcount calls are changed to call a "record_ip"
|
||||
function. After this, the first time a kernel function is called,
|
||||
it has the calling address saved in a hash table.
|
||||
|
||||
@@ -1085,8 +1093,8 @@ traced, is that we can now selectively choose which functions we
|
||||
want to trace and which ones we want the mcount calls to remain as
|
||||
nops.
|
||||
|
||||
Two files that contain to the enabling and disabling of recorded
|
||||
functions are:
|
||||
Two files are used, one for enabling and one for disabling the tracing
|
||||
of recorded functions. They are:
|
||||
|
||||
set_ftrace_filter
|
||||
|
||||
@@ -1094,7 +1102,7 @@ and
|
||||
|
||||
set_ftrace_notrace
|
||||
|
||||
A list of available functions that you can add to this files is listed
|
||||
A list of available functions that you can add to these files is listed
|
||||
in:
|
||||
|
||||
available_filter_functions
|
||||
@@ -1133,9 +1141,9 @@ sys_nanosleep
|
||||
|
||||
|
||||
Perhaps this isn't enough. The filters also allow simple wild cards.
|
||||
Only the following is currently available
|
||||
Only the following are currently available
|
||||
|
||||
<match>* - will match functions that begins with <match>
|
||||
<match>* - will match functions that begin with <match>
|
||||
*<match> - will match functions that end with <match>
|
||||
*<match>* - will match functions that have <match> in it
|
||||
|
||||
@@ -1187,7 +1195,7 @@ This is because the '>' and '>>' act just like they do in bash.
|
||||
To rewrite the filters, use '>'
|
||||
To append to the filters, use '>>'
|
||||
|
||||
To clear out a filter so that all functions will be recorded again.
|
||||
To clear out a filter so that all functions will be recorded again:
|
||||
|
||||
# echo > /debug/tracing/set_ftrace_filter
|
||||
# cat /debug/tracing/set_ftrace_filter
|
||||
@@ -1246,8 +1254,8 @@ ftraced
|
||||
|
||||
As mentioned above, when dynamic ftrace is configured in, a kernel
|
||||
thread wakes up once a second and checks to see if there are mcount
|
||||
calls that need to be converted into nops. If there is not, then
|
||||
it simply goes back to sleep. But if there is, it will call
|
||||
calls that need to be converted into nops. If there are not any, then
|
||||
it simply goes back to sleep. But if there are some, it will call
|
||||
kstop_machine to convert the calls to nops.
|
||||
|
||||
There may be a case that you do not want this added latency.
|
||||
@@ -1262,8 +1270,8 @@ mcount calls to nops. Remember that there's a large overhead
|
||||
to calling mcount. Without this kernel thread, that overhead will
|
||||
exist.
|
||||
|
||||
Any write to the ftraced_enabled file will cause the kstop_machine
|
||||
to run if there are recorded calls to mcount. This means that a
|
||||
If there are recorded calls to mcount, any write to the ftraced_enabled
|
||||
file will cause the kstop_machine to run. This means that a
|
||||
user can manually perform the updates when they want to by simply
|
||||
echoing a '0' into the ftraced_enabled file.
|
||||
|
||||
@@ -1315,7 +1323,7 @@ trace entries
|
||||
|
||||
Having too much or not enough data can be troublesome in diagnosing
|
||||
some issue in the kernel. The file trace_entries is used to modify
|
||||
the size of the internal trace buffers. The numbers listed
|
||||
the size of the internal trace buffers. The number listed
|
||||
is the number of entries that can be recorded per CPU. To know
|
||||
the full size, multiply the number of possible CPUS with the
|
||||
number of entries.
|
||||
@@ -1323,7 +1331,7 @@ number of entries.
|
||||
# cat /debug/tracing/trace_entries
|
||||
65620
|
||||
|
||||
Note, to modify this you must have tracing fulling disabled. To do that,
|
||||
Note, to modify this, you must have tracing completely disabled. To do that,
|
||||
echo "none" into the current_tracer.
|
||||
|
||||
# echo none > /debug/tracing/current_tracer
|
||||
@@ -1344,7 +1352,7 @@ it will add them.
|
||||
This shows us that 85 entries can fit on a single page.
|
||||
|
||||
The number of pages that will be allocated is a percentage of available
|
||||
memory. Allocating too much will produces an error.
|
||||
memory. Allocating too much will produce an error.
|
||||
|
||||
# echo 1000000000000 > /debug/tracing/trace_entries
|
||||
-bash: echo: write error: Cannot allocate memory
|
||||
|
||||
@@ -117,6 +117,7 @@ Code Seq# Include File Comments
|
||||
<mailto:natalia@nikhefk.nikhef.nl>
|
||||
'c' 00-7F linux/comstats.h conflict!
|
||||
'c' 00-7F linux/coda.h conflict!
|
||||
'c' 80-9F asm-s390/chsc.h
|
||||
'd' 00-FF linux/char/drm/drm/h conflict!
|
||||
'd' 00-DF linux/video_decoder.h conflict!
|
||||
'd' F0-FF linux/digi1.h
|
||||
|
||||
@@ -109,7 +109,7 @@ There are two possible methods of using Kdump.
|
||||
2) Or use the system kernel binary itself as dump-capture kernel and there is
|
||||
no need to build a separate dump-capture kernel. This is possible
|
||||
only with the architecutres which support a relocatable kernel. As
|
||||
of today i386 and ia64 architectures support relocatable kernel.
|
||||
of today, i386, x86_64 and ia64 architectures support relocatable kernel.
|
||||
|
||||
Building a relocatable kernel is advantageous from the point of view that
|
||||
one does not have to build a second kernel for capturing the dump. But
|
||||
|
||||
@@ -271,6 +271,17 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
aic79xx= [HW,SCSI]
|
||||
See Documentation/scsi/aic79xx.txt.
|
||||
|
||||
amd_iommu= [HW,X86-84]
|
||||
Pass parameters to the AMD IOMMU driver in the system.
|
||||
Possible values are:
|
||||
isolate - enable device isolation (each device, as far
|
||||
as possible, will get its own protection
|
||||
domain)
|
||||
amd_iommu_size= [HW,X86-64]
|
||||
Define the size of the aperture for the AMD IOMMU
|
||||
driver. Possible values are:
|
||||
'32M', '64M' (default), '128M', '256M', '512M', '1G'
|
||||
|
||||
amijoy.map= [HW,JOY] Amiga joystick support
|
||||
Map of devices attached to JOY0DAT and JOY1DAT
|
||||
Format: <a>,<b>
|
||||
@@ -599,6 +610,29 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
See drivers/char/README.epca and
|
||||
Documentation/digiepca.txt.
|
||||
|
||||
disable_mtrr_cleanup [X86]
|
||||
enable_mtrr_cleanup [X86]
|
||||
The kernel tries to adjust MTRR layout from continuous
|
||||
to discrete, to make X server driver able to add WB
|
||||
entry later. This parameter enables/disables that.
|
||||
|
||||
mtrr_chunk_size=nn[KMG] [X86]
|
||||
used for mtrr cleanup. It is largest continous chunk
|
||||
that could hold holes aka. UC entries.
|
||||
|
||||
mtrr_gran_size=nn[KMG] [X86]
|
||||
Used for mtrr cleanup. It is granularity of mtrr block.
|
||||
Default is 1.
|
||||
Large value could prevent small alignment from
|
||||
using up MTRRs.
|
||||
|
||||
mtrr_spare_reg_nr=n [X86]
|
||||
Format: <integer>
|
||||
Range: 0,7 : spare reg number
|
||||
Default : 1
|
||||
Used for mtrr cleanup. It is spare mtrr entries number.
|
||||
Set to 2 or more if your graphical card needs more.
|
||||
|
||||
disable_mtrr_trim [X86, Intel and AMD only]
|
||||
By default the kernel will trim any uncacheable
|
||||
memory out of your available memory pool based on
|
||||
@@ -1208,6 +1242,11 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
mtdparts= [MTD]
|
||||
See drivers/mtd/cmdlinepart.c.
|
||||
|
||||
mtdset= [ARM]
|
||||
ARM/S3C2412 JIVE boot control
|
||||
|
||||
See arch/arm/mach-s3c2412/mach-jive.c
|
||||
|
||||
mtouchusb.raw_coordinates=
|
||||
[HW] Make the MicroTouch USB driver use raw coordinates
|
||||
('y', default) or cooked coordinates ('n')
|
||||
@@ -2116,6 +2155,9 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
usbhid.mousepoll=
|
||||
[USBHID] The interval which mice are to be polled at.
|
||||
|
||||
add_efi_memmap [EFI; x86-32,X86-64] Include EFI memory map in
|
||||
kernel's map of available physical RAM.
|
||||
|
||||
vdso= [X86-32,SH,x86-64]
|
||||
vdso=2: enable compat VDSO (default with COMPAT_VDSO)
|
||||
vdso=1: enable VDSO (default)
|
||||
|
||||
@@ -10,7 +10,7 @@ us to generate 'watchdog NMI interrupts'. (NMI: Non Maskable Interrupt
|
||||
which get executed even if the system is otherwise locked up hard).
|
||||
This can be used to debug hard kernel lockups. By executing periodic
|
||||
NMI interrupts, the kernel can monitor whether any CPU has locked up,
|
||||
and print out debugging messages if so.
|
||||
and print out debugging messages if so.
|
||||
|
||||
In order to use the NMI watchdog, you need to have APIC support in your
|
||||
kernel. For SMP kernels, APIC support gets compiled in automatically. For
|
||||
@@ -22,8 +22,7 @@ CONFIG_X86_UP_IOAPIC is for uniprocessor with an IO-APIC. [Note: certain
|
||||
kernel debugging options, such as Kernel Stack Meter or Kernel Tracer,
|
||||
may implicitly disable the NMI watchdog.]
|
||||
|
||||
For x86-64, the needed APIC is always compiled in, and the NMI watchdog is
|
||||
always enabled with I/O-APIC mode (nmi_watchdog=1).
|
||||
For x86-64, the needed APIC is always compiled in.
|
||||
|
||||
Using local APIC (nmi_watchdog=2) needs the first performance register, so
|
||||
you can't use it for other purposes (such as high precision performance
|
||||
@@ -63,16 +62,15 @@ when the system is idle), but if your system locks up on anything but the
|
||||
"hlt", then you are out of luck -- the event will not happen at all and the
|
||||
watchdog won't trigger. This is a shortcoming of the local APIC watchdog
|
||||
-- unfortunately there is no "clock ticks" event that would work all the
|
||||
time. The I/O APIC watchdog is driven externally and has no such shortcoming.
|
||||
time. The I/O APIC watchdog is driven externally and has no such shortcoming.
|
||||
But its NMI frequency is much higher, resulting in a more significant hit
|
||||
to the overall system performance.
|
||||
|
||||
NOTE: starting with 2.4.2-ac18 the NMI-oopser is disabled by default,
|
||||
you have to enable it with a boot time parameter. Prior to 2.4.2-ac18
|
||||
the NMI-oopser is enabled unconditionally on x86 SMP boxes.
|
||||
On x86 nmi_watchdog is disabled by default so you have to enable it with
|
||||
a boot time parameter.
|
||||
|
||||
On x86-64 the NMI oopser is on by default. On 64bit Intel CPUs
|
||||
it uses IO-APIC by default and on AMD it uses local APIC.
|
||||
NOTE: In kernels prior to 2.4.2-ac18 the NMI-oopser is enabled unconditionally
|
||||
on x86 SMP boxes.
|
||||
|
||||
[ feel free to send bug reports, suggestions and patches to
|
||||
Ingo Molnar <mingo@redhat.com> or the Linux SMP mailing
|
||||
|
||||
@@ -61,10 +61,7 @@ builder by #define'ing ARCH_HASH_SCHED_DOMAIN, and exporting your
|
||||
arch_init_sched_domains function. This function will attach domains to all
|
||||
CPUs using cpu_attach_domain.
|
||||
|
||||
Implementors should change the line
|
||||
#undef SCHED_DOMAIN_DEBUG
|
||||
to
|
||||
#define SCHED_DOMAIN_DEBUG
|
||||
in kernel/sched.c as this enables an error checking parse of the sched domains
|
||||
The sched-domains debugging infrastructure can be enabled by enabling
|
||||
CONFIG_SCHED_DEBUG. This enables an error checking parse of the sched domains
|
||||
which should catch most possible errors (described above). It also prints out
|
||||
the domain structure in a visual format.
|
||||
|
||||
@@ -51,9 +51,9 @@ needs only about 3% CPU time to do so, it can do with a 0.03 * 0.005s =
|
||||
0.00015s. So this group can be scheduled with a period of 0.005s and a run time
|
||||
of 0.00015s.
|
||||
|
||||
The remaining CPU time will be used for user input and other tass. Because
|
||||
The remaining CPU time will be used for user input and other tasks. Because
|
||||
realtime tasks have explicitly allocated the CPU time they need to perform
|
||||
their tasks, buffer underruns in the graphocs or audio can be eliminated.
|
||||
their tasks, buffer underruns in the graphics or audio can be eliminated.
|
||||
|
||||
NOTE: the above example is not fully implemented as of yet (2.6.25). We still
|
||||
lack an EDF scheduler to make non-uniform periods usable.
|
||||
|
||||
@@ -753,8 +753,11 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
|
||||
|
||||
[Multiple options for each card instance]
|
||||
model - force the model name
|
||||
position_fix - Fix DMA pointer (0 = auto, 1 = none, 2 = POSBUF, 3 = FIFO size)
|
||||
position_fix - Fix DMA pointer (0 = auto, 1 = use LPIB, 2 = POSBUF)
|
||||
probe_mask - Bitmask to probe codecs (default = -1, meaning all slots)
|
||||
bdl_pos_adj - Specifies the DMA IRQ timing delay in samples.
|
||||
Passing -1 will make the driver to choose the appropriate
|
||||
value based on the controller chip.
|
||||
|
||||
[Single (global) options]
|
||||
single_cmd - Use single immediate commands to communicate with
|
||||
@@ -845,7 +848,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
|
||||
ALC269
|
||||
basic Basic preset
|
||||
|
||||
ALC662
|
||||
ALC662/663
|
||||
3stack-dig 3-stack (2-channel) with SPDIF
|
||||
3stack-6ch 3-stack (6-channel)
|
||||
3stack-6ch-dig 3-stack (6-channel) with SPDIF
|
||||
@@ -853,6 +856,10 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
|
||||
lenovo-101e Lenovo laptop
|
||||
eeepc-p701 ASUS Eeepc P701
|
||||
eeepc-ep20 ASUS Eeepc EP20
|
||||
m51va ASUS M51VA
|
||||
g71v ASUS G71V
|
||||
h13 ASUS H13
|
||||
g50v ASUS G50V
|
||||
auto auto-config reading BIOS (default)
|
||||
|
||||
ALC882/885
|
||||
@@ -1091,7 +1098,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
|
||||
This occurs when the access to non-existing or non-working codec slot
|
||||
(likely a modem one) causes a stall of the communication via HD-audio
|
||||
bus. You can see which codec slots are probed by enabling
|
||||
CONFIG_SND_DEBUG_DETECT, or simply from the file name of the codec
|
||||
CONFIG_SND_DEBUG_VERBOSE, or simply from the file name of the codec
|
||||
proc files. Then limit the slots to probe by probe_mask option.
|
||||
For example, probe_mask=1 means to probe only the first slot, and
|
||||
probe_mask=4 means only the third slot.
|
||||
@@ -2267,6 +2274,10 @@ case above again, the first two slots are already reserved. If any
|
||||
other driver (e.g. snd-usb-audio) is loaded before snd-interwave or
|
||||
snd-ens1371, it will be assigned to the third or later slot.
|
||||
|
||||
When a module name is given with '!', the slot will be given for any
|
||||
modules but that name. For example, "slots=!snd-pcsp" will reserve
|
||||
the first slot for any modules but snd-pcsp.
|
||||
|
||||
|
||||
ALSA PCM devices to OSS devices mapping
|
||||
=======================================
|
||||
|
||||
@@ -6127,8 +6127,8 @@ struct _snd_pcm_runtime {
|
||||
|
||||
<para>
|
||||
<function>snd_printdd()</function> is compiled in only when
|
||||
<constant>CONFIG_SND_DEBUG_DETECT</constant> is set. Please note
|
||||
that <constant>DEBUG_DETECT</constant> is not set as default
|
||||
<constant>CONFIG_SND_DEBUG_VERBOSE</constant> is set. Please note
|
||||
that <constant>CONFIG_SND_DEBUG_VERBOSE</constant> is not set as default
|
||||
even if you configure the alsa-driver with
|
||||
<option>--with-debug=full</option> option. You need to give
|
||||
explicitly <option>--with-debug=detect</option> option instead.
|
||||
|
||||
@@ -0,0 +1,164 @@
|
||||
In-kernel memory-mapped I/O tracing
|
||||
|
||||
|
||||
Home page and links to optional user space tools:
|
||||
|
||||
http://nouveau.freedesktop.org/wiki/MmioTrace
|
||||
|
||||
MMIO tracing was originally developed by Intel around 2003 for their Fault
|
||||
Injection Test Harness. In Dec 2006 - Jan 2007, using the code from Intel,
|
||||
Jeff Muizelaar created a tool for tracing MMIO accesses with the Nouveau
|
||||
project in mind. Since then many people have contributed.
|
||||
|
||||
Mmiotrace was built for reverse engineering any memory-mapped IO device with
|
||||
the Nouveau project as the first real user. Only x86 and x86_64 architectures
|
||||
are supported.
|
||||
|
||||
Out-of-tree mmiotrace was originally modified for mainline inclusion and
|
||||
ftrace framework by Pekka Paalanen <pq@iki.fi>.
|
||||
|
||||
|
||||
Preparation
|
||||
-----------
|
||||
|
||||
Mmiotrace feature is compiled in by the CONFIG_MMIOTRACE option. Tracing is
|
||||
disabled by default, so it is safe to have this set to yes. SMP systems are
|
||||
supported, but tracing is unreliable and may miss events if more than one CPU
|
||||
is on-line, therefore mmiotrace takes all but one CPU off-line during run-time
|
||||
activation. You can re-enable CPUs by hand, but you have been warned, there
|
||||
is no way to automatically detect if you are losing events due to CPUs racing.
|
||||
|
||||
|
||||
Usage Quick Reference
|
||||
---------------------
|
||||
|
||||
$ mount -t debugfs debugfs /debug
|
||||
$ echo mmiotrace > /debug/tracing/current_tracer
|
||||
$ cat /debug/tracing/trace_pipe > mydump.txt &
|
||||
Start X or whatever.
|
||||
$ echo "X is up" > /debug/tracing/marker
|
||||
$ echo none > /debug/tracing/current_tracer
|
||||
Check for lost events.
|
||||
|
||||
|
||||
Usage
|
||||
-----
|
||||
|
||||
Make sure debugfs is mounted to /debug. If not, (requires root privileges)
|
||||
$ mount -t debugfs debugfs /debug
|
||||
|
||||
Check that the driver you are about to trace is not loaded.
|
||||
|
||||
Activate mmiotrace (requires root privileges):
|
||||
$ echo mmiotrace > /debug/tracing/current_tracer
|
||||
|
||||
Start storing the trace:
|
||||
$ cat /debug/tracing/trace_pipe > mydump.txt &
|
||||
The 'cat' process should stay running (sleeping) in the background.
|
||||
|
||||
Load the driver you want to trace and use it. Mmiotrace will only catch MMIO
|
||||
accesses to areas that are ioremapped while mmiotrace is active.
|
||||
|
||||
[Unimplemented feature:]
|
||||
During tracing you can place comments (markers) into the trace by
|
||||
$ echo "X is up" > /debug/tracing/marker
|
||||
This makes it easier to see which part of the (huge) trace corresponds to
|
||||
which action. It is recommended to place descriptive markers about what you
|
||||
do.
|
||||
|
||||
Shut down mmiotrace (requires root privileges):
|
||||
$ echo none > /debug/tracing/current_tracer
|
||||
The 'cat' process exits. If it does not, kill it by issuing 'fg' command and
|
||||
pressing ctrl+c.
|
||||
|
||||
Check that mmiotrace did not lose events due to a buffer filling up. Either
|
||||
$ grep -i lost mydump.txt
|
||||
which tells you exactly how many events were lost, or use
|
||||
$ dmesg
|
||||
to view your kernel log and look for "mmiotrace has lost events" warning. If
|
||||
events were lost, the trace is incomplete. You should enlarge the buffers and
|
||||
try again. Buffers are enlarged by first seeing how large the current buffers
|
||||
are:
|
||||
$ cat /debug/tracing/trace_entries
|
||||
gives you a number. Approximately double this number and write it back, for
|
||||
instance:
|
||||
$ echo 128000 > /debug/tracing/trace_entries
|
||||
Then start again from the top.
|
||||
|
||||
If you are doing a trace for a driver project, e.g. Nouveau, you should also
|
||||
do the following before sending your results:
|
||||
$ lspci -vvv > lspci.txt
|
||||
$ dmesg > dmesg.txt
|
||||
$ tar zcf pciid-nick-mmiotrace.tar.gz mydump.txt lspci.txt dmesg.txt
|
||||
and then send the .tar.gz file. The trace compresses considerably. Replace
|
||||
"pciid" and "nick" with the PCI ID or model name of your piece of hardware
|
||||
under investigation and your nick name.
|
||||
|
||||
|
||||
How Mmiotrace Works
|
||||
-------------------
|
||||
|
||||
Access to hardware IO-memory is gained by mapping addresses from PCI bus by
|
||||
calling one of the ioremap_*() functions. Mmiotrace is hooked into the
|
||||
__ioremap() function and gets called whenever a mapping is created. Mapping is
|
||||
an event that is recorded into the trace log. Note, that ISA range mappings
|
||||
are not caught, since the mapping always exists and is returned directly.
|
||||
|
||||
MMIO accesses are recorded via page faults. Just before __ioremap() returns,
|
||||
the mapped pages are marked as not present. Any access to the pages causes a
|
||||
fault. The page fault handler calls mmiotrace to handle the fault. Mmiotrace
|
||||
marks the page present, sets TF flag to achieve single stepping and exits the
|
||||
fault handler. The instruction that faulted is executed and debug trap is
|
||||
entered. Here mmiotrace again marks the page as not present. The instruction
|
||||
is decoded to get the type of operation (read/write), data width and the value
|
||||
read or written. These are stored to the trace log.
|
||||
|
||||
Setting the page present in the page fault handler has a race condition on SMP
|
||||
machines. During the single stepping other CPUs may run freely on that page
|
||||
and events can be missed without a notice. Re-enabling other CPUs during
|
||||
tracing is discouraged.
|
||||
|
||||
|
||||
Trace Log Format
|
||||
----------------
|
||||
|
||||
The raw log is text and easily filtered with e.g. grep and awk. One record is
|
||||
one line in the log. A record starts with a keyword, followed by keyword
|
||||
dependant arguments. Arguments are separated by a space, or continue until the
|
||||
end of line. The format for version 20070824 is as follows:
|
||||
|
||||
Explanation Keyword Space separated arguments
|
||||
---------------------------------------------------------------------------
|
||||
|
||||
read event R width, timestamp, map id, physical, value, PC, PID
|
||||
write event W width, timestamp, map id, physical, value, PC, PID
|
||||
ioremap event MAP timestamp, map id, physical, virtual, length, PC, PID
|
||||
iounmap event UNMAP timestamp, map id, PC, PID
|
||||
marker MARK timestamp, text
|
||||
version VERSION the string "20070824"
|
||||
info for reader LSPCI one line from lspci -v
|
||||
PCI address map PCIDEV space separated /proc/bus/pci/devices data
|
||||
unk. opcode UNKNOWN timestamp, map id, physical, data, PC, PID
|
||||
|
||||
Timestamp is in seconds with decimals. Physical is a PCI bus address, virtual
|
||||
is a kernel virtual address. Width is the data width in bytes and value is the
|
||||
data value. Map id is an arbitrary id number identifying the mapping that was
|
||||
used in an operation. PC is the program counter and PID is process id. PC is
|
||||
zero if it is not recorded. PID is always zero as tracing MMIO accesses
|
||||
originating in user space memory is not yet supported.
|
||||
|
||||
For instance, the following awk filter will pass all 32-bit writes that target
|
||||
physical addresses in the range [0xfb73ce40, 0xfb800000[
|
||||
|
||||
$ awk '/W 4 / { adr=strtonum($5); if (adr >= 0xfb73ce40 &&
|
||||
adr < 0xfb800000) print; }'
|
||||
|
||||
|
||||
Tools for Developers
|
||||
--------------------
|
||||
|
||||
The user space tools include utilities for:
|
||||
- replacing numeric addresses and values with hardware register names
|
||||
- replaying MMIO logs, i.e., re-executing the recorded writes
|
||||
|
||||
|
||||
@@ -1,17 +1,14 @@
|
||||
THE LINUX/I386 BOOT PROTOCOL
|
||||
----------------------------
|
||||
THE LINUX/x86 BOOT PROTOCOL
|
||||
---------------------------
|
||||
|
||||
H. Peter Anvin <hpa@zytor.com>
|
||||
Last update 2007-05-23
|
||||
|
||||
On the i386 platform, the Linux kernel uses a rather complicated boot
|
||||
On the x86 platform, the Linux kernel uses a rather complicated boot
|
||||
convention. This has evolved partially due to historical aspects, as
|
||||
well as the desire in the early days to have the kernel itself be a
|
||||
bootable image, the complicated PC memory model and due to changed
|
||||
expectations in the PC industry caused by the effective demise of
|
||||
real-mode DOS as a mainstream operating system.
|
||||
|
||||
Currently, the following versions of the Linux/i386 boot protocol exist.
|
||||
Currently, the following versions of the Linux/x86 boot protocol exist.
|
||||
|
||||
Old kernels: zImage/Image support only. Some very early kernels
|
||||
may not even support a command line.
|
||||
@@ -372,10 +369,17 @@ Protocol: 2.00+
|
||||
- If 0, the protected-mode code is loaded at 0x10000.
|
||||
- If 1, the protected-mode code is loaded at 0x100000.
|
||||
|
||||
Bit 5 (write): QUIET_FLAG
|
||||
- If 0, print early messages.
|
||||
- If 1, suppress early messages.
|
||||
This requests to the kernel (decompressor and early
|
||||
kernel) to not write early messages that require
|
||||
accessing the display hardware directly.
|
||||
|
||||
Bit 6 (write): KEEP_SEGMENTS
|
||||
Protocol: 2.07+
|
||||
- if 0, reload the segment registers in the 32bit entry point.
|
||||
- if 1, do not reload the segment registers in the 32bit entry point.
|
||||
- If 0, reload the segment registers in the 32bit entry point.
|
||||
- If 1, do not reload the segment registers in the 32bit entry point.
|
||||
Assume that %cs %ds %ss %es are all set to flat segments with
|
||||
a base of 0 (or the equivalent for their environment).
|
||||
|
||||
@@ -504,7 +508,7 @@ Protocol: 2.06+
|
||||
maximum size was 255.
|
||||
|
||||
Field name: hardware_subarch
|
||||
Type: write
|
||||
Type: write (optional, defaults to x86/PC)
|
||||
Offset/size: 0x23c/4
|
||||
Protocol: 2.07+
|
||||
|
||||
@@ -520,11 +524,13 @@ Protocol: 2.07+
|
||||
0x00000002 Xen
|
||||
|
||||
Field name: hardware_subarch_data
|
||||
Type: write
|
||||
Type: write (subarch-dependent)
|
||||
Offset/size: 0x240/8
|
||||
Protocol: 2.07+
|
||||
|
||||
A pointer to data that is specific to hardware subarch
|
||||
This field is currently unused for the default x86/PC environment,
|
||||
do not modify.
|
||||
|
||||
Field name: payload_offset
|
||||
Type: read
|
||||
@@ -545,6 +551,34 @@ Protocol: 2.08+
|
||||
|
||||
The length of the payload.
|
||||
|
||||
Field name: setup_data
|
||||
Type: write (special)
|
||||
Offset/size: 0x250/8
|
||||
Protocol: 2.09+
|
||||
|
||||
The 64-bit physical pointer to NULL terminated single linked list of
|
||||
struct setup_data. This is used to define a more extensible boot
|
||||
parameters passing mechanism. The definition of struct setup_data is
|
||||
as follow:
|
||||
|
||||
struct setup_data {
|
||||
u64 next;
|
||||
u32 type;
|
||||
u32 len;
|
||||
u8 data[0];
|
||||
};
|
||||
|
||||
Where, the next is a 64-bit physical pointer to the next node of
|
||||
linked list, the next field of the last node is 0; the type is used
|
||||
to identify the contents of data; the len is the length of data
|
||||
field; the data holds the real payload.
|
||||
|
||||
This list may be modified at a number of points during the bootup
|
||||
process. Therefore, when modifying this list one should always make
|
||||
sure to consider the case where the linked list already contains
|
||||
entries.
|
||||
|
||||
|
||||
**** THE IMAGE CHECKSUM
|
||||
|
||||
From boot protocol version 2.08 onwards the CRC-32 is calculated over
|
||||
@@ -553,6 +587,7 @@ initial remainder of 0xffffffff. The checksum is appended to the
|
||||
file; therefore the CRC of the file up to the limit specified in the
|
||||
syssize field of the header is always 0.
|
||||
|
||||
|
||||
**** THE KERNEL COMMAND LINE
|
||||
|
||||
The kernel command line has become an important way for the boot
|
||||
@@ -584,28 +619,6 @@ command line is entered using the following protocol:
|
||||
covered by setup_move_size, so you may need to adjust this
|
||||
field.
|
||||
|
||||
Field name: setup_data
|
||||
Type: write (obligatory)
|
||||
Offset/size: 0x250/8
|
||||
Protocol: 2.09+
|
||||
|
||||
The 64-bit physical pointer to NULL terminated single linked list of
|
||||
struct setup_data. This is used to define a more extensible boot
|
||||
parameters passing mechanism. The definition of struct setup_data is
|
||||
as follow:
|
||||
|
||||
struct setup_data {
|
||||
u64 next;
|
||||
u32 type;
|
||||
u32 len;
|
||||
u8 data[0];
|
||||
};
|
||||
|
||||
Where, the next is a 64-bit physical pointer to the next node of
|
||||
linked list, the next field of the last node is 0; the type is used
|
||||
to identify the contents of data; the len is the length of data
|
||||
field; the data holds the real payload.
|
||||
|
||||
|
||||
**** MEMORY LAYOUT OF THE REAL-MODE CODE
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user