Merge commit '85082fd7cbe3173198aac0eb5e85ab1edcc6352c' into test-build

Manual fixup of:

	arch/powerpc/Kconfig
This commit is contained in:
Benjamin Herrenschmidt
2008-07-15 15:44:51 +10:00
1806 changed files with 120835 additions and 38598 deletions
+34
View File
@@ -26,3 +26,37 @@ Description:
I/O statistics of partition <part>. The format is the
same as the above-written /sys/block/<disk>/stat
format.
What: /sys/block/<disk>/integrity/format
Date: June 2008
Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
Metadata format for integrity capable block device.
E.g. T10-DIF-TYPE1-CRC.
What: /sys/block/<disk>/integrity/read_verify
Date: June 2008
Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
Indicates whether the block layer should verify the
integrity of read requests serviced by devices that
support sending integrity metadata.
What: /sys/block/<disk>/integrity/tag_size
Date: June 2008
Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
Number of bytes of integrity tag space available per
512 bytes of data.
What: /sys/block/<disk>/integrity/write_generate
Date: June 2008
Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
Indicates whether the block layer should automatically
generate checksums for write requests bound for
devices that support receiving integrity metadata.
+35
View File
@@ -0,0 +1,35 @@
What: /sys/bus/css/devices/.../type
Date: March 2008
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
linux-s390@vger.kernel.org
Description: Contains the subchannel type, as reported by the hardware.
This attribute is present for all subchannel types.
What: /sys/bus/css/devices/.../modalias
Date: March 2008
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
linux-s390@vger.kernel.org
Description: Contains the module alias as reported with uevents.
It is of the format css:t<type> and present for all
subchannel types.
What: /sys/bus/css/drivers/io_subchannel/.../chpids
Date: December 2002
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
linux-s390@vger.kernel.org
Description: Contains the ids of the channel paths used by this
subchannel, as reported by the channel subsystem
during subchannel recognition.
Note: This is an I/O-subchannel specific attribute.
Users: s390-tools, HAL
What: /sys/bus/css/drivers/io_subchannel/.../pimpampom
Date: December 2002
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
linux-s390@vger.kernel.org
Description: Contains the PIM/PAM/POM values, as reported by the
channel subsystem when last queried by the common I/O
layer (this implies that this attribute is not neccessarily
in sync with the values current in the channel subsystem).
Note: This is an I/O-subchannel specific attribute.
Users: s390-tools, HAL
@@ -0,0 +1,71 @@
What: /sys/firmware/memmap/
Date: June 2008
Contact: Bernhard Walle <bwalle@suse.de>
Description:
On all platforms, the firmware provides a memory map which the
kernel reads. The resources from that memory map are registered
in the kernel resource tree and exposed to userspace via
/proc/iomem (together with other resources).
However, on most architectures that firmware-provided memory
map is modified afterwards by the kernel itself, either because
the kernel merges that memory map with other information or
just because the user overwrites that memory map via command
line.
kexec needs the raw firmware-provided memory map to setup the
parameter segment of the kernel that should be booted with
kexec. Also, the raw memory map is useful for debugging. For
that reason, /sys/firmware/memmap is an interface that provides
the raw memory map to userspace.
The structure is as follows: Under /sys/firmware/memmap there
are subdirectories with the number of the entry as their name:
/sys/firmware/memmap/0
/sys/firmware/memmap/1
/sys/firmware/memmap/2
/sys/firmware/memmap/3
...
The maximum depends on the number of memory map entries provided
by the firmware. The order is just the order that the firmware
provides.
Each directory contains three files:
start : The start address (as hexadecimal number with the
'0x' prefix).
end : The end address, inclusive (regardless whether the
firmware provides inclusive or exclusive ranges).
type : Type of the entry as string. See below for a list of
valid types.
So, for example:
/sys/firmware/memmap/0/start
/sys/firmware/memmap/0/end
/sys/firmware/memmap/0/type
/sys/firmware/memmap/1/start
...
Currently following types exist:
- System RAM
- ACPI Tables
- ACPI Non-volatile Storage
- reserved
Following shell snippet can be used to display that memory
map in a human-readable format:
-------------------- 8< ----------------------------------------
#!/bin/bash
cd /sys/firmware/memmap
for dir in * ; do
start=$(cat $dir/start)
end=$(cat $dir/end)
type=$(cat $dir/type)
printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type"
done
-------------------- >8 ----------------------------------------
+327
View File
@@ -0,0 +1,327 @@
----------------------------------------------------------------------
1. INTRODUCTION
Modern filesystems feature checksumming of data and metadata to
protect against data corruption. However, the detection of the
corruption is done at read time which could potentially be months
after the data was written. At that point the original data that the
application tried to write is most likely lost.
The solution is to ensure that the disk is actually storing what the
application meant it to. Recent additions to both the SCSI family
protocols (SBC Data Integrity Field, SCC protection proposal) as well
as SATA/T13 (External Path Protection) try to remedy this by adding
support for appending integrity metadata to an I/O. The integrity
metadata (or protection information in SCSI terminology) includes a
checksum for each sector as well as an incrementing counter that
ensures the individual sectors are written in the right order. And
for some protection schemes also that the I/O is written to the right
place on disk.
Current storage controllers and devices implement various protective
measures, for instance checksumming and scrubbing. But these
technologies are working in their own isolated domains or at best
between adjacent nodes in the I/O path. The interesting thing about
DIF and the other integrity extensions is that the protection format
is well defined and every node in the I/O path can verify the
integrity of the I/O and reject it if corruption is detected. This
allows not only corruption prevention but also isolation of the point
of failure.
----------------------------------------------------------------------
2. THE DATA INTEGRITY EXTENSIONS
As written, the protocol extensions only protect the path between
controller and storage device. However, many controllers actually
allow the operating system to interact with the integrity metadata
(IMD). We have been working with several FC/SAS HBA vendors to enable
the protection information to be transferred to and from their
controllers.
The SCSI Data Integrity Field works by appending 8 bytes of protection
information to each sector. The data + integrity metadata is stored
in 520 byte sectors on disk. Data + IMD are interleaved when
transferred between the controller and target. The T13 proposal is
similar.
Because it is highly inconvenient for operating systems to deal with
520 (and 4104) byte sectors, we approached several HBA vendors and
encouraged them to allow separation of the data and integrity metadata
scatter-gather lists.
The controller will interleave the buffers on write and split them on
read. This means that the Linux can DMA the data buffers to and from
host memory without changes to the page cache.
Also, the 16-bit CRC checksum mandated by both the SCSI and SATA specs
is somewhat heavy to compute in software. Benchmarks found that
calculating this checksum had a significant impact on system
performance for a number of workloads. Some controllers allow a
lighter-weight checksum to be used when interfacing with the operating
system. Emulex, for instance, supports the TCP/IP checksum instead.
The IP checksum received from the OS is converted to the 16-bit CRC
when writing and vice versa. This allows the integrity metadata to be
generated by Linux or the application at very low cost (comparable to
software RAID5).
The IP checksum is weaker than the CRC in terms of detecting bit
errors. However, the strength is really in the separation of the data
buffers and the integrity metadata. These two distinct buffers much
match up for an I/O to complete.
The separation of the data and integrity metadata buffers as well as
the choice in checksums is referred to as the Data Integrity
Extensions. As these extensions are outside the scope of the protocol
bodies (T10, T13), Oracle and its partners are trying to standardize
them within the Storage Networking Industry Association.
----------------------------------------------------------------------
3. KERNEL CHANGES
The data integrity framework in Linux enables protection information
to be pinned to I/Os and sent to/received from controllers that
support it.
The advantage to the integrity extensions in SCSI and SATA is that
they enable us to protect the entire path from application to storage
device. However, at the same time this is also the biggest
disadvantage. It means that the protection information must be in a
format that can be understood by the disk.
Generally Linux/POSIX applications are agnostic to the intricacies of
the storage devices they are accessing. The virtual filesystem switch
and the block layer make things like hardware sector size and
transport protocols completely transparent to the application.
However, this level of detail is required when preparing the
protection information to send to a disk. Consequently, the very
concept of an end-to-end protection scheme is a layering violation.
It is completely unreasonable for an application to be aware whether
it is accessing a SCSI or SATA disk.
The data integrity support implemented in Linux attempts to hide this
from the application. As far as the application (and to some extent
the kernel) is concerned, the integrity metadata is opaque information
that's attached to the I/O.
The current implementation allows the block layer to automatically
generate the protection information for any I/O. Eventually the
intent is to move the integrity metadata calculation to userspace for
user data. Metadata and other I/O that originates within the kernel
will still use the automatic generation interface.
Some storage devices allow each hardware sector to be tagged with a
16-bit value. The owner of this tag space is the owner of the block
device. I.e. the filesystem in most cases. The filesystem can use
this extra space to tag sectors as they see fit. Because the tag
space is limited, the block interface allows tagging bigger chunks by
way of interleaving. This way, 8*16 bits of information can be
attached to a typical 4KB filesystem block.
This also means that applications such as fsck and mkfs will need
access to manipulate the tags from user space. A passthrough
interface for this is being worked on.
----------------------------------------------------------------------
4. BLOCK LAYER IMPLEMENTATION DETAILS
4.1 BIO
The data integrity patches add a new field to struct bio when
CONFIG_BLK_DEV_INTEGRITY is enabled. bio->bi_integrity is a pointer
to a struct bip which contains the bio integrity payload. Essentially
a bip is a trimmed down struct bio which holds a bio_vec containing
the integrity metadata and the required housekeeping information (bvec
pool, vector count, etc.)
A kernel subsystem can enable data integrity protection on a bio by
calling bio_integrity_alloc(bio). This will allocate and attach the
bip to the bio.
Individual pages containing integrity metadata can subsequently be
attached using bio_integrity_add_page().
bio_free() will automatically free the bip.
4.2 BLOCK DEVICE
Because the format of the protection data is tied to the physical
disk, each block device has been extended with a block integrity
profile (struct blk_integrity). This optional profile is registered
with the block layer using blk_integrity_register().
The profile contains callback functions for generating and verifying
the protection data, as well as getting and setting application tags.
The profile also contains a few constants to aid in completing,
merging and splitting the integrity metadata.
Layered block devices will need to pick a profile that's appropriate
for all subdevices. blk_integrity_compare() can help with that. DM
and MD linear, RAID0 and RAID1 are currently supported. RAID4/5/6
will require extra work due to the application tag.
----------------------------------------------------------------------
5.0 BLOCK LAYER INTEGRITY API
5.1 NORMAL FILESYSTEM
The normal filesystem is unaware that the underlying block device
is capable of sending/receiving integrity metadata. The IMD will
be automatically generated by the block layer at submit_bio() time
in case of a WRITE. A READ request will cause the I/O integrity
to be verified upon completion.
IMD generation and verification can be toggled using the
/sys/block/<bdev>/integrity/write_generate
and
/sys/block/<bdev>/integrity/read_verify
flags.
5.2 INTEGRITY-AWARE FILESYSTEM
A filesystem that is integrity-aware can prepare I/Os with IMD
attached. It can also use the application tag space if this is
supported by the block device.
int bdev_integrity_enabled(block_device, int rw);
bdev_integrity_enabled() will return 1 if the block device
supports integrity metadata transfer for the data direction
specified in 'rw'.
bdev_integrity_enabled() honors the write_generate and
read_verify flags in sysfs and will respond accordingly.
int bio_integrity_prep(bio);
To generate IMD for WRITE and to set up buffers for READ, the
filesystem must call bio_integrity_prep(bio).
Prior to calling this function, the bio data direction and start
sector must be set, and the bio should have all data pages
added. It is up to the caller to ensure that the bio does not
change while I/O is in progress.
bio_integrity_prep() should only be called if
bio_integrity_enabled() returned 1.
int bio_integrity_tag_size(bio);
If the filesystem wants to use the application tag space it will
first have to find out how much storage space is available.
Because tag space is generally limited (usually 2 bytes per
sector regardless of sector size), the integrity framework
supports interleaving the information between the sectors in an
I/O.
Filesystems can call bio_integrity_tag_size(bio) to find out how
many bytes of storage are available for that particular bio.
Another option is bdev_get_tag_size(block_device) which will
return the number of available bytes per hardware sector.
int bio_integrity_set_tag(bio, void *tag_buf, len);
After a successful return from bio_integrity_prep(),
bio_integrity_set_tag() can be used to attach an opaque tag
buffer to a bio. Obviously this only makes sense if the I/O is
a WRITE.
int bio_integrity_get_tag(bio, void *tag_buf, len);
Similarly, at READ I/O completion time the filesystem can
retrieve the tag buffer using bio_integrity_get_tag().
6.3 PASSING EXISTING INTEGRITY METADATA
Filesystems that either generate their own integrity metadata or
are capable of transferring IMD from user space can use the
following calls:
struct bip * bio_integrity_alloc(bio, gfp_mask, nr_pages);
Allocates the bio integrity payload and hangs it off of the bio.
nr_pages indicate how many pages of protection data need to be
stored in the integrity bio_vec list (similar to bio_alloc()).
The integrity payload will be freed at bio_free() time.
int bio_integrity_add_page(bio, page, len, offset);
Attaches a page containing integrity metadata to an existing
bio. The bio must have an existing bip,
i.e. bio_integrity_alloc() must have been called. For a WRITE,
the integrity metadata in the pages must be in a format
understood by the target device with the notable exception that
the sector numbers will be remapped as the request traverses the
I/O stack. This implies that the pages added using this call
will be modified during I/O! The first reference tag in the
integrity metadata must have a value of bip->bip_sector.
Pages can be added using bio_integrity_add_page() as long as
there is room in the bip bio_vec array (nr_pages).
Upon completion of a READ operation, the attached pages will
contain the integrity metadata received from the storage device.
It is up to the receiver to process them and verify data
integrity upon completion.
6.4 REGISTERING A BLOCK DEVICE AS CAPABLE OF EXCHANGING INTEGRITY
METADATA
To enable integrity exchange on a block device the gendisk must be
registered as capable:
int blk_integrity_register(gendisk, blk_integrity);
The blk_integrity struct is a template and should contain the
following:
static struct blk_integrity my_profile = {
.name = "STANDARDSBODY-TYPE-VARIANT-CSUM",
.generate_fn = my_generate_fn,
.verify_fn = my_verify_fn,
.get_tag_fn = my_get_tag_fn,
.set_tag_fn = my_set_tag_fn,
.tuple_size = sizeof(struct my_tuple_size),
.tag_size = <tag bytes per hw sector>,
};
'name' is a text string which will be visible in sysfs. This is
part of the userland API so chose it carefully and never change
it. The format is standards body-type-variant.
E.g. T10-DIF-TYPE1-IP or T13-EPP-0-CRC.
'generate_fn' generates appropriate integrity metadata (for WRITE).
'verify_fn' verifies that the data buffer matches the integrity
metadata.
'tuple_size' must be set to match the size of the integrity
metadata per sector. I.e. 8 for DIF and EPP.
'tag_size' must be set to identify how many bytes of tag space
are available per hardware sector. For DIF this is either 2 or
0 depending on the value of the Control Mode Page ATO bit.
See 6.2 for a description of get_tag_fn and set_tag_fn.
----------------------------------------------------------------------
2007-12-24 Martin K. Petersen <martin.petersen@oracle.com>
+71 -63
View File
@@ -2,8 +2,11 @@
========================
Copyright 2008 Red Hat Inc.
Author: Steven Rostedt <srostedt@redhat.com>
Author: Steven Rostedt <srostedt@redhat.com>
License: The GNU Free Documentation License, Version 1.2
Reviewers: Elias Oltmanns and Randy Dunlap
Writen for: 2.6.26-rc8 linux-2.6-tip.git tip/tracing/ftrace branch
Introduction
------------
@@ -46,7 +49,7 @@ of ftrace. Here is a list of some of the key files:
that is configured.
available_tracers : This holds the different types of tracers that
has been compiled into the kernel. The tracers
have been compiled into the kernel. The tracers
listed here can be configured by echoing in their
name into current_tracer.
@@ -90,11 +93,13 @@ of ftrace. Here is a list of some of the key files:
trace_entries : This sets or displays the number of trace
entries each CPU buffer can hold. The tracer buffers
are the same size for each CPU, so care must be
taken when modifying the trace_entries. The number
of actually entries will be the number given
times the number of possible CPUS. The buffers
are saved as individual pages, and the actual entries
will always be rounded up to entries per page.
taken when modifying the trace_entries. The trace
buffers are allocated in pages (blocks of memory that
the kernel uses for allocation, usually 4 KB in size).
Since each entry is smaller than a page, if the last
allocated page has room for more entries than were
requested, the rest of the page is used to allocate
entries.
This can only be updated when the current_tracer
is set to "none".
@@ -114,13 +119,13 @@ of ftrace. Here is a list of some of the key files:
in performance. This also has a side effect of
enabling or disabling specific functions to be
traced. Echoing in names of functions into this
file will limit the trace to only those files.
file will limit the trace to only these functions.
set_ftrace_notrace: This has the opposite effect that
set_ftrace_filter has. Any function that is added
here will not be traced. If a function exists
in both set_ftrace_filter and set_ftrace_notrace
the function will _not_ bet traced.
in both set_ftrace_filter and set_ftrace_notrace,
the function will _not_ be traced.
available_filter_functions : When a function is encountered the first
time by the dynamic tracer, it is recorded and
@@ -138,7 +143,7 @@ Here are the list of current tracers that can be configured.
ftrace - function tracer that uses mcount to trace all functions.
It is possible to filter out which functions that are
traced when dynamic ftrace is configured in.
to be traced when dynamic ftrace is configured in.
sched_switch - traces the context switches between tasks.
@@ -297,13 +302,13 @@ explains which is which.
The above is mostly meaningful for kernel developers.
time: This differs from the trace output where as the trace output
contained a absolute timestamp. This timestamp is relative
to the start of the first entry in the the trace.
time: This differs from the trace file output. The trace file output
included an absolute timestamp. The timestamp used by the
latency_trace file is relative to the start of the trace.
delay: This is just to help catch your eye a bit better. And
needs to be fixed to be only relative to the same CPU.
The marks is determined by the difference between this
The marks are determined by the difference between this
current trace and the next trace.
'!' - greater than preempt_mark_thresh (default 100)
'+' - greater than 1 microsecond
@@ -322,13 +327,13 @@ output. To see what is available, simply cat the file:
print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \
noblock nostacktrace nosched-tree
To disable one of the options, echo in the option appended with "no".
To disable one of the options, echo in the option prepended with "no".
echo noprint-parent > /debug/tracing/iter_ctrl
To enable an option, leave off the "no".
echo sym-offest > /debug/tracing/iter_ctrl
echo sym-offset > /debug/tracing/iter_ctrl
Here are the available options:
@@ -344,7 +349,7 @@ Here are the available options:
sym-offset - Display not only the function name, but also the offset
in the function. For example, instead of seeing just
"ktime_get" you will see "ktime_get+0xb/0x20"
"ktime_get", you will see "ktime_get+0xb/0x20".
sym-offset:
bash-4000 [01] 1477.606694: simple_strtoul+0x6/0xa0
@@ -364,7 +369,7 @@ Here are the available options:
user applications that can translate the raw numbers better than
having it done in the kernel.
hex - similar to raw, but the numbers will be in a hexadecimal format.
hex - Similar to raw, but the numbers will be in a hexadecimal format.
bin - This will print out the formats in raw binary.
@@ -381,7 +386,7 @@ sched_switch
------------
This tracer simply records schedule switches. Here's an example
on how to implement it.
of how to use it.
# echo sched_switch > /debug/tracing/current_tracer
# echo 1 > /debug/tracing/tracing_enabled
@@ -470,7 +475,7 @@ interrupt from triggering or the mouse interrupt from letting the
kernel know of a new mouse event. The result is a latency with the
reaction time.
The irqsoff tracer tracks the time interrupts are disabled and when
The irqsoff tracer tracks the time interrupts are disabled to the time
they are re-enabled. When a new maximum latency is hit, it saves off
the trace so that it may be retrieved at a later time. Every time a
new maximum in reached, the old saved trace is discarded and the new
@@ -519,7 +524,7 @@ The difference between the 6 and the displayed timestamp 7us is
because the clock must have incremented between the time of recording
the max latency and recording the function that had that latency.
Note the above had ftrace_enabled not set. If we set the ftrace_enabled
Note the above had ftrace_enabled not set. If we set the ftrace_enabled,
we get a much larger output:
# tracer: irqsoff
@@ -570,21 +575,21 @@ vim:ft=help
Here we traced a 50 microsecond latency. But we also see all the
functions that were called during that time. Note that enabling
function tracing we endure an added overhead. This overhead may
extend the latency times. But never the less, this trace has provided
some very helpful debugging.
functions that were called during that time. Note that by enabling
function tracing, we endure an added overhead. This overhead may
extend the latency times. But nevertheless, this trace has provided
some very helpful debugging information.
preemptoff
----------
When preemption is disabled we may be able to receive interrupts but
the task can not be preempted and a higher priority task must wait
When preemption is disabled, we may be able to receive interrupts but
the task cannot be preempted and a higher priority task must wait
for preemption to be enabled again before it can preempt a lower
priority task.
The preemptoff tracer traces the places that disables preemption.
The preemptoff tracer traces the places that disable preemption.
Like the irqsoff, it records the maximum latency that preemption
was disabled. The control of preemptoff is much like the irqsoff.
@@ -696,7 +701,7 @@ Notice that the __do_softirq when called doesn't have a preempt_count.
It may seem that we missed a preempt enabled. What really happened
is that the preempt count is held on the threads stack and we
switched to the softirq stack (4K stacks in effect). The code
does not copy the preempt count, but because interrupts are disabled
does not copy the preempt count, but because interrupts are disabled,
we don't need to worry about it. Having a tracer like this is good
to let people know what really happens inside the kernel.
@@ -732,7 +737,7 @@ To record this time, use the preemptirqsoff tracer.
Again, using this trace is much like the irqsoff and preemptoff tracers.
# echo preemptoff > /debug/tracing/current_tracer
# echo preemptirqsoff > /debug/tracing/current_tracer
# echo 0 > /debug/tracing/tracing_max_latency
# echo 1 > /debug/tracing/tracing_enabled
# ls -ltr
@@ -862,9 +867,9 @@ This is a very interesting trace. It started with the preemption of
the ls task. We see that the task had the "need_resched" bit set
with the 'N' in the trace. Interrupts are disabled in the spin_lock
and the trace started. We see that a schedule took place to run
sshd. When the interrupts were enabled we took an interrupt.
On return of the interrupt the softirq ran. We took another interrupt
while running the softirq as we see with the capital 'H'.
sshd. When the interrupts were enabled, we took an interrupt.
On return from the interrupt handler, the softirq ran. We took another
interrupt while running the softirq as we see with the capital 'H'.
wakeup
@@ -876,9 +881,9 @@ time it executes. This is also known as "schedule latency".
I stress the point that this is about RT tasks. It is also important
to know the scheduling latency of non-RT tasks, but the average
schedule latency is better for non-RT tasks. Tools like
LatencyTop is more appropriate for such measurements.
LatencyTop are more appropriate for such measurements.
Real-Time environments is interested in the worst case latency.
Real-Time environments are interested in the worst case latency.
That is the longest latency it takes for something to happen, and
not the average. We can have a very fast scheduler that may only
have a large latency once in a while, but that would not work well
@@ -889,8 +894,8 @@ tasks that are unpredictable will overwrite the worst case latency
of RT tasks.
Since this tracer only deals with RT tasks, we will run this slightly
different than we did with the previous tracers. Instead of performing
an 'ls' we will run 'sleep 1' under 'chrt' which changes the
differently than we did with the previous tracers. Instead of performing
an 'ls', we will run 'sleep 1' under 'chrt' which changes the
priority of the task.
# echo wakeup > /debug/tracing/current_tracer
@@ -924,9 +929,9 @@ wakeup latency trace v1.1.5 on 2.6.26-rc8
vim:ft=help
Running this on an idle system we see that it only took 4 microseconds
Running this on an idle system, we see that it only took 4 microseconds
to perform the task switch. Note, since the trace marker in the
schedule is before the actual "switch" we stop the tracing when
schedule is before the actual "switch", we stop the tracing when
the recorded task is about to schedule in. This may change if
we add a new marker at the end of the scheduler.
@@ -992,12 +997,15 @@ ksoftirq-7 1d..4 50us : schedule (__cond_resched)
The interrupt went off while running ksoftirqd. This task runs at
SCHED_OTHER. Why didn't we see the 'N' set early? This may be
a harmless bug with x86_32 and 4K stacks. The need_reched() function
that tests if we need to reschedule looks on the actual stack.
Where as the setting of the NEED_RESCHED bit happens on the
task's stack. But because we are in a hard interrupt, the test
is with the interrupts stack which has that to be false. We don't
see the 'N' until we switch back to the task's stack.
a harmless bug with x86_32 and 4K stacks. On x86_32 with 4K stacks
configured, the interrupt and softirq runs with their own stack.
Some information is held on the top of the task's stack (need_resched
and preempt_count are both stored there). The setting of the NEED_RESCHED
bit is done directly to the task's stack, but the reading of the
NEED_RESCHED is done by looking at the current stack, which in this case
is the stack for the hard interrupt. This hides the fact that NEED_RESCHED
has been set. We don't see the 'N' until we switch back to the task's
assigned stack.
ftrace
------
@@ -1067,10 +1075,10 @@ this works is the mcount function call (placed at the start of
every kernel function, produced by the -pg switch in gcc), starts
of pointing to a simple return.
When dynamic ftrace is initialized, it calls kstop_machine to make it
act like a uniprocessor so that it can freely modify code without
worrying about other processors executing that same code. At
initialization, the mcount calls are change to call a "record_ip"
When dynamic ftrace is initialized, it calls kstop_machine to make
the machine act like a uniprocessor so that it can freely modify code
without worrying about other processors executing that same code. At
initialization, the mcount calls are changed to call a "record_ip"
function. After this, the first time a kernel function is called,
it has the calling address saved in a hash table.
@@ -1085,8 +1093,8 @@ traced, is that we can now selectively choose which functions we
want to trace and which ones we want the mcount calls to remain as
nops.
Two files that contain to the enabling and disabling of recorded
functions are:
Two files are used, one for enabling and one for disabling the tracing
of recorded functions. They are:
set_ftrace_filter
@@ -1094,7 +1102,7 @@ and
set_ftrace_notrace
A list of available functions that you can add to this files is listed
A list of available functions that you can add to these files is listed
in:
available_filter_functions
@@ -1133,9 +1141,9 @@ sys_nanosleep
Perhaps this isn't enough. The filters also allow simple wild cards.
Only the following is currently available
Only the following are currently available
<match>* - will match functions that begins with <match>
<match>* - will match functions that begin with <match>
*<match> - will match functions that end with <match>
*<match>* - will match functions that have <match> in it
@@ -1187,7 +1195,7 @@ This is because the '>' and '>>' act just like they do in bash.
To rewrite the filters, use '>'
To append to the filters, use '>>'
To clear out a filter so that all functions will be recorded again.
To clear out a filter so that all functions will be recorded again:
# echo > /debug/tracing/set_ftrace_filter
# cat /debug/tracing/set_ftrace_filter
@@ -1246,8 +1254,8 @@ ftraced
As mentioned above, when dynamic ftrace is configured in, a kernel
thread wakes up once a second and checks to see if there are mcount
calls that need to be converted into nops. If there is not, then
it simply goes back to sleep. But if there is, it will call
calls that need to be converted into nops. If there are not any, then
it simply goes back to sleep. But if there are some, it will call
kstop_machine to convert the calls to nops.
There may be a case that you do not want this added latency.
@@ -1262,8 +1270,8 @@ mcount calls to nops. Remember that there's a large overhead
to calling mcount. Without this kernel thread, that overhead will
exist.
Any write to the ftraced_enabled file will cause the kstop_machine
to run if there are recorded calls to mcount. This means that a
If there are recorded calls to mcount, any write to the ftraced_enabled
file will cause the kstop_machine to run. This means that a
user can manually perform the updates when they want to by simply
echoing a '0' into the ftraced_enabled file.
@@ -1315,7 +1323,7 @@ trace entries
Having too much or not enough data can be troublesome in diagnosing
some issue in the kernel. The file trace_entries is used to modify
the size of the internal trace buffers. The numbers listed
the size of the internal trace buffers. The number listed
is the number of entries that can be recorded per CPU. To know
the full size, multiply the number of possible CPUS with the
number of entries.
@@ -1323,7 +1331,7 @@ number of entries.
# cat /debug/tracing/trace_entries
65620
Note, to modify this you must have tracing fulling disabled. To do that,
Note, to modify this, you must have tracing completely disabled. To do that,
echo "none" into the current_tracer.
# echo none > /debug/tracing/current_tracer
@@ -1344,7 +1352,7 @@ it will add them.
This shows us that 85 entries can fit on a single page.
The number of pages that will be allocated is a percentage of available
memory. Allocating too much will produces an error.
memory. Allocating too much will produce an error.
# echo 1000000000000 > /debug/tracing/trace_entries
-bash: echo: write error: Cannot allocate memory
+1
View File
@@ -117,6 +117,7 @@ Code Seq# Include File Comments
<mailto:natalia@nikhefk.nikhef.nl>
'c' 00-7F linux/comstats.h conflict!
'c' 00-7F linux/coda.h conflict!
'c' 80-9F asm-s390/chsc.h
'd' 00-FF linux/char/drm/drm/h conflict!
'd' 00-DF linux/video_decoder.h conflict!
'd' F0-FF linux/digi1.h
+1 -1
View File
@@ -109,7 +109,7 @@ There are two possible methods of using Kdump.
2) Or use the system kernel binary itself as dump-capture kernel and there is
no need to build a separate dump-capture kernel. This is possible
only with the architecutres which support a relocatable kernel. As
of today i386 and ia64 architectures support relocatable kernel.
of today, i386, x86_64 and ia64 architectures support relocatable kernel.
Building a relocatable kernel is advantageous from the point of view that
one does not have to build a second kernel for capturing the dump. But
+42
View File
@@ -271,6 +271,17 @@ and is between 256 and 4096 characters. It is defined in the file
aic79xx= [HW,SCSI]
See Documentation/scsi/aic79xx.txt.
amd_iommu= [HW,X86-84]
Pass parameters to the AMD IOMMU driver in the system.
Possible values are:
isolate - enable device isolation (each device, as far
as possible, will get its own protection
domain)
amd_iommu_size= [HW,X86-64]
Define the size of the aperture for the AMD IOMMU
driver. Possible values are:
'32M', '64M' (default), '128M', '256M', '512M', '1G'
amijoy.map= [HW,JOY] Amiga joystick support
Map of devices attached to JOY0DAT and JOY1DAT
Format: <a>,<b>
@@ -599,6 +610,29 @@ and is between 256 and 4096 characters. It is defined in the file
See drivers/char/README.epca and
Documentation/digiepca.txt.
disable_mtrr_cleanup [X86]
enable_mtrr_cleanup [X86]
The kernel tries to adjust MTRR layout from continuous
to discrete, to make X server driver able to add WB
entry later. This parameter enables/disables that.
mtrr_chunk_size=nn[KMG] [X86]
used for mtrr cleanup. It is largest continous chunk
that could hold holes aka. UC entries.
mtrr_gran_size=nn[KMG] [X86]
Used for mtrr cleanup. It is granularity of mtrr block.
Default is 1.
Large value could prevent small alignment from
using up MTRRs.
mtrr_spare_reg_nr=n [X86]
Format: <integer>
Range: 0,7 : spare reg number
Default : 1
Used for mtrr cleanup. It is spare mtrr entries number.
Set to 2 or more if your graphical card needs more.
disable_mtrr_trim [X86, Intel and AMD only]
By default the kernel will trim any uncacheable
memory out of your available memory pool based on
@@ -1208,6 +1242,11 @@ and is between 256 and 4096 characters. It is defined in the file
mtdparts= [MTD]
See drivers/mtd/cmdlinepart.c.
mtdset= [ARM]
ARM/S3C2412 JIVE boot control
See arch/arm/mach-s3c2412/mach-jive.c
mtouchusb.raw_coordinates=
[HW] Make the MicroTouch USB driver use raw coordinates
('y', default) or cooked coordinates ('n')
@@ -2116,6 +2155,9 @@ and is between 256 and 4096 characters. It is defined in the file
usbhid.mousepoll=
[USBHID] The interval which mice are to be polled at.
add_efi_memmap [EFI; x86-32,X86-64] Include EFI memory map in
kernel's map of available physical RAM.
vdso= [X86-32,SH,x86-64]
vdso=2: enable compat VDSO (default with COMPAT_VDSO)
vdso=1: enable VDSO (default)
+7 -9
View File
@@ -10,7 +10,7 @@ us to generate 'watchdog NMI interrupts'. (NMI: Non Maskable Interrupt
which get executed even if the system is otherwise locked up hard).
This can be used to debug hard kernel lockups. By executing periodic
NMI interrupts, the kernel can monitor whether any CPU has locked up,
and print out debugging messages if so.
and print out debugging messages if so.
In order to use the NMI watchdog, you need to have APIC support in your
kernel. For SMP kernels, APIC support gets compiled in automatically. For
@@ -22,8 +22,7 @@ CONFIG_X86_UP_IOAPIC is for uniprocessor with an IO-APIC. [Note: certain
kernel debugging options, such as Kernel Stack Meter or Kernel Tracer,
may implicitly disable the NMI watchdog.]
For x86-64, the needed APIC is always compiled in, and the NMI watchdog is
always enabled with I/O-APIC mode (nmi_watchdog=1).
For x86-64, the needed APIC is always compiled in.
Using local APIC (nmi_watchdog=2) needs the first performance register, so
you can't use it for other purposes (such as high precision performance
@@ -63,16 +62,15 @@ when the system is idle), but if your system locks up on anything but the
"hlt", then you are out of luck -- the event will not happen at all and the
watchdog won't trigger. This is a shortcoming of the local APIC watchdog
-- unfortunately there is no "clock ticks" event that would work all the
time. The I/O APIC watchdog is driven externally and has no such shortcoming.
time. The I/O APIC watchdog is driven externally and has no such shortcoming.
But its NMI frequency is much higher, resulting in a more significant hit
to the overall system performance.
NOTE: starting with 2.4.2-ac18 the NMI-oopser is disabled by default,
you have to enable it with a boot time parameter. Prior to 2.4.2-ac18
the NMI-oopser is enabled unconditionally on x86 SMP boxes.
On x86 nmi_watchdog is disabled by default so you have to enable it with
a boot time parameter.
On x86-64 the NMI oopser is on by default. On 64bit Intel CPUs
it uses IO-APIC by default and on AMD it uses local APIC.
NOTE: In kernels prior to 2.4.2-ac18 the NMI-oopser is enabled unconditionally
on x86 SMP boxes.
[ feel free to send bug reports, suggestions and patches to
Ingo Molnar <mingo@redhat.com> or the Linux SMP mailing
+2 -5
View File
@@ -61,10 +61,7 @@ builder by #define'ing ARCH_HASH_SCHED_DOMAIN, and exporting your
arch_init_sched_domains function. This function will attach domains to all
CPUs using cpu_attach_domain.
Implementors should change the line
#undef SCHED_DOMAIN_DEBUG
to
#define SCHED_DOMAIN_DEBUG
in kernel/sched.c as this enables an error checking parse of the sched domains
The sched-domains debugging infrastructure can be enabled by enabling
CONFIG_SCHED_DEBUG. This enables an error checking parse of the sched domains
which should catch most possible errors (described above). It also prints out
the domain structure in a visual format.
+2 -2
View File
@@ -51,9 +51,9 @@ needs only about 3% CPU time to do so, it can do with a 0.03 * 0.005s =
0.00015s. So this group can be scheduled with a period of 0.005s and a run time
of 0.00015s.
The remaining CPU time will be used for user input and other tass. Because
The remaining CPU time will be used for user input and other tasks. Because
realtime tasks have explicitly allocated the CPU time they need to perform
their tasks, buffer underruns in the graphocs or audio can be eliminated.
their tasks, buffer underruns in the graphics or audio can be eliminated.
NOTE: the above example is not fully implemented as of yet (2.6.25). We still
lack an EDF scheduler to make non-uniform periods usable.
@@ -753,8 +753,11 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
[Multiple options for each card instance]
model - force the model name
position_fix - Fix DMA pointer (0 = auto, 1 = none, 2 = POSBUF, 3 = FIFO size)
position_fix - Fix DMA pointer (0 = auto, 1 = use LPIB, 2 = POSBUF)
probe_mask - Bitmask to probe codecs (default = -1, meaning all slots)
bdl_pos_adj - Specifies the DMA IRQ timing delay in samples.
Passing -1 will make the driver to choose the appropriate
value based on the controller chip.
[Single (global) options]
single_cmd - Use single immediate commands to communicate with
@@ -845,7 +848,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
ALC269
basic Basic preset
ALC662
ALC662/663
3stack-dig 3-stack (2-channel) with SPDIF
3stack-6ch 3-stack (6-channel)
3stack-6ch-dig 3-stack (6-channel) with SPDIF
@@ -853,6 +856,10 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
lenovo-101e Lenovo laptop
eeepc-p701 ASUS Eeepc P701
eeepc-ep20 ASUS Eeepc EP20
m51va ASUS M51VA
g71v ASUS G71V
h13 ASUS H13
g50v ASUS G50V
auto auto-config reading BIOS (default)
ALC882/885
@@ -1091,7 +1098,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
This occurs when the access to non-existing or non-working codec slot
(likely a modem one) causes a stall of the communication via HD-audio
bus. You can see which codec slots are probed by enabling
CONFIG_SND_DEBUG_DETECT, or simply from the file name of the codec
CONFIG_SND_DEBUG_VERBOSE, or simply from the file name of the codec
proc files. Then limit the slots to probe by probe_mask option.
For example, probe_mask=1 means to probe only the first slot, and
probe_mask=4 means only the third slot.
@@ -2267,6 +2274,10 @@ case above again, the first two slots are already reserved. If any
other driver (e.g. snd-usb-audio) is loaded before snd-interwave or
snd-ens1371, it will be assigned to the third or later slot.
When a module name is given with '!', the slot will be given for any
modules but that name. For example, "slots=!snd-pcsp" will reserve
the first slot for any modules but snd-pcsp.
ALSA PCM devices to OSS devices mapping
=======================================
@@ -6127,8 +6127,8 @@ struct _snd_pcm_runtime {
<para>
<function>snd_printdd()</function> is compiled in only when
<constant>CONFIG_SND_DEBUG_DETECT</constant> is set. Please note
that <constant>DEBUG_DETECT</constant> is not set as default
<constant>CONFIG_SND_DEBUG_VERBOSE</constant> is set. Please note
that <constant>CONFIG_SND_DEBUG_VERBOSE</constant> is not set as default
even if you configure the alsa-driver with
<option>--with-debug=full</option> option. You need to give
explicitly <option>--with-debug=detect</option> option instead.
+164
View File
@@ -0,0 +1,164 @@
In-kernel memory-mapped I/O tracing
Home page and links to optional user space tools:
http://nouveau.freedesktop.org/wiki/MmioTrace
MMIO tracing was originally developed by Intel around 2003 for their Fault
Injection Test Harness. In Dec 2006 - Jan 2007, using the code from Intel,
Jeff Muizelaar created a tool for tracing MMIO accesses with the Nouveau
project in mind. Since then many people have contributed.
Mmiotrace was built for reverse engineering any memory-mapped IO device with
the Nouveau project as the first real user. Only x86 and x86_64 architectures
are supported.
Out-of-tree mmiotrace was originally modified for mainline inclusion and
ftrace framework by Pekka Paalanen <pq@iki.fi>.
Preparation
-----------
Mmiotrace feature is compiled in by the CONFIG_MMIOTRACE option. Tracing is
disabled by default, so it is safe to have this set to yes. SMP systems are
supported, but tracing is unreliable and may miss events if more than one CPU
is on-line, therefore mmiotrace takes all but one CPU off-line during run-time
activation. You can re-enable CPUs by hand, but you have been warned, there
is no way to automatically detect if you are losing events due to CPUs racing.
Usage Quick Reference
---------------------
$ mount -t debugfs debugfs /debug
$ echo mmiotrace > /debug/tracing/current_tracer
$ cat /debug/tracing/trace_pipe > mydump.txt &
Start X or whatever.
$ echo "X is up" > /debug/tracing/marker
$ echo none > /debug/tracing/current_tracer
Check for lost events.
Usage
-----
Make sure debugfs is mounted to /debug. If not, (requires root privileges)
$ mount -t debugfs debugfs /debug
Check that the driver you are about to trace is not loaded.
Activate mmiotrace (requires root privileges):
$ echo mmiotrace > /debug/tracing/current_tracer
Start storing the trace:
$ cat /debug/tracing/trace_pipe > mydump.txt &
The 'cat' process should stay running (sleeping) in the background.
Load the driver you want to trace and use it. Mmiotrace will only catch MMIO
accesses to areas that are ioremapped while mmiotrace is active.
[Unimplemented feature:]
During tracing you can place comments (markers) into the trace by
$ echo "X is up" > /debug/tracing/marker
This makes it easier to see which part of the (huge) trace corresponds to
which action. It is recommended to place descriptive markers about what you
do.
Shut down mmiotrace (requires root privileges):
$ echo none > /debug/tracing/current_tracer
The 'cat' process exits. If it does not, kill it by issuing 'fg' command and
pressing ctrl+c.
Check that mmiotrace did not lose events due to a buffer filling up. Either
$ grep -i lost mydump.txt
which tells you exactly how many events were lost, or use
$ dmesg
to view your kernel log and look for "mmiotrace has lost events" warning. If
events were lost, the trace is incomplete. You should enlarge the buffers and
try again. Buffers are enlarged by first seeing how large the current buffers
are:
$ cat /debug/tracing/trace_entries
gives you a number. Approximately double this number and write it back, for
instance:
$ echo 128000 > /debug/tracing/trace_entries
Then start again from the top.
If you are doing a trace for a driver project, e.g. Nouveau, you should also
do the following before sending your results:
$ lspci -vvv > lspci.txt
$ dmesg > dmesg.txt
$ tar zcf pciid-nick-mmiotrace.tar.gz mydump.txt lspci.txt dmesg.txt
and then send the .tar.gz file. The trace compresses considerably. Replace
"pciid" and "nick" with the PCI ID or model name of your piece of hardware
under investigation and your nick name.
How Mmiotrace Works
-------------------
Access to hardware IO-memory is gained by mapping addresses from PCI bus by
calling one of the ioremap_*() functions. Mmiotrace is hooked into the
__ioremap() function and gets called whenever a mapping is created. Mapping is
an event that is recorded into the trace log. Note, that ISA range mappings
are not caught, since the mapping always exists and is returned directly.
MMIO accesses are recorded via page faults. Just before __ioremap() returns,
the mapped pages are marked as not present. Any access to the pages causes a
fault. The page fault handler calls mmiotrace to handle the fault. Mmiotrace
marks the page present, sets TF flag to achieve single stepping and exits the
fault handler. The instruction that faulted is executed and debug trap is
entered. Here mmiotrace again marks the page as not present. The instruction
is decoded to get the type of operation (read/write), data width and the value
read or written. These are stored to the trace log.
Setting the page present in the page fault handler has a race condition on SMP
machines. During the single stepping other CPUs may run freely on that page
and events can be missed without a notice. Re-enabling other CPUs during
tracing is discouraged.
Trace Log Format
----------------
The raw log is text and easily filtered with e.g. grep and awk. One record is
one line in the log. A record starts with a keyword, followed by keyword
dependant arguments. Arguments are separated by a space, or continue until the
end of line. The format for version 20070824 is as follows:
Explanation Keyword Space separated arguments
---------------------------------------------------------------------------
read event R width, timestamp, map id, physical, value, PC, PID
write event W width, timestamp, map id, physical, value, PC, PID
ioremap event MAP timestamp, map id, physical, virtual, length, PC, PID
iounmap event UNMAP timestamp, map id, PC, PID
marker MARK timestamp, text
version VERSION the string "20070824"
info for reader LSPCI one line from lspci -v
PCI address map PCIDEV space separated /proc/bus/pci/devices data
unk. opcode UNKNOWN timestamp, map id, physical, data, PC, PID
Timestamp is in seconds with decimals. Physical is a PCI bus address, virtual
is a kernel virtual address. Width is the data width in bytes and value is the
data value. Map id is an arbitrary id number identifying the mapping that was
used in an operation. PC is the program counter and PID is process id. PC is
zero if it is not recorded. PID is always zero as tracing MMIO accesses
originating in user space memory is not yet supported.
For instance, the following awk filter will pass all 32-bit writes that target
physical addresses in the range [0xfb73ce40, 0xfb800000[
$ awk '/W 4 / { adr=strtonum($5); if (adr >= 0xfb73ce40 &&
adr < 0xfb800000) print; }'
Tools for Developers
--------------------
The user space tools include utilities for:
- replacing numeric addresses and values with hardware register names
- replaying MMIO logs, i.e., re-executing the recorded writes
@@ -1,17 +1,14 @@
THE LINUX/I386 BOOT PROTOCOL
----------------------------
THE LINUX/x86 BOOT PROTOCOL
---------------------------
H. Peter Anvin <hpa@zytor.com>
Last update 2007-05-23
On the i386 platform, the Linux kernel uses a rather complicated boot
On the x86 platform, the Linux kernel uses a rather complicated boot
convention. This has evolved partially due to historical aspects, as
well as the desire in the early days to have the kernel itself be a
bootable image, the complicated PC memory model and due to changed
expectations in the PC industry caused by the effective demise of
real-mode DOS as a mainstream operating system.
Currently, the following versions of the Linux/i386 boot protocol exist.
Currently, the following versions of the Linux/x86 boot protocol exist.
Old kernels: zImage/Image support only. Some very early kernels
may not even support a command line.
@@ -372,10 +369,17 @@ Protocol: 2.00+
- If 0, the protected-mode code is loaded at 0x10000.
- If 1, the protected-mode code is loaded at 0x100000.
Bit 5 (write): QUIET_FLAG
- If 0, print early messages.
- If 1, suppress early messages.
This requests to the kernel (decompressor and early
kernel) to not write early messages that require
accessing the display hardware directly.
Bit 6 (write): KEEP_SEGMENTS
Protocol: 2.07+
- if 0, reload the segment registers in the 32bit entry point.
- if 1, do not reload the segment registers in the 32bit entry point.
- If 0, reload the segment registers in the 32bit entry point.
- If 1, do not reload the segment registers in the 32bit entry point.
Assume that %cs %ds %ss %es are all set to flat segments with
a base of 0 (or the equivalent for their environment).
@@ -504,7 +508,7 @@ Protocol: 2.06+
maximum size was 255.
Field name: hardware_subarch
Type: write
Type: write (optional, defaults to x86/PC)
Offset/size: 0x23c/4
Protocol: 2.07+
@@ -520,11 +524,13 @@ Protocol: 2.07+
0x00000002 Xen
Field name: hardware_subarch_data
Type: write
Type: write (subarch-dependent)
Offset/size: 0x240/8
Protocol: 2.07+
A pointer to data that is specific to hardware subarch
This field is currently unused for the default x86/PC environment,
do not modify.
Field name: payload_offset
Type: read
@@ -545,6 +551,34 @@ Protocol: 2.08+
The length of the payload.
Field name: setup_data
Type: write (special)
Offset/size: 0x250/8
Protocol: 2.09+
The 64-bit physical pointer to NULL terminated single linked list of
struct setup_data. This is used to define a more extensible boot
parameters passing mechanism. The definition of struct setup_data is
as follow:
struct setup_data {
u64 next;
u32 type;
u32 len;
u8 data[0];
};
Where, the next is a 64-bit physical pointer to the next node of
linked list, the next field of the last node is 0; the type is used
to identify the contents of data; the len is the length of data
field; the data holds the real payload.
This list may be modified at a number of points during the bootup
process. Therefore, when modifying this list one should always make
sure to consider the case where the linked list already contains
entries.
**** THE IMAGE CHECKSUM
From boot protocol version 2.08 onwards the CRC-32 is calculated over
@@ -553,6 +587,7 @@ initial remainder of 0xffffffff. The checksum is appended to the
file; therefore the CRC of the file up to the limit specified in the
syssize field of the header is always 0.
**** THE KERNEL COMMAND LINE
The kernel command line has become an important way for the boot
@@ -584,28 +619,6 @@ command line is entered using the following protocol:
covered by setup_move_size, so you may need to adjust this
field.
Field name: setup_data
Type: write (obligatory)
Offset/size: 0x250/8
Protocol: 2.09+
The 64-bit physical pointer to NULL terminated single linked list of
struct setup_data. This is used to define a more extensible boot
parameters passing mechanism. The definition of struct setup_data is
as follow:
struct setup_data {
u64 next;
u32 type;
u32 len;
u8 data[0];
};
Where, the next is a 64-bit physical pointer to the next node of
linked list, the next field of the last node is 0; the type is used
to identify the contents of data; the len is the length of data
field; the data holds the real payload.
**** MEMORY LAYOUT OF THE REAL-MODE CODE

Some files were not shown because too many files have changed in this diff Show More