You've already forked linux-apfs
mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
This commit is contained in:
@@ -1573,12 +1573,8 @@ S: 160 00 Praha 6
|
||||
S: Czech Republic
|
||||
|
||||
N: Niels Kristian Bech Jensen
|
||||
E: nkbj@image.dk
|
||||
W: http://www.image.dk/~nkbj
|
||||
E: nkbj1970@hotmail.com
|
||||
D: Miscellaneous kernel updates and fixes.
|
||||
S: Dr. Holsts Vej 34, lejl. 164
|
||||
S: DK-8230 Åbyhøj
|
||||
S: Denmark
|
||||
|
||||
N: Michael K. Johnson
|
||||
E: johnsonm@redhat.com
|
||||
|
||||
@@ -0,0 +1,77 @@
|
||||
This directory attempts to document the ABI between the Linux kernel and
|
||||
userspace, and the relative stability of these interfaces. Due to the
|
||||
everchanging nature of Linux, and the differing maturity levels, these
|
||||
interfaces should be used by userspace programs in different ways.
|
||||
|
||||
We have four different levels of ABI stability, as shown by the four
|
||||
different subdirectories in this location. Interfaces may change levels
|
||||
of stability according to the rules described below.
|
||||
|
||||
The different levels of stability are:
|
||||
|
||||
stable/
|
||||
This directory documents the interfaces that the developer has
|
||||
defined to be stable. Userspace programs are free to use these
|
||||
interfaces with no restrictions, and backward compatibility for
|
||||
them will be guaranteed for at least 2 years. Most interfaces
|
||||
(like syscalls) are expected to never change and always be
|
||||
available.
|
||||
|
||||
testing/
|
||||
This directory documents interfaces that are felt to be stable,
|
||||
as the main development of this interface has been completed.
|
||||
The interface can be changed to add new features, but the
|
||||
current interface will not break by doing this, unless grave
|
||||
errors or security problems are found in them. Userspace
|
||||
programs can start to rely on these interfaces, but they must be
|
||||
aware of changes that can occur before these interfaces move to
|
||||
be marked stable. Programs that use these interfaces are
|
||||
strongly encouraged to add their name to the description of
|
||||
these interfaces, so that the kernel developers can easily
|
||||
notify them if any changes occur (see the description of the
|
||||
layout of the files below for details on how to do this.)
|
||||
|
||||
obsolete/
|
||||
This directory documents interfaces that are still remaining in
|
||||
the kernel, but are marked to be removed at some later point in
|
||||
time. The description of the interface will document the reason
|
||||
why it is obsolete and when it can be expected to be removed.
|
||||
The file Documentation/feature-removal-schedule.txt may describe
|
||||
some of these interfaces, giving a schedule for when they will
|
||||
be removed.
|
||||
|
||||
removed/
|
||||
This directory contains a list of the old interfaces that have
|
||||
been removed from the kernel.
|
||||
|
||||
Every file in these directories will contain the following information:
|
||||
|
||||
What: Short description of the interface
|
||||
Date: Date created
|
||||
KernelVersion: Kernel version this feature first showed up in.
|
||||
Contact: Primary contact for this interface (may be a mailing list)
|
||||
Description: Long description of the interface and how to use it.
|
||||
Users: All users of this interface who wish to be notified when
|
||||
it changes. This is very important for interfaces in
|
||||
the "testing" stage, so that kernel developers can work
|
||||
with userspace developers to ensure that things do not
|
||||
break in ways that are unacceptable. It is also
|
||||
important to get feedback for these interfaces to make
|
||||
sure they are working in a proper way and do not need to
|
||||
be changed further.
|
||||
|
||||
|
||||
How things move between levels:
|
||||
|
||||
Interfaces in stable may move to obsolete, as long as the proper
|
||||
notification is given.
|
||||
|
||||
Interfaces may be removed from obsolete and the kernel as long as the
|
||||
documented amount of time has gone by.
|
||||
|
||||
Interfaces in the testing state can move to the stable state when the
|
||||
developers feel they are finished. They cannot be removed from the
|
||||
kernel tree without going through the obsolete state first.
|
||||
|
||||
It's up to the developer to place their interfaces in the category they
|
||||
wish for it to start out in.
|
||||
@@ -0,0 +1,13 @@
|
||||
What: devfs
|
||||
Date: July 2005
|
||||
Contact: Greg Kroah-Hartman <gregkh@suse.de>
|
||||
Description:
|
||||
devfs has been unmaintained for a number of years, has unfixable
|
||||
races, contains a naming policy within the kernel that is
|
||||
against the LSB, and can be replaced by using udev.
|
||||
The files fs/devfs/*, include/linux/devfs_fs*.h will be removed,
|
||||
along with the the assorted devfs function calls throughout the
|
||||
kernel tree.
|
||||
|
||||
Users:
|
||||
|
||||
@@ -0,0 +1,10 @@
|
||||
What: The kernel syscall interface
|
||||
Description:
|
||||
This interface matches much of the POSIX interface and is based
|
||||
on it and other Unix based interfaces. It will only be added to
|
||||
over time, and not have things removed from it.
|
||||
|
||||
Note that this interface is different for every architecture
|
||||
that Linux supports. Please see the architecture-specific
|
||||
documentation for details on the syscall numbers that are to be
|
||||
mapped to each syscall.
|
||||
@@ -0,0 +1,30 @@
|
||||
What: /sys/module
|
||||
Description:
|
||||
The /sys/module tree consists of the following structure:
|
||||
|
||||
/sys/module/MODULENAME
|
||||
The name of the module that is in the kernel. This
|
||||
module name will show up either if the module is built
|
||||
directly into the kernel, or if it is loaded as a
|
||||
dyanmic module.
|
||||
|
||||
/sys/module/MODULENAME/parameters
|
||||
This directory contains individual files that are each
|
||||
individual parameters of the module that are able to be
|
||||
changed at runtime. See the individual module
|
||||
documentation as to the contents of these parameters and
|
||||
what they accomplish.
|
||||
|
||||
Note: The individual parameter names and values are not
|
||||
considered stable, only the fact that they will be
|
||||
placed in this location within sysfs. See the
|
||||
individual driver documentation for details as to the
|
||||
stability of the different parameters.
|
||||
|
||||
/sys/module/MODULENAME/refcnt
|
||||
If the module is able to be unloaded from the kernel, this file
|
||||
will contain the current reference count of the module.
|
||||
|
||||
Note: If the module is built into the kernel, or if the
|
||||
CONFIG_MODULE_UNLOAD kernel configuration value is not enabled,
|
||||
this file will not be present.
|
||||
@@ -0,0 +1,16 @@
|
||||
What: /sys/class/
|
||||
Date: Febuary 2006
|
||||
Contact: Greg Kroah-Hartman <gregkh@suse.de>
|
||||
Description:
|
||||
The /sys/class directory will consist of a group of
|
||||
subdirectories describing individual classes of devices
|
||||
in the kernel. The individual directories will consist
|
||||
of either subdirectories, or symlinks to other
|
||||
directories.
|
||||
|
||||
All programs that use this directory tree must be able
|
||||
to handle both subdirectories or symlinks in order to
|
||||
work properly.
|
||||
|
||||
Users:
|
||||
udev <linux-hotplug-devel@lists.sourceforge.net>
|
||||
@@ -0,0 +1,25 @@
|
||||
What: /sys/devices
|
||||
Date: February 2006
|
||||
Contact: Greg Kroah-Hartman <gregkh@suse.de>
|
||||
Description:
|
||||
The /sys/devices tree contains a snapshot of the
|
||||
internal state of the kernel device tree. Devices will
|
||||
be added and removed dynamically as the machine runs,
|
||||
and between different kernel versions, the layout of the
|
||||
devices within this tree will change.
|
||||
|
||||
Please do not rely on the format of this tree because of
|
||||
this. If a program wishes to find different things in
|
||||
the tree, please use the /sys/class structure and rely
|
||||
on the symlinks there to point to the proper location
|
||||
within the /sys/devices tree of the individual devices.
|
||||
Or rely on the uevent messages to notify programs of
|
||||
devices being added and removed from this tree to find
|
||||
the location of those devices.
|
||||
|
||||
Note that sometimes not all devices along the directory
|
||||
chain will have emitted uevent messages, so userspace
|
||||
programs must be able to handle such occurrences.
|
||||
|
||||
Users:
|
||||
udev <linux-hotplug-devel@lists.sourceforge.net>
|
||||
+88
-12
@@ -155,7 +155,83 @@ problem, which is called the function-growth-hormone-imbalance syndrome.
|
||||
See next chapter.
|
||||
|
||||
|
||||
Chapter 5: Functions
|
||||
Chapter 5: Typedefs
|
||||
|
||||
Please don't use things like "vps_t".
|
||||
|
||||
It's a _mistake_ to use typedef for structures and pointers. When you see a
|
||||
|
||||
vps_t a;
|
||||
|
||||
in the source, what does it mean?
|
||||
|
||||
In contrast, if it says
|
||||
|
||||
struct virtual_container *a;
|
||||
|
||||
you can actually tell what "a" is.
|
||||
|
||||
Lots of people think that typedefs "help readability". Not so. They are
|
||||
useful only for:
|
||||
|
||||
(a) totally opaque objects (where the typedef is actively used to _hide_
|
||||
what the object is).
|
||||
|
||||
Example: "pte_t" etc. opaque objects that you can only access using
|
||||
the proper accessor functions.
|
||||
|
||||
NOTE! Opaqueness and "accessor functions" are not good in themselves.
|
||||
The reason we have them for things like pte_t etc. is that there
|
||||
really is absolutely _zero_ portably accessible information there.
|
||||
|
||||
(b) Clear integer types, where the abstraction _helps_ avoid confusion
|
||||
whether it is "int" or "long".
|
||||
|
||||
u8/u16/u32 are perfectly fine typedefs, although they fit into
|
||||
category (d) better than here.
|
||||
|
||||
NOTE! Again - there needs to be a _reason_ for this. If something is
|
||||
"unsigned long", then there's no reason to do
|
||||
|
||||
typedef unsigned long myflags_t;
|
||||
|
||||
but if there is a clear reason for why it under certain circumstances
|
||||
might be an "unsigned int" and under other configurations might be
|
||||
"unsigned long", then by all means go ahead and use a typedef.
|
||||
|
||||
(c) when you use sparse to literally create a _new_ type for
|
||||
type-checking.
|
||||
|
||||
(d) New types which are identical to standard C99 types, in certain
|
||||
exceptional circumstances.
|
||||
|
||||
Although it would only take a short amount of time for the eyes and
|
||||
brain to become accustomed to the standard types like 'uint32_t',
|
||||
some people object to their use anyway.
|
||||
|
||||
Therefore, the Linux-specific 'u8/u16/u32/u64' types and their
|
||||
signed equivalents which are identical to standard types are
|
||||
permitted -- although they are not mandatory in new code of your
|
||||
own.
|
||||
|
||||
When editing existing code which already uses one or the other set
|
||||
of types, you should conform to the existing choices in that code.
|
||||
|
||||
(e) Types safe for use in userspace.
|
||||
|
||||
In certain structures which are visible to userspace, we cannot
|
||||
require C99 types and cannot use the 'u32' form above. Thus, we
|
||||
use __u32 and similar types in all structures which are shared
|
||||
with userspace.
|
||||
|
||||
Maybe there are other cases too, but the rule should basically be to NEVER
|
||||
EVER use a typedef unless you can clearly match one of those rules.
|
||||
|
||||
In general, a pointer, or a struct that has elements that can reasonably
|
||||
be directly accessed should _never_ be a typedef.
|
||||
|
||||
|
||||
Chapter 6: Functions
|
||||
|
||||
Functions should be short and sweet, and do just one thing. They should
|
||||
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,
|
||||
@@ -183,7 +259,7 @@ and it gets confused. You know you're brilliant, but maybe you'd like
|
||||
to understand what you did 2 weeks from now.
|
||||
|
||||
|
||||
Chapter 6: Centralized exiting of functions
|
||||
Chapter 7: Centralized exiting of functions
|
||||
|
||||
Albeit deprecated by some people, the equivalent of the goto statement is
|
||||
used frequently by compilers in form of the unconditional jump instruction.
|
||||
@@ -220,7 +296,7 @@ out:
|
||||
return result;
|
||||
}
|
||||
|
||||
Chapter 7: Commenting
|
||||
Chapter 8: Commenting
|
||||
|
||||
Comments are good, but there is also a danger of over-commenting. NEVER
|
||||
try to explain HOW your code works in a comment: it's much better to
|
||||
@@ -240,7 +316,7 @@ When commenting the kernel API functions, please use the kerneldoc format.
|
||||
See the files Documentation/kernel-doc-nano-HOWTO.txt and scripts/kernel-doc
|
||||
for details.
|
||||
|
||||
Chapter 8: You've made a mess of it
|
||||
Chapter 9: You've made a mess of it
|
||||
|
||||
That's OK, we all do. You've probably been told by your long-time Unix
|
||||
user helper that "GNU emacs" automatically formats the C sources for
|
||||
@@ -288,7 +364,7 @@ re-formatting you may want to take a look at the man page. But
|
||||
remember: "indent" is not a fix for bad programming.
|
||||
|
||||
|
||||
Chapter 9: Configuration-files
|
||||
Chapter 10: Configuration-files
|
||||
|
||||
For configuration options (arch/xxx/Kconfig, and all the Kconfig files),
|
||||
somewhat different indentation is used.
|
||||
@@ -313,7 +389,7 @@ support for file-systems, for instance) should be denoted (DANGEROUS), other
|
||||
experimental options should be denoted (EXPERIMENTAL).
|
||||
|
||||
|
||||
Chapter 10: Data structures
|
||||
Chapter 11: Data structures
|
||||
|
||||
Data structures that have visibility outside the single-threaded
|
||||
environment they are created and destroyed in should always have
|
||||
@@ -344,7 +420,7 @@ Remember: if another thread can find your data structure, and you don't
|
||||
have a reference count on it, you almost certainly have a bug.
|
||||
|
||||
|
||||
Chapter 11: Macros, Enums and RTL
|
||||
Chapter 12: Macros, Enums and RTL
|
||||
|
||||
Names of macros defining constants and labels in enums are capitalized.
|
||||
|
||||
@@ -399,7 +475,7 @@ The cpp manual deals with macros exhaustively. The gcc internals manual also
|
||||
covers RTL which is used frequently with assembly language in the kernel.
|
||||
|
||||
|
||||
Chapter 12: Printing kernel messages
|
||||
Chapter 13: Printing kernel messages
|
||||
|
||||
Kernel developers like to be seen as literate. Do mind the spelling
|
||||
of kernel messages to make a good impression. Do not use crippled
|
||||
@@ -410,7 +486,7 @@ Kernel messages do not have to be terminated with a period.
|
||||
Printing numbers in parentheses (%d) adds no value and should be avoided.
|
||||
|
||||
|
||||
Chapter 13: Allocating memory
|
||||
Chapter 14: Allocating memory
|
||||
|
||||
The kernel provides the following general purpose memory allocators:
|
||||
kmalloc(), kzalloc(), kcalloc(), and vmalloc(). Please refer to the API
|
||||
@@ -429,7 +505,7 @@ from void pointer to any other pointer type is guaranteed by the C programming
|
||||
language.
|
||||
|
||||
|
||||
Chapter 14: The inline disease
|
||||
Chapter 15: The inline disease
|
||||
|
||||
There appears to be a common misperception that gcc has a magic "make me
|
||||
faster" speedup option called "inline". While the use of inlines can be
|
||||
@@ -457,7 +533,7 @@ something it would have done anyway.
|
||||
|
||||
|
||||
|
||||
Chapter 15: References
|
||||
Appendix I: References
|
||||
|
||||
The C Programming Language, Second Edition
|
||||
by Brian W. Kernighan and Dennis M. Ritchie.
|
||||
@@ -481,4 +557,4 @@ Kernel CodingStyle, by greg@kroah.com at OLS 2002:
|
||||
http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/
|
||||
|
||||
--
|
||||
Last updated on 30 December 2005 by a community effort on LKML.
|
||||
Last updated on 30 April 2006.
|
||||
|
||||
@@ -62,6 +62,8 @@
|
||||
<sect1><title>Internal Functions</title>
|
||||
!Ikernel/exit.c
|
||||
!Ikernel/signal.c
|
||||
!Iinclude/linux/kthread.h
|
||||
!Ekernel/kthread.c
|
||||
</sect1>
|
||||
|
||||
<sect1><title>Kernel objects manipulation</title>
|
||||
@@ -114,9 +116,33 @@ X!Ilib/string.c
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
<chapter id="kernel-lib">
|
||||
<title>Basic Kernel Library Functions</title>
|
||||
|
||||
<para>
|
||||
The Linux kernel provides more basic utility functions.
|
||||
</para>
|
||||
|
||||
<sect1><title>Bitmap Operations</title>
|
||||
!Elib/bitmap.c
|
||||
!Ilib/bitmap.c
|
||||
</sect1>
|
||||
|
||||
<sect1><title>Command-line Parsing</title>
|
||||
!Elib/cmdline.c
|
||||
</sect1>
|
||||
|
||||
<sect1><title>CRC Functions</title>
|
||||
!Elib/crc16.c
|
||||
!Elib/crc32.c
|
||||
!Elib/crc-ccitt.c
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
<chapter id="mm">
|
||||
<title>Memory Management in Linux</title>
|
||||
<sect1><title>The Slab Cache</title>
|
||||
!Iinclude/linux/slab.h
|
||||
!Emm/slab.c
|
||||
</sect1>
|
||||
<sect1><title>User Space Memory Access</title>
|
||||
@@ -280,12 +306,13 @@ X!Ekernel/module.c
|
||||
<sect1><title>MTRR Handling</title>
|
||||
!Earch/i386/kernel/cpu/mtrr/main.c
|
||||
</sect1>
|
||||
|
||||
<sect1><title>PCI Support Library</title>
|
||||
!Edrivers/pci/pci.c
|
||||
!Edrivers/pci/pci-driver.c
|
||||
!Edrivers/pci/remove.c
|
||||
!Edrivers/pci/pci-acpi.c
|
||||
<!-- kerneldoc does not understand to __devinit
|
||||
<!-- kerneldoc does not understand __devinit
|
||||
X!Edrivers/pci/search.c
|
||||
-->
|
||||
!Edrivers/pci/msi.c
|
||||
@@ -314,6 +341,13 @@ X!Earch/i386/kernel/mca.c
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
<chapter id="firmware">
|
||||
<title>Firmware Interfaces</title>
|
||||
<sect1><title>DMI Interfaces</title>
|
||||
!Edrivers/firmware/dmi_scan.c
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
<chapter id="devfs">
|
||||
<title>The Device File System</title>
|
||||
!Efs/devfs/base.c
|
||||
@@ -331,6 +365,18 @@ X!Earch/i386/kernel/mca.c
|
||||
!Esecurity/security.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="audit">
|
||||
<title>Audit Interfaces</title>
|
||||
!Ekernel/audit.c
|
||||
!Ikernel/auditsc.c
|
||||
!Ikernel/auditfilter.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="accounting">
|
||||
<title>Accounting Framework</title>
|
||||
!Ikernel/acct.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="pmfuncs">
|
||||
<title>Power Management</title>
|
||||
!Ekernel/power/pm.c
|
||||
@@ -390,7 +436,6 @@ X!Edrivers/pnp/system.c
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
|
||||
<chapter id="blkdev">
|
||||
<title>Block Devices</title>
|
||||
!Eblock/ll_rw_blk.c
|
||||
@@ -401,6 +446,14 @@ X!Edrivers/pnp/system.c
|
||||
!Edrivers/char/misc.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="parportdev">
|
||||
<title>Parallel Port Devices</title>
|
||||
!Iinclude/linux/parport.h
|
||||
!Edrivers/parport/ieee1284.c
|
||||
!Edrivers/parport/share.c
|
||||
!Idrivers/parport/daisy.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="viddev">
|
||||
<title>Video4Linux</title>
|
||||
!Edrivers/media/video/videodev.c
|
||||
|
||||
@@ -169,6 +169,22 @@ void (*tf_read) (struct ata_port *ap, struct ata_taskfile *tf);
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2><title>PIO data read/write</title>
|
||||
<programlisting>
|
||||
void (*data_xfer) (struct ata_device *, unsigned char *, unsigned int, int);
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
All bmdma-style drivers must implement this hook. This is the low-level
|
||||
operation that actually copies the data bytes during a PIO data
|
||||
transfer.
|
||||
Typically the driver
|
||||
will choose one of ata_pio_data_xfer_noirq(), ata_pio_data_xfer(), or
|
||||
ata_mmio_data_xfer().
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2><title>ATA command execute</title>
|
||||
<programlisting>
|
||||
void (*exec_command)(struct ata_port *ap, struct ata_taskfile *tf);
|
||||
@@ -204,11 +220,10 @@ command.
|
||||
<programlisting>
|
||||
u8 (*check_status)(struct ata_port *ap);
|
||||
u8 (*check_altstatus)(struct ata_port *ap);
|
||||
u8 (*check_err)(struct ata_port *ap);
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
Reads the Status/AltStatus/Error ATA shadow register from
|
||||
Reads the Status/AltStatus ATA shadow register from
|
||||
hardware. On some hardware, reading the Status register has
|
||||
the side effect of clearing the interrupt condition.
|
||||
Most drivers for taskfile-based hardware use
|
||||
@@ -269,23 +284,6 @@ void (*set_mode) (struct ata_port *ap);
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2><title>Reset ATA bus</title>
|
||||
<programlisting>
|
||||
void (*phy_reset) (struct ata_port *ap);
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
The very first step in the probe phase. Actions vary depending
|
||||
on the bus type, typically. After waking up the device and probing
|
||||
for device presence (PATA and SATA), typically a soft reset
|
||||
(SRST) will be performed. Drivers typically use the helper
|
||||
functions ata_bus_reset() or sata_phy_reset() for this hook.
|
||||
Many SATA drivers use sata_phy_reset() or call it from within
|
||||
their own phy_reset() functions.
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2><title>Control PCI IDE BMDMA engine</title>
|
||||
<programlisting>
|
||||
void (*bmdma_setup) (struct ata_queued_cmd *qc);
|
||||
@@ -354,16 +352,74 @@ int (*qc_issue) (struct ata_queued_cmd *qc);
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2><title>Timeout (error) handling</title>
|
||||
<sect2><title>Exception and probe handling (EH)</title>
|
||||
<programlisting>
|
||||
void (*eng_timeout) (struct ata_port *ap);
|
||||
void (*phy_reset) (struct ata_port *ap);
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
This is a high level error handling function, called from the
|
||||
error handling thread, when a command times out. Most newer
|
||||
hardware will implement its own error handling code here. IDE BMDMA
|
||||
drivers may use the helper function ata_eng_timeout().
|
||||
Deprecated. Use ->error_handler() instead.
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
void (*freeze) (struct ata_port *ap);
|
||||
void (*thaw) (struct ata_port *ap);
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
ata_port_freeze() is called when HSM violations or some other
|
||||
condition disrupts normal operation of the port. A frozen port
|
||||
is not allowed to perform any operation until the port is
|
||||
thawed, which usually follows a successful reset.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The optional ->freeze() callback can be used for freezing the port
|
||||
hardware-wise (e.g. mask interrupt and stop DMA engine). If a
|
||||
port cannot be frozen hardware-wise, the interrupt handler
|
||||
must ack and clear interrupts unconditionally while the port
|
||||
is frozen.
|
||||
</para>
|
||||
<para>
|
||||
The optional ->thaw() callback is called to perform the opposite of ->freeze():
|
||||
prepare the port for normal operation once again. Unmask interrupts,
|
||||
start DMA engine, etc.
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
void (*error_handler) (struct ata_port *ap);
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
->error_handler() is a driver's hook into probe, hotplug, and recovery
|
||||
and other exceptional conditions. The primary responsibility of an
|
||||
implementation is to call ata_do_eh() or ata_bmdma_drive_eh() with a set
|
||||
of EH hooks as arguments:
|
||||
</para>
|
||||
|
||||
<para>
|
||||
'prereset' hook (may be NULL) is called during an EH reset, before any other actions
|
||||
are taken.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
'postreset' hook (may be NULL) is called after the EH reset is performed. Based on
|
||||
existing conditions, severity of the problem, and hardware capabilities,
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Either 'softreset' (may be NULL) or 'hardreset' (may be NULL) will be
|
||||
called to perform the low-level EH reset.
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
void (*post_internal_cmd) (struct ata_queued_cmd *qc);
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
Perform any hardware-specific actions necessary to finish processing
|
||||
after executing a probe-time or EH-time command via ata_exec_internal().
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
@@ -144,9 +144,47 @@ over a rather long period of time, but improvements are always welcome!
|
||||
whether the increased speed is worth it.
|
||||
|
||||
8. Although synchronize_rcu() is a bit slower than is call_rcu(),
|
||||
it usually results in simpler code. So, unless update performance
|
||||
is important or the updaters cannot block, synchronize_rcu()
|
||||
should be used in preference to call_rcu().
|
||||
it usually results in simpler code. So, unless update
|
||||
performance is critically important or the updaters cannot block,
|
||||
synchronize_rcu() should be used in preference to call_rcu().
|
||||
|
||||
An especially important property of the synchronize_rcu()
|
||||
primitive is that it automatically self-limits: if grace periods
|
||||
are delayed for whatever reason, then the synchronize_rcu()
|
||||
primitive will correspondingly delay updates. In contrast,
|
||||
code using call_rcu() should explicitly limit update rate in
|
||||
cases where grace periods are delayed, as failing to do so can
|
||||
result in excessive realtime latencies or even OOM conditions.
|
||||
|
||||
Ways of gaining this self-limiting property when using call_rcu()
|
||||
include:
|
||||
|
||||
a. Keeping a count of the number of data-structure elements
|
||||
used by the RCU-protected data structure, including those
|
||||
waiting for a grace period to elapse. Enforce a limit
|
||||
on this number, stalling updates as needed to allow
|
||||
previously deferred frees to complete.
|
||||
|
||||
Alternatively, limit only the number awaiting deferred
|
||||
free rather than the total number of elements.
|
||||
|
||||
b. Limiting update rate. For example, if updates occur only
|
||||
once per hour, then no explicit rate limiting is required,
|
||||
unless your system is already badly broken. The dcache
|
||||
subsystem takes this approach -- updates are guarded
|
||||
by a global lock, limiting their rate.
|
||||
|
||||
c. Trusted update -- if updates can only be done manually by
|
||||
superuser or some other trusted user, then it might not
|
||||
be necessary to automatically limit them. The theory
|
||||
here is that superuser already has lots of ways to crash
|
||||
the machine.
|
||||
|
||||
d. Use call_rcu_bh() rather than call_rcu(), in order to take
|
||||
advantage of call_rcu_bh()'s faster grace periods.
|
||||
|
||||
e. Periodically invoke synchronize_rcu(), permitting a limited
|
||||
number of updates per grace period.
|
||||
|
||||
9. All RCU list-traversal primitives, which include
|
||||
list_for_each_rcu(), list_for_each_entry_rcu(),
|
||||
|
||||
@@ -184,7 +184,17 @@ synchronize_rcu()
|
||||
blocking, it registers a function and argument which are invoked
|
||||
after all ongoing RCU read-side critical sections have completed.
|
||||
This callback variant is particularly useful in situations where
|
||||
it is illegal to block.
|
||||
it is illegal to block or where update-side performance is
|
||||
critically important.
|
||||
|
||||
However, the call_rcu() API should not be used lightly, as use
|
||||
of the synchronize_rcu() API generally results in simpler code.
|
||||
In addition, the synchronize_rcu() API has the nice property
|
||||
of automatically limiting update rate should grace periods
|
||||
be delayed. This property results in system resilience in face
|
||||
of denial-of-service attacks. Code using call_rcu() should limit
|
||||
update rate in order to gain this same sort of resilience. See
|
||||
checklist.txt for some approaches to limiting the update rate.
|
||||
|
||||
rcu_assign_pointer()
|
||||
|
||||
@@ -790,7 +800,6 @@ RCU pointer update:
|
||||
|
||||
RCU grace period:
|
||||
|
||||
synchronize_kernel (deprecated)
|
||||
synchronize_net
|
||||
synchronize_sched
|
||||
synchronize_rcu
|
||||
|
||||
@@ -0,0 +1,57 @@
|
||||
Linux Kernel patch sumbittal checklist
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Here are some basic things that developers should do if they
|
||||
want to see their kernel patch submittals accepted quicker.
|
||||
|
||||
These are all above and beyond the documentation that is provided
|
||||
in Documentation/SubmittingPatches and elsewhere about submitting
|
||||
Linux kernel patches.
|
||||
|
||||
|
||||
|
||||
- Builds cleanly with applicable or modified CONFIG options =y, =m, and =n.
|
||||
No gcc warnings/errors, no linker warnings/errors.
|
||||
|
||||
- Passes allnoconfig, allmodconfig
|
||||
|
||||
- Builds on multiple CPU arch-es by using local cross-compile tools
|
||||
or something like PLM at OSDL.
|
||||
|
||||
- ppc64 is a good architecture for cross-compilation checking because it
|
||||
tends to use `unsigned long' for 64-bit quantities.
|
||||
|
||||
- Matches kernel coding style(!)
|
||||
|
||||
- Any new or modified CONFIG options don't muck up the config menu.
|
||||
|
||||
- All new Kconfig options have help text.
|
||||
|
||||
- Has been carefully reviewed with respect to relevant Kconfig
|
||||
combinations. This is very hard to get right with testing --
|
||||
brainpower pays off here.
|
||||
|
||||
- Check cleanly with sparse.
|
||||
|
||||
- Use 'make checkstack' and 'make namespacecheck' and fix any
|
||||
problems that they find. Note: checkstack does not point out
|
||||
problems explicitly, but any one function that uses more than
|
||||
512 bytes on the stack is a candidate for change.
|
||||
|
||||
- Include kernel-doc to document global kernel APIs. (Not required
|
||||
for static functions, but OK there also.) Use 'make htmldocs'
|
||||
or 'make mandocs' to check the kernel-doc and fix any issues.
|
||||
|
||||
- Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT,
|
||||
CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES,
|
||||
CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP all simultaneously
|
||||
enabled.
|
||||
|
||||
- Has been build- and runtime tested with and without CONFIG_SMP and
|
||||
CONFIG_PREEMPT.
|
||||
|
||||
- If the patch affects IO/Disk, etc: has been tested with and without
|
||||
CONFIG_LBD.
|
||||
|
||||
|
||||
2006-APR-27
|
||||
+116
-24
@@ -3,7 +3,7 @@
|
||||
|
||||
Maintained by Torben Mathiasen <device@lanana.org>
|
||||
|
||||
Last revised: 25 January 2005
|
||||
Last revised: 15 May 2006
|
||||
|
||||
This list is the Linux Device List, the official registry of allocated
|
||||
device numbers and /dev directory nodes for the Linux operating
|
||||
@@ -94,7 +94,6 @@ Your cooperation is appreciated.
|
||||
9 = /dev/urandom Faster, less secure random number gen.
|
||||
10 = /dev/aio Asyncronous I/O notification interface
|
||||
11 = /dev/kmsg Writes to this come out as printk's
|
||||
12 = /dev/oldmem Access to crash dump from kexec kernel
|
||||
1 block RAM disk
|
||||
0 = /dev/ram0 First RAM disk
|
||||
1 = /dev/ram1 Second RAM disk
|
||||
@@ -262,13 +261,13 @@ Your cooperation is appreciated.
|
||||
NOTE: These devices permit both read and write access.
|
||||
|
||||
7 block Loopback devices
|
||||
0 = /dev/loop0 First loopback device
|
||||
1 = /dev/loop1 Second loopback device
|
||||
0 = /dev/loop0 First loop device
|
||||
1 = /dev/loop1 Second loop device
|
||||
...
|
||||
|
||||
The loopback devices are used to mount filesystems not
|
||||
The loop devices are used to mount filesystems not
|
||||
associated with block devices. The binding to the
|
||||
loopback devices is handled by mount(8) or losetup(8).
|
||||
loop devices is handled by mount(8) or losetup(8).
|
||||
|
||||
8 block SCSI disk devices (0-15)
|
||||
0 = /dev/sda First SCSI disk whole disk
|
||||
@@ -943,7 +942,7 @@ Your cooperation is appreciated.
|
||||
240 = /dev/ftlp FTL on 16th Memory Technology Device
|
||||
|
||||
Partitions are handled in the same way as for IDE
|
||||
disks (see major number 3) expect that the partition
|
||||
disks (see major number 3) except that the partition
|
||||
limit is 15 rather than 63 per disk (same as SCSI.)
|
||||
|
||||
45 char isdn4linux ISDN BRI driver
|
||||
@@ -1168,7 +1167,7 @@ Your cooperation is appreciated.
|
||||
The filename of the encrypted container and the passwords
|
||||
are sent via ioctls (using the sdmount tool) to the master
|
||||
node which then activates them via one of the
|
||||
/dev/scramdisk/x nodes for loopback mounting (all handled
|
||||
/dev/scramdisk/x nodes for loop mounting (all handled
|
||||
through the sdmount tool).
|
||||
|
||||
Requested by: andy@scramdisklinux.org
|
||||
@@ -2538,18 +2537,32 @@ Your cooperation is appreciated.
|
||||
0 = /dev/usb/lp0 First USB printer
|
||||
...
|
||||
15 = /dev/usb/lp15 16th USB printer
|
||||
16 = /dev/usb/mouse0 First USB mouse
|
||||
...
|
||||
31 = /dev/usb/mouse15 16th USB mouse
|
||||
32 = /dev/usb/ez0 First USB firmware loader
|
||||
...
|
||||
47 = /dev/usb/ez15 16th USB firmware loader
|
||||
48 = /dev/usb/scanner0 First USB scanner
|
||||
...
|
||||
63 = /dev/usb/scanner15 16th USB scanner
|
||||
64 = /dev/usb/rio500 Diamond Rio 500
|
||||
65 = /dev/usb/usblcd USBLCD Interface (info@usblcd.de)
|
||||
66 = /dev/usb/cpad0 Synaptics cPad (mouse/LCD)
|
||||
96 = /dev/usb/hiddev0 1st USB HID device
|
||||
...
|
||||
111 = /dev/usb/hiddev15 16th USB HID device
|
||||
112 = /dev/usb/auer0 1st auerswald ISDN device
|
||||
...
|
||||
127 = /dev/usb/auer15 16th auerswald ISDN device
|
||||
128 = /dev/usb/brlvgr0 First Braille Voyager device
|
||||
...
|
||||
131 = /dev/usb/brlvgr3 Fourth Braille Voyager device
|
||||
132 = /dev/usb/idmouse ID Mouse (fingerprint scanner) device
|
||||
133 = /dev/usb/sisusbvga1 First SiSUSB VGA device
|
||||
...
|
||||
140 = /dev/usb/sisusbvga8 Eigth SISUSB VGA device
|
||||
144 = /dev/usb/lcd USB LCD device
|
||||
160 = /dev/usb/legousbtower0 1st USB Legotower device
|
||||
...
|
||||
175 = /dev/usb/legousbtower15 16th USB Legotower device
|
||||
240 = /dev/usb/dabusb0 First daubusb device
|
||||
...
|
||||
243 = /dev/usb/dabusb3 Fourth dabusb device
|
||||
|
||||
180 block USB block devices
|
||||
0 = /dev/uba First USB block device
|
||||
@@ -2710,6 +2723,17 @@ Your cooperation is appreciated.
|
||||
1 = /dev/cpu/1/msr MSRs on CPU 1
|
||||
...
|
||||
|
||||
202 block Xen Virtual Block Device
|
||||
0 = /dev/xvda First Xen VBD whole disk
|
||||
16 = /dev/xvdb Second Xen VBD whole disk
|
||||
32 = /dev/xvdc Third Xen VBD whole disk
|
||||
...
|
||||
240 = /dev/xvdp Sixteenth Xen VBD whole disk
|
||||
|
||||
Partitions are handled in the same way as for IDE
|
||||
disks (see major number 3) except that the limit on
|
||||
partitions is 15.
|
||||
|
||||
203 char CPU CPUID information
|
||||
0 = /dev/cpu/0/cpuid CPUID on CPU 0
|
||||
1 = /dev/cpu/1/cpuid CPUID on CPU 1
|
||||
@@ -2747,11 +2771,27 @@ Your cooperation is appreciated.
|
||||
46 = /dev/ttyCPM0 PPC CPM (SCC or SMC) - port 0
|
||||
...
|
||||
47 = /dev/ttyCPM5 PPC CPM (SCC or SMC) - port 5
|
||||
50 = /dev/ttyIOC40 Altix serial card
|
||||
50 = /dev/ttyIOC0 Altix serial card
|
||||
...
|
||||
81 = /dev/ttyIOC431 Altix serial card
|
||||
82 = /dev/ttyVR0 NEC VR4100 series SIU
|
||||
83 = /dev/ttyVR1 NEC VR4100 series DSIU
|
||||
81 = /dev/ttyIOC31 Altix serial card
|
||||
82 = /dev/ttyVR0 NEC VR4100 series SIU
|
||||
83 = /dev/ttyVR1 NEC VR4100 series DSIU
|
||||
84 = /dev/ttyIOC84 Altix ioc4 serial card
|
||||
...
|
||||
115 = /dev/ttyIOC115 Altix ioc4 serial card
|
||||
116 = /dev/ttySIOC0 Altix ioc3 serial card
|
||||
...
|
||||
147 = /dev/ttySIOC31 Altix ioc3 serial card
|
||||
148 = /dev/ttyPSC0 PPC PSC - port 0
|
||||
...
|
||||
153 = /dev/ttyPSC5 PPC PSC - port 5
|
||||
154 = /dev/ttyAT0 ATMEL serial port 0
|
||||
...
|
||||
169 = /dev/ttyAT15 ATMEL serial port 15
|
||||
170 = /dev/ttyNX0 Hilscher netX serial port 0
|
||||
...
|
||||
185 = /dev/ttyNX15 Hilscher netX serial port 15
|
||||
186 = /dev/ttyJ0 JTAG1 DCC protocol based serial port emulation
|
||||
|
||||
205 char Low-density serial ports (alternate device)
|
||||
0 = /dev/culu0 Callout device for ttyLU0
|
||||
@@ -2786,8 +2826,8 @@ Your cooperation is appreciated.
|
||||
50 = /dev/cuioc40 Callout device for ttyIOC40
|
||||
...
|
||||
81 = /dev/cuioc431 Callout device for ttyIOC431
|
||||
82 = /dev/cuvr0 Callout device for ttyVR0
|
||||
83 = /dev/cuvr1 Callout device for ttyVR1
|
||||
82 = /dev/cuvr0 Callout device for ttyVR0
|
||||
83 = /dev/cuvr1 Callout device for ttyVR1
|
||||
|
||||
|
||||
206 char OnStream SC-x0 tape devices
|
||||
@@ -2897,7 +2937,6 @@ Your cooperation is appreciated.
|
||||
...
|
||||
196 = /dev/dvb/adapter3/video0 first video decoder of fourth card
|
||||
|
||||
|
||||
216 char Bluetooth RFCOMM TTY devices
|
||||
0 = /dev/rfcomm0 First Bluetooth RFCOMM TTY device
|
||||
1 = /dev/rfcomm1 Second Bluetooth RFCOMM TTY device
|
||||
@@ -3002,12 +3041,43 @@ Your cooperation is appreciated.
|
||||
ioctl()'s can be used to rewind the tape regardless of
|
||||
the device used to access it.
|
||||
|
||||
231 char InfiniBand MAD
|
||||
231 char InfiniBand
|
||||
0 = /dev/infiniband/umad0
|
||||
1 = /dev/infiniband/umad1
|
||||
...
|
||||
...
|
||||
63 = /dev/infiniband/umad63 63rd InfiniBandMad device
|
||||
64 = /dev/infiniband/issm0 First InfiniBand IsSM device
|
||||
65 = /dev/infiniband/issm1 Second InfiniBand IsSM device
|
||||
...
|
||||
127 = /dev/infiniband/issm63 63rd InfiniBand IsSM device
|
||||
128 = /dev/infiniband/uverbs0 First InfiniBand verbs device
|
||||
129 = /dev/infiniband/uverbs1 Second InfiniBand verbs device
|
||||
...
|
||||
159 = /dev/infiniband/uverbs31 31st InfiniBand verbs device
|
||||
|
||||
232-239 UNASSIGNED
|
||||
232 char Biometric Devices
|
||||
0 = /dev/biometric/sensor0/fingerprint first fingerprint sensor on first device
|
||||
1 = /dev/biometric/sensor0/iris first iris sensor on first device
|
||||
2 = /dev/biometric/sensor0/retina first retina sensor on first device
|
||||
3 = /dev/biometric/sensor0/voiceprint first voiceprint sensor on first device
|
||||
4 = /dev/biometric/sensor0/facial first facial sensor on first device
|
||||
5 = /dev/biometric/sensor0/hand first hand sensor on first device
|
||||
...
|
||||
10 = /dev/biometric/sensor1/fingerprint first fingerprint sensor on second device
|
||||
...
|
||||
20 = /dev/biometric/sensor2/fingerprint first fingerprint sensor on third device
|
||||
...
|
||||
|
||||
233 char PathScale InfiniPath interconnect
|
||||
0 = /dev/ipath Primary device for programs (any unit)
|
||||
1 = /dev/ipath0 Access specifically to unit 0
|
||||
2 = /dev/ipath1 Access specifically to unit 1
|
||||
...
|
||||
4 = /dev/ipath3 Access specifically to unit 3
|
||||
129 = /dev/ipath_sma Device used by Subnet Management Agent
|
||||
130 = /dev/ipath_diag Device used by diagnostics programs
|
||||
|
||||
234-239 UNASSIGNED
|
||||
|
||||
240-254 char LOCAL/EXPERIMENTAL USE
|
||||
240-254 block LOCAL/EXPERIMENTAL USE
|
||||
@@ -3021,6 +3091,28 @@ Your cooperation is appreciated.
|
||||
This major is reserved to assist the expansion to a
|
||||
larger number space. No device nodes with this major
|
||||
should ever be created on the filesystem.
|
||||
(This is probaly not true anymore, but I'll leave it
|
||||
for now /Torben)
|
||||
|
||||
---LARGE MAJORS!!!!!---
|
||||
|
||||
256 char Equinox SST multi-port serial boards
|
||||
0 = /dev/ttyEQ0 First serial port on first Equinox SST board
|
||||
127 = /dev/ttyEQ127 Last serial port on first Equinox SST board
|
||||
128 = /dev/ttyEQ128 First serial port on second Equinox SST board
|
||||
...
|
||||
1027 = /dev/ttyEQ1027 Last serial port on eighth Equinox SST board
|
||||
|
||||
256 block Resident Flash Disk Flash Translation Layer
|
||||
0 = /dev/rfda First RFD FTL layer
|
||||
16 = /dev/rfdb Second RFD FTL layer
|
||||
...
|
||||
240 = /dev/rfdp 16th RFD FTL layer
|
||||
|
||||
257 char Phoenix Technologies Cryptographic Services Driver
|
||||
0 = /dev/ptlsec Crypto Services Driver
|
||||
|
||||
|
||||
|
||||
**** ADDITIONAL /dev DIRECTORY ENTRIES
|
||||
|
||||
|
||||
@@ -33,21 +33,6 @@ Who: Adrian Bunk <bunk@stusta.de>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: RCU API moves to EXPORT_SYMBOL_GPL
|
||||
When: April 2006
|
||||
Files: include/linux/rcupdate.h, kernel/rcupdate.c
|
||||
Why: Outside of Linux, the only implementations of anything even
|
||||
vaguely resembling RCU that I am aware of are in DYNIX/ptx,
|
||||
VM/XA, Tornado, and K42. I do not expect anyone to port binary
|
||||
drivers or kernel modules from any of these, since the first two
|
||||
are owned by IBM and the last two are open-source research OSes.
|
||||
So these will move to GPL after a grace period to allow
|
||||
people, who might be using implementations that I am not aware
|
||||
of, to adjust to this upcoming change.
|
||||
Who: Paul E. McKenney <paulmck@us.ibm.com>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: raw1394: requests of type RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN
|
||||
When: November 2006
|
||||
Why: Deprecated in favour of the new ioctl-based rawiso interface, which is
|
||||
|
||||
@@ -99,7 +99,7 @@ prototypes:
|
||||
int (*sync_fs)(struct super_block *sb, int wait);
|
||||
void (*write_super_lockfs) (struct super_block *);
|
||||
void (*unlockfs) (struct super_block *);
|
||||
int (*statfs) (struct super_block *, struct kstatfs *);
|
||||
int (*statfs) (struct dentry *, struct kstatfs *);
|
||||
int (*remount_fs) (struct super_block *, int *, char *);
|
||||
void (*clear_inode) (struct inode *);
|
||||
void (*umount_begin) (struct super_block *);
|
||||
@@ -142,15 +142,16 @@ see also dquot_operations section.
|
||||
|
||||
--------------------------- file_system_type ---------------------------
|
||||
prototypes:
|
||||
struct super_block *(*get_sb) (struct file_system_type *, int,
|
||||
const char *, void *);
|
||||
struct int (*get_sb) (struct file_system_type *, int,
|
||||
const char *, void *, struct vfsmount *);
|
||||
void (*kill_sb) (struct super_block *);
|
||||
locking rules:
|
||||
may block BKL
|
||||
get_sb yes yes
|
||||
kill_sb yes yes
|
||||
|
||||
->get_sb() returns error or a locked superblock (exclusive on ->s_umount).
|
||||
->get_sb() returns error or 0 with locked superblock attached to the vfsmount
|
||||
(exclusive on ->s_umount).
|
||||
->kill_sb() takes a write-locked superblock, does all shutdown work on it,
|
||||
unlocks and drops the reference.
|
||||
|
||||
|
||||
@@ -19,7 +19,7 @@ following procedure:
|
||||
|
||||
(2) Have the follow_link() op do the following steps:
|
||||
|
||||
(a) Call do_kern_mount() to call the appropriate filesystem to set up a
|
||||
(a) Call vfs_kern_mount() to call the appropriate filesystem to set up a
|
||||
superblock and gain a vfsmount structure representing it.
|
||||
|
||||
(b) Copy the nameidata provided as an argument and substitute the dentry
|
||||
|
||||
@@ -18,6 +18,14 @@ Non-privileged mount (or user mount):
|
||||
user. NOTE: this is not the same as mounts allowed with the "user"
|
||||
option in /etc/fstab, which is not discussed here.
|
||||
|
||||
Filesystem connection:
|
||||
|
||||
A connection between the filesystem daemon and the kernel. The
|
||||
connection exists until either the daemon dies, or the filesystem is
|
||||
umounted. Note that detaching (or lazy umounting) the filesystem
|
||||
does _not_ break the connection, in this case it will exist until
|
||||
the last reference to the filesystem is released.
|
||||
|
||||
Mount owner:
|
||||
|
||||
The user who does the mounting.
|
||||
@@ -86,16 +94,20 @@ Mount options
|
||||
The default is infinite. Note that the size of read requests is
|
||||
limited anyway to 32 pages (which is 128kbyte on i386).
|
||||
|
||||
Sysfs
|
||||
~~~~~
|
||||
Control filesystem
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
FUSE sets up the following hierarchy in sysfs:
|
||||
There's a control filesystem for FUSE, which can be mounted by:
|
||||
|
||||
/sys/fs/fuse/connections/N/
|
||||
mount -t fusectl none /sys/fs/fuse/connections
|
||||
|
||||
where N is an increasing number allocated to each new connection.
|
||||
Mounting it under the '/sys/fs/fuse/connections' directory makes it
|
||||
backwards compatible with earlier versions.
|
||||
|
||||
For each connection the following attributes are defined:
|
||||
Under the fuse control filesystem each connection has a directory
|
||||
named by a unique number.
|
||||
|
||||
For each connection the following files exist within this directory:
|
||||
|
||||
'waiting'
|
||||
|
||||
@@ -110,7 +122,47 @@ For each connection the following attributes are defined:
|
||||
connection. This means that all waiting requests will be aborted an
|
||||
error returned for all aborted and new requests.
|
||||
|
||||
Only a privileged user may read or write these attributes.
|
||||
Only the owner of the mount may read or write these files.
|
||||
|
||||
Interrupting filesystem operations
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
If a process issuing a FUSE filesystem request is interrupted, the
|
||||
following will happen:
|
||||
|
||||
1) If the request is not yet sent to userspace AND the signal is
|
||||
fatal (SIGKILL or unhandled fatal signal), then the request is
|
||||
dequeued and returns immediately.
|
||||
|
||||
2) If the request is not yet sent to userspace AND the signal is not
|
||||
fatal, then an 'interrupted' flag is set for the request. When
|
||||
the request has been successfully transfered to userspace and
|
||||
this flag is set, an INTERRUPT request is queued.
|
||||
|
||||
3) If the request is already sent to userspace, then an INTERRUPT
|
||||
request is queued.
|
||||
|
||||
INTERRUPT requests take precedence over other requests, so the
|
||||
userspace filesystem will receive queued INTERRUPTs before any others.
|
||||
|
||||
The userspace filesystem may ignore the INTERRUPT requests entirely,
|
||||
or may honor them by sending a reply to the _original_ request, with
|
||||
the error set to EINTR.
|
||||
|
||||
It is also possible that there's a race between processing the
|
||||
original request and it's INTERRUPT request. There are two possibilities:
|
||||
|
||||
1) The INTERRUPT request is processed before the original request is
|
||||
processed
|
||||
|
||||
2) The INTERRUPT request is processed after the original request has
|
||||
been answered
|
||||
|
||||
If the filesystem cannot find the original request, it should wait for
|
||||
some timeout and/or a number of new requests to arrive, after which it
|
||||
should reply to the INTERRUPT request with an EAGAIN error. In case
|
||||
1) the INTERRUPT request will be requeued. In case 2) the INTERRUPT
|
||||
reply will be ignored.
|
||||
|
||||
Aborting a filesystem connection
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
@@ -139,8 +191,8 @@ the filesystem. There are several ways to do this:
|
||||
- Use forced umount (umount -f). Works in all cases but only if
|
||||
filesystem is still attached (it hasn't been lazy unmounted)
|
||||
|
||||
- Abort filesystem through the sysfs interface. Most powerful
|
||||
method, always works.
|
||||
- Abort filesystem through the FUSE control filesystem. Most
|
||||
powerful method, always works.
|
||||
|
||||
How do non-privileged mounts work?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
@@ -304,25 +356,7 @@ Scenario 1 - Simple deadlock
|
||||
| | for "file"]
|
||||
| | *DEADLOCK*
|
||||
|
||||
The solution for this is to allow requests to be interrupted while
|
||||
they are in userspace:
|
||||
|
||||
| [interrupted by signal] |
|
||||
| <fuse_unlink() |
|
||||
| [release semaphore] | [semaphore acquired]
|
||||
| <sys_unlink() |
|
||||
| | >fuse_unlink()
|
||||
| | [queue req on fc->pending]
|
||||
| | [wake up fc->waitq]
|
||||
| | [sleep on req->waitq]
|
||||
|
||||
If the filesystem daemon was single threaded, this will stop here,
|
||||
since there's no other thread to dequeue and execute the request.
|
||||
In this case the solution is to kill the FUSE daemon as well. If
|
||||
there are multiple serving threads, you just have to kill them as
|
||||
long as any remain.
|
||||
|
||||
Moral: a filesystem which deadlocks, can soon find itself dead.
|
||||
The solution for this is to allow the filesystem to be aborted.
|
||||
|
||||
Scenario 2 - Tricky deadlock
|
||||
----------------------------
|
||||
@@ -355,24 +389,14 @@ but is caused by a pagefault.
|
||||
| | [lock page]
|
||||
| | * DEADLOCK *
|
||||
|
||||
Solution is again to let the the request be interrupted (not
|
||||
elaborated further).
|
||||
Solution is basically the same as above.
|
||||
|
||||
An additional problem is that while the write buffer is being
|
||||
copied to the request, the request must not be interrupted. This
|
||||
is because the destination address of the copy may not be valid
|
||||
after the request is interrupted.
|
||||
An additional problem is that while the write buffer is being copied
|
||||
to the request, the request must not be interrupted/aborted. This is
|
||||
because the destination address of the copy may not be valid after the
|
||||
request has returned.
|
||||
|
||||
This is solved with doing the copy atomically, and allowing
|
||||
interruption while the page(s) belonging to the write buffer are
|
||||
faulted with get_user_pages(). The 'req->locked' flag indicates
|
||||
when the copy is taking place, and interruption is delayed until
|
||||
this flag is unset.
|
||||
|
||||
Scenario 3 - Tricky deadlock with asynchronous read
|
||||
---------------------------------------------------
|
||||
|
||||
The same situation as above, except thread-1 will wait on page lock
|
||||
and hence it will be uninterruptible as well. The solution is to
|
||||
abort the connection with forced umount (if mount is attached) or
|
||||
through the abort attribute in sysfs.
|
||||
This is solved with doing the copy atomically, and allowing abort
|
||||
while the page(s) belonging to the write buffer are faulted with
|
||||
get_user_pages(). The 'req->locked' flag indicates when the copy is
|
||||
taking place, and abort is delayed until this flag is unset.
|
||||
|
||||
@@ -50,10 +50,11 @@ Turn your foo_read_super() into a function that would return 0 in case of
|
||||
success and negative number in case of error (-EINVAL unless you have more
|
||||
informative error value to report). Call it foo_fill_super(). Now declare
|
||||
|
||||
struct super_block foo_get_sb(struct file_system_type *fs_type,
|
||||
int flags, const char *dev_name, void *data)
|
||||
int foo_get_sb(struct file_system_type *fs_type,
|
||||
int flags, const char *dev_name, void *data, struct vfsmount *mnt)
|
||||
{
|
||||
return get_sb_bdev(fs_type, flags, dev_name, data, ext2_fill_super);
|
||||
return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super,
|
||||
mnt);
|
||||
}
|
||||
|
||||
(or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of
|
||||
|
||||
@@ -70,11 +70,13 @@ tmpfs mounts. See Documentation/filesystems/tmpfs.txt for more information.
|
||||
What is rootfs?
|
||||
---------------
|
||||
|
||||
Rootfs is a special instance of ramfs, which is always present in 2.6 systems.
|
||||
(It's used internally as the starting and stopping point for searches of the
|
||||
kernel's doubly-linked list of mount points.)
|
||||
Rootfs is a special instance of ramfs (or tmpfs, if that's enabled), which is
|
||||
always present in 2.6 systems. You can't unmount rootfs for approximately the
|
||||
same reason you can't kill the init process; rather than having special code
|
||||
to check for and handle an empty list, it's smaller and simpler for the kernel
|
||||
to just make sure certain lists can't become empty.
|
||||
|
||||
Most systems just mount another filesystem over it and ignore it. The
|
||||
Most systems just mount another filesystem over rootfs and ignore it. The
|
||||
amount of space an empty instance of ramfs takes up is tiny.
|
||||
|
||||
What is initramfs?
|
||||
@@ -92,14 +94,16 @@ out of that.
|
||||
|
||||
All this differs from the old initrd in several ways:
|
||||
|
||||
- The old initrd was a separate file, while the initramfs archive is linked
|
||||
into the linux kernel image. (The directory linux-*/usr is devoted to
|
||||
generating this archive during the build.)
|
||||
- The old initrd was always a separate file, while the initramfs archive is
|
||||
linked into the linux kernel image. (The directory linux-*/usr is devoted
|
||||
to generating this archive during the build.)
|
||||
|
||||
- The old initrd file was a gzipped filesystem image (in some file format,
|
||||
such as ext2, that had to be built into the kernel), while the new
|
||||
such as ext2, that needed a driver built into the kernel), while the new
|
||||
initramfs archive is a gzipped cpio archive (like tar only simpler,
|
||||
see cpio(1) and Documentation/early-userspace/buffer-format.txt).
|
||||
see cpio(1) and Documentation/early-userspace/buffer-format.txt). The
|
||||
kernel's cpio extraction code is not only extremely small, it's also
|
||||
__init data that can be discarded during the boot process.
|
||||
|
||||
- The program run by the old initrd (which was called /initrd, not /init) did
|
||||
some setup and then returned to the kernel, while the init program from
|
||||
@@ -124,13 +128,14 @@ Populating initramfs:
|
||||
|
||||
The 2.6 kernel build process always creates a gzipped cpio format initramfs
|
||||
archive and links it into the resulting kernel binary. By default, this
|
||||
archive is empty (consuming 134 bytes on x86). The config option
|
||||
CONFIG_INITRAMFS_SOURCE (for some reason buried under devices->block devices
|
||||
in menuconfig, and living in usr/Kconfig) can be used to specify a source for
|
||||
the initramfs archive, which will automatically be incorporated into the
|
||||
resulting binary. This option can point to an existing gzipped cpio archive, a
|
||||
directory containing files to be archived, or a text file specification such
|
||||
as the following example:
|
||||
archive is empty (consuming 134 bytes on x86).
|
||||
|
||||
The config option CONFIG_INITRAMFS_SOURCE (for some reason buried under
|
||||
devices->block devices in menuconfig, and living in usr/Kconfig) can be used
|
||||
to specify a source for the initramfs archive, which will automatically be
|
||||
incorporated into the resulting binary. This option can point to an existing
|
||||
gzipped cpio archive, a directory containing files to be archived, or a text
|
||||
file specification such as the following example:
|
||||
|
||||
dir /dev 755 0 0
|
||||
nod /dev/console 644 0 0 c 5 1
|
||||
@@ -146,23 +151,84 @@ as the following example:
|
||||
Run "usr/gen_init_cpio" (after the kernel build) to get a usage message
|
||||
documenting the above file format.
|
||||
|
||||
One advantage of the text file is that root access is not required to
|
||||
One advantage of the configuration file is that root access is not required to
|
||||
set permissions or create device nodes in the new archive. (Note that those
|
||||
two example "file" entries expect to find files named "init.sh" and "busybox" in
|
||||
a directory called "initramfs", under the linux-2.6.* directory. See
|
||||
Documentation/early-userspace/README for more details.)
|
||||
|
||||
The kernel does not depend on external cpio tools, gen_init_cpio is created
|
||||
from usr/gen_init_cpio.c which is entirely self-contained, and the kernel's
|
||||
boot-time extractor is also (obviously) self-contained. However, if you _do_
|
||||
happen to have cpio installed, the following command line can extract the
|
||||
generated cpio image back into its component files:
|
||||
The kernel does not depend on external cpio tools. If you specify a
|
||||
directory instead of a configuration file, the kernel's build infrastructure
|
||||
creates a configuration file from that directory (usr/Makefile calls
|
||||
scripts/gen_initramfs_list.sh), and proceeds to package up that directory
|
||||
using the config file (by feeding it to usr/gen_init_cpio, which is created
|
||||
from usr/gen_init_cpio.c). The kernel's build-time cpio creation code is
|
||||
entirely self-contained, and the kernel's boot-time extractor is also
|
||||
(obviously) self-contained.
|
||||
|
||||
The one thing you might need external cpio utilities installed for is creating
|
||||
or extracting your own preprepared cpio files to feed to the kernel build
|
||||
(instead of a config file or directory).
|
||||
|
||||
The following command line can extract a cpio image (either by the above script
|
||||
or by the kernel build) back into its component files:
|
||||
|
||||
cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames
|
||||
|
||||
The following shell script can create a prebuilt cpio archive you can
|
||||
use in place of the above config file:
|
||||
|
||||
#!/bin/sh
|
||||
|
||||
# Copyright 2006 Rob Landley <rob@landley.net> and TimeSys Corporation.
|
||||
# Licensed under GPL version 2
|
||||
|
||||
if [ $# -ne 2 ]
|
||||
then
|
||||
echo "usage: mkinitramfs directory imagename.cpio.gz"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ -d "$1" ]
|
||||
then
|
||||
echo "creating $2 from $1"
|
||||
(cd "$1"; find . | cpio -o -H newc | gzip) > "$2"
|
||||
else
|
||||
echo "First argument must be a directory"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
Note: The cpio man page contains some bad advice that will break your initramfs
|
||||
archive if you follow it. It says "A typical way to generate the list
|
||||
of filenames is with the find command; you should give find the -depth option
|
||||
to minimize problems with permissions on directories that are unwritable or not
|
||||
searchable." Don't do this when creating initramfs.cpio.gz images, it won't
|
||||
work. The Linux kernel cpio extractor won't create files in a directory that
|
||||
doesn't exist, so the directory entries must go before the files that go in
|
||||
those directories. The above script gets them in the right order.
|
||||
|
||||
External initramfs images:
|
||||
--------------------------
|
||||
|
||||
If the kernel has initrd support enabled, an external cpio.gz archive can also
|
||||
be passed into a 2.6 kernel in place of an initrd. In this case, the kernel
|
||||
will autodetect the type (initramfs, not initrd) and extract the external cpio
|
||||
archive into rootfs before trying to run /init.
|
||||
|
||||
This has the memory efficiency advantages of initramfs (no ramdisk block
|
||||
device) but the separate packaging of initrd (which is nice if you have
|
||||
non-GPL code you'd like to run from initramfs, without conflating it with
|
||||
the GPL licensed Linux kernel binary).
|
||||
|
||||
It can also be used to supplement the kernel's built-in initamfs image. The
|
||||
files in the external archive will overwrite any conflicting files in
|
||||
the built-in initramfs archive. Some distributors also prefer to customize
|
||||
a single kernel image with task-specific initramfs images, without recompiling.
|
||||
|
||||
Contents of initramfs:
|
||||
----------------------
|
||||
|
||||
An initramfs archive is a complete self-contained root filesystem for Linux.
|
||||
If you don't already understand what shared libraries, devices, and paths
|
||||
you need to get a minimal root filesystem up and running, here are some
|
||||
references:
|
||||
@@ -176,13 +242,36 @@ code against, along with some related utilities. It is BSD licensed.
|
||||
|
||||
I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net)
|
||||
myself. These are LGPL and GPL, respectively. (A self-contained initramfs
|
||||
package is planned for the busybox 1.2 release.)
|
||||
package is planned for the busybox 1.3 release.)
|
||||
|
||||
In theory you could use glibc, but that's not well suited for small embedded
|
||||
uses like this. (A "hello world" program statically linked against glibc is
|
||||
over 400k. With uClibc it's 7k. Also note that glibc dlopens libnss to do
|
||||
name lookups, even when otherwise statically linked.)
|
||||
|
||||
A good first step is to get initramfs to run a statically linked "hello world"
|
||||
program as init, and test it under an emulator like qemu (www.qemu.org) or
|
||||
User Mode Linux, like so:
|
||||
|
||||
cat > hello.c << EOF
|
||||
#include <stdio.h>
|
||||
#include <unistd.h>
|
||||
|
||||
int main(int argc, char *argv[])
|
||||
{
|
||||
printf("Hello world!\n");
|
||||
sleep(999999999);
|
||||
}
|
||||
EOF
|
||||
gcc -static hello2.c -o init
|
||||
echo init | cpio -o -H newc | gzip > test.cpio.gz
|
||||
# Testing external initramfs using the initrd loading mechanism.
|
||||
qemu -kernel /boot/vmlinuz -initrd test.cpio.gz /dev/zero
|
||||
|
||||
When debugging a normal root filesystem, it's nice to be able to boot with
|
||||
"init=/bin/sh". The initramfs equivalent is "rdinit=/bin/sh", and it's
|
||||
just as useful.
|
||||
|
||||
Why cpio rather than tar?
|
||||
-------------------------
|
||||
|
||||
@@ -241,7 +330,7 @@ the above threads) is:
|
||||
Future directions:
|
||||
------------------
|
||||
|
||||
Today (2.6.14), initramfs is always compiled in, but not always used. The
|
||||
Today (2.6.16), initramfs is always compiled in, but not always used. The
|
||||
kernel falls back to legacy boot code that is reached only if initramfs does
|
||||
not contain an /init program. The fallback is legacy code, there to ensure a
|
||||
smooth transition and allowing early boot functionality to gradually move to
|
||||
@@ -258,8 +347,9 @@ and so on.
|
||||
|
||||
This kind of complexity (which inevitably includes policy) is rightly handled
|
||||
in userspace. Both klibc and busybox/uClibc are working on simple initramfs
|
||||
packages to drop into a kernel build, and when standard solutions are ready
|
||||
and widely deployed, the kernel's legacy early boot code will become obsolete
|
||||
and a candidate for the feature removal schedule.
|
||||
packages to drop into a kernel build.
|
||||
|
||||
But that's a while off yet.
|
||||
The klibc package has now been accepted into Andrew Morton's 2.6.17-mm tree.
|
||||
The kernel's current early boot code (partition detection, etc) will probably
|
||||
be migrated into a default initramfs, automatically created and used by the
|
||||
kernel build.
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user