Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6

This commit is contained in:
David Woodhouse
2006-06-26 16:35:44 +01:00
2601 changed files with 123934 additions and 101120 deletions
+1 -5
View File
@@ -1573,12 +1573,8 @@ S: 160 00 Praha 6
S: Czech Republic
N: Niels Kristian Bech Jensen
E: nkbj@image.dk
W: http://www.image.dk/~nkbj
E: nkbj1970@hotmail.com
D: Miscellaneous kernel updates and fixes.
S: Dr. Holsts Vej 34, lejl. 164
S: DK-8230 Åbyhøj
S: Denmark
N: Michael K. Johnson
E: johnsonm@redhat.com
+77
View File
@@ -0,0 +1,77 @@
This directory attempts to document the ABI between the Linux kernel and
userspace, and the relative stability of these interfaces. Due to the
everchanging nature of Linux, and the differing maturity levels, these
interfaces should be used by userspace programs in different ways.
We have four different levels of ABI stability, as shown by the four
different subdirectories in this location. Interfaces may change levels
of stability according to the rules described below.
The different levels of stability are:
stable/
This directory documents the interfaces that the developer has
defined to be stable. Userspace programs are free to use these
interfaces with no restrictions, and backward compatibility for
them will be guaranteed for at least 2 years. Most interfaces
(like syscalls) are expected to never change and always be
available.
testing/
This directory documents interfaces that are felt to be stable,
as the main development of this interface has been completed.
The interface can be changed to add new features, but the
current interface will not break by doing this, unless grave
errors or security problems are found in them. Userspace
programs can start to rely on these interfaces, but they must be
aware of changes that can occur before these interfaces move to
be marked stable. Programs that use these interfaces are
strongly encouraged to add their name to the description of
these interfaces, so that the kernel developers can easily
notify them if any changes occur (see the description of the
layout of the files below for details on how to do this.)
obsolete/
This directory documents interfaces that are still remaining in
the kernel, but are marked to be removed at some later point in
time. The description of the interface will document the reason
why it is obsolete and when it can be expected to be removed.
The file Documentation/feature-removal-schedule.txt may describe
some of these interfaces, giving a schedule for when they will
be removed.
removed/
This directory contains a list of the old interfaces that have
been removed from the kernel.
Every file in these directories will contain the following information:
What: Short description of the interface
Date: Date created
KernelVersion: Kernel version this feature first showed up in.
Contact: Primary contact for this interface (may be a mailing list)
Description: Long description of the interface and how to use it.
Users: All users of this interface who wish to be notified when
it changes. This is very important for interfaces in
the "testing" stage, so that kernel developers can work
with userspace developers to ensure that things do not
break in ways that are unacceptable. It is also
important to get feedback for these interfaces to make
sure they are working in a proper way and do not need to
be changed further.
How things move between levels:
Interfaces in stable may move to obsolete, as long as the proper
notification is given.
Interfaces may be removed from obsolete and the kernel as long as the
documented amount of time has gone by.
Interfaces in the testing state can move to the stable state when the
developers feel they are finished. They cannot be removed from the
kernel tree without going through the obsolete state first.
It's up to the developer to place their interfaces in the category they
wish for it to start out in.
+13
View File
@@ -0,0 +1,13 @@
What: devfs
Date: July 2005
Contact: Greg Kroah-Hartman <gregkh@suse.de>
Description:
devfs has been unmaintained for a number of years, has unfixable
races, contains a naming policy within the kernel that is
against the LSB, and can be replaced by using udev.
The files fs/devfs/*, include/linux/devfs_fs*.h will be removed,
along with the the assorted devfs function calls throughout the
kernel tree.
Users:
+10
View File
@@ -0,0 +1,10 @@
What: The kernel syscall interface
Description:
This interface matches much of the POSIX interface and is based
on it and other Unix based interfaces. It will only be added to
over time, and not have things removed from it.
Note that this interface is different for every architecture
that Linux supports. Please see the architecture-specific
documentation for details on the syscall numbers that are to be
mapped to each syscall.
+30
View File
@@ -0,0 +1,30 @@
What: /sys/module
Description:
The /sys/module tree consists of the following structure:
/sys/module/MODULENAME
The name of the module that is in the kernel. This
module name will show up either if the module is built
directly into the kernel, or if it is loaded as a
dyanmic module.
/sys/module/MODULENAME/parameters
This directory contains individual files that are each
individual parameters of the module that are able to be
changed at runtime. See the individual module
documentation as to the contents of these parameters and
what they accomplish.
Note: The individual parameter names and values are not
considered stable, only the fact that they will be
placed in this location within sysfs. See the
individual driver documentation for details as to the
stability of the different parameters.
/sys/module/MODULENAME/refcnt
If the module is able to be unloaded from the kernel, this file
will contain the current reference count of the module.
Note: If the module is built into the kernel, or if the
CONFIG_MODULE_UNLOAD kernel configuration value is not enabled,
this file will not be present.
+16
View File
@@ -0,0 +1,16 @@
What: /sys/class/
Date: Febuary 2006
Contact: Greg Kroah-Hartman <gregkh@suse.de>
Description:
The /sys/class directory will consist of a group of
subdirectories describing individual classes of devices
in the kernel. The individual directories will consist
of either subdirectories, or symlinks to other
directories.
All programs that use this directory tree must be able
to handle both subdirectories or symlinks in order to
work properly.
Users:
udev <linux-hotplug-devel@lists.sourceforge.net>
+25
View File
@@ -0,0 +1,25 @@
What: /sys/devices
Date: February 2006
Contact: Greg Kroah-Hartman <gregkh@suse.de>
Description:
The /sys/devices tree contains a snapshot of the
internal state of the kernel device tree. Devices will
be added and removed dynamically as the machine runs,
and between different kernel versions, the layout of the
devices within this tree will change.
Please do not rely on the format of this tree because of
this. If a program wishes to find different things in
the tree, please use the /sys/class structure and rely
on the symlinks there to point to the proper location
within the /sys/devices tree of the individual devices.
Or rely on the uevent messages to notify programs of
devices being added and removed from this tree to find
the location of those devices.
Note that sometimes not all devices along the directory
chain will have emitted uevent messages, so userspace
programs must be able to handle such occurrences.
Users:
udev <linux-hotplug-devel@lists.sourceforge.net>
+88 -12
View File
@@ -155,7 +155,83 @@ problem, which is called the function-growth-hormone-imbalance syndrome.
See next chapter.
Chapter 5: Functions
Chapter 5: Typedefs
Please don't use things like "vps_t".
It's a _mistake_ to use typedef for structures and pointers. When you see a
vps_t a;
in the source, what does it mean?
In contrast, if it says
struct virtual_container *a;
you can actually tell what "a" is.
Lots of people think that typedefs "help readability". Not so. They are
useful only for:
(a) totally opaque objects (where the typedef is actively used to _hide_
what the object is).
Example: "pte_t" etc. opaque objects that you can only access using
the proper accessor functions.
NOTE! Opaqueness and "accessor functions" are not good in themselves.
The reason we have them for things like pte_t etc. is that there
really is absolutely _zero_ portably accessible information there.
(b) Clear integer types, where the abstraction _helps_ avoid confusion
whether it is "int" or "long".
u8/u16/u32 are perfectly fine typedefs, although they fit into
category (d) better than here.
NOTE! Again - there needs to be a _reason_ for this. If something is
"unsigned long", then there's no reason to do
typedef unsigned long myflags_t;
but if there is a clear reason for why it under certain circumstances
might be an "unsigned int" and under other configurations might be
"unsigned long", then by all means go ahead and use a typedef.
(c) when you use sparse to literally create a _new_ type for
type-checking.
(d) New types which are identical to standard C99 types, in certain
exceptional circumstances.
Although it would only take a short amount of time for the eyes and
brain to become accustomed to the standard types like 'uint32_t',
some people object to their use anyway.
Therefore, the Linux-specific 'u8/u16/u32/u64' types and their
signed equivalents which are identical to standard types are
permitted -- although they are not mandatory in new code of your
own.
When editing existing code which already uses one or the other set
of types, you should conform to the existing choices in that code.
(e) Types safe for use in userspace.
In certain structures which are visible to userspace, we cannot
require C99 types and cannot use the 'u32' form above. Thus, we
use __u32 and similar types in all structures which are shared
with userspace.
Maybe there are other cases too, but the rule should basically be to NEVER
EVER use a typedef unless you can clearly match one of those rules.
In general, a pointer, or a struct that has elements that can reasonably
be directly accessed should _never_ be a typedef.
Chapter 6: Functions
Functions should be short and sweet, and do just one thing. They should
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,
@@ -183,7 +259,7 @@ and it gets confused. You know you're brilliant, but maybe you'd like
to understand what you did 2 weeks from now.
Chapter 6: Centralized exiting of functions
Chapter 7: Centralized exiting of functions
Albeit deprecated by some people, the equivalent of the goto statement is
used frequently by compilers in form of the unconditional jump instruction.
@@ -220,7 +296,7 @@ out:
return result;
}
Chapter 7: Commenting
Chapter 8: Commenting
Comments are good, but there is also a danger of over-commenting. NEVER
try to explain HOW your code works in a comment: it's much better to
@@ -240,7 +316,7 @@ When commenting the kernel API functions, please use the kerneldoc format.
See the files Documentation/kernel-doc-nano-HOWTO.txt and scripts/kernel-doc
for details.
Chapter 8: You've made a mess of it
Chapter 9: You've made a mess of it
That's OK, we all do. You've probably been told by your long-time Unix
user helper that "GNU emacs" automatically formats the C sources for
@@ -288,7 +364,7 @@ re-formatting you may want to take a look at the man page. But
remember: "indent" is not a fix for bad programming.
Chapter 9: Configuration-files
Chapter 10: Configuration-files
For configuration options (arch/xxx/Kconfig, and all the Kconfig files),
somewhat different indentation is used.
@@ -313,7 +389,7 @@ support for file-systems, for instance) should be denoted (DANGEROUS), other
experimental options should be denoted (EXPERIMENTAL).
Chapter 10: Data structures
Chapter 11: Data structures
Data structures that have visibility outside the single-threaded
environment they are created and destroyed in should always have
@@ -344,7 +420,7 @@ Remember: if another thread can find your data structure, and you don't
have a reference count on it, you almost certainly have a bug.
Chapter 11: Macros, Enums and RTL
Chapter 12: Macros, Enums and RTL
Names of macros defining constants and labels in enums are capitalized.
@@ -399,7 +475,7 @@ The cpp manual deals with macros exhaustively. The gcc internals manual also
covers RTL which is used frequently with assembly language in the kernel.
Chapter 12: Printing kernel messages
Chapter 13: Printing kernel messages
Kernel developers like to be seen as literate. Do mind the spelling
of kernel messages to make a good impression. Do not use crippled
@@ -410,7 +486,7 @@ Kernel messages do not have to be terminated with a period.
Printing numbers in parentheses (%d) adds no value and should be avoided.
Chapter 13: Allocating memory
Chapter 14: Allocating memory
The kernel provides the following general purpose memory allocators:
kmalloc(), kzalloc(), kcalloc(), and vmalloc(). Please refer to the API
@@ -429,7 +505,7 @@ from void pointer to any other pointer type is guaranteed by the C programming
language.
Chapter 14: The inline disease
Chapter 15: The inline disease
There appears to be a common misperception that gcc has a magic "make me
faster" speedup option called "inline". While the use of inlines can be
@@ -457,7 +533,7 @@ something it would have done anyway.
Chapter 15: References
Appendix I: References
The C Programming Language, Second Edition
by Brian W. Kernighan and Dennis M. Ritchie.
@@ -481,4 +557,4 @@ Kernel CodingStyle, by greg@kroah.com at OLS 2002:
http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/
--
Last updated on 30 December 2005 by a community effort on LKML.
Last updated on 30 April 2006.
+55 -2
View File
@@ -62,6 +62,8 @@
<sect1><title>Internal Functions</title>
!Ikernel/exit.c
!Ikernel/signal.c
!Iinclude/linux/kthread.h
!Ekernel/kthread.c
</sect1>
<sect1><title>Kernel objects manipulation</title>
@@ -114,9 +116,33 @@ X!Ilib/string.c
</sect1>
</chapter>
<chapter id="kernel-lib">
<title>Basic Kernel Library Functions</title>
<para>
The Linux kernel provides more basic utility functions.
</para>
<sect1><title>Bitmap Operations</title>
!Elib/bitmap.c
!Ilib/bitmap.c
</sect1>
<sect1><title>Command-line Parsing</title>
!Elib/cmdline.c
</sect1>
<sect1><title>CRC Functions</title>
!Elib/crc16.c
!Elib/crc32.c
!Elib/crc-ccitt.c
</sect1>
</chapter>
<chapter id="mm">
<title>Memory Management in Linux</title>
<sect1><title>The Slab Cache</title>
!Iinclude/linux/slab.h
!Emm/slab.c
</sect1>
<sect1><title>User Space Memory Access</title>
@@ -280,12 +306,13 @@ X!Ekernel/module.c
<sect1><title>MTRR Handling</title>
!Earch/i386/kernel/cpu/mtrr/main.c
</sect1>
<sect1><title>PCI Support Library</title>
!Edrivers/pci/pci.c
!Edrivers/pci/pci-driver.c
!Edrivers/pci/remove.c
!Edrivers/pci/pci-acpi.c
<!-- kerneldoc does not understand to __devinit
<!-- kerneldoc does not understand __devinit
X!Edrivers/pci/search.c
-->
!Edrivers/pci/msi.c
@@ -314,6 +341,13 @@ X!Earch/i386/kernel/mca.c
</sect1>
</chapter>
<chapter id="firmware">
<title>Firmware Interfaces</title>
<sect1><title>DMI Interfaces</title>
!Edrivers/firmware/dmi_scan.c
</sect1>
</chapter>
<chapter id="devfs">
<title>The Device File System</title>
!Efs/devfs/base.c
@@ -331,6 +365,18 @@ X!Earch/i386/kernel/mca.c
!Esecurity/security.c
</chapter>
<chapter id="audit">
<title>Audit Interfaces</title>
!Ekernel/audit.c
!Ikernel/auditsc.c
!Ikernel/auditfilter.c
</chapter>
<chapter id="accounting">
<title>Accounting Framework</title>
!Ikernel/acct.c
</chapter>
<chapter id="pmfuncs">
<title>Power Management</title>
!Ekernel/power/pm.c
@@ -390,7 +436,6 @@ X!Edrivers/pnp/system.c
</sect1>
</chapter>
<chapter id="blkdev">
<title>Block Devices</title>
!Eblock/ll_rw_blk.c
@@ -401,6 +446,14 @@ X!Edrivers/pnp/system.c
!Edrivers/char/misc.c
</chapter>
<chapter id="parportdev">
<title>Parallel Port Devices</title>
!Iinclude/linux/parport.h
!Edrivers/parport/ieee1284.c
!Edrivers/parport/share.c
!Idrivers/parport/daisy.c
</chapter>
<chapter id="viddev">
<title>Video4Linux</title>
!Edrivers/media/video/videodev.c
+80 -24
View File
@@ -169,6 +169,22 @@ void (*tf_read) (struct ata_port *ap, struct ata_taskfile *tf);
</sect2>
<sect2><title>PIO data read/write</title>
<programlisting>
void (*data_xfer) (struct ata_device *, unsigned char *, unsigned int, int);
</programlisting>
<para>
All bmdma-style drivers must implement this hook. This is the low-level
operation that actually copies the data bytes during a PIO data
transfer.
Typically the driver
will choose one of ata_pio_data_xfer_noirq(), ata_pio_data_xfer(), or
ata_mmio_data_xfer().
</para>
</sect2>
<sect2><title>ATA command execute</title>
<programlisting>
void (*exec_command)(struct ata_port *ap, struct ata_taskfile *tf);
@@ -204,11 +220,10 @@ command.
<programlisting>
u8 (*check_status)(struct ata_port *ap);
u8 (*check_altstatus)(struct ata_port *ap);
u8 (*check_err)(struct ata_port *ap);
</programlisting>
<para>
Reads the Status/AltStatus/Error ATA shadow register from
Reads the Status/AltStatus ATA shadow register from
hardware. On some hardware, reading the Status register has
the side effect of clearing the interrupt condition.
Most drivers for taskfile-based hardware use
@@ -269,23 +284,6 @@ void (*set_mode) (struct ata_port *ap);
</sect2>
<sect2><title>Reset ATA bus</title>
<programlisting>
void (*phy_reset) (struct ata_port *ap);
</programlisting>
<para>
The very first step in the probe phase. Actions vary depending
on the bus type, typically. After waking up the device and probing
for device presence (PATA and SATA), typically a soft reset
(SRST) will be performed. Drivers typically use the helper
functions ata_bus_reset() or sata_phy_reset() for this hook.
Many SATA drivers use sata_phy_reset() or call it from within
their own phy_reset() functions.
</para>
</sect2>
<sect2><title>Control PCI IDE BMDMA engine</title>
<programlisting>
void (*bmdma_setup) (struct ata_queued_cmd *qc);
@@ -354,16 +352,74 @@ int (*qc_issue) (struct ata_queued_cmd *qc);
</sect2>
<sect2><title>Timeout (error) handling</title>
<sect2><title>Exception and probe handling (EH)</title>
<programlisting>
void (*eng_timeout) (struct ata_port *ap);
void (*phy_reset) (struct ata_port *ap);
</programlisting>
<para>
This is a high level error handling function, called from the
error handling thread, when a command times out. Most newer
hardware will implement its own error handling code here. IDE BMDMA
drivers may use the helper function ata_eng_timeout().
Deprecated. Use ->error_handler() instead.
</para>
<programlisting>
void (*freeze) (struct ata_port *ap);
void (*thaw) (struct ata_port *ap);
</programlisting>
<para>
ata_port_freeze() is called when HSM violations or some other
condition disrupts normal operation of the port. A frozen port
is not allowed to perform any operation until the port is
thawed, which usually follows a successful reset.
</para>
<para>
The optional ->freeze() callback can be used for freezing the port
hardware-wise (e.g. mask interrupt and stop DMA engine). If a
port cannot be frozen hardware-wise, the interrupt handler
must ack and clear interrupts unconditionally while the port
is frozen.
</para>
<para>
The optional ->thaw() callback is called to perform the opposite of ->freeze():
prepare the port for normal operation once again. Unmask interrupts,
start DMA engine, etc.
</para>
<programlisting>
void (*error_handler) (struct ata_port *ap);
</programlisting>
<para>
->error_handler() is a driver's hook into probe, hotplug, and recovery
and other exceptional conditions. The primary responsibility of an
implementation is to call ata_do_eh() or ata_bmdma_drive_eh() with a set
of EH hooks as arguments:
</para>
<para>
'prereset' hook (may be NULL) is called during an EH reset, before any other actions
are taken.
</para>
<para>
'postreset' hook (may be NULL) is called after the EH reset is performed. Based on
existing conditions, severity of the problem, and hardware capabilities,
</para>
<para>
Either 'softreset' (may be NULL) or 'hardreset' (may be NULL) will be
called to perform the low-level EH reset.
</para>
<programlisting>
void (*post_internal_cmd) (struct ata_queued_cmd *qc);
</programlisting>
<para>
Perform any hardware-specific actions necessary to finish processing
after executing a probe-time or EH-time command via ata_exec_internal().
</para>
</sect2>
+41 -3
View File
@@ -144,9 +144,47 @@ over a rather long period of time, but improvements are always welcome!
whether the increased speed is worth it.
8. Although synchronize_rcu() is a bit slower than is call_rcu(),
it usually results in simpler code. So, unless update performance
is important or the updaters cannot block, synchronize_rcu()
should be used in preference to call_rcu().
it usually results in simpler code. So, unless update
performance is critically important or the updaters cannot block,
synchronize_rcu() should be used in preference to call_rcu().
An especially important property of the synchronize_rcu()
primitive is that it automatically self-limits: if grace periods
are delayed for whatever reason, then the synchronize_rcu()
primitive will correspondingly delay updates. In contrast,
code using call_rcu() should explicitly limit update rate in
cases where grace periods are delayed, as failing to do so can
result in excessive realtime latencies or even OOM conditions.
Ways of gaining this self-limiting property when using call_rcu()
include:
a. Keeping a count of the number of data-structure elements
used by the RCU-protected data structure, including those
waiting for a grace period to elapse. Enforce a limit
on this number, stalling updates as needed to allow
previously deferred frees to complete.
Alternatively, limit only the number awaiting deferred
free rather than the total number of elements.
b. Limiting update rate. For example, if updates occur only
once per hour, then no explicit rate limiting is required,
unless your system is already badly broken. The dcache
subsystem takes this approach -- updates are guarded
by a global lock, limiting their rate.
c. Trusted update -- if updates can only be done manually by
superuser or some other trusted user, then it might not
be necessary to automatically limit them. The theory
here is that superuser already has lots of ways to crash
the machine.
d. Use call_rcu_bh() rather than call_rcu(), in order to take
advantage of call_rcu_bh()'s faster grace periods.
e. Periodically invoke synchronize_rcu(), permitting a limited
number of updates per grace period.
9. All RCU list-traversal primitives, which include
list_for_each_rcu(), list_for_each_entry_rcu(),
+11 -2
View File
@@ -184,7 +184,17 @@ synchronize_rcu()
blocking, it registers a function and argument which are invoked
after all ongoing RCU read-side critical sections have completed.
This callback variant is particularly useful in situations where
it is illegal to block.
it is illegal to block or where update-side performance is
critically important.
However, the call_rcu() API should not be used lightly, as use
of the synchronize_rcu() API generally results in simpler code.
In addition, the synchronize_rcu() API has the nice property
of automatically limiting update rate should grace periods
be delayed. This property results in system resilience in face
of denial-of-service attacks. Code using call_rcu() should limit
update rate in order to gain this same sort of resilience. See
checklist.txt for some approaches to limiting the update rate.
rcu_assign_pointer()
@@ -790,7 +800,6 @@ RCU pointer update:
RCU grace period:
synchronize_kernel (deprecated)
synchronize_net
synchronize_sched
synchronize_rcu
+57
View File
@@ -0,0 +1,57 @@
Linux Kernel patch sumbittal checklist
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here are some basic things that developers should do if they
want to see their kernel patch submittals accepted quicker.
These are all above and beyond the documentation that is provided
in Documentation/SubmittingPatches and elsewhere about submitting
Linux kernel patches.
- Builds cleanly with applicable or modified CONFIG options =y, =m, and =n.
No gcc warnings/errors, no linker warnings/errors.
- Passes allnoconfig, allmodconfig
- Builds on multiple CPU arch-es by using local cross-compile tools
or something like PLM at OSDL.
- ppc64 is a good architecture for cross-compilation checking because it
tends to use `unsigned long' for 64-bit quantities.
- Matches kernel coding style(!)
- Any new or modified CONFIG options don't muck up the config menu.
- All new Kconfig options have help text.
- Has been carefully reviewed with respect to relevant Kconfig
combinations. This is very hard to get right with testing --
brainpower pays off here.
- Check cleanly with sparse.
- Use 'make checkstack' and 'make namespacecheck' and fix any
problems that they find. Note: checkstack does not point out
problems explicitly, but any one function that uses more than
512 bytes on the stack is a candidate for change.
- Include kernel-doc to document global kernel APIs. (Not required
for static functions, but OK there also.) Use 'make htmldocs'
or 'make mandocs' to check the kernel-doc and fix any issues.
- Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT,
CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES,
CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP all simultaneously
enabled.
- Has been build- and runtime tested with and without CONFIG_SMP and
CONFIG_PREEMPT.
- If the patch affects IO/Disk, etc: has been tested with and without
CONFIG_LBD.
2006-APR-27
+116 -24
View File
@@ -3,7 +3,7 @@
Maintained by Torben Mathiasen <device@lanana.org>
Last revised: 25 January 2005
Last revised: 15 May 2006
This list is the Linux Device List, the official registry of allocated
device numbers and /dev directory nodes for the Linux operating
@@ -94,7 +94,6 @@ Your cooperation is appreciated.
9 = /dev/urandom Faster, less secure random number gen.
10 = /dev/aio Asyncronous I/O notification interface
11 = /dev/kmsg Writes to this come out as printk's
12 = /dev/oldmem Access to crash dump from kexec kernel
1 block RAM disk
0 = /dev/ram0 First RAM disk
1 = /dev/ram1 Second RAM disk
@@ -262,13 +261,13 @@ Your cooperation is appreciated.
NOTE: These devices permit both read and write access.
7 block Loopback devices
0 = /dev/loop0 First loopback device
1 = /dev/loop1 Second loopback device
0 = /dev/loop0 First loop device
1 = /dev/loop1 Second loop device
...
The loopback devices are used to mount filesystems not
The loop devices are used to mount filesystems not
associated with block devices. The binding to the
loopback devices is handled by mount(8) or losetup(8).
loop devices is handled by mount(8) or losetup(8).
8 block SCSI disk devices (0-15)
0 = /dev/sda First SCSI disk whole disk
@@ -943,7 +942,7 @@ Your cooperation is appreciated.
240 = /dev/ftlp FTL on 16th Memory Technology Device
Partitions are handled in the same way as for IDE
disks (see major number 3) expect that the partition
disks (see major number 3) except that the partition
limit is 15 rather than 63 per disk (same as SCSI.)
45 char isdn4linux ISDN BRI driver
@@ -1168,7 +1167,7 @@ Your cooperation is appreciated.
The filename of the encrypted container and the passwords
are sent via ioctls (using the sdmount tool) to the master
node which then activates them via one of the
/dev/scramdisk/x nodes for loopback mounting (all handled
/dev/scramdisk/x nodes for loop mounting (all handled
through the sdmount tool).
Requested by: andy@scramdisklinux.org
@@ -2538,18 +2537,32 @@ Your cooperation is appreciated.
0 = /dev/usb/lp0 First USB printer
...
15 = /dev/usb/lp15 16th USB printer
16 = /dev/usb/mouse0 First USB mouse
...
31 = /dev/usb/mouse15 16th USB mouse
32 = /dev/usb/ez0 First USB firmware loader
...
47 = /dev/usb/ez15 16th USB firmware loader
48 = /dev/usb/scanner0 First USB scanner
...
63 = /dev/usb/scanner15 16th USB scanner
64 = /dev/usb/rio500 Diamond Rio 500
65 = /dev/usb/usblcd USBLCD Interface (info@usblcd.de)
66 = /dev/usb/cpad0 Synaptics cPad (mouse/LCD)
96 = /dev/usb/hiddev0 1st USB HID device
...
111 = /dev/usb/hiddev15 16th USB HID device
112 = /dev/usb/auer0 1st auerswald ISDN device
...
127 = /dev/usb/auer15 16th auerswald ISDN device
128 = /dev/usb/brlvgr0 First Braille Voyager device
...
131 = /dev/usb/brlvgr3 Fourth Braille Voyager device
132 = /dev/usb/idmouse ID Mouse (fingerprint scanner) device
133 = /dev/usb/sisusbvga1 First SiSUSB VGA device
...
140 = /dev/usb/sisusbvga8 Eigth SISUSB VGA device
144 = /dev/usb/lcd USB LCD device
160 = /dev/usb/legousbtower0 1st USB Legotower device
...
175 = /dev/usb/legousbtower15 16th USB Legotower device
240 = /dev/usb/dabusb0 First daubusb device
...
243 = /dev/usb/dabusb3 Fourth dabusb device
180 block USB block devices
0 = /dev/uba First USB block device
@@ -2710,6 +2723,17 @@ Your cooperation is appreciated.
1 = /dev/cpu/1/msr MSRs on CPU 1
...
202 block Xen Virtual Block Device
0 = /dev/xvda First Xen VBD whole disk
16 = /dev/xvdb Second Xen VBD whole disk
32 = /dev/xvdc Third Xen VBD whole disk
...
240 = /dev/xvdp Sixteenth Xen VBD whole disk
Partitions are handled in the same way as for IDE
disks (see major number 3) except that the limit on
partitions is 15.
203 char CPU CPUID information
0 = /dev/cpu/0/cpuid CPUID on CPU 0
1 = /dev/cpu/1/cpuid CPUID on CPU 1
@@ -2747,11 +2771,27 @@ Your cooperation is appreciated.
46 = /dev/ttyCPM0 PPC CPM (SCC or SMC) - port 0
...
47 = /dev/ttyCPM5 PPC CPM (SCC or SMC) - port 5
50 = /dev/ttyIOC40 Altix serial card
50 = /dev/ttyIOC0 Altix serial card
...
81 = /dev/ttyIOC431 Altix serial card
82 = /dev/ttyVR0 NEC VR4100 series SIU
83 = /dev/ttyVR1 NEC VR4100 series DSIU
81 = /dev/ttyIOC31 Altix serial card
82 = /dev/ttyVR0 NEC VR4100 series SIU
83 = /dev/ttyVR1 NEC VR4100 series DSIU
84 = /dev/ttyIOC84 Altix ioc4 serial card
...
115 = /dev/ttyIOC115 Altix ioc4 serial card
116 = /dev/ttySIOC0 Altix ioc3 serial card
...
147 = /dev/ttySIOC31 Altix ioc3 serial card
148 = /dev/ttyPSC0 PPC PSC - port 0
...
153 = /dev/ttyPSC5 PPC PSC - port 5
154 = /dev/ttyAT0 ATMEL serial port 0
...
169 = /dev/ttyAT15 ATMEL serial port 15
170 = /dev/ttyNX0 Hilscher netX serial port 0
...
185 = /dev/ttyNX15 Hilscher netX serial port 15
186 = /dev/ttyJ0 JTAG1 DCC protocol based serial port emulation
205 char Low-density serial ports (alternate device)
0 = /dev/culu0 Callout device for ttyLU0
@@ -2786,8 +2826,8 @@ Your cooperation is appreciated.
50 = /dev/cuioc40 Callout device for ttyIOC40
...
81 = /dev/cuioc431 Callout device for ttyIOC431
82 = /dev/cuvr0 Callout device for ttyVR0
83 = /dev/cuvr1 Callout device for ttyVR1
82 = /dev/cuvr0 Callout device for ttyVR0
83 = /dev/cuvr1 Callout device for ttyVR1
206 char OnStream SC-x0 tape devices
@@ -2897,7 +2937,6 @@ Your cooperation is appreciated.
...
196 = /dev/dvb/adapter3/video0 first video decoder of fourth card
216 char Bluetooth RFCOMM TTY devices
0 = /dev/rfcomm0 First Bluetooth RFCOMM TTY device
1 = /dev/rfcomm1 Second Bluetooth RFCOMM TTY device
@@ -3002,12 +3041,43 @@ Your cooperation is appreciated.
ioctl()'s can be used to rewind the tape regardless of
the device used to access it.
231 char InfiniBand MAD
231 char InfiniBand
0 = /dev/infiniband/umad0
1 = /dev/infiniband/umad1
...
...
63 = /dev/infiniband/umad63 63rd InfiniBandMad device
64 = /dev/infiniband/issm0 First InfiniBand IsSM device
65 = /dev/infiniband/issm1 Second InfiniBand IsSM device
...
127 = /dev/infiniband/issm63 63rd InfiniBand IsSM device
128 = /dev/infiniband/uverbs0 First InfiniBand verbs device
129 = /dev/infiniband/uverbs1 Second InfiniBand verbs device
...
159 = /dev/infiniband/uverbs31 31st InfiniBand verbs device
232-239 UNASSIGNED
232 char Biometric Devices
0 = /dev/biometric/sensor0/fingerprint first fingerprint sensor on first device
1 = /dev/biometric/sensor0/iris first iris sensor on first device
2 = /dev/biometric/sensor0/retina first retina sensor on first device
3 = /dev/biometric/sensor0/voiceprint first voiceprint sensor on first device
4 = /dev/biometric/sensor0/facial first facial sensor on first device
5 = /dev/biometric/sensor0/hand first hand sensor on first device
...
10 = /dev/biometric/sensor1/fingerprint first fingerprint sensor on second device
...
20 = /dev/biometric/sensor2/fingerprint first fingerprint sensor on third device
...
233 char PathScale InfiniPath interconnect
0 = /dev/ipath Primary device for programs (any unit)
1 = /dev/ipath0 Access specifically to unit 0
2 = /dev/ipath1 Access specifically to unit 1
...
4 = /dev/ipath3 Access specifically to unit 3
129 = /dev/ipath_sma Device used by Subnet Management Agent
130 = /dev/ipath_diag Device used by diagnostics programs
234-239 UNASSIGNED
240-254 char LOCAL/EXPERIMENTAL USE
240-254 block LOCAL/EXPERIMENTAL USE
@@ -3021,6 +3091,28 @@ Your cooperation is appreciated.
This major is reserved to assist the expansion to a
larger number space. No device nodes with this major
should ever be created on the filesystem.
(This is probaly not true anymore, but I'll leave it
for now /Torben)
---LARGE MAJORS!!!!!---
256 char Equinox SST multi-port serial boards
0 = /dev/ttyEQ0 First serial port on first Equinox SST board
127 = /dev/ttyEQ127 Last serial port on first Equinox SST board
128 = /dev/ttyEQ128 First serial port on second Equinox SST board
...
1027 = /dev/ttyEQ1027 Last serial port on eighth Equinox SST board
256 block Resident Flash Disk Flash Translation Layer
0 = /dev/rfda First RFD FTL layer
16 = /dev/rfdb Second RFD FTL layer
...
240 = /dev/rfdp 16th RFD FTL layer
257 char Phoenix Technologies Cryptographic Services Driver
0 = /dev/ptlsec Crypto Services Driver
**** ADDITIONAL /dev DIRECTORY ENTRIES
@@ -33,21 +33,6 @@ Who: Adrian Bunk <bunk@stusta.de>
---------------------------
What: RCU API moves to EXPORT_SYMBOL_GPL
When: April 2006
Files: include/linux/rcupdate.h, kernel/rcupdate.c
Why: Outside of Linux, the only implementations of anything even
vaguely resembling RCU that I am aware of are in DYNIX/ptx,
VM/XA, Tornado, and K42. I do not expect anyone to port binary
drivers or kernel modules from any of these, since the first two
are owned by IBM and the last two are open-source research OSes.
So these will move to GPL after a grace period to allow
people, who might be using implementations that I am not aware
of, to adjust to this upcoming change.
Who: Paul E. McKenney <paulmck@us.ibm.com>
---------------------------
What: raw1394: requests of type RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN
When: November 2006
Why: Deprecated in favour of the new ioctl-based rawiso interface, which is
+5 -4
View File
@@ -99,7 +99,7 @@ prototypes:
int (*sync_fs)(struct super_block *sb, int wait);
void (*write_super_lockfs) (struct super_block *);
void (*unlockfs) (struct super_block *);
int (*statfs) (struct super_block *, struct kstatfs *);
int (*statfs) (struct dentry *, struct kstatfs *);
int (*remount_fs) (struct super_block *, int *, char *);
void (*clear_inode) (struct inode *);
void (*umount_begin) (struct super_block *);
@@ -142,15 +142,16 @@ see also dquot_operations section.
--------------------------- file_system_type ---------------------------
prototypes:
struct super_block *(*get_sb) (struct file_system_type *, int,
const char *, void *);
struct int (*get_sb) (struct file_system_type *, int,
const char *, void *, struct vfsmount *);
void (*kill_sb) (struct super_block *);
locking rules:
may block BKL
get_sb yes yes
kill_sb yes yes
->get_sb() returns error or a locked superblock (exclusive on ->s_umount).
->get_sb() returns error or 0 with locked superblock attached to the vfsmount
(exclusive on ->s_umount).
->kill_sb() takes a write-locked superblock, does all shutdown work on it,
unlocks and drops the reference.
@@ -19,7 +19,7 @@ following procedure:
(2) Have the follow_link() op do the following steps:
(a) Call do_kern_mount() to call the appropriate filesystem to set up a
(a) Call vfs_kern_mount() to call the appropriate filesystem to set up a
superblock and gain a vfsmount structure representing it.
(b) Copy the nameidata provided as an argument and substitute the dentry
+71 -47
View File
@@ -18,6 +18,14 @@ Non-privileged mount (or user mount):
user. NOTE: this is not the same as mounts allowed with the "user"
option in /etc/fstab, which is not discussed here.
Filesystem connection:
A connection between the filesystem daemon and the kernel. The
connection exists until either the daemon dies, or the filesystem is
umounted. Note that detaching (or lazy umounting) the filesystem
does _not_ break the connection, in this case it will exist until
the last reference to the filesystem is released.
Mount owner:
The user who does the mounting.
@@ -86,16 +94,20 @@ Mount options
The default is infinite. Note that the size of read requests is
limited anyway to 32 pages (which is 128kbyte on i386).
Sysfs
~~~~~
Control filesystem
~~~~~~~~~~~~~~~~~~
FUSE sets up the following hierarchy in sysfs:
There's a control filesystem for FUSE, which can be mounted by:
/sys/fs/fuse/connections/N/
mount -t fusectl none /sys/fs/fuse/connections
where N is an increasing number allocated to each new connection.
Mounting it under the '/sys/fs/fuse/connections' directory makes it
backwards compatible with earlier versions.
For each connection the following attributes are defined:
Under the fuse control filesystem each connection has a directory
named by a unique number.
For each connection the following files exist within this directory:
'waiting'
@@ -110,7 +122,47 @@ For each connection the following attributes are defined:
connection. This means that all waiting requests will be aborted an
error returned for all aborted and new requests.
Only a privileged user may read or write these attributes.
Only the owner of the mount may read or write these files.
Interrupting filesystem operations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If a process issuing a FUSE filesystem request is interrupted, the
following will happen:
1) If the request is not yet sent to userspace AND the signal is
fatal (SIGKILL or unhandled fatal signal), then the request is
dequeued and returns immediately.
2) If the request is not yet sent to userspace AND the signal is not
fatal, then an 'interrupted' flag is set for the request. When
the request has been successfully transfered to userspace and
this flag is set, an INTERRUPT request is queued.
3) If the request is already sent to userspace, then an INTERRUPT
request is queued.
INTERRUPT requests take precedence over other requests, so the
userspace filesystem will receive queued INTERRUPTs before any others.
The userspace filesystem may ignore the INTERRUPT requests entirely,
or may honor them by sending a reply to the _original_ request, with
the error set to EINTR.
It is also possible that there's a race between processing the
original request and it's INTERRUPT request. There are two possibilities:
1) The INTERRUPT request is processed before the original request is
processed
2) The INTERRUPT request is processed after the original request has
been answered
If the filesystem cannot find the original request, it should wait for
some timeout and/or a number of new requests to arrive, after which it
should reply to the INTERRUPT request with an EAGAIN error. In case
1) the INTERRUPT request will be requeued. In case 2) the INTERRUPT
reply will be ignored.
Aborting a filesystem connection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -139,8 +191,8 @@ the filesystem. There are several ways to do this:
- Use forced umount (umount -f). Works in all cases but only if
filesystem is still attached (it hasn't been lazy unmounted)
- Abort filesystem through the sysfs interface. Most powerful
method, always works.
- Abort filesystem through the FUSE control filesystem. Most
powerful method, always works.
How do non-privileged mounts work?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -304,25 +356,7 @@ Scenario 1 - Simple deadlock
| | for "file"]
| | *DEADLOCK*
The solution for this is to allow requests to be interrupted while
they are in userspace:
| [interrupted by signal] |
| <fuse_unlink() |
| [release semaphore] | [semaphore acquired]
| <sys_unlink() |
| | >fuse_unlink()
| | [queue req on fc->pending]
| | [wake up fc->waitq]
| | [sleep on req->waitq]
If the filesystem daemon was single threaded, this will stop here,
since there's no other thread to dequeue and execute the request.
In this case the solution is to kill the FUSE daemon as well. If
there are multiple serving threads, you just have to kill them as
long as any remain.
Moral: a filesystem which deadlocks, can soon find itself dead.
The solution for this is to allow the filesystem to be aborted.
Scenario 2 - Tricky deadlock
----------------------------
@@ -355,24 +389,14 @@ but is caused by a pagefault.
| | [lock page]
| | * DEADLOCK *
Solution is again to let the the request be interrupted (not
elaborated further).
Solution is basically the same as above.
An additional problem is that while the write buffer is being
copied to the request, the request must not be interrupted. This
is because the destination address of the copy may not be valid
after the request is interrupted.
An additional problem is that while the write buffer is being copied
to the request, the request must not be interrupted/aborted. This is
because the destination address of the copy may not be valid after the
request has returned.
This is solved with doing the copy atomically, and allowing
interruption while the page(s) belonging to the write buffer are
faulted with get_user_pages(). The 'req->locked' flag indicates
when the copy is taking place, and interruption is delayed until
this flag is unset.
Scenario 3 - Tricky deadlock with asynchronous read
---------------------------------------------------
The same situation as above, except thread-1 will wait on page lock
and hence it will be uninterruptible as well. The solution is to
abort the connection with forced umount (if mount is attached) or
through the abort attribute in sysfs.
This is solved with doing the copy atomically, and allowing abort
while the page(s) belonging to the write buffer are faulted with
get_user_pages(). The 'req->locked' flag indicates when the copy is
taking place, and abort is delayed until this flag is unset.
+4 -3
View File
@@ -50,10 +50,11 @@ Turn your foo_read_super() into a function that would return 0 in case of
success and negative number in case of error (-EINVAL unless you have more
informative error value to report). Call it foo_fill_super(). Now declare
struct super_block foo_get_sb(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data)
int foo_get_sb(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data, struct vfsmount *mnt)
{
return get_sb_bdev(fs_type, flags, dev_name, data, ext2_fill_super);
return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super,
mnt);
}
(or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of
@@ -70,11 +70,13 @@ tmpfs mounts. See Documentation/filesystems/tmpfs.txt for more information.
What is rootfs?
---------------
Rootfs is a special instance of ramfs, which is always present in 2.6 systems.
(It's used internally as the starting and stopping point for searches of the
kernel's doubly-linked list of mount points.)
Rootfs is a special instance of ramfs (or tmpfs, if that's enabled), which is
always present in 2.6 systems. You can't unmount rootfs for approximately the
same reason you can't kill the init process; rather than having special code
to check for and handle an empty list, it's smaller and simpler for the kernel
to just make sure certain lists can't become empty.
Most systems just mount another filesystem over it and ignore it. The
Most systems just mount another filesystem over rootfs and ignore it. The
amount of space an empty instance of ramfs takes up is tiny.
What is initramfs?
@@ -92,14 +94,16 @@ out of that.
All this differs from the old initrd in several ways:
- The old initrd was a separate file, while the initramfs archive is linked
into the linux kernel image. (The directory linux-*/usr is devoted to
generating this archive during the build.)
- The old initrd was always a separate file, while the initramfs archive is
linked into the linux kernel image. (The directory linux-*/usr is devoted
to generating this archive during the build.)
- The old initrd file was a gzipped filesystem image (in some file format,
such as ext2, that had to be built into the kernel), while the new
such as ext2, that needed a driver built into the kernel), while the new
initramfs archive is a gzipped cpio archive (like tar only simpler,
see cpio(1) and Documentation/early-userspace/buffer-format.txt).
see cpio(1) and Documentation/early-userspace/buffer-format.txt). The
kernel's cpio extraction code is not only extremely small, it's also
__init data that can be discarded during the boot process.
- The program run by the old initrd (which was called /initrd, not /init) did
some setup and then returned to the kernel, while the init program from
@@ -124,13 +128,14 @@ Populating initramfs:
The 2.6 kernel build process always creates a gzipped cpio format initramfs
archive and links it into the resulting kernel binary. By default, this
archive is empty (consuming 134 bytes on x86). The config option
CONFIG_INITRAMFS_SOURCE (for some reason buried under devices->block devices
in menuconfig, and living in usr/Kconfig) can be used to specify a source for
the initramfs archive, which will automatically be incorporated into the
resulting binary. This option can point to an existing gzipped cpio archive, a
directory containing files to be archived, or a text file specification such
as the following example:
archive is empty (consuming 134 bytes on x86).
The config option CONFIG_INITRAMFS_SOURCE (for some reason buried under
devices->block devices in menuconfig, and living in usr/Kconfig) can be used
to specify a source for the initramfs archive, which will automatically be
incorporated into the resulting binary. This option can point to an existing
gzipped cpio archive, a directory containing files to be archived, or a text
file specification such as the following example:
dir /dev 755 0 0
nod /dev/console 644 0 0 c 5 1
@@ -146,23 +151,84 @@ as the following example:
Run "usr/gen_init_cpio" (after the kernel build) to get a usage message
documenting the above file format.
One advantage of the text file is that root access is not required to
One advantage of the configuration file is that root access is not required to
set permissions or create device nodes in the new archive. (Note that those
two example "file" entries expect to find files named "init.sh" and "busybox" in
a directory called "initramfs", under the linux-2.6.* directory. See
Documentation/early-userspace/README for more details.)
The kernel does not depend on external cpio tools, gen_init_cpio is created
from usr/gen_init_cpio.c which is entirely self-contained, and the kernel's
boot-time extractor is also (obviously) self-contained. However, if you _do_
happen to have cpio installed, the following command line can extract the
generated cpio image back into its component files:
The kernel does not depend on external cpio tools. If you specify a
directory instead of a configuration file, the kernel's build infrastructure
creates a configuration file from that directory (usr/Makefile calls
scripts/gen_initramfs_list.sh), and proceeds to package up that directory
using the config file (by feeding it to usr/gen_init_cpio, which is created
from usr/gen_init_cpio.c). The kernel's build-time cpio creation code is
entirely self-contained, and the kernel's boot-time extractor is also
(obviously) self-contained.
The one thing you might need external cpio utilities installed for is creating
or extracting your own preprepared cpio files to feed to the kernel build
(instead of a config file or directory).
The following command line can extract a cpio image (either by the above script
or by the kernel build) back into its component files:
cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames
The following shell script can create a prebuilt cpio archive you can
use in place of the above config file:
#!/bin/sh
# Copyright 2006 Rob Landley <rob@landley.net> and TimeSys Corporation.
# Licensed under GPL version 2
if [ $# -ne 2 ]
then
echo "usage: mkinitramfs directory imagename.cpio.gz"
exit 1
fi
if [ -d "$1" ]
then
echo "creating $2 from $1"
(cd "$1"; find . | cpio -o -H newc | gzip) > "$2"
else
echo "First argument must be a directory"
exit 1
fi
Note: The cpio man page contains some bad advice that will break your initramfs
archive if you follow it. It says "A typical way to generate the list
of filenames is with the find command; you should give find the -depth option
to minimize problems with permissions on directories that are unwritable or not
searchable." Don't do this when creating initramfs.cpio.gz images, it won't
work. The Linux kernel cpio extractor won't create files in a directory that
doesn't exist, so the directory entries must go before the files that go in
those directories. The above script gets them in the right order.
External initramfs images:
--------------------------
If the kernel has initrd support enabled, an external cpio.gz archive can also
be passed into a 2.6 kernel in place of an initrd. In this case, the kernel
will autodetect the type (initramfs, not initrd) and extract the external cpio
archive into rootfs before trying to run /init.
This has the memory efficiency advantages of initramfs (no ramdisk block
device) but the separate packaging of initrd (which is nice if you have
non-GPL code you'd like to run from initramfs, without conflating it with
the GPL licensed Linux kernel binary).
It can also be used to supplement the kernel's built-in initamfs image. The
files in the external archive will overwrite any conflicting files in
the built-in initramfs archive. Some distributors also prefer to customize
a single kernel image with task-specific initramfs images, without recompiling.
Contents of initramfs:
----------------------
An initramfs archive is a complete self-contained root filesystem for Linux.
If you don't already understand what shared libraries, devices, and paths
you need to get a minimal root filesystem up and running, here are some
references:
@@ -176,13 +242,36 @@ code against, along with some related utilities. It is BSD licensed.
I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net)
myself. These are LGPL and GPL, respectively. (A self-contained initramfs
package is planned for the busybox 1.2 release.)
package is planned for the busybox 1.3 release.)
In theory you could use glibc, but that's not well suited for small embedded
uses like this. (A "hello world" program statically linked against glibc is
over 400k. With uClibc it's 7k. Also note that glibc dlopens libnss to do
name lookups, even when otherwise statically linked.)
A good first step is to get initramfs to run a statically linked "hello world"
program as init, and test it under an emulator like qemu (www.qemu.org) or
User Mode Linux, like so:
cat > hello.c << EOF
#include <stdio.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
printf("Hello world!\n");
sleep(999999999);
}
EOF
gcc -static hello2.c -o init
echo init | cpio -o -H newc | gzip > test.cpio.gz
# Testing external initramfs using the initrd loading mechanism.
qemu -kernel /boot/vmlinuz -initrd test.cpio.gz /dev/zero
When debugging a normal root filesystem, it's nice to be able to boot with
"init=/bin/sh". The initramfs equivalent is "rdinit=/bin/sh", and it's
just as useful.
Why cpio rather than tar?
-------------------------
@@ -241,7 +330,7 @@ the above threads) is:
Future directions:
------------------
Today (2.6.14), initramfs is always compiled in, but not always used. The
Today (2.6.16), initramfs is always compiled in, but not always used. The
kernel falls back to legacy boot code that is reached only if initramfs does
not contain an /init program. The fallback is legacy code, there to ensure a
smooth transition and allowing early boot functionality to gradually move to
@@ -258,8 +347,9 @@ and so on.
This kind of complexity (which inevitably includes policy) is rightly handled
in userspace. Both klibc and busybox/uClibc are working on simple initramfs
packages to drop into a kernel build, and when standard solutions are ready
and widely deployed, the kernel's legacy early boot code will become obsolete
and a candidate for the feature removal schedule.
packages to drop into a kernel build.
But that's a while off yet.
The klibc package has now been accepted into Andrew Morton's 2.6.17-mm tree.
The kernel's current early boot code (partition detection, etc) will probably
be migrated into a default initramfs, automatically created and used by the
kernel build.

Some files were not shown because too many files have changed in this diff Show More