Merge branch 'linux-2.6' into for-linus

This commit is contained in:
Paul Mackerras
2006-12-04 15:59:07 +11:00
2661 changed files with 91557 additions and 34728 deletions
+1
View File
@@ -20,6 +20,7 @@
# Top-level generic files # Top-level generic files
# #
tags tags
TAGS
vmlinux* vmlinux*
System.map System.map
Module.symvers Module.symvers
+6 -5
View File
@@ -45,7 +45,7 @@ S: Longford, Ireland
S: Sydney, Australia S: Sydney, Australia
N: Tigran A. Aivazian N: Tigran A. Aivazian
E: tigran@veritas.com E: tigran@aivazian.fsnet.co.uk
W: http://www.moses.uklinux.net/patches W: http://www.moses.uklinux.net/patches
D: BFS filesystem D: BFS filesystem
D: Intel IA32 CPU microcode update support D: Intel IA32 CPU microcode update support
@@ -2598,6 +2598,9 @@ S: Ucitelska 1576
S: Prague 8 S: Prague 8
S: 182 00 Czech Republic S: 182 00 Czech Republic
N: Rick Payne
D: RFC2385 Support for TCP
N: Barak A. Pearlmutter N: Barak A. Pearlmutter
E: bap@cs.unm.edu E: bap@cs.unm.edu
W: http://www.cs.unm.edu/~bap/ W: http://www.cs.unm.edu/~bap/
@@ -3511,14 +3514,12 @@ D: The Linux Support Team Erlangen
N: David Weinehall N: David Weinehall
E: tao@acc.umu.se E: tao@acc.umu.se
P: 1024D/DC47CA16 7ACE 0FB0 7A74 F994 9B36 E1D1 D14E 8526 DC47 CA16
W: http://www.acc.umu.se/~tao/ W: http://www.acc.umu.se/~tao/
W: http://www.acc.umu.se/~mcalinux/ D: v2.0 kernel maintainer
D: Fixes for the NE/2-driver D: Fixes for the NE/2-driver
D: Miscellaneous MCA-support D: Miscellaneous MCA-support
D: Cleanup of the Config-files D: Cleanup of the Config-files
S: Axtorpsvagen 40:20
S: S-903 37 UMEA
S: Sweden
N: Matt Welsh N: Matt Welsh
E: mdw@metalab.unc.edu E: mdw@metalab.unc.edu
+16 -1
View File
@@ -21,7 +21,7 @@ Description:
these states. these states.
What: /sys/power/disk What: /sys/power/disk
Date: August 2006 Date: September 2006
Contact: Rafael J. Wysocki <rjw@sisk.pl> Contact: Rafael J. Wysocki <rjw@sisk.pl>
Description: Description:
The /sys/power/disk file controls the operating mode of the The /sys/power/disk file controls the operating mode of the
@@ -39,6 +39,19 @@ Description:
'reboot' - the memory image will be saved by the kernel and 'reboot' - the memory image will be saved by the kernel and
the system will be rebooted. the system will be rebooted.
Additionally, /sys/power/disk can be used to turn on one of the
two testing modes of the suspend-to-disk mechanism: 'testproc'
or 'test'. If the suspend-to-disk mechanism is in the
'testproc' mode, writing 'disk' to /sys/power/state will cause
the kernel to disable nonboot CPUs and freeze tasks, wait for 5
seconds, unfreeze tasks and enable nonboot CPUs. If it is in
the 'test' mode, writing 'disk' to /sys/power/state will cause
the kernel to disable nonboot CPUs and freeze tasks, shrink
memory, suspend devices, wait for 5 seconds, resume devices,
unfreeze tasks and enable nonboot CPUs. Then, we are able to
look in the log messages and work out, for example, which code
is being slow and which device drivers are misbehaving.
The suspend-to-disk method may be chosen by writing to this The suspend-to-disk method may be chosen by writing to this
file one of the accepted strings: file one of the accepted strings:
@@ -46,6 +59,8 @@ Description:
'platform' 'platform'
'shutdown' 'shutdown'
'reboot' 'reboot'
'testproc'
'test'
It will only change to 'firmware' or 'platform' if the system It will only change to 'firmware' or 'platform' if the system
supports that. supports that.
+1 -1
View File
@@ -201,7 +201,7 @@ udev
---- ----
udev is a userspace application for populating /dev dynamically with udev is a userspace application for populating /dev dynamically with
only entries for devices actually present. udev replaces the basic only entries for devices actually present. udev replaces the basic
functionality of devfs, while allowing persistant device naming for functionality of devfs, while allowing persistent device naming for
devices. devices.
FUSE FUSE
+1 -1
View File
@@ -489,7 +489,7 @@ size is the size of the area (must be multiples of PAGE_SIZE).
flags can be or'd together and are flags can be or'd together and are
DMA_MEMORY_MAP - request that the memory returned from DMA_MEMORY_MAP - request that the memory returned from
dma_alloc_coherent() be directly writeable. dma_alloc_coherent() be directly writable.
DMA_MEMORY_IO - request that the memory returned from DMA_MEMORY_IO - request that the memory returned from
dma_alloc_coherent() be addressable using read/write/memcpy_toio etc. dma_alloc_coherent() be addressable using read/write/memcpy_toio etc.
+1 -1
View File
@@ -110,7 +110,7 @@ lock.
Once the DMA transfer is finished (or timed out) you should disable Once the DMA transfer is finished (or timed out) you should disable
the channel again. You should also check get_dma_residue() to make the channel again. You should also check get_dma_residue() to make
sure that all data has been transfered. sure that all data has been transferred.
Example: Example:
+1 -1
View File
@@ -9,7 +9,7 @@
DOCBOOKS := wanbook.xml z8530book.xml mcabook.xml videobook.xml \ DOCBOOKS := wanbook.xml z8530book.xml mcabook.xml videobook.xml \
kernel-hacking.xml kernel-locking.xml deviceiobook.xml \ kernel-hacking.xml kernel-locking.xml deviceiobook.xml \
procfs-guide.xml writing_usb_driver.xml \ procfs-guide.xml writing_usb_driver.xml \
kernel-api.xml journal-api.xml lsm.xml usb.xml \ kernel-api.xml filesystems.xml lsm.xml usb.xml \
gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \ gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
genericirq.xml genericirq.xml
@@ -2,9 +2,106 @@
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="LinuxJBDAPI"> <book id="Linux-filesystems-API">
<bookinfo> <bookinfo>
<title>Linux Filesystems API</title>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="vfs">
<title>The Linux VFS</title>
<sect1><title>The Filesystem types</title>
!Iinclude/linux/fs.h
</sect1>
<sect1><title>The Directory Cache</title>
!Efs/dcache.c
!Iinclude/linux/dcache.h
</sect1>
<sect1><title>Inode Handling</title>
!Efs/inode.c
!Efs/bad_inode.c
</sect1>
<sect1><title>Registration and Superblocks</title>
!Efs/super.c
</sect1>
<sect1><title>File Locks</title>
!Efs/locks.c
!Ifs/locks.c
</sect1>
<sect1><title>Other Functions</title>
!Efs/mpage.c
!Efs/namei.c
!Efs/buffer.c
!Efs/bio.c
!Efs/seq_file.c
!Efs/filesystems.c
!Efs/fs-writeback.c
!Efs/block_dev.c
</sect1>
</chapter>
<chapter id="proc">
<title>The proc filesystem</title>
<sect1><title>sysctl interface</title>
!Ekernel/sysctl.c
</sect1>
<sect1><title>proc filesystem interface</title>
!Ifs/proc/base.c
</sect1>
</chapter>
<chapter id="sysfs">
<title>The Filesystem for Exporting Kernel Objects</title>
!Efs/sysfs/file.c
!Efs/sysfs/symlink.c
!Efs/sysfs/bin.c
</chapter>
<chapter id="debugfs">
<title>The debugfs filesystem</title>
<sect1><title>debugfs interface</title>
!Efs/debugfs/inode.c
!Efs/debugfs/file.c
</sect1>
</chapter>
<chapter id="LinuxJDBAPI">
<chapterinfo>
<title>The Linux Journalling API</title> <title>The Linux Journalling API</title>
<authorgroup> <authorgroup>
<author> <author>
<firstname>Roger</firstname> <firstname>Roger</firstname>
@@ -14,9 +111,9 @@
<email>rgammans@computer-surgery.co.uk</email> <email>rgammans@computer-surgery.co.uk</email>
</address> </address>
</affiliation> </affiliation>
</author> </author>
</authorgroup> </authorgroup>
<authorgroup> <authorgroup>
<author> <author>
<firstname>Stephen</firstname> <firstname>Stephen</firstname>
@@ -33,50 +130,21 @@
<year>2002</year> <year>2002</year>
<holder>Roger Gammans</holder> <holder>Roger Gammans</holder>
</copyright> </copyright>
</chapterinfo>
<legalnotice> <title>The Linux Journalling API</title>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc> <sect1>
<chapter id="Overview">
<title>Overview</title> <title>Overview</title>
<sect1> <sect2>
<title>Details</title> <title>Details</title>
<para> <para>
The journalling layer is easy to use. You need to The journalling layer is easy to use. You need to
first of all create a journal_t data structure. There are first of all create a journal_t data structure. There are
two calls to do this dependent on how you decide to allocate the physical two calls to do this dependent on how you decide to allocate the physical
media on which the journal resides. The journal_init_inode() call media on which the journal resides. The journal_init_inode() call
is for journals stored in filesystem inodes, or the journal_init_dev() is for journals stored in filesystem inodes, or the journal_init_dev()
call can be use for journal stored on a raw device (in a continuous range call can be use for journal stored on a raw device (in a continuous range
of blocks). A journal_t is a typedef for a struct pointer, so when of blocks). A journal_t is a typedef for a struct pointer, so when
you are finally finished make sure you call journal_destroy() on it you are finally finished make sure you call journal_destroy() on it
to free up any used kernel memory. to free up any used kernel memory.
@@ -91,27 +159,26 @@ need to call journal_create().
<para> <para>
Most of the time however your journal file will already have been created, but Most of the time however your journal file will already have been created, but
before you load it you must call journal_wipe() to empty the journal file. before you load it you must call journal_wipe() to empty the journal file.
Hang on, you say , what if the filesystem wasn't cleanly umount()'d . Well, it is the Hang on, you say , what if the filesystem wasn't cleanly umount()'d . Well, it is the
job of the client file system to detect this and skip the call to journal_wipe(). job of the client file system to detect this and skip the call to journal_wipe().
</para> </para>
<para> <para>
In either case the next call should be to journal_load() which prepares the In either case the next call should be to journal_load() which prepares the
journal file for use. Note that journal_wipe(..,0) calls journal_skip_recovery() journal file for use. Note that journal_wipe(..,0) calls journal_skip_recovery()
for you if it detects any outstanding transactions in the journal and similarly for you if it detects any outstanding transactions in the journal and similarly
journal_load() will call journal_recover() if necessary. journal_load() will call journal_recover() if necessary.
I would advise reading fs/ext3/super.c for examples on this stage. I would advise reading fs/ext3/super.c for examples on this stage.
[RGG: Why is the journal_wipe() call necessary - doesn't this needlessly [RGG: Why is the journal_wipe() call necessary - doesn't this needlessly
complicate the API. Or isn't a good idea for the journal layer to hide complicate the API. Or isn't a good idea for the journal layer to hide
dirty mounts from the client fs] dirty mounts from the client fs]
</para> </para>
<para> <para>
Now you can go ahead and start modifying the underlying Now you can go ahead and start modifying the underlying
filesystem. Almost. filesystem. Almost.
</para> </para>
<para> <para>
You still need to actually journal your filesystem changes, this You still need to actually journal your filesystem changes, this
@@ -138,10 +205,10 @@ individual buffers (blocks). Before you start to modify a buffer you
need to call journal_get_{create,write,undo}_access() as appropriate, need to call journal_get_{create,write,undo}_access() as appropriate,
this allows the journalling layer to copy the unmodified data if it this allows the journalling layer to copy the unmodified data if it
needs to. After all the buffer may be part of a previously uncommitted needs to. After all the buffer may be part of a previously uncommitted
transaction. transaction.
At this point you are at last ready to modify a buffer, and once At this point you are at last ready to modify a buffer, and once
you are have done so you need to call journal_dirty_{meta,}data(). you are have done so you need to call journal_dirty_{meta,}data().
Or if you've asked for access to a buffer you now know is now longer Or if you've asked for access to a buffer you now know is now longer
required to be pushed back on the device you can call journal_forget() required to be pushed back on the device you can call journal_forget()
in much the same way as you might have used bforget() in the past. in much the same way as you might have used bforget() in the past.
</para> </para>
@@ -156,7 +223,6 @@ Then at umount time , in your put_super() (2.4) or write_super() (2.5)
you can then call journal_destroy() to clean up your in-core journal object. you can then call journal_destroy() to clean up your in-core journal object.
</para> </para>
<para> <para>
Unfortunately there a couple of ways the journal layer can cause a deadlock. Unfortunately there a couple of ways the journal layer can cause a deadlock.
The first thing to note is that each task can only have The first thing to note is that each task can only have
@@ -164,19 +230,19 @@ a single outstanding transaction at any one time, remember nothing
commits until the outermost journal_stop(). This means commits until the outermost journal_stop(). This means
you must complete the transaction at the end of each file/inode/address you must complete the transaction at the end of each file/inode/address
etc. operation you perform, so that the journalling system isn't re-entered etc. operation you perform, so that the journalling system isn't re-entered
on another journal. Since transactions can't be nested/batched on another journal. Since transactions can't be nested/batched
across differing journals, and another filesystem other than across differing journals, and another filesystem other than
yours (say ext3) may be modified in a later syscall. yours (say ext3) may be modified in a later syscall.
</para> </para>
<para> <para>
The second case to bear in mind is that journal_start() can The second case to bear in mind is that journal_start() can
block if there isn't enough space in the journal for your transaction block if there isn't enough space in the journal for your transaction
(based on the passed nblocks param) - when it blocks it merely(!) needs to (based on the passed nblocks param) - when it blocks it merely(!) needs to
wait for transactions to complete and be committed from other tasks, wait for transactions to complete and be committed from other tasks,
so essentially we are waiting for journal_stop(). So to avoid so essentially we are waiting for journal_stop(). So to avoid
deadlocks you must treat journal_start/stop() as if they deadlocks you must treat journal_start/stop() as if they
were semaphores and include them in your semaphore ordering rules to prevent were semaphores and include them in your semaphore ordering rules to prevent
deadlocks. Note that journal_extend() has similar blocking behaviour to deadlocks. Note that journal_extend() has similar blocking behaviour to
journal_start() so you can deadlock here just as easily as on journal_start(). journal_start() so you can deadlock here just as easily as on journal_start().
</para> </para>
@@ -184,7 +250,7 @@ journal_start() so you can deadlock here just as easily as on journal_start().
<para> <para>
Try to reserve the right number of blocks the first time. ;-). This will Try to reserve the right number of blocks the first time. ;-). This will
be the maximum number of blocks you are going to touch in this transaction. be the maximum number of blocks you are going to touch in this transaction.
I advise having a look at at least ext3_jbd.h to see the basis on which I advise having a look at at least ext3_jbd.h to see the basis on which
ext3 uses to make these decisions. ext3 uses to make these decisions.
</para> </para>
@@ -193,13 +259,13 @@ Another wriggle to watch out for is your on-disk block allocation strategy.
why? Because, if you undo a delete, you need to ensure you haven't reused any why? Because, if you undo a delete, you need to ensure you haven't reused any
of the freed blocks in a later transaction. One simple way of doing this of the freed blocks in a later transaction. One simple way of doing this
is make sure any blocks you allocate only have checkpointed transactions is make sure any blocks you allocate only have checkpointed transactions
listed against them. Ext3 does this in ext3_test_allocatable(). listed against them. Ext3 does this in ext3_test_allocatable().
</para> </para>
<para> <para>
Lock is also providing through journal_{un,}lock_updates(), Lock is also providing through journal_{un,}lock_updates(),
ext3 uses this when it wants a window with a clean and stable fs for a moment. ext3 uses this when it wants a window with a clean and stable fs for a moment.
eg. eg.
</para> </para>
<programlisting> <programlisting>
@@ -230,19 +296,19 @@ extend it like this:-
struct journal_callback for_jbd; struct journal_callback for_jbd;
// Stuff for myfs allocated together. // Stuff for myfs allocated together.
myfs_inode* i_commited; myfs_inode* i_commited;
} }
</programlisting> </programlisting>
<para> <para>
this would be useful if you needed to know when data was committed to a this would be useful if you needed to know when data was committed to a
particular inode. particular inode.
</para> </para>
</sect1> </sect2>
<sect1> <sect2>
<title>Summary</title> <title>Summary</title>
<para> <para>
Using the journal is a matter of wrapping the different context changes, Using the journal is a matter of wrapping the different context changes,
being each mount, each modification (transaction) and each changed buffer being each mount, each modification (transaction) and each changed buffer
@@ -260,15 +326,15 @@ an example.
if (clean) journal_wipe(); if (clean) journal_wipe();
journal_load(); journal_load();
foreach(transaction) { /*transactions must be foreach(transaction) { /*transactions must be
completed before completed before
a syscall returns to a syscall returns to
userspace*/ userspace*/
handle_t * xct=journal_start(my_jnrl); handle_t * xct=journal_start(my_jnrl);
foreach(bh) { foreach(bh) {
journal_get_{create,write,undo}_access(xact,bh); journal_get_{create,write,undo}_access(xact,bh);
if ( myfs_modify(bh) ) { /* returns true if ( myfs_modify(bh) ) { /* returns true
if makes changes */ if makes changes */
journal_dirty_{meta,}data(xact,bh); journal_dirty_{meta,}data(xact,bh);
} else { } else {
@@ -279,55 +345,57 @@ an example.
} }
journal_destroy(my_jrnl); journal_destroy(my_jrnl);
</programlisting> </programlisting>
</sect1> </sect2>
</chapter> </sect1>
<chapter id="adt"> <sect1>
<title>Data Types</title> <title>Data Types</title>
<para> <para>
The journalling layer uses typedefs to 'hide' the concrete definitions The journalling layer uses typedefs to 'hide' the concrete definitions
of the structures used. As a client of the JBD layer you can of the structures used. As a client of the JBD layer you can
just rely on the using the pointer as a magic cookie of some sort. just rely on the using the pointer as a magic cookie of some sort.
Obviously the hiding is not enforced as this is 'C'.
</para>
<sect1><title>Structures</title>
!Iinclude/linux/jbd.h
</sect1>
</chapter>
<chapter id="calls"> Obviously the hiding is not enforced as this is 'C'.
</para>
<sect2><title>Structures</title>
!Iinclude/linux/jbd.h
</sect2>
</sect1>
<sect1>
<title>Functions</title> <title>Functions</title>
<para> <para>
The functions here are split into two groups those that The functions here are split into two groups those that
affect a journal as a whole, and those which are used to affect a journal as a whole, and those which are used to
manage transactions manage transactions
</para> </para>
<sect1><title>Journal Level</title> <sect2><title>Journal Level</title>
!Efs/jbd/journal.c !Efs/jbd/journal.c
!Ifs/jbd/recovery.c !Ifs/jbd/recovery.c
</sect1> </sect2>
<sect1><title>Transasction Level</title> <sect2><title>Transasction Level</title>
!Efs/jbd/transaction.c !Efs/jbd/transaction.c
</sect1> </sect2>
</chapter> </sect1>
<chapter> <sect1>
<title>See also</title> <title>See also</title>
<para> <para>
<citation> <citation>
<ulink url="ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/journal-design.ps.gz"> <ulink url="ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/journal-design.ps.gz">
Journaling the Linux ext2fs Filesystem,LinuxExpo 98, Stephen Tweedie Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen Tweedie
</ulink> </ulink>
</citation> </citation>
</para> </para>
<para> <para>
<citation> <citation>
<ulink url="http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html"> <ulink url="http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html">
Ext3 Journalling FileSystem , OLS 2000, Dr. Stephen Tweedie Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen Tweedie
</ulink> </ulink>
</citation> </citation>
</para> </para>
</chapter> </sect1>
</chapter>
</book> </book>
-60
View File
@@ -182,66 +182,6 @@ X!Ilib/string.c
</sect1> </sect1>
</chapter> </chapter>
<chapter id="vfs">
<title>The Linux VFS</title>
<sect1><title>The Filesystem types</title>
!Iinclude/linux/fs.h
</sect1>
<sect1><title>The Directory Cache</title>
!Efs/dcache.c
!Iinclude/linux/dcache.h
</sect1>
<sect1><title>Inode Handling</title>
!Efs/inode.c
!Efs/bad_inode.c
</sect1>
<sect1><title>Registration and Superblocks</title>
!Efs/super.c
</sect1>
<sect1><title>File Locks</title>
!Efs/locks.c
!Ifs/locks.c
</sect1>
<sect1><title>Other Functions</title>
!Efs/mpage.c
!Efs/namei.c
!Efs/buffer.c
!Efs/bio.c
!Efs/seq_file.c
!Efs/filesystems.c
!Efs/fs-writeback.c
!Efs/block_dev.c
</sect1>
</chapter>
<chapter id="proc">
<title>The proc filesystem</title>
<sect1><title>sysctl interface</title>
!Ekernel/sysctl.c
</sect1>
<sect1><title>proc filesystem interface</title>
!Ifs/proc/base.c
</sect1>
</chapter>
<chapter id="sysfs">
<title>The Filesystem for Exporting Kernel Objects</title>
!Efs/sysfs/file.c
!Efs/sysfs/symlink.c
!Efs/sysfs/bin.c
</chapter>
<chapter id="debugfs">
<title>The debugfs filesystem</title>
<sect1><title>debugfs interface</title>
!Efs/debugfs/inode.c
!Efs/debugfs/file.c
</sect1>
</chapter>
<chapter id="relayfs"> <chapter id="relayfs">
<title>relay interface support</title> <title>relay interface support</title>
@@ -345,8 +345,7 @@ static inline void skel_delete (struct usb_skel *dev)
usb_buffer_free (dev->udev, dev->bulk_out_size, usb_buffer_free (dev->udev, dev->bulk_out_size,
dev->bulk_out_buffer, dev->bulk_out_buffer,
dev->write_urb->transfer_dma); dev->write_urb->transfer_dma);
if (dev->write_urb != NULL) usb_free_urb (dev->write_urb);
usb_free_urb (dev->write_urb);
kfree (dev); kfree (dev);
} }
</programlisting> </programlisting>
+20
View File
@@ -395,6 +395,26 @@ bugme-janitor mailing list (every change in the bugzilla is mailed here)
Managing bug reports
--------------------
One of the best ways to put into practice your hacking skills is by fixing
bugs reported by other people. Not only you will help to make the kernel
more stable, you'll learn to fix real world problems and you will improve
your skills, and other developers will be aware of your presence. Fixing
bugs is one of the best ways to get merits among other developers, because
not many people like wasting time fixing other people's bugs.
To work in the already reported bug reports, go to http://bugzilla.kernel.org.
If you want to be advised of the future bug reports, you can subscribe to the
bugme-new mailing list (only new bug reports are mailed here) or to the
bugme-janitor mailing list (every change in the bugzilla is mailed here)
http://lists.osdl.org/mailman/listinfo/bugme-new
http://lists.osdl.org/mailman/listinfo/bugme-janitors
Mailing lists Mailing lists
------------- -------------
+63 -2
View File
@@ -219,7 +219,7 @@ into the field vector of each element contained in a second argument.
Note that the pre-assigned IOAPIC dev->irq is valid only if the device Note that the pre-assigned IOAPIC dev->irq is valid only if the device
operates in PIN-IRQ assertion mode. In MSI-X mode, any attempt at operates in PIN-IRQ assertion mode. In MSI-X mode, any attempt at
using dev->irq by the device driver to request for interrupt service using dev->irq by the device driver to request for interrupt service
may result unpredictabe behavior. may result in unpredictable behavior.
For each MSI-X vector granted, a device driver is responsible for calling For each MSI-X vector granted, a device driver is responsible for calling
other functions like request_irq(), enable_irq(), etc. to enable other functions like request_irq(), enable_irq(), etc. to enable
@@ -470,7 +470,68 @@ LOC: 324553 325068
ERR: 0 ERR: 0
MIS: 0 MIS: 0
6. FAQ 6. MSI quirks
Several PCI chipsets or devices are known to not support MSI.
The PCI stack provides 3 possible levels of MSI disabling:
* on a single device
* on all devices behind a specific bridge
* globally
6.1. Disabling MSI on a single device
Under some circumstances, it might be required to disable MSI on a
single device, It may be achived by either not calling pci_enable_msi()
or all, or setting the pci_dev->no_msi flag before (most of the time
in a quirk).
6.2. Disabling MSI below a bridge
The vast majority of MSI quirks are required by PCI bridges not
being able to route MSI between busses. In this case, MSI have to be
disabled on all devices behind this bridge. It is achieves by setting
the PCI_BUS_FLAGS_NO_MSI flag in the pci_bus->bus_flags of the bridge
subordinate bus. There is no need to set the same flag on bridges that
are below the broken brigde. When pci_enable_msi() is called to enable
MSI on a device, pci_msi_supported() takes care of checking the NO_MSI
flag in all parent busses of the device.
Some bridges actually support dynamic MSI support enabling/disabling
by changing some bits in their PCI configuration space (especially
the Hypertransport chipsets such as the nVidia nForce and Serverworks
HT2000). It may then be required to update the NO_MSI flag on the
corresponding devices in the sysfs hierarchy. To enable MSI support
on device "0000:00:0e", do:
echo 1 > /sys/bus/pci/devices/0000:00:0e/msi_bus
To disable MSI support, echo 0 instead of 1. Note that it should be
used with caution since changing this value might break interrupts.
6.3. Disabling MSI globally
Some extreme cases may require to disable MSI globally on the system.
For now, the only known case is a Serverworks PCI-X chipsets (MSI are
not supported on several busses that are not all connected to the
chipset in the Linux PCI hierarchy). In the vast majority of other
cases, disabling only behind a specific bridge is enough.
For debugging purpose, the user may also pass pci=nomsi on the kernel
command-line to explicitly disable MSI globally. But, once the appro-
priate quirks are added to the kernel, this option should not be
required anymore.
6.4. Finding why MSI cannot be enabled on a device
Assuming that MSI are not enabled on a device, you should look at
dmesg to find messages that quirks may output when disabling MSI
on some devices, some bridges or even globally.
Then, lspci -t gives the list of bridges above a device. Reading
/sys/bus/pci/devices/0000:00:0e/msi_bus will tell you whether MSI
are enabled (1) or disabled (0). In 0 is found in a single bridge
msi_bus file above the device, MSI cannot be enabled.
7. FAQ
Q1. Are there any limitations on using the MSI? Q1. Are there any limitations on using the MSI?
+1 -1
View File
@@ -49,7 +49,7 @@ __u64 stime, utime;
} }
/* Maximum size of response requested or message sent */ /* Maximum size of response requested or message sent */
#define MAX_MSG_SIZE 256 #define MAX_MSG_SIZE 1024
/* Maximum number of cpus expected to be specified in a cpumask */ /* Maximum number of cpus expected to be specified in a cpumask */
#define MAX_CPUS 32 #define MAX_CPUS 32
/* Maximum length of pathname to log file */ /* Maximum length of pathname to log file */
+5 -5
View File
@@ -96,9 +96,9 @@ a) TASKSTATS_TYPE_AGGR_PID/TGID : attribute containing no payload but indicates
a pid/tgid will be followed by some stats. a pid/tgid will be followed by some stats.
b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose stats b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose stats
is being returned. are being returned.
c) TASKSTATS_TYPE_STATS: attribute with a struct taskstsats as payload. The c) TASKSTATS_TYPE_STATS: attribute with a struct taskstats as payload. The
same structure is used for both per-pid and per-tgid stats. same structure is used for both per-pid and per-tgid stats.
3. New message sent by kernel whenever a task exits. The payload consists of a 3. New message sent by kernel whenever a task exits. The payload consists of a
@@ -122,12 +122,12 @@ of atomicity).
However, maintaining per-process, in addition to per-task stats, within the However, maintaining per-process, in addition to per-task stats, within the
kernel has space and time overheads. To address this, the taskstats code kernel has space and time overheads. To address this, the taskstats code
accumalates each exiting task's statistics into a process-wide data structure. accumulates each exiting task's statistics into a process-wide data structure.
When the last task of a process exits, the process level data accumalated also When the last task of a process exits, the process level data accumulated also
gets sent to userspace (along with the per-task data). gets sent to userspace (along with the per-task data).
When a user queries to get per-tgid data, the sum of all other live threads in When a user queries to get per-tgid data, the sum of all other live threads in
the group is added up and added to the accumalated total for previously exited the group is added up and added to the accumulated total for previously exited
threads of the same thread group. threads of the same thread group.
Extending taskstats Extending taskstats
+5 -5
View File
@@ -183,7 +183,7 @@ it, the pci dma mapping routines and associated data structures have now been
modified to accomplish a direct page -> bus translation, without requiring modified to accomplish a direct page -> bus translation, without requiring
a virtual address mapping (unlike the earlier scheme of virtual address a virtual address mapping (unlike the earlier scheme of virtual address
-> bus translation). So this works uniformly for high-memory pages (which -> bus translation). So this works uniformly for high-memory pages (which
do not have a correponding kernel virtual address space mapping) and do not have a corresponding kernel virtual address space mapping) and
low-memory pages. low-memory pages.
Note: Please refer to DMA-mapping.txt for a discussion on PCI high mem DMA Note: Please refer to DMA-mapping.txt for a discussion on PCI high mem DMA
@@ -391,7 +391,7 @@ forced such requests to be broken up into small chunks before being passed
on to the generic block layer, only to be merged by the i/o scheduler on to the generic block layer, only to be merged by the i/o scheduler
when the underlying device was capable of handling the i/o in one shot. when the underlying device was capable of handling the i/o in one shot.
Also, using the buffer head as an i/o structure for i/os that didn't originate Also, using the buffer head as an i/o structure for i/os that didn't originate
from the buffer cache unecessarily added to the weight of the descriptors from the buffer cache unnecessarily added to the weight of the descriptors
which were generated for each such chunk. which were generated for each such chunk.
The following were some of the goals and expectations considered in the The following were some of the goals and expectations considered in the
@@ -403,14 +403,14 @@ i. Should be appropriate as a descriptor for both raw and buffered i/o -
for raw i/o. for raw i/o.
ii. Ability to represent high-memory buffers (which do not have a virtual ii. Ability to represent high-memory buffers (which do not have a virtual
address mapping in kernel address space). address mapping in kernel address space).
iii.Ability to represent large i/os w/o unecessarily breaking them up (i.e iii.Ability to represent large i/os w/o unnecessarily breaking them up (i.e
greater than PAGE_SIZE chunks in one shot) greater than PAGE_SIZE chunks in one shot)
iv. At the same time, ability to retain independent identity of i/os from iv. At the same time, ability to retain independent identity of i/os from
different sources or i/o units requiring individual completion (e.g. for different sources or i/o units requiring individual completion (e.g. for
latency reasons) latency reasons)
v. Ability to represent an i/o involving multiple physical memory segments v. Ability to represent an i/o involving multiple physical memory segments
(including non-page aligned page fragments, as specified via readv/writev) (including non-page aligned page fragments, as specified via readv/writev)
without unecessarily breaking it up, if the underlying device is capable of without unnecessarily breaking it up, if the underlying device is capable of
handling it. handling it.
vi. Preferably should be based on a memory descriptor structure that can be vi. Preferably should be based on a memory descriptor structure that can be
passed around different types of subsystems or layers, maybe even passed around different types of subsystems or layers, maybe even
@@ -1013,7 +1013,7 @@ Characteristics:
i. Binary tree i. Binary tree
AS and deadline i/o schedulers use red black binary trees for disk position AS and deadline i/o schedulers use red black binary trees for disk position
sorting and searching, and a fifo linked list for time-based searching. This sorting and searching, and a fifo linked list for time-based searching. This
gives good scalability and good availablility of information. Requests are gives good scalability and good availability of information. Requests are
almost always dispatched in disk sort order, so a cache is kept of the next almost always dispatched in disk sort order, so a cache is kept of the next
request in sort order to prevent binary tree lookups. request in sort order to prevent binary tree lookups.
+2 -2
View File
@@ -1,7 +1,7 @@
The cpufreq-nforce2 driver changes the FSB on nVidia nForce2 plattforms. The cpufreq-nforce2 driver changes the FSB on nVidia nForce2 platforms.
This works better than on other plattforms, because the FSB of the CPU This works better than on other platforms, because the FSB of the CPU
can be controlled independently from the PCI/AGP clock. can be controlled independently from the PCI/AGP clock.
The module has two options: The module has two options:
+72 -72
View File
@@ -46,7 +46,7 @@ maxcpus=n Restrict boot time cpus to n. Say if you have 4 cpus, using
maxcpus=2 will only boot 2. You can choose to bring the maxcpus=2 will only boot 2. You can choose to bring the
other cpus later online, read FAQ's for more info. other cpus later online, read FAQ's for more info.
additional_cpus*=n Use this to limit hotpluggable cpus. This option sets additional_cpus=n (*) Use this to limit hotpluggable cpus. This option sets
cpu_possible_map = cpu_present_map + additional_cpus cpu_possible_map = cpu_present_map + additional_cpus
(*) Option valid only for following architectures (*) Option valid only for following architectures
@@ -54,8 +54,8 @@ additional_cpus*=n Use this to limit hotpluggable cpus. This option sets
ia64 and x86_64 use the number of disabled local apics in ACPI tables MADT ia64 and x86_64 use the number of disabled local apics in ACPI tables MADT
to determine the number of potentially hot-pluggable cpus. The implementation to determine the number of potentially hot-pluggable cpus. The implementation
should only rely on this to count the #of cpus, but *MUST* not rely on the should only rely on this to count the # of cpus, but *MUST* not rely on the
apicid values in those tables for disabled apics. In the event BIOS doesnt apicid values in those tables for disabled apics. In the event BIOS doesn't
mark such hot-pluggable cpus as disabled entries, one could use this mark such hot-pluggable cpus as disabled entries, one could use this
parameter "additional_cpus=x" to represent those cpus in the cpu_possible_map. parameter "additional_cpus=x" to represent those cpus in the cpu_possible_map.
@@ -101,15 +101,15 @@ cpu_possible_map/for_each_possible_cpu() to iterate.
Never use anything other than cpumask_t to represent bitmap of CPUs. Never use anything other than cpumask_t to represent bitmap of CPUs.
#include <linux/cpumask.h> #include <linux/cpumask.h>
for_each_possible_cpu - Iterate over cpu_possible_map for_each_possible_cpu - Iterate over cpu_possible_map
for_each_online_cpu - Iterate over cpu_online_map for_each_online_cpu - Iterate over cpu_online_map
for_each_present_cpu - Iterate over cpu_present_map for_each_present_cpu - Iterate over cpu_present_map
for_each_cpu_mask(x,mask) - Iterate over some random collection of cpu mask. for_each_cpu_mask(x,mask) - Iterate over some random collection of cpu mask.
#include <linux/cpu.h> #include <linux/cpu.h>
lock_cpu_hotplug() and unlock_cpu_hotplug(): lock_cpu_hotplug() and unlock_cpu_hotplug():
The above calls are used to inhibit cpu hotplug operations. While holding the The above calls are used to inhibit cpu hotplug operations. While holding the
cpucontrol mutex, cpu_online_map will not change. If you merely need to avoid cpucontrol mutex, cpu_online_map will not change. If you merely need to avoid
@@ -120,7 +120,7 @@ will work as long as stop_machine_run() is used to take a cpu down.
CPU Hotplug - Frequently Asked Questions. CPU Hotplug - Frequently Asked Questions.
Q: How to i enable my kernel to support CPU hotplug? Q: How to enable my kernel to support CPU hotplug?
A: When doing make defconfig, Enable CPU hotplug support A: When doing make defconfig, Enable CPU hotplug support
"Processor type and Features" -> Support for Hotpluggable CPUs "Processor type and Features" -> Support for Hotpluggable CPUs
@@ -141,39 +141,39 @@ A: You should now notice an entry in sysfs.
Check if sysfs is mounted, using the "mount" command. You should notice Check if sysfs is mounted, using the "mount" command. You should notice
an entry as shown below in the output. an entry as shown below in the output.
.... ....
none on /sys type sysfs (rw) none on /sys type sysfs (rw)
.... ....
if this is not mounted, do the following. If this is not mounted, do the following.
#mkdir /sysfs #mkdir /sysfs
#mount -t sysfs sys /sys #mount -t sysfs sys /sys
now you should see entries for all present cpu, the following is an example Now you should see entries for all present cpu, the following is an example
in a 8-way system. in a 8-way system.
#pwd #pwd
#/sys/devices/system/cpu #/sys/devices/system/cpu
#ls -l #ls -l
total 0 total 0
drwxr-xr-x 10 root root 0 Sep 19 07:44 . drwxr-xr-x 10 root root 0 Sep 19 07:44 .
drwxr-xr-x 13 root root 0 Sep 19 07:45 .. drwxr-xr-x 13 root root 0 Sep 19 07:45 ..
drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu0 drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu0
drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu1 drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu1
drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu2 drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu2
drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu3 drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu3
drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu4 drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu4
drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu5 drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu5
drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu6 drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu6
drwxr-xr-x 3 root root 0 Sep 19 07:48 cpu7 drwxr-xr-x 3 root root 0 Sep 19 07:48 cpu7
Under each directory you would find an "online" file which is the control Under each directory you would find an "online" file which is the control
file to logically online/offline a processor. file to logically online/offline a processor.
Q: Does hot-add/hot-remove refer to physical add/remove of cpus? Q: Does hot-add/hot-remove refer to physical add/remove of cpus?
A: The usage of hot-add/remove may not be very consistently used in the code. A: The usage of hot-add/remove may not be very consistently used in the code.
CONFIG_CPU_HOTPLUG enables logical online/offline capability in the kernel. CONFIG_HOTPLUG_CPU enables logical online/offline capability in the kernel.
To support physical addition/removal, one would need some BIOS hooks and To support physical addition/removal, one would need some BIOS hooks and
the platform should have something like an attention button in PCI hotplug. the platform should have something like an attention button in PCI hotplug.
CONFIG_ACPI_HOTPLUG_CPU enables ACPI support for physical add/remove of CPUs. CONFIG_ACPI_HOTPLUG_CPU enables ACPI support for physical add/remove of CPUs.
@@ -181,17 +181,17 @@ CONFIG_ACPI_HOTPLUG_CPU enables ACPI support for physical add/remove of CPUs.
Q: How do i logically offline a CPU? Q: How do i logically offline a CPU?
A: Do the following. A: Do the following.
#echo 0 > /sys/devices/system/cpu/cpuX/online #echo 0 > /sys/devices/system/cpu/cpuX/online
once the logical offline is successful, check Once the logical offline is successful, check
#cat /proc/interrupts #cat /proc/interrupts
you should now not see the CPU that you removed. Also online file will report You should now not see the CPU that you removed. Also online file will report
the state as 0 when a cpu if offline and 1 when its online. the state as 0 when a cpu if offline and 1 when its online.
#To display the current cpu state. #To display the current cpu state.
#cat /sys/devices/system/cpu/cpuX/online #cat /sys/devices/system/cpu/cpuX/online
Q: Why cant i remove CPU0 on some systems? Q: Why cant i remove CPU0 on some systems?
A: Some architectures may have some special dependency on a certain CPU. A: Some architectures may have some special dependency on a certain CPU.
@@ -234,8 +234,8 @@ Q: If i have some kernel code that needs to be aware of CPU arrival and
departure, how to i arrange for proper notification? departure, how to i arrange for proper notification?
A: This is what you would need in your kernel code to receive notifications. A: This is what you would need in your kernel code to receive notifications.
#include <linux/cpu.h> #include <linux/cpu.h>
static int __cpuinit foobar_cpu_callback(struct notifier_block *nfb, static int __cpuinit foobar_cpu_callback(struct notifier_block *nfb,
unsigned long action, void *hcpu) unsigned long action, void *hcpu)
{ {
unsigned int cpu = (unsigned long)hcpu; unsigned int cpu = (unsigned long)hcpu;
@@ -279,10 +279,10 @@ Q: I don't see my action being called for all CPUs already up and running?
A: Yes, CPU notifiers are called only when new CPUs are on-lined or offlined. A: Yes, CPU notifiers are called only when new CPUs are on-lined or offlined.
If you need to perform some action for each cpu already in the system, then If you need to perform some action for each cpu already in the system, then
for_each_online_cpu(i) { for_each_online_cpu(i) {
foobar_cpu_callback(&foobar_cpu_notifier, CPU_UP_PREPARE, i); foobar_cpu_callback(&foobar_cpu_notifier, CPU_UP_PREPARE, i);
foobar_cpu_callback(&foobar-cpu_notifier, CPU_ONLINE, i); foobar_cpu_callback(&foobar_cpu_notifier, CPU_ONLINE, i);
} }
Q: If i would like to develop cpu hotplug support for a new architecture, Q: If i would like to develop cpu hotplug support for a new architecture,
what do i need at a minimum? what do i need at a minimum?
@@ -307,38 +307,38 @@ Q: I need to ensure that a particular cpu is not removed when there is some
work specific to this cpu is in progress. work specific to this cpu is in progress.
A: First switch the current thread context to preferred cpu A: First switch the current thread context to preferred cpu
int my_func_on_cpu(int cpu) int my_func_on_cpu(int cpu)
{ {
cpumask_t saved_mask, new_mask = CPU_MASK_NONE; cpumask_t saved_mask, new_mask = CPU_MASK_NONE;
int curr_cpu, err = 0; int curr_cpu, err = 0;
saved_mask = current->cpus_allowed; saved_mask = current->cpus_allowed;
cpu_set(cpu, new_mask); cpu_set(cpu, new_mask);
err = set_cpus_allowed(current, new_mask); err = set_cpus_allowed(current, new_mask);
if (err) if (err)
return err; return err;
/* /*
* If we got scheduled out just after the return from * If we got scheduled out just after the return from
* set_cpus_allowed() before running the work, this ensures * set_cpus_allowed() before running the work, this ensures
* we stay locked. * we stay locked.
*/ */
curr_cpu = get_cpu(); curr_cpu = get_cpu();
if (curr_cpu != cpu) { if (curr_cpu != cpu) {
err = -EAGAIN; err = -EAGAIN;
goto ret; goto ret;
} else { } else {
/* /*
* Do work : But cant sleep, since get_cpu() disables preempt * Do work : But cant sleep, since get_cpu() disables preempt
*/ */
} }
ret: ret:
put_cpu(); put_cpu();
set_cpus_allowed(current, saved_mask); set_cpus_allowed(current, saved_mask);
return err; return err;
} }
Q: How do we determine how many CPUs are available for hotplug. Q: How do we determine how many CPUs are available for hotplug.
+4 -4
View File
@@ -92,7 +92,7 @@ Your cooperation is appreciated.
7 = /dev/full Returns ENOSPC on write 7 = /dev/full Returns ENOSPC on write
8 = /dev/random Nondeterministic random number gen. 8 = /dev/random Nondeterministic random number gen.
9 = /dev/urandom Faster, less secure random number gen. 9 = /dev/urandom Faster, less secure random number gen.
10 = /dev/aio Asyncronous I/O notification interface 10 = /dev/aio Asynchronous I/O notification interface
11 = /dev/kmsg Writes to this come out as printk's 11 = /dev/kmsg Writes to this come out as printk's
1 block RAM disk 1 block RAM disk
0 = /dev/ram0 First RAM disk 0 = /dev/ram0 First RAM disk
@@ -1093,7 +1093,7 @@ Your cooperation is appreciated.
55 char DSP56001 digital signal processor 55 char DSP56001 digital signal processor
0 = /dev/dsp56k First DSP56001 0 = /dev/dsp56k First DSP56001
55 block Mylex DAC960 PCI RAID controller; eigth controller 55 block Mylex DAC960 PCI RAID controller; eighth controller
0 = /dev/rd/c7d0 First disk, whole disk 0 = /dev/rd/c7d0 First disk, whole disk
8 = /dev/rd/c7d1 Second disk, whole disk 8 = /dev/rd/c7d1 Second disk, whole disk
... ...
@@ -1456,7 +1456,7 @@ Your cooperation is appreciated.
1 = /dev/cum1 Callout device for ttyM1 1 = /dev/cum1 Callout device for ttyM1
... ...
79 block Compaq Intelligent Drive Array, eigth controller 79 block Compaq Intelligent Drive Array, eighth controller
0 = /dev/ida/c7d0 First logical drive whole disk 0 = /dev/ida/c7d0 First logical drive whole disk
16 = /dev/ida/c7d1 Second logical drive whole disk 16 = /dev/ida/c7d1 Second logical drive whole disk
... ...
@@ -1900,7 +1900,7 @@ Your cooperation is appreciated.
1 = /dev/av1 Second A/V card 1 = /dev/av1 Second A/V card
... ...
111 block Compaq Next Generation Drive Array, eigth controller 111 block Compaq Next Generation Drive Array, eighth controller
0 = /dev/cciss/c7d0 First logical drive, whole disk 0 = /dev/cciss/c7d0 First logical drive, whole disk
16 = /dev/cciss/c7d1 Second logical drive, whole disk 16 = /dev/cciss/c7d1 Second logical drive, whole disk
... ...
+108 -76
View File
@@ -1,99 +1,131 @@
Platform Devices and Drivers Platform Devices and Drivers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
See <linux/platform_device.h> for the driver model interface to the
platform bus: platform_device, and platform_driver. This pseudo-bus
is used to connect devices on busses with minimal infrastructure,
like those used to integrate peripherals on many system-on-chip
processors, or some "legacy" PC interconnects; as opposed to large
formally specified ones like PCI or USB.
Platform devices Platform devices
~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~
Platform devices are devices that typically appear as autonomous Platform devices are devices that typically appear as autonomous
entities in the system. This includes legacy port-based devices and entities in the system. This includes legacy port-based devices and
host bridges to peripheral buses. host bridges to peripheral buses, and most controllers integrated
into system-on-chip platforms. What they usually have in common
is direct addressing from a CPU bus. Rarely, a platform_device will
be connected through a segment of some other kind of bus; but its
registers will still be directly addressible.
Platform devices are given a name, used in driver binding, and a
list of resources such as addresses and IRQs.
struct platform_device {
const char *name;
u32 id;
struct device dev;
u32 num_resources;
struct resource *resource;
};
Platform drivers Platform drivers
~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~
Drivers for platform devices are typically very simple and Platform drivers follow the standard driver model convention, where
unstructured. Either the device was present at a particular I/O port discovery/enumeration is handled outside the drivers, and drivers
and the driver was loaded, or it was not. There was no possibility provide probe() and remove() methods. They support power management
of hotplugging or alternative discovery besides probing at a specific and shutdown notifications using the standard conventions.
I/O address and expecting a specific response.
struct platform_driver {
int (*probe)(struct platform_device *);
int (*remove)(struct platform_device *);
void (*shutdown)(struct platform_device *);
int (*suspend)(struct platform_device *, pm_message_t state);
int (*suspend_late)(struct platform_device *, pm_message_t state);
int (*resume_early)(struct platform_device *);
int (*resume)(struct platform_device *);
struct device_driver driver;
};
Note that probe() should general verify that the specified device hardware
actually exists; sometimes platform setup code can't be sure. The probing
can use device resources, including clocks, and device platform_data.
Platform drivers register themselves the normal way:
int platform_driver_register(struct platform_driver *drv);
Or, in common situations where the device is known not to be hot-pluggable,
the probe() routine can live in an init section to reduce the driver's
runtime memory footprint:
int platform_driver_probe(struct platform_driver *drv,
int (*probe)(struct platform_device *))
Other Architectures, Modern Firmware, and new Platforms Device Enumeration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~
These devices are not always at the legacy I/O ports. This is true on As a rule, platform specific (and often board-specific) setup code wil
other architectures and on some modern architectures. In most cases, register platform devices:
the drivers are modified to discover the devices at other well-known
ports for the given platform. However, the firmware in these systems int platform_device_register(struct platform_device *pdev);
does usually know where exactly these devices reside, and in some
cases, it's the only way of discovering them. int platform_add_devices(struct platform_device **pdevs, int ndev);
The general rule is to register only those devices that actually exist,
but in some cases extra devices might be registered. For example, a kernel
might be configured to work with an external network adapter that might not
be populated on all boards, or likewise to work with an integrated controller
that some boards might not hook up to any peripherals.
In some cases, boot firmware will export tables describing the devices
that are populated on a given board. Without such tables, often the
only way for system setup code to set up the correct devices is to build
a kernel for a specific target board. Such board-specific kernels are
common with embedded and custom systems development.
In many cases, the memory and IRQ resources associated with the platform
device are not enough to let the device's driver work. Board setup code
will often provide additional information using the device's platform_data
field to hold additional information.
Embedded systems frequently need one or more clocks for platform devices,
which are normally kept off until they're actively needed (to save power).
System setup also associates those clocks with the device, so that that
calls to clk_get(&pdev->dev, clock_name) return them as needed.
The Platform Bus Device Naming and Driver Binding
~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A platform bus has been created to deal with these issues. First and The platform_device.dev.bus_id is the canonical name for the devices.
foremost, it groups all the legacy devices under a common bus, and It's built from two components:
gives them a common parent if they don't already have one.
But, besides the organizational benefits, the platform bus can also * platform_device.name ... which is also used to for driver matching.
accommodate firmware-based enumeration.
* platform_device.id ... the device instance number, or else "-1"
to indicate there's only one.
Device Discovery These are catenated, so name/id "serial"/0 indicates bus_id "serial.0", and
~~~~~~~~~~~~~~~~ "serial/3" indicates bus_id "serial.3"; both would use the platform_driver
The platform bus has no concept of probing for devices. Devices named "serial". While "my_rtc"/-1 would be bus_id "my_rtc" (no instance id)
discovery is left up to either the legacy drivers or the and use the platform_driver called "my_rtc".
firmware. These entities are expected to notify the platform of
devices that it discovers via the bus's add() callback:
platform_bus.add(parent,bus_id). Driver binding is performed automatically by the driver core, invoking
driver probe() after finding a match between device and driver. If the
probe() succeeds, the driver and device are bound as usual. There are
three different ways to find such a match:
- Whenever a device is registered, the drivers for that bus are
checked for matches. Platform devices should be registered very
early during system boot.
Bus IDs - When a driver is registered using platform_driver_register(), all
~~~~~~~ unbound devices on that bus are checked for matches. Drivers
Bus IDs are the canonical names for the devices. There is no globally usually register later during booting, or by module loading.
standard addressing mechanism for legacy devices. In the IA-32 world,
we have Pnp IDs to use, as well as the legacy I/O ports. However,
neither tell what the device really is or have any meaning on other
platforms.
Since both PnP IDs and the legacy I/O ports (and other standard I/O - Registering a driver using platform_driver_probe() works just like
ports for specific devices) have a 1:1 mapping, we map the using platform_driver_register(), except that the the driver won't
platform-specific name or identifier to a generic name (at least be probed later if another device registers. (Which is OK, since
within the scope of the kernel). this interface is only for use with non-hotpluggable devices.)
For example, a serial driver might find a device at I/O 0x3f8. The
ACPI firmware might also discover a device with PnP ID (_HID)
PNP0501. Both correspond to the same device and should be mapped to the
canonical name 'serial'.
The bus_id field should be a concatenation of the canonical name and
the instance of that type of device. For example, the device at I/O
port 0x3f8 should have a bus_id of "serial0". This places the
responsibility of enumerating devices of a particular type up to the
discovery mechanism. But, they are the entity that should know best
(as opposed to the platform bus driver).
Drivers
~~~~~~~
Drivers for platform devices should have a name that is the same as
the canonical name of the devices they support. This allows the
platform bus driver to do simple matching with the basic data
structures to determine if a driver supports a certain device.
For example, a legacy serial driver should have a name of 'serial' and
register itself with the platform bus.
Driver Binding
~~~~~~~~~~~~~~
Legacy drivers assume they are bound to the device once they start up
and probe an I/O port. Divorcing them from this will be a difficult
process. However, that shouldn't prevent us from implementing
firmware-based enumeration.
The firmware should notify the platform bus about devices before the
legacy drivers have had a chance to load. Once the drivers are loaded,
they driver model core will attempt to bind the driver to any
previously-discovered devices. Once that has happened, it will be free
to discover any other devices it pleases.
+1 -1
View File
@@ -92,7 +92,7 @@ struct device represents a single device. It mainly contains metadata
describing the relationship the device has to other entities. describing the relationship the device has to other entities.
- Embedd a struct device in the bus-specific device type. - Embed a struct device in the bus-specific device type.
struct pci_dev { struct pci_dev {

Some files were not shown because too many files have changed in this diff Show More