Merge branch 'master' of /home/shaggy/git/linus-clean/

2026-05-01 15:00:59 -07:00 · 2009-02-02 13:40:55 -06:00
parent 1ad53a98c9 45c82b5a77
commit 8db0c5d5ef
2789 changed files with 127217 additions and 28739 deletions
@@ -3786,14 +3786,11 @@ S: The Netherlands

 N: David Woodhouse
 E: dwmw2@infradead.org
-D: ARCnet stuff, Applicom board driver, SO_BINDTODEVICE,
-D: some Alpha platform porting from 2.0, Memory Technology Devices,
-D: Acquire watchdog timer, PC speaker driver maintenance,
+D: JFFS2 file system, Memory Technology Device subsystem,
 D: various other stuff that annoyed me by not working.
-S: c/o Red Hat Engineering
-S: Rustat House
-S: 60 Clifton Road
-S: Cambridge. CB1 7EG
+S: c/o Intel Corporation
+S: Pipers Way
+S: Swindon. SN3 1RJ
 S: England

 N: Chris Wright
@@ -33,10 +33,12 @@ o  Gnu make               3.79.1                  # make --version
 o  binutils               2.12                    # ld -v
 o  util-linux             2.10o                   # fdformat --version
 o  module-init-tools      0.9.10                  # depmod -V
-o  e2fsprogs              1.29                    # tune2fs
+o  e2fsprogs              1.41.4                  # e2fsck -V
 o  jfsutils               1.1.3                   # fsck.jfs -V
 o  reiserfsprogs          3.6.3                   # reiserfsck -V 2>&1|grep reiserfsprogs
 o  xfsprogs               2.6.0                   # xfs_db -V
+o  squashfs-tools         4.0                     # mksquashfs -version
+o  btrfs-progs            0.18                    # btrfsck
 o  pcmciautils            004                     # pccardctl -V
 o  quota-tools            3.09                    # quota -V
 o  PPP                    2.4.0                   # pppd --version
@@ -483,17 +483,25 @@ values.  To do the latter, you can stick the following in your .emacs file:
    (* (max steps 1)
       c-basic-offset)))

+(add-hook 'c-mode-common-hook
+          (lambda ()
+            ;; Add kernel style
+            (c-add-style
+             "linux-tabs-only"
+             '("linux" (c-offsets-alist
+                        (arglist-cont-nonempty
+                         c-lineup-gcc-asm-reg
+                         c-lineup-arglist-tabs-only))))))
+
 (add-hook 'c-mode-hook
          (lambda ()
            (let ((filename (buffer-file-name)))
              ;; Enable kernel mode for the appropriate files
              (when (and filename
-                         (string-match "~/src/linux-trees" filename))
+                         (string-match (expand-file-name "~/src/linux-trees")
+                                       filename))
                (setq indent-tabs-mode t)
-                (c-set-style "linux")
-                (c-set-offset 'arglist-cont-nonempty
-                              '(c-lineup-gcc-asm-reg
-                                c-lineup-arglist-tabs-only))))))
+                (c-set-style "linux-tabs-only")))))

 This will make emacs go better with the kernel coding style for C
 files below ~/src/linux-trees.
@@ -5,7 +5,7 @@

 This document describes the DMA API.  For a more gentle introduction
 phrased in terms of the pci_ equivalents (and actual examples) see
-DMA-mapping.txt
+Documentation/PCI/PCI-DMA-mapping.txt.

 This API is split into two pieces.  Part I describes the API and the
 corresponding pci_ API.  Part II describes the extensions to the API
@@ -170,16 +170,15 @@ Returns: 0 if successful and a negative error if not.
 u64
 dma_get_required_mask(struct device *dev)

-After setting the mask with dma_set_mask(), this API returns the
-actual mask (within that already set) that the platform actually
-requires to operate efficiently.  Usually this means the returned mask
+This API returns the mask that the platform requires to
+operate efficiently.  Usually this means the returned mask
 is the minimum required to cover all of memory.  Examining the
 required mask gives drivers with variable descriptor sizes the
 opportunity to use smaller descriptors as necessary.

 Requesting the required mask does not alter the current mask.  If you
-wish to take advantage of it, you should issue another dma_set_mask()
-call to lower the mask again.
+wish to take advantage of it, you should issue a dma_set_mask()
+call to set the mask to the value returned.


 Part Id - Streaming DMA mappings
@@ -41,6 +41,12 @@ GPL version 2.
 </abstract>

 <revhistory>
+	<revision>
+	<revnumber>0.7</revnumber>
+	<date>2008-12-23</date>
+	<authorinitials>hjk</authorinitials>
+	<revremark>Added generic platform drivers and offset attribute.</revremark>
+	</revision>
 	<revision>
 	<revnumber>0.6</revnumber>
 	<date>2008-12-05</date>
@@ -312,6 +318,16 @@ interested in translating it, please email me
 	pointed to by addr.
 	</para>
 </listitem>
+<listitem>
+	<para>
+	<filename>offset</filename>: The offset, in bytes, that has to be
+	added to the pointer returned by <function>mmap()</function> to get
+	to the actual device memory. This is important if the device's memory
+	is not page aligned. Remember that pointers returned by
+	<function>mmap()</function> are always page aligned, so it is good
+	style to always add this offset.
+	</para>
+</listitem>
 </itemizedlist>

 <para>
@@ -594,6 +610,78 @@ framework to set up sysfs files for this region. Simply leave it alone.
 	</para>
 </sect1>

+<sect1 id="using_uio_pdrv">
+<title>Using uio_pdrv for platform devices</title>
+	<para>
+	In many cases, UIO drivers for platform devices can be handled in a
+	generic way. In the same place where you define your
+	<varname>struct platform_device</varname>, you simply also implement
+	your interrupt handler and fill your
+	<varname>struct uio_info</varname>. A pointer to this
+	<varname>struct uio_info</varname> is then used as
+	<varname>platform_data</varname> for your platform device.
+	</para>
+	<para>
+	You also need to set up an array of <varname>struct resource</varname>
+	containing addresses and sizes of your memory mappings. This
+	information is passed to the driver using the
+	<varname>.resource</varname> and <varname>.num_resources</varname>
+	elements of <varname>struct platform_device</varname>.
+	</para>
+	<para>
+	You now have to set the <varname>.name</varname> element of
+	<varname>struct platform_device</varname> to
+	<varname>"uio_pdrv"</varname> to use the generic UIO platform device
+	driver. This driver will fill the <varname>mem[]</varname> array
+	according to the resources given, and register the device.
+	</para>
+	<para>
+	The advantage of this approach is that you only have to edit a file
+	you need to edit anyway. You do not have to create an extra driver.
+	</para>
+</sect1>
+
+<sect1 id="using_uio_pdrv_genirq">
+<title>Using uio_pdrv_genirq for platform devices</title>
+	<para>
+	Especially in embedded devices, you frequently find chips where the
+	irq pin is tied to its own dedicated interrupt line. In such cases,
+	where you can be really sure the interrupt is not shared, we can take
+	the concept of <varname>uio_pdrv</varname> one step further and use a
+	generic interrupt handler. That's what
+	<varname>uio_pdrv_genirq</varname> does.
+	</para>
+	<para>
+	The setup for this driver is the same as described above for
+	<varname>uio_pdrv</varname>, except that you do not implement an
+	interrupt handler. The <varname>.handler</varname> element of
+	<varname>struct uio_info</varname> must remain
+	<varname>NULL</varname>. The  <varname>.irq_flags</varname> element
+	must not contain <varname>IRQF_SHARED</varname>.
+	</para>
+	<para>
+	You will set the <varname>.name</varname> element of
+	<varname>struct platform_device</varname> to
+	<varname>"uio_pdrv_genirq"</varname> to use this driver.
+	</para>
+	<para>
+	The generic interrupt handler of <varname>uio_pdrv_genirq</varname>
+	will simply disable the interrupt line using
+	<function>disable_irq_nosync()</function>. After doing its work,
+	userspace can reenable the interrupt by writing 0x00000001 to the UIO
+	device file. The driver already implements an
+	<function>irq_control()</function> to make this possible, you must not
+	implement your own.
+	</para>
+	<para>
+	Using <varname>uio_pdrv_genirq</varname> not only saves a few lines of
+	interrupt handler code. You also do not need to know anything about
+	the chip's internal registers to create the kernel part of the driver.
+	All you need to know is the irq number of the pin the chip is
+	connected to.
+	</para>
+</sect1>
+
 </chapter>

 <chapter id="userspace_driver" xreflabel="Writing a driver in user space">
@@ -1,6 +1,6 @@
 [ NOTE: The virt_to_bus() and bus_to_virt() functions have been
-	superseded by the functionality provided by the PCI DMA
-	interface (see Documentation/DMA-mapping.txt).  They continue
+	superseded by the functionality provided by the PCI DMA interface
+	(see Documentation/PCI/PCI-DMA-mapping.txt).  They continue
 	to be documented below for historical purposes, but new code
 	must not use them. --davidm 00/12/12 ]

@@ -392,6 +392,10 @@ int main(int argc, char *argv[])
 			goto err;
 		}
 	}
+	if (!maskset && !tid && !containerset) {
+		usage();
+		goto err;
+	}

 	do {
 		int i;
@@ -186,8 +186,9 @@ a virtual address mapping (unlike the earlier scheme of virtual address
 do not have a corresponding kernel virtual address space mapping) and
 low-memory pages.

-Note: Please refer to DMA-mapping.txt for a discussion on PCI high mem DMA
-aspects and mapping of scatter gather lists, and support for 64 bit PCI.
+Note: Please refer to Documentation/PCI/PCI-DMA-mapping.txt for a discussion
+on PCI high mem DMA aspects and mapping of scatter gather lists, and support
+for 64 bit PCI.

 Special handling is required only for cases where i/o needs to happen on
 pages at physical memory addresses beyond what the device can support. In these
@@ -953,14 +954,14 @@ elevator_allow_merge_fn		called whenever the block layer determines
 				results in some sort of conflict internally,
 				this hook allows it to do that.

-elevator_dispatch_fn		fills the dispatch queue with ready requests.
+elevator_dispatch_fn*		fills the dispatch queue with ready requests.
 				I/O schedulers are free to postpone requests by
 				not filling the dispatch queue unless @force
 				is non-zero.  Once dispatched, I/O schedulers
 				are not allowed to manipulate the requests -
 				they belong to generic dispatch queue.

-elevator_add_req_fn		called to add a new request into the scheduler
+elevator_add_req_fn*		called to add a new request into the scheduler

 elevator_queue_empty_fn		returns true if the merge queue is empty.
 				Drivers shouldn't use this, but rather check
@@ -990,7 +991,7 @@ elevator_activate_req_fn	Called when device driver first sees a request.
 elevator_deactivate_req_fn	Called when device driver decides to delay
 				a request by requeueing it.

-elevator_init_fn
+elevator_init_fn*
 elevator_exit_fn		Allocate and free any elevator specific storage
 				for a queue.

@@ -1,7 +1,8 @@
 				CGROUPS
 				-------

-Written by Paul Menage <menage@google.com> based on Documentation/cpusets.txt
+Written by Paul Menage <menage@google.com> based on
+Documentation/cgroups/cpusets.txt

 Original copyright statements from cpusets.txt:
 Portions Copyright (C) 2004 BULL SA.
@@ -68,7 +69,7 @@ On their own, the only use for cgroups is for simple job
 tracking. The intention is that other subsystems hook into the generic
 cgroup support to provide new attributes for cgroups, such as
 accounting/limiting the resources which processes in a cgroup can
-access. For example, cpusets (see Documentation/cpusets.txt) allows
+access. For example, cpusets (see Documentation/cgroups/cpusets.txt) allows
 you to associate a set of CPUs and a set of memory nodes with the
 tasks in each cgroup.

@@ -1,12 +1,12 @@
 Memory Resource Controller(Memcg)  Implementation Memo.
-Last Updated: 2008/12/15
-Base Kernel Version: based on 2.6.28-rc8-mm.
+Last Updated: 2009/1/19
+Base Kernel Version: based on 2.6.29-rc2.

 Because VM is getting complex (one of reasons is memcg...), memcg's behavior
 is complex. This is a document for memcg's internal behavior.
 Please note that implementation details can be changed.

-(*) Topics on API should be in Documentation/controllers/memory.txt)
+(*) Topics on API should be in Documentation/cgroups/memory.txt)

 0. How to record usage ?
   2 objects are used.
@@ -340,3 +340,23 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
 	# mount -t cgroup none /cgroup -t cpuset,memory,cpu,devices

 	and do task move, mkdir, rmdir etc...under this.
+
+ 9.7 swapoff.
+	Besides management of swap is one of complicated parts of memcg,
+	call path of swap-in at swapoff is not same as usual swap-in path..
+	It's worth to be tested explicitly.
+
+	For example, test like following is good.
+	(Shell-A)
+	# mount -t cgroup none /cgroup -t memory
+	# mkdir /cgroup/test
+	# echo 40M > /cgroup/test/memory.limit_in_bytes
+	# echo 0 > /cgroup/test/tasks
+	Run malloc(100M) program under this. You'll see 60M of swaps.
+	(Shell-B)
+	# move all tasks in /cgroup/test to /cgroup
+	# /sbin/swapoff -a
+	# rmdir /test/cgroup
+	# kill malloc task.
+
+	Of course, tmpfs v.s. swapoff test should be tested, too.
@@ -13,9 +13,9 @@
 3.6 Constraints
 3.7 Example

-4 DRIVER DEVELOPER NOTES
+4 DMAENGINE DRIVER DEVELOPER NOTES
 4.1 Conformance points
-4.2 "My application needs finer control of hardware channels"
+4.2 "My application needs exclusive control of hardware channels"

 5 SOURCE

@@ -150,6 +150,7 @@ ops_run_* and ops_complete_* routines in drivers/md/raid5.c for more
 implementation examples.

 4 DRIVER DEVELOPMENT NOTES
+
 4.1 Conformance points:
 There are a few conformance points required in dmaengine drivers to
 accommodate assumptions made by applications using the async_tx API:
@@ -158,58 +159,49 @@ accommodate assumptions made by applications using the async_tx API:
 3/ Use async_tx_run_dependencies() in the descriptor clean up path to
   handle submission of dependent operations

-4.2 "My application needs finer control of hardware channels"
-This requirement seems to arise from cases where a DMA engine driver is
-trying to support device-to-memory DMA.  The dmaengine and async_tx
-implementations were designed for offloading memory-to-memory
-operations; however, there are some capabilities of the dmaengine layer
-that can be used for platform-specific channel management.
-Platform-specific constraints can be handled by registering the
-application as a 'dma_client' and implementing a 'dma_event_callback' to
-apply a filter to the available channels in the system.  Before showing
-how to implement a custom dma_event callback some background of
-dmaengine's client support is required.
+4.2 "My application needs exclusive control of hardware channels"
+Primarily this requirement arises from cases where a DMA engine driver
+is being used to support device-to-memory operations.  A channel that is
+performing these operations cannot, for many platform specific reasons,
+be shared.  For these cases the dma_request_channel() interface is
+provided.

-The following routines in dmaengine support multiple clients requesting
-use of a channel:
- dma_async_client_register(struct dma_client *client)
- dma_async_client_chan_request(struct dma_client *client)
+The interface is:
+struct dma_chan *dma_request_channel(dma_cap_mask_t mask,
+				     dma_filter_fn filter_fn,
+				     void *filter_param);

-dma_async_client_register takes a pointer to an initialized dma_client
-structure.  It expects that the 'event_callback' and 'cap_mask' fields
-are already initialized.
+Where dma_filter_fn is defined as:
+typedef bool (*dma_filter_fn)(struct dma_chan *chan, void *filter_param);

-dma_async_client_chan_request triggers dmaengine to notify the client of
-all channels that satisfy the capability mask.  It is up to the client's
-event_callback routine to track how many channels the client needs and
-how many it is currently using.  The dma_event_callback routine returns a
-dma_state_client code to let dmaengine know the status of the
-allocation.
+When the optional 'filter_fn' parameter is set to NULL
+dma_request_channel simply returns the first channel that satisfies the
+capability mask.  Otherwise, when the mask parameter is insufficient for
+specifying the necessary channel, the filter_fn routine can be used to
+disposition the available channels in the system. The filter_fn routine
+is called once for each free channel in the system.  Upon seeing a
+suitable channel filter_fn returns DMA_ACK which flags that channel to
+be the return value from dma_request_channel.  A channel allocated via
+this interface is exclusive to the caller, until dma_release_channel()
+is called.

-Below is the example of how to extend this functionality for
-platform-specific filtering of the available channels beyond the
-standard capability mask:
+The DMA_PRIVATE capability flag is used to tag dma devices that should
+not be used by the general-purpose allocator.  It can be set at
+initialization time if it is known that a channel will always be
+private.  Alternatively, it is set when dma_request_channel() finds an
+unused "public" channel.

-static enum dma_state_client
-my_dma_client_callback(struct dma_client *client,
-			struct dma_chan *chan, enum dma_state state)
-{
-	struct dma_device *dma_dev;
-	struct my_platform_specific_dma *plat_dma_dev;
-	
-	dma_dev = chan->device;
-	plat_dma_dev = container_of(dma_dev,
-				    struct my_platform_specific_dma,
-				    dma_dev);
-
-	if (!plat_dma_dev->platform_specific_capability)
-		return DMA_DUP;
-
-	. . .
-}
+A couple caveats to note when implementing a driver and consumer:
+1/ Once a channel has been privately allocated it will no longer be
+   considered by the general-purpose allocator even after a call to
+   dma_release_channel().
+2/ Since capabilities are specified at the device level a dma_device
+   with multiple channels will either have all channels public, or all
+   channels private.

 5 SOURCE
-include/linux/dmaengine.h: core header file for DMA drivers and clients
+
+include/linux/dmaengine.h: core header file for DMA drivers and api users
 drivers/dma/dmaengine.c: offload engine channel management routines
 drivers/dma/: location for offload engine drivers
 include/linux/async_tx.h: core header file for the async_tx api
@@ -0,0 +1 @@
+See Documentation/crypto/async-tx-api.txt
@@ -97,8 +97,8 @@ prototypes:
 	void (*put_super) (struct super_block *);
 	void (*write_super) (struct super_block *);
 	int (*sync_fs)(struct super_block *sb, int wait);
-	void (*write_super_lockfs) (struct super_block *);
-	void (*unlockfs) (struct super_block *);
+	int (*freeze_fs) (struct super_block *);
+	int (*unfreeze_fs) (struct super_block *);
 	int (*statfs) (struct dentry *, struct kstatfs *);
 	int (*remount_fs) (struct super_block *, int *, char *);
 	void (*clear_inode) (struct inode *);
@@ -119,8 +119,8 @@ delete_inode:		no
 put_super:		yes	yes	no
 write_super:		no	yes	read
 sync_fs:		no	no	read
-write_super_lockfs:	?
-unlockfs:		?
+freeze_fs:		?
+unfreeze_fs:		?
 statfs:			no	no	no
 remount_fs:		yes	yes	maybe		(see below)
 clear_inode:		no
@@ -0,0 +1,91 @@
+
+	BTRFS
+	=====
+
+Btrfs is a new copy on write filesystem for Linux aimed at
+implementing advanced features while focusing on fault tolerance,
+repair and easy administration. Initially developed by Oracle, Btrfs
+is licensed under the GPL and open for contribution from anyone.
+
+Linux has a wealth of filesystems to choose from, but we are facing a
+number of challenges with scaling to the large storage subsystems that
+are becoming common in today's data centers. Filesystems need to scale
+in their ability to address and manage large storage, and also in
+their ability to detect, repair and tolerate errors in the data stored
+on disk.  Btrfs is under heavy development, and is not suitable for
+any uses other than benchmarking and review. The Btrfs disk format is
+not yet finalized.
+
+The main Btrfs features include:
+
+    * Extent based file storage (2^64 max file size)
+    * Space efficient packing of small files
+    * Space efficient indexed directories
+    * Dynamic inode allocation
+    * Writable snapshots
+    * Subvolumes (separate internal filesystem roots)
+    * Object level mirroring and striping
+    * Checksums on data and metadata (multiple algorithms available)
+    * Compression
+    * Integrated multiple device support, with several raid algorithms
+    * Online filesystem check (not yet implemented)
+    * Very fast offline filesystem check
+    * Efficient incremental backup and FS mirroring (not yet implemented)
+    * Online filesystem defragmentation
+
+
+
+	MAILING LIST
+	============
+
+There is a Btrfs mailing list hosted on vger.kernel.org. You can
+find details on how to subscribe here:
+
+http://vger.kernel.org/vger-lists.html#linux-btrfs
+
+Mailing list archives are available from gmane:
+
+http://dir.gmane.org/gmane.comp.file-systems.btrfs
+
+
+
+	IRC
+	===
+
+Discussion of Btrfs also occurs on the #btrfs channel of the Freenode
+IRC network.
+
+
+
+	UTILITIES
+	=========
+
+Userspace tools for creating and manipulating Btrfs file systems are
+available from the git repository at the following location:
+
+ http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs-unstable.git
+ git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git
+
+These include the following tools:
+
+mkfs.btrfs: create a filesystem
+
+btrfsctl: control program to create snapshots and subvolumes:
+
+	mount /dev/sda2 /mnt
+	btrfsctl -s new_subvol_name /mnt
+	btrfsctl -s snapshot_of_default /mnt/default
+	btrfsctl -s snapshot_of_new_subvol /mnt/new_subvol_name
+	btrfsctl -s snapshot_of_a_snapshot /mnt/snapshot_of_new_subvol
+	ls /mnt
+	default snapshot_of_a_snapshot snapshot_of_new_subvol
+	new_subvol_name snapshot_of_default
+
+	Snapshots and subvolumes cannot be deleted right now, but you can
+	rm -rf all the files and directories inside them.
+
+btrfsck: do a limited check of the FS extent trees.
+
+btrfs-debug-tree: print all of the FS metadata in text form.  Example:
+
+	btrfs-debug-tree /dev/sda2 >& big_output_file
@@ -251,7 +251,7 @@ NFS/RDMA Setup

    Instruct the server to listen on the RDMA transport:

-    $ echo rdma 2050 > /proc/fs/nfsd/portlist
+    $ echo rdma 20049 > /proc/fs/nfsd/portlist

  - On the client system

@@ -263,7 +263,7 @@ NFS/RDMA Setup
    Regardless of how the client was built (module or built-in), use this
    command to mount the NFS/RDMA server:

-    $ mount -o rdma,port=2050 <IPoIB-server-name-or-address>:/<export> /mnt
+    $ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt

    To verify that the mount is using RDMA, run "cat /proc/mounts" and check
    the "proto" field for the given mount.
--- a/Show More
+++ b/Show More