Commit Graph

7175 Commits

Author SHA1 Message Date
Miao Xie 8118a859dc sysfs: fix off-by-one error in fill_read_buffer()
I found that there is a off-by-one problem in the following code.

Version:	2.6.24-rc2
File:		fs/sysfs/file.c:118-122
Function:	fill_read_buffer
--------------------------------------------------------------------
	count = ops->show(kobj, attr_sd->s_attr.attr, buffer->page);

	sysfs_put_active_two(attr_sd);

	BUG_ON(count > (ssize_t)PAGE_SIZE);
--------------------------------------------------------------------

Because according to the specification of the sysfs and the implement of
the show methods, the show methods return the number of bytes which would
be generated for the given input, excluding the trailing null.So if the
return value of the show methods equals PAGE_SIZE - 1, the buffer is full
in fact.  And if the return value equals PAGE_SIZE, the resulting string
was already truncated,or buffer overflow occurred.

This patch fixes an off-by-one error in fill_read_buffer.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Tejun Heo <teheo@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-28 13:53:53 -08:00
Mark Fasheh b1967d0edd ocfs2: reverse inline-data truncate args
ocfs2_truncate() and ocfs2_remove_inode_range() had reversed their "set
i_size" arguments to ocfs2_truncate_inline(). Fix things so that truncate
sets i_size, and punching a hole ignores it.

This exposed a problem where punching a hole in an inline-data file wasn't
updating the page cache, so fix that too.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:03 -08:00
Mark Fasheh 0d8a4e0cd6 ocfs2: Fix comparison in ocfs2_size_fits_inline_data()
This was causing us to prematurely push out inline data by one byte.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:03 -08:00
Mark Fasheh bccb9dad89 ocfs2: Remove bug statement in ocfs2_dentry_iput()
The existing bug statement didn't take into account unhashed dentries which
might not have a cluster lock on them. This could happen if a node exporting
the file system via NFS is rebooted, re-exported to nfs clients and then
unmounted. It's fine in this case to not have a dentry cluster lock.

Just remove the bug statement and replace it with an error print, which
does the proper checks. Though we want to know if something has happened
which might have prevented a cluster lock from being created, it's
definitely not necessary to panic the machine for this.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:02 -08:00
Jan Kara 5a58c3ef22 [PATCH] ocfs2: Remove expensive bitmap scanning
Enable expensive bitmap scanning only if DEBUG option is enabled.
The bitmap scanning quite loads the CPU and on my machine the write
throughput of dd if=/dev/zero of=/ocfs2/file bs=1M count=500 conv=sync
improves from 37 MB/s to 45.4 MB/s in local mode...

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:02 -08:00
Mark Fasheh a46043e08f ocfs2: log valid inode # on bad inode
If the inode block isn't valid then we don't want to print the value from
that, instead print the block number which was passed in (which should
always be correct). Also, turn this into a debug print for now - folks who
hit an actual problem always have other logs indicating what the source is.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:02 -08:00
Mark Fasheh ef9f86ceb6 ocfs2: Filter -ENOSPC in mlog_errno()
It's almost never worth printing in that situation and we keep forgetting to
manually filter it out.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:01 -08:00
Joe Perches 2759236f84 [PATCH] fs/ocfs2: Add missing "space"
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:01 -08:00
Mark Fasheh e001e796e4 ocfs2: Reset journal parameters after s_mount_opt update
Right now we're just setting them from the existing parameters, not the
new ones that a remount specified.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:01 -08:00
Linus Torvalds 423eaf8f00 Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6:
  NFS: Clean up new multi-segment direct I/O changes
  NFS: Ensure we return zero if applications attempt to write zero bytes
  NFS: Support multiple segment iovecs in the NFS direct I/O path
  NFS: Introduce iovec I/O helpers to fs/nfs/direct.c
  SUNRPC: Add missing "space" to net/sunrpc/auth_gss.c
  SUNRPC: make sunrpc/xprtsock.c:xs_setup_{udp,tcp}() static
  NFS: fs/nfs/dir.c should #include "internal.h"
  NFS: make nfs_wb_page_priority() static
  NFS: mount failure causes bad page state
  SUNRPC: remove NFS/RDMA client's binary sysctls
  kernel BUG at fs/nfs/namespace.c:108! - can be triggered by bad server
  sunrpc: rpc_pipe_poll may miss available data in some cases
  sunrpc: return error if unsupported enctype or cksumtype is encountered
  sunrpc: gss_pipe_downcall(), don't assume all errors are transient
  NFS: Fix the ustat() regression
2007-11-26 19:42:59 -08:00
Linus Torvalds 0685ab4fb8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
  sched: bump version of kernel/sched_debug.c
  sched: fix minimum granularity tunings
  sched: fix RLIMIT_CPU comment
  sched: fix kernel/acct.c comment
  sched: fix prev_stime calculation
  sched: don't forget to unlock uids_mutex on error paths
2007-11-26 19:42:08 -08:00
Chuck Lever 02fe494619 NFS: Clean up new multi-segment direct I/O changes
Simplify calling sequence of nfs_direct_{read,write}_schedule(), and
rename them to reflect their new role.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:32:40 -05:00
Chuck Lever b9148c6b80 NFS: Ensure we return zero if applications attempt to write zero bytes
A zero byte count direct write request should be a successful no-op, not an
error.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:32:38 -05:00
Chuck Lever c216fd708e NFS: Support multiple segment iovecs in the NFS direct I/O path
Allow applications to perform asynchronous scatter-gather direct I/O
to NFS files.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:32:36 -05:00
Chuck Lever 19f737879c NFS: Introduce iovec I/O helpers to fs/nfs/direct.c
Add helpers that iterate over multi-segment iovecs.  These will
be used to support multi-segment scatter/gather direct I/O in a
later patch.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:32:35 -05:00
Adrian Bunk 4c30d56edc NFS: fs/nfs/dir.c should #include "internal.h"
Every file should include the headers containing the prototypes for its global
functions (in this case nfs_access_cache_shrinker()).

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:24:49 -05:00
Adrian Bunk 5334eb13d4 NFS: make nfs_wb_page_priority() static
nfs_wb_page_priority() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:24:48 -05:00
Russell King f16c960332 NFS: mount failure causes bad page state
While testing a kernel based upon ecd744eec3
(with wrong boot arguments), I got the following bad page state entry while
NFS was trying to mount it's rootfs:

IP-Config: Complete:
      device=eth0, addr=192.168.1.101, mask=255.255.255.0, gw=255.255.255.255,
     host=192.168.1.101, domain=, nis-domain=(none),
     bootserver=192.168.1.100, rootserver=192.168.1.100, rootpath=
Looking up port of RPC 100003/2 on 192.168.1.100
rpcbind: server 192.168.1.100 not responding, timed out
Root-NFS: Unable to get nfsd port number from server, using default
Looking up port of RPC 100005/1 on 192.168.1.100
rpcbind: server 192.168.1.100 not responding, timed out
Root-NFS: Unable to get mountd port number from server, using default
mount: server 192.168.1.100 not responding, timed out
Root-NFS: Server returned error -5 while mounting /nfs/rootfs/
VFS: Unable to mount root fs via NFS, trying floppy.
Bad page state in process 'swapper'
page:c02b1260 flags:0x00000400 mapping:00000000 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
[<c0023e34>] (dump_stack+0x0/0x14) from [<c0062570>] (bad_page+0x70/0xac)
[<c0062500>] (bad_page+0x0/0xac) from [<c0064914>] (free_hot_cold_page+0x80/0x178)
[<c0064894>] (free_hot_cold_page+0x0/0x178) from [<c0064a74>] (free_hot_page+0x14/0x18)
[<c0064a60>] (free_hot_page+0x0/0x18) from [<c0067078>] (put_page+0xf8/0x154)
[<c0066f80>] (put_page+0x0/0x154) from [<c007dbc8>] (kfree+0xc8/0xd0)
[<c007db00>] (kfree+0x0/0xd0) from [<c00cbb54>] (nfs_get_sb+0x230/0x710)
[<c00cb924>] (nfs_get_sb+0x0/0x710) from [<c0084334>] (vfs_kern_mount+0x58/0xac)[<c00842dc>] (vfs_kern_mount+0x0/0xac) from [<c00843c0>] (do_kern_mount+0x38/0xf4)
[<c0084388>] (do_kern_mount+0x0/0xf4) from [<c0099c7c>] (do_mount+0x1e8/0x614)
...

This seems to be caused by use of an uninitialised structure due to NULL
options being passed to nfs_validate_mount_data().  Ensure that the
parsed mount data is always initialised.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
     (Trond: added fix for the same bug in nfs4_validate_mount_data()).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:24:22 -05:00
Ingo Molnar 08e4570a4a sched: fix prev_stime calculation
Srivatsa Vaddagiri noticed occasionally incorrect CPU usage
values in top and tracked it down to stime going below 0 in
task_stime(). Negative values are possible there due to the
sampled nature of stime/utime.

Fix suggested by Balbir Singh.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>
2007-11-26 21:21:49 +01:00
Steve French 2b83457bde [CIFS] Fix check after use error in ACL code
Spotted by the coverity scanner.

CC: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-25 10:01:00 +00:00
Steve French 058250a0d5 Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 2007-11-25 09:53:27 +00:00
Jeff Layton cea218054a [CIFS] Fix potential data corruption when writing out cached dirty pages
Fix RedHat bug 329431

The idea here is separate "conscious" from "unconscious" flushes.
Conscious flushes are those due to a fsync() or close(). Unconscious
ones are flushes that occur as a side effect of some other operation or
due to memory pressure.

Currently, when an error occurs during an unconscious flush (ENOSPC or
EIO), we toss out the page and don't preserve that error to report to
the user when a conscious flush occurs. If after the unconscious flush,
there are no more dirty pages for the inode, the conscious flush will
simply return success even though there were previous errors when writing
out pages. This can lead to data corruption.

The easiest way to reproduce this is to mount up a CIFS share that's
very close to being full or where the user is very close to quota. mv
a file to the share that's slightly larger than the quota allows. The
writes will all succeed (since they go to pagecache). The mv will do a
setattr to set the new file's attributes. This calls
filemap_write_and_wait,
which will return an error since all of the pages can't be written out.
Then later, when the flush and release ops occur, there are no more
dirty pages in pagecache for the file and those operations return 0. mv
then assumes that the file was written out correctly and deletes the
original.

CIFS already has a write_behind_rc variable where it stores the results
from earlier flushes, but that value is only reported in cifs_close.
Since the VFS ignores the return value from the release operation, this
isn't helpful. We should be reporting this error during the flush
operation.

This patch does the following:

1) changes cifs_fsync to use filemap_write_and_wait and cifs_flush and also
sync to check its return code. If it returns successful, they then check
the value of write_behind_rc to see if an earlier flush had reported any
errors. If so, they return that error and clear write_behind_rc.

2) sets write_behind_rc in a few other places where pages are written
out as a side effect of other operations and the code waits on them.

3) changes cifs_setattr to only call filemap_write_and_wait for
ATTR_SIZE changes.

4) makes cifs_writepages accurately distinguish between EIO and ENOSPC
errors when writing out pages.

Some simple testing indicates that the patch works as expected and that
it fixes the reproduceable known problem.

Acked-by: Dave Kleikamp <shaggy@austin.rr.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-20 23:19:03 +00:00
Petr Tesarik 2a97468024 [CIFS] Fix spurious reconnect on 2nd peek from read of SMB length
When retrying kernel_recvmsg() because of a short read, check returned
length against the remaining length, not against total length. This
avoids unneeded session reconnects which would otherwise occur when
kernel_recvmsg() finally returns zero when asked to read zero bytes.

Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-20 02:24:08 +00:00
Neil Brown 4c1fe2f78a kernel BUG at fs/nfs/namespace.c:108! - can be triggered by bad server
Hi Trond,

I have discovered that the BUG_ON in nfs_follow_mountpoint:

	BUG_ON(IS_ROOT(dentry));

can be triggered by a misbehaving server.

What happens is the client does a lookup and discoveres that the named
directory has a different fsid, so it initiates a mount.
It then performs a GETATTR on the mounted directory and gets a
different fsid again (due to a bug in the NFS server).
This causes nfs_follow_mountpoint to be called on the newly mounted
root, which triggers the BUG_ON.

To duplicate this, have a directory which contains some mountpoints,
and export that directory with the "crossmnt" flag using nfs-utils
1.1.1 (or 1.1.0 I think)

The GETATTR on the root of the mounted filesystem will return the
information for the top exportpoint, while a lookup will return the
correct information.  This difference causes the NFS client to BUG.

I think the best way to fix this is to trap this possibility early, so
just before completing the mount in the NFS client, check that it isn't
going to use nfs_mountpoint_inode_operations.
As long as i_op will never change once set (is that true?), this
should be adequately safe.

The following patch shows a possible approach, and it works for me.
i.e. when the NFS server is misbehaving, I get ESTALE on those
mountpoints, while when the NFS server is working correctly, I get
correct behaviour on the client.

NeilBrown

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-17 13:08:48 -05:00
Trond Myklebust b09b9417d0 NFS: Fix the ustat() regression
Since 2.6.18, the superblock sb->s_root has been a dummy dentry with a
dummy inode. This breaks ustat(), which actually uses sb->s_root in a
vfstat() call.

Fix this by making the s_root a dummy alias to the directory inode that was
used when creating the superblock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-17 13:08:44 -05:00