New helpers: user_statfs() and fd_statfs(), taking userland pathname and
descriptor resp. and filling struct kstatfs. Syscalls of statfs family
(native, compat and foreign - osf and hpux on alpha and parisc resp.)
switched to those. Removes some boilerplate code, simplifies cleanup
on errors...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fix for a dumb preadv()/pwritev() compat bug - unlike the native
variants, compat_... ones forget to check FMODE_P{READ,WRITE}, so e.g.
on pipe the native preadv() will fail with -ESPIPE and compat one will
act as readv() and succeed. Not critical, but it's a clear bug with trivial
fix.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
f_flags and f_spare fields were not copied to userspace when
compat_sys_[f]statfs64 called.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The commit 7ed1ee6118 ("Take statfs variants to fs/statfs.c")
separates out statfs syscalls from fs/open.c. Thus the comment
should be changed also.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Jiri Kosina <trivial@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Note: this patch targets 2.6.37 and tries to be as simple as possible.
That is why it adds more copy-and-paste horror into fs/compat.c and
uglifies fs/exec.c, this will be cleanuped later.
compat_copy_strings() plays with bprm->vma/mm directly and thus has
two problems: it lacks the RLIMIT_STACK check and argv/envp memory
is not visible to oom killer.
Export acct_arg_size() and get_arg_page(), change compat_copy_strings()
to use get_arg_page(), change compat_do_execve() to do acct_arg_size(0)
as do_execve() does.
Add the fatal_signal_pending/cond_resched checks into compat_count() and
compat_copy_strings(), this matches the code in fs/exec.c and certainly
makes sense.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The definition of PAGE_CACHE_MASK in <linux/pagemap.h> is needed to use
MAX_RW_COUNT, and on x86-64 that gets done indirectly through the
architecture header includes. But on MIPS and s390 that doesn't happen,
and we need to make sure that fs/compat.c includes pagemap.h explicitly.
Introduced in commit 435f49a518 ("readv/writev: do the same
MAX_RW_COUNT truncation that read/write does").
Reported-by: Sachin Sant <sachinp@in.ibm.com> (S390)
Reported-by: wu zhangjin <wuzhangjin@gmail.com> (MIPS)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We used to protect against overflow, but rather than return an error, do
what read/write does, namely to limit the total size to MAX_RW_COUNT.
This is not only more consistent, but it also means that any broken
low-level read/write routine that still keeps counts in 'int' can't
break.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This merges the staging-next tree to Linus's tree and resolves
some conflicts that were present due to changes in other trees that were
affected by files here.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: (99 commits)
svcrpc: svc_tcp_sendto XPT_DEAD check is redundant
svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue
svcrpc: assume svc_delete_xprt() called only once
svcrpc: never clear XPT_BUSY on dead xprt
nfsd4: fix connection allocation in sequence()
nfsd4: only require krb5 principal for NFSv4.0 callbacks
nfsd4: move minorversion to client
nfsd4: delay session removal till free_client
nfsd4: separate callback change and callback probe
nfsd4: callback program number is per-session
nfsd4: track backchannel connections
nfsd4: confirm only on succesful create_session
nfsd4: make backchannel sequence number per-session
nfsd4: use client pointer to backchannel session
nfsd4: move callback setup into session init code
nfsd4: don't cache seq_misordered replies
SUNRPC: Properly initialize sock_xprt.srcaddr in all cases
SUNRPC: Use conventional switch statement when reclassifying sockets
sunrpc/xprtrdma: clean up workqueue usage
sunrpc: Turn list_for_each-s into the ..._entry-s
...
Fix up trivial conflicts (two different deprecation notices added in
separate branches) in Documentation/feature-removal-schedule.txt
smbfs has been scheduled for removal in 2.6.27, so
maybe we can now move it to drivers/staging on the
way out.
smbfs still uses the big kernel lock and nobody
is going to fix that, so we should be getting
rid of it soon.
This removes the 32 bit compat mount and ioctl
handling code, which is implemented in common fs
code, and moves all smbfs related files into
drivers/staging/smbfs.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In 32-bit compatibility mode, the error handling for
compat_do_readv_writev() may free an uninitialized pointer, potentially
leading to all sorts of ugly memory corruption. This is reliably
triggerable by unprivileged users by invoking the readv()/writev()
syscalls with an invalid iovec pointer. The below patch fixes this to
emulate the non-compat version.
Introduced by commit b83733639a ("compat: factor out
compat_rw_copy_check_uvector from compat_do_readv_writev")
Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Cc: stable@kernel.org (2.6.35)
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add CONFIG_NFSD_DEPRECATED, default to y.
Only include deprecated interface if this is defined.
This allows distros to remove this interface before the official
removal, and allows developers to test without it.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Mark arguments to certain system calls as being const where they should be but
aren't. The list includes:
(*) The filename arguments of various stat syscalls, execve(), various utimes
syscalls and some mount syscalls.
(*) The filename arguments of some syscall helpers relating to the above.
(*) The buffer argument of various write syscalls.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.infradead.org/users/eparis/notify: (132 commits)
fanotify: use both marks when possible
fsnotify: pass both the vfsmount mark and inode mark
fsnotify: walk the inode and vfsmount lists simultaneously
fsnotify: rework ignored mark flushing
fsnotify: remove global fsnotify groups lists
fsnotify: remove group->mask
fsnotify: remove the global masks
fsnotify: cleanup should_send_event
fanotify: use the mark in handler functions
audit: use the mark in handler functions
dnotify: use the mark in handler functions
inotify: use the mark in handler functions
fsnotify: send fsnotify_mark to groups in event handling functions
fsnotify: Exchange list heads instead of moving elements
fsnotify: srcu to protect read side of inode and vfsmount locks
fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called
fsnotify: use _rcu functions for mark list traversal
fsnotify: place marks on object in order of group memory address
vfs/fsnotify: fsnotify_close can delay the final work in fput
fsnotify: store struct file not struct path
...
Fix up trivial delete/modify conflict in fs/notify/inotify/inotify.c.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
no need for list_for_each_entry_safe()/resetting with superblock list
Fix sget() race with failing mount
vfs: don't hold s_umount over close_bdev_exclusive() call
sysv: do not mark superblock dirty on remount
sysv: do not mark superblock dirty on mount
btrfs: remove junk sb_dirt change
BFS: clean up the superblock usage
AFFS: wait for sb synchronization when needed
AFFS: clean up dirty flag usage
cifs: truncate fallout
mbcache: fix shrinker function return value
mbcache: Remove unused features
add f_flags to struct statfs(64)
pass a struct path to vfs_statfs
update VFS documentation for method changes.
All filesystems that need invalidate_inode_buffers() are doing that explicitly
convert remaining ->clear_inode() to ->evict_inode()
Make ->drop_inode() just return whether inode needs to be dropped
fs/inode.c:clear_inode() is gone
fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
...
Fix up trivial conflicts in fs/nilfs2/super.c
Using:
gcc (GCC) 4.5.0 20100610 (prerelease)
The following warnings appear:
fs/readdir.c: In function `filldir64':
fs/readdir.c:240:15: warning: `dirent' is used uninitialized in this function
fs/readdir.c: In function `filldir':
fs/readdir.c:155:15: warning: `dirent' is used uninitialized in this function
fs/compat.c: In function `compat_filldir64':
fs/compat.c:1071:11: warning: `dirent' is used uninitialized in this function
fs/compat.c: In function `compat_filldir':
fs/compat.c:984:15: warning: `dirent' is used uninitialized in this function
The warnings are related to the use of the NAME_OFFSET() macro. Luckily,
it appears as though the standard offsetof() macro is what is being
implemented by NAME_OFFSET(), thus we can fix the warning and use a more
standard code construct at the same time.
Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We'll need the path to implement the flags field for statvfs support.
We do have it available in all callers except:
- ecryptfs_statfs. This one doesn't actually need vfs_statfs but just
needs to do a caller to the lower filesystem statfs method.
- sys_ustat. Add a non-exported statfs_by_dentry helper for it which
doesn't won't be able to fill out the flags field later on.
In addition rename the helpers for statfs vs fstatfs to do_*statfs instead
of the misleading vfs prefix.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fanotify, the upcoming notification system actually needs a struct path so it can
do opens in the context of listeners, and it needs a file so it can get f_flags
from the original process. Close was the only operation that already was passing
a struct file to the notification hook. This patch passes a file for access,
modify, and open as well as they are easily available to these hooks.
Signed-off-by: Eric Paris <eparis@redhat.com>
A call to access_ok is missing a compat_ptr conversion. Introduced with
b83733639a "compat: factor out
compat_rw_copy_check_uvector from compat_do_readv_writev"
fs/compat.c: In function 'compat_rw_copy_check_uvector':
fs/compat.c:629: warning: passing argument 1 of '__access_ok' makes pointer from integer without a cast
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The aio compat code was not converting the struct iovecs from 32bit to
64bit pointers, causing either EINVAL to be returned from io_getevents, or
EFAULT as the result of the I/O. This patch passes a compat flag to
io_submit to signal that pointer conversion is necessary for a given iocb
array.
A variant of this was tested by Michael Tokarev. I have also updated the
libaio test harness to exercise this code path with good success.
Further, I grabbed a copy of ltp and ran the
testcases/kernel/syscall/readv and writev tests there (compiled with -m32
on my 64bit system). All seems happy, but extra eyes on this would be
welcome.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix CONFIG_COMPAT=n build]
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: <stable@kernel.org> [2.6.35.1]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It was reported in http://lkml.org/lkml/2010/3/8/309 that 32 bit readv and
writev AIO operations were not functioning properly. It turns out that
the code to convert the 32bit io vectors to 64 bits was never written.
The results of that can be pretty bad, but in my testing, it mostly ended
up in generating EFAULT as we walked off the list of I/O vectors provided.
This patch set fixes the problem in my environment. are greatly
appreciated.
This patch:
Factor out code that will be used by both compat_do_readv_writev and the
compat aio submission code paths.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: <stable@kernel.org> [2.6.35.1]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Originally, commit d899bf7b ("procfs: provide stack information for
threads") attempted to introduce a new feature for showing where the
threadstack was located and how many pages are being utilized by the
stack.
Commit c44972f1 ("procfs: disable per-task stack usage on NOMMU") was
applied to fix the NO_MMU case.
Commit 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on
64-bit") was applied to fix a bug in ia32 executables being loaded.
Commit 9ebd4eba7 ("procfs: fix /proc/<pid>/stat stack pointer for kernel
threads") was applied to fix a bug which had kernel threads printing a
userland stack address.
Commit 1306d603f ('proc: partially revert "procfs: provide stack
information for threads"') was then applied to revert the stack pages
being used to solve a significant performance regression.
This patch nearly undoes the effect of all these patches.
The reason for reverting these is it provides an unusable value in
field 28. For x86_64, a fork will result in the task->stack_start
value being updated to the current user top of stack and not the stack
start address. This unpredictability of the stack_start value makes
it worthless. That includes the intended use of showing how much stack
space a thread has.
Other architectures will get different values. As an example, ia64
gets 0. The do_fork() and copy_process() functions appear to treat the
stack_start and stack_size parameters as architecture specific.
I only partially reverted c44972f1 ("procfs: disable per-task stack usage
on NOMMU") . If I had completely reverted it, I would have had to change
mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is
configured. Since I could not test the builds without significant effort,
I decided to not change mm/Makefile.
I only partially reverted 89240ba0 ("x86, fs: Fix x86 procfs stack
information for threads on 64-bit") . I left the KSTK_ESP() change in
place as that seemed worthwhile.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>