Jack Lin reports that the error return from dup3() for the RLIMIT_NOFILE
case changed incorrectly after 3.6.
The culprit is commit f33ff9927f ("take rlimit check to callers of
expand_files()") which when it moved the "return -EMFILE" out to the
caller, didn't notice that the dup3() had special code to turn the
EMFILE return into EBADF.
The replace_fd() helper that got added later then inherited the bug too.
Reported-by: Jack Lin <linliangjie@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ Noted more bugs, wrote proper changelog, fixed up typos - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I have tested the attached patch to fix the dup3 regression.
Rich.
From 0944e30e12dec6544b3602626b60ff412375c78f Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Tue, 9 Oct 2012 14:42:45 +0100
Subject: [PATCH] dup3: Return an error when oldfd == newfd.
The following commit:
commit fe17f22d7f
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Tue Aug 21 11:48:11 2012 -0400
take purely descriptor-related stuff from fcntl.c to file.c
was supposed to be just code motion, but it dropped the following two
lines:
if (unlikely(oldfd == newfd))
return -EINVAL;
from the dup3 system call. dup3 is not specified by POSIX, so Linux
can do what it likes. However the POSIX proposal for dup3 [1] states
that it should return an error if oldfd == newfd.
[1] http://austingroupbugs.net/view.php?id=411
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
descriptor-related parts of daemonize, done right. As the
result we simplify the locking rules for ->files - we
hold task_lock in *all* cases when we modify ->files.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
iterates through the opened files in given descriptor table,
calling a supplied function; we stop once non-zero is returned.
Callback gets struct file *, descriptor number and const void *
argument passed to iterator. It is called with files->file_lock
held, so it is not allowed to block.
tty_io, netprio_cgroup and selinux flush_unauthorized_files()
converted to its use.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Similar situation to that of __alloc_fd(); do not use unless you
really have to. You should not touch any descriptor table other
than your own; it's a sure sign of a really bad API design.
As with __alloc_fd(), you *must* use a first-class reference to
struct files_struct; something obtained by get_files_struct(some task)
(let alone direct task->files) will not do. It must be either
current->files, or obtained by get_files_struct(current) by the
owner of that sucker and given to you.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
At that point nobody can see us anyway; everything that
looks at files_fdtable(files) is separated from the
guts of put_files_struct(files) - either since files is
current->files or because we fetched it under task_lock()
and hadn't dropped that yet, or because we'd bumped
files->count while holding task_lock()...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Essentially, alloc_fd() in a files_struct we own a reference to.
Most of the time wanting to use it is a sign of lousy API
design (such as android/binder). It's *not* a general-purpose
interface; better that than open-coding its guts, but again,
playing with other process' descriptor table is a sign of bad
design.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... except for one in android, where the check is different
and already done in caller. No need to recalculate rlimit
many times in alloc_fd() either.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull x32 support for x86-64 from Ingo Molnar:
"This tree introduces the X32 binary format and execution mode for x86:
32-bit data space binaries using 64-bit instructions and 64-bit kernel
syscalls.
This allows applications whose working set fits into a 32 bits address
space to make use of 64-bit instructions while using a 32-bit address
space with shorter pointers, more compressed data structures, etc."
Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c}
* 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
x32: Fix alignment fail in struct compat_siginfo
x32: Fix stupid ia32/x32 inversion in the siginfo format
x32: Add ptrace for x32
x32: Switch to a 64-bit clock_t
x32: Provide separate is_ia32_task() and is_x32_task() predicates
x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls
x86/x32: Fix the binutils auto-detect
x32: Warn and disable rather than error if binutils too old
x32: Only clear TIF_X32 flag once
x32: Make sure TS_COMPAT is cleared for x32 tasks
fs: Remove missed ->fds_bits from cessation use of fd_set structs internally
fs: Fix close_on_exec pointer in alloc_fdtable
x32: Drop non-__vdso weak symbols from the x32 VDSO
x32: Fix coding style violations in the x32 VDSO code
x32: Add x32 VDSO support
x32: Allow x32 to be configured
x32: If configured, add x32 system calls to system call tables
x32: Handle process creation
x32: Signal-related system calls
x86: Add #ifdef CONFIG_COMPAT to <asm/sys_ia32.h>
...
For files only using THIS_MODULE and/or EXPORT_SYMBOL, map
them onto including export.h -- or if the file isn't even
using those, then just delete the include. Fix up any implicit
include dependencies that were being masked by module.h along
the way.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Replace the fd_sets in struct fdtable with an array of unsigned longs and then
use the standard non-atomic bit operations rather than the FD_* macros.
This:
(1) Removes the abuses of struct fd_set:
(a) Since we don't want to allocate a full fd_set the vast majority of the
time, we actually, in effect, just allocate a just-big-enough array of
unsigned longs and cast it to an fd_set type - so why bother with the
fd_set at all?
(b) Some places outside of the core fdtable handling code (such as
SELinux) want to look inside the array of unsigned longs hidden inside
the fd_set struct for more efficient iteration over the entire set.
(2) Eliminates the use of FD_*() macros in the kernel completely.
(3) Permits the __FD_*() macros to be deleted entirely where not exposed to
userspace.
Signed-off-by: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/20120216174954.23314.48147.stgit@warthog.procyon.org.uk
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Wrap accesses to the fd_sets in struct fdtable (for recording open files and
close-on-exec flags) so that we can move away from using fd_sets since we
abuse the fd_set structs by not allocating the full-sized structure under
normal circumstances and by non-core code looking at the internals of the
fd_sets.
The first abuse means that use of FD_ZERO() on these fd_sets is not permitted,
since that cannot be told about their abnormal lengths.
This introduces six wrapper functions for setting, clearing and testing
close-on-exec flags and fd-is-open flags:
void __set_close_on_exec(int fd, struct fdtable *fdt);
void __clear_close_on_exec(int fd, struct fdtable *fdt);
bool close_on_exec(int fd, const struct fdtable *fdt);
void __set_open_fd(int fd, struct fdtable *fdt);
void __clear_open_fd(int fd, struct fdtable *fdt);
bool fd_is_open(int fd, const struct fdtable *fdt);
Note that I've prepended '__' to the names of the set/clear functions because
they require the caller to hold a lock to use them.
Note also that I haven't added wrappers for looking behind the scenes at the
the array. Possibly that should exist too.
Signed-off-by: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/20120216174942.23314.1364.stgit@warthog.procyon.org.uk
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>