Commit Graph

17041 Commits

Author SHA1 Message Date
Eric W. Biederman f8d4f618fe sysfs: Serialize updates to the vfs inode
The vfs depends upon filesystem methods to update the
vfs inode.   Sysfs adds to the normal number of places
where the vfs inode is updated by also updatng the
vfs inode in sysfs_refresh_inode.

Typically the inode mutex is used to serialize updates
to the vfs inode, but grabbing the inode mutex in
sysfs_permission and sysfs_getattr causes deadlocks,
because sometimes the vfs calls those operations with
the inode mutex held.  Therefore sysfs  can not use the
inode mutex to serial updates to the vfs inode.

The sysfs_mutex is acquired in all of the routines
where sysfs updates the vfs inode, and with a small
change we can consistently protext sysfs vfs inode
updates with the sysfs_mutex. To protect the sysfs
vfs inode updates with the sysfs_mutex simply requires
extending the scope of sysfs_mutex in sysfs_setattr
over inode_setattr, and over inode_change_ok (so we
have an unchanging inode when we perform the check).

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:52 -08:00
Eric W. Biederman 6992f53349 sysfs: Use one lockdep class per sysfs attribute.
Acknowledge that the logical sysfs rwsem has one instance per
sysfs attribute with different locking depencencies for different
attributes.

There is a sysfs idiom where writing to one sysfs file causes the
addition or removal of other sysfs files.   Lumping all of the
sysfs attributes together in one lock class causes lockdep to
generate lots of false positives.

This introduces the requirement that non-static sysfs attributes
need to be initialized with sysfs_attr_init or sysfs_bin_attr_init.
Strictly speaking this requirement only exists when lockdep is
enabled, and when lockdep is enabled we get a bit fat warning
if this requirement is not met.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:51 -08:00
Eric W. Biederman a2db684287 sysfs: Only take active references on attributes.
If we exclude directories and symlinks from the set of sysfs
dirents where we need active references we are left with
sysfs attributes (binary or not).

- Tweak sysfs_deactivate to only do something on attributes
- Move lockdep initialization into sysfs_file_add_mode to
  limit it to just attributes.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:51 -08:00
Eric W. Biederman e72ceb8cca sysfs: Remove sysfs_get/put_active_two
It turns out that holding an active reference on a directory is
pointless.  The purpose of the active references are to allows us to
block when removing sysfs entries that have custom methods so we don't
remove modules while running modular code and to keep those custom
methods from accessing data structures after the files have been
removed.  Further sysfs_remove_dir remove all elements in the
directory before removing the directory itself, so there is no chance
we will remove a directory with active children.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:51 -08:00
Emese Revfy 52cf25d0ab Driver core: Constify struct sysfs_ops in struct kobj_type
Constify struct sysfs_ops.

This is part of the ops structure constification
effort started by Arjan van de Ven et al.

Benefits of this constification:

 * prevents modification of data that is shared
   (referenced) by many other structure instances
   at runtime

 * detects/prevents accidental (but not intentional)
   modification attempts on archs that enforce
   read-only kernel data at runtime

 * potentially better optimized code as the compiler
   can assume that the const data cannot be changed

 * the compiler/linker move const data into .rodata
   and therefore exclude them from false sharing

Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:49 -08:00
Emese Revfy 9cd43611cc kobject: Constify struct kset_uevent_ops
Constify struct kset_uevent_ops.

This is part of the ops structure constification
effort started by Arjan van de Ven et al.

Benefits of this constification:

 * prevents modification of data that is shared
   (referenced) by many other structure instances
   at runtime

 * detects/prevents accidental (but not intentional)
   modification attempts on archs that enforce
   read-only kernel data at runtime

 * potentially better optimized code as the compiler
   can assume that the const data cannot be changed

 * the compiler/linker move const data into .rodata
   and therefore exclude them from false sharing

Signed-off-by: Emese Revfy <re.emese@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:49 -08:00
Eric W. Biederman 1e5289c97b sysfs: Cache the last sysfs_dirent to improve readdir scalability v2
When sysfs_readdir stops short we now cache the next
sysfs_dirent to return to user space in filp->private_data.
There is no impact on the rest of sysfs by doing this and
in the common case it allows us to pick up exactly where
we left off with no seeking.

Additionally I drop and regrab the sysfs_mutex around
filldir to avoid a page fault abritrarily increasing the
hold time on the sysfs_mutex.

v2: Returned to using INT_MAX as the EOF condition.
    seekdir is ambiguous unless all directory entries have
    a unique f_pos value.

Fixes http://bugzilla.kernel.org/show_bug.cgi?id=14949

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:48 -08:00
Andi Kleen 1c205ae18d sysfs: Add sysfs_add/remove_files utility functions
Adding/Removing a whole array of attributes is very common. Add a standard
utility function to do this with a simple function call, instead of
requiring drivers to open code this.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:47 -08:00
Randy Dunlap 138860b953 seq_file: fix new kernel-doc warnings
Fix kernel-doc notation in new seq-file functions and
correct spelling.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-07 15:48:26 -08:00
Linus Torvalds b8fa05719b Revert "lib: build list_sort() only if needed"
This reverts commit a069c266ae.

It turns ou that not only was it missing a case (XFS) that needed it,
but perhaps more importantly, people sometimes want to enable new
modules that they hadn't had enabled before, and if such a module uses
list_sort(), it can't easily be inserted any more.

So rather than add a "select LIST_SORT" to the XFS case, just leave it
compiled in.  It's not all _that_ big, after all, and the inconvenience
isn't worth it.

Requested-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Don Mullis <don.mullis@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-07 09:54:44 -08:00
Linus Torvalds 66b89159c2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs
* git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs:
  [LogFS] Change magic number
  [LogFS] Remove h_version field
  [LogFS] Check feature flags
  [LogFS] Only write journal if dirty
  [LogFS] Fix bdev erases
  [LogFS] Silence gcc
  [LogFS] Prevent 64bit divisions in hash_index
  [LogFS] Plug memory leak on error paths
  [LogFS] Add MAINTAINERS entry
  [LogFS] add new flash file system

Fixed up trivial conflict in lib/Kconfig, and a semantic conflict in
fs/logfs/inode.c introduced by write_inode() being changed to use
writeback_control' by commit a9185b41a4
("pass writeback_control to ->write_inode")
2010-03-06 13:18:03 -08:00
Linus Torvalds 66ce3cf84d Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (21 commits)
  xfs: return inode fork offset in bulkstat for fsr
  xfs: Increase the default size of the reserved blocks pool
  xfs: truncate delalloc extents when IO fails in writeback
  xfs: check for more work before sleeping in xfssyncd
  xfs: Fix a build warning in xfs_aops.c
  xfs: fix locking for inode cache radix tree tag updates
  xfs: remove xfs_ipin/xfs_iunpin
  xfs: cleanup xfs_iunpin_wait/xfs_iunpin_nowait
  xfs: kill xfs_lrw.h
  xfs: factor common xfs_trans_bjoin code
  xfs: stop passing opaque handles to xfs_log.c routines
  xfs: split xfs_bmap_btalloc
  xfs: fix xfs_fsblock_t tracing
  xfs: fix inode pincount check in fsync
  xfs: Non-blocking inode locking in IO completion
  xfs: implement optimized fdatasync
  xfs: remove wrapper for the fsync file operation
  xfs: remove wrappers for read/write file operations
  xfs: merge xfs_lrw.c into xfs_file.c
  xfs: fix dquota trace format
  ...
2010-03-06 11:32:21 -08:00
Linus Torvalds 05c5cb31ec Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: (22 commits)
  nfsd4: fix minor memory leak
  svcrpc: treat uid's as unsigned
  nfsd: ensure sockets are closed on error
  Revert "sunrpc: move the close processing after do recvfrom method"
  Revert "sunrpc: fix peername failed on closed listener"
  sunrpc: remove unnecessary svc_xprt_put
  NFSD: NFSv4 callback client should use RPC_TASK_SOFTCONN
  xfs_export_operations.commit_metadata
  commit_metadata export operation replacing nfsd_sync_dir
  lockd: don't clear sm_monitored on nsm_reboot_lookup
  lockd: release reference to nsm_handle in nlm_host_rebooted
  nfsd: Use vfs_fsync_range() in nfsd_commit
  NFSD: Create PF_INET6 listener in write_ports
  SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found"
  SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt()
  NFSD: Support AF_INET6 in svc_addsock() function
  SUNRPC: Use rpc_pton() in ip_map_parse()
  nfsd: 4.1 has an rfc number
  nfsd41: Create the recovery entry for the NFSv4.1 client
  nfsd: use vfs_fsync for non-directories
  ...
2010-03-06 11:31:38 -08:00
Neil Horman 76595f79d7 coredump: suppress uid comparison test if core output files are pipes
Modify uid check in do_coredump so as to not apply it in the case of
pipes.

This just got noticed in testing.  The end of do_coredump validates the
uid of the inode for the created file against the uid of the crashing
process to ensure that no one can pre-create a core file with different
ownership and grab the information contained in the core when they
shouldn' tbe able to.  This causes failures when using pipes for a core
dumps if the crashing process is not root, which is the uid of the pipe
when it is created.

The fix is simple.  Since the check for matching uid's isn't relevant for
pipes (a process can't create a pipe that the uermodehelper code will open
anyway), we can just just skip it in the event ispipe is non-zero

Reverts a pipe-affecting change which was accidentally made in

: commit c46f739dd3
: Author:     Ingo Molnar <mingo@elte.hu>
: AuthorDate: Wed Nov 28 13:59:18 2007 +0100
: Commit:     Linus Torvalds <torvalds@woody.linux-foundation.org>
: CommitDate: Wed Nov 28 10:58:01 2007 -0800
:
:     vfs: coredumping fix

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:46 -08:00
Oleg Nesterov 5c99cbf49a coredump: set ->group_exit_code for other CLONE_VM tasks too
User visible change.

do_coredump() kills all threads which share the same ->mm but only the
coredumping process gets the proper exit_code.  Other tasks which share
the same ->mm die "silently" and return status == 0 to parent.

This is historical behaviour, not actually a bug.  But I think Frank
Heckenbach rightly dislikes the current behaviour.  Simple test-case:

	#include <stdio.h>
	#include <unistd.h>
	#include <signal.h>
	#include <sys/wait.h>

	int main(void)
	{
		int stat;

		if (!fork()) {
			if (!vfork())
				kill(getpid(), SIGQUIT);
		}

		wait(&stat);
		printf("stat=%x\n", stat);
		return 0;
	}

Before this patch it prints "stat=0" despite the fact the child was killed
by SIGQUIT.  After this patch the output is "stat=3" which obviously makes
more sense.

Even with this patch, only the task which originates the coredumping gets
"|= 0x80" if the core was actually dumped, but at least the coredumping
signal is visible to do_wait/etc.

Reported-by: Frank Heckenbach <f.heckenbach@fh-soft.de>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:46 -08:00
Masami Hiramatsu 30736a4d43 coredump: pass mm->flags as a coredump parameter for consistency
Pass mm->flags as a coredump parameter for consistency.

 ---
1787         if (mm->core_state || !get_dumpable(mm)) {  <- (1)
1788                 up_write(&mm->mmap_sem);
1789                 put_cred(cred);
1790                 goto fail;
1791         }
1792
[...]
1798         if (get_dumpable(mm) == 2) {    /* Setuid core dump mode */ <-(2)
1799                 flag = O_EXCL;          /* Stop rewrite attacks */
1800                 cred->fsuid = 0;        /* Dump root private */
1801         }
 ---

Since dumpable bits are not protected by lock, there is a chance to change
these bits between (1) and (2).

To solve this issue, this patch copies mm->flags to
coredump_params.mm_flags at the beginning of do_coredump() and uses it
instead of get_dumpable() while dumping core.

This copy is also passed to binfmt->core_dump, since elf*_core_dump() uses
dump_filter bits in mm->flags.

[akpm@linux-foundation.org: fix merge]
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:46 -08:00
Daisuke HATAYAMA 8d9032bbe4 elf coredump: add extended numbering support
The current ELF dumper implementation can produce broken corefiles if
program headers exceed 65535.  This number is determined by the number of
vmas which the process have.  In particular, some extreme programs may use
more than 65535 vmas.  (If you google max_map_count, you can find some
users facing this problem.) This kind of program never be able to generate
correct coredumps.

This patch implements ``extended numbering'' that uses sh_info field of
the first section header instead of e_phnum field in order to represent
upto 4294967295 vmas.

This is supported by
AMD64-ABI(http://www.x86-64.org/documentation.html) and
Solaris(http://docs.sun.com/app/docs/doc/817-1984/).
Of course, we are preparing patches for gdb and binutils.

Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:46 -08:00
Daisuke HATAYAMA 93eb211e6c elf coredump: make offset calculation process and writing process explicit
By the next patch, elf_core_dump() and elf_fdpic_core_dump() will support
extended numbering and so will produce the corefiles with section header
table in a special case.

The problem is the process of writing a file header offset of the section
header table into e_shoff field of the ELF header.  ELF header is
positioned at the beginning of the corefile, while section header at the
end.  So, we need to take which of the following ways:

 1. Seek backward to retry writing operation for ELF header
    after writing process for a whole part

 2. Make offset calculation process and writing process
    totally sequential

The clause 1.  is not always possible: one cannot assume that file system
supports seek function.  Consider the no_llseek case.

Therefore, this patch adopts the clause 2.

Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:46 -08:00
Daisuke HATAYAMA 1fcccbac89 elf coredump: replace ELF_CORE_EXTRA_* macros by functions
elf_core_dump() and elf_fdpic_core_dump() use #ifdef and the corresponding
macro for hiding _multiline_ logics in functions.  This patch removes
#ifdef and replaces ELF_CORE_EXTRA_* by corresponding functions.  For
architectures not implemeonting ELF_CORE_EXTRA_*, we use weak functions in
order to reduce a range of modification.

This cleanup is for my next patches, but I think this cleanup itself is
worth doing regardless of my firnal purpose.

Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:45 -08:00
Daisuke HATAYAMA 088e7af73a coredump: move dump_write() and dump_seek() into a header file
My next patch will replace ELF_CORE_EXTRA_* macros by functions, putting
them into other newly created *.c files.  Then, each files will contain
dump_write(), where each pair of binfmt_*.c and elfcore.c should be the
same.  So, this patch moves them into a header file with dump_seek().
Also, the patch deletes confusing DUMP_WRITE macros in each files.

Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:45 -08:00
Daisuke HATAYAMA 05f47fda9f coredump: unify dump_seek() implementations for each binfmt_*.c
The current ELF dumper can produce broken corefiles if program headers
exceed 65535.  In particular, the program in 64-bit environment often
demands more than 65535 mmaps.  If you google max_map_count, then you can
find many users facing this problem.

Solaris has already dealt with this issue, and other OSes have also
adopted the same method as in Solaris.  Currently, Sun's document and AMD
64 ABI include the description for the extension, where they call the
extension Extended Numbering.  See Reference for further information.

I believe that linux kernel should adopt the same way as they did, so I've
written this patch.

I am also preparing for patches of GDB and binutils.

How to fix
==========

In new dumping process, there are two cases according to weather or
not the number of program headers is equal to or more than 65535.

 - if less than 65535, the produced corefile format is exactly the same
   as the ordinary one.

 - if equal to or more than 65535, then e_phnum field is set to newly
   introduced constant PN_XNUM(0xffff) and the actual number of program
   headers is set to sh_info field of the section header at index 0.

Compatibility Concern
=====================

 * As already mentioned in Summary, Sun and AMD64 has already adopted
   this.  See Reference.

 * There are four combinations according to whether kernel and userland
   tools are respectively modified or not.  The next table summarizes
   shortly for each combination.

                  ---------------------------------------------
                     Original Kernel    |   Modified Kernel
                  ---------------------------------------------
    	            < 65535  | >= 65535 | < 65535  | >= 65535
  -------------------------------------------------------------
   Original Tools |    OK    |  broken  |   OK     | broken (#)
  -------------------------------------------------------------
   Modified Tools |    OK    |  broken  |   OK     |    OK
  -------------------------------------------------------------

  Note that there is no case that `OK' changes to `broken'.

  (#) Although this case remains broken, O-M behaves better than
  O-O. That is, while in O-O case e_phnum field would be extremely
  small due to integer overflow, in O-M case it is guaranteed to be at
  least 65535 by being set to PN_XNUM(0xFFFF), much closer to the
  actual correct value than the O-O case.

Test Program
============

Here is a test program mkmmaps.c that is useful to produce the
corefile with many mmaps. To use this, please take the following
steps:

$ ulimit -c unlimited
$ sysctl vm.max_map_count=70000 # default 65530 is too small
$ sysctl fs.file-max=70000
$ mkmmaps 65535

Then, the program will abort and a corefile will be generated.

If failed, there are two cases according to the error message
displayed.

 * ``out of memory'' means vm.max_map_count is still smaller

 * ``too many open files'' means fs.file-max is still smaller

So, please change it to a larger value, and then retry it.

mkmmaps.c
==
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
int main(int argc, char **argv)
{
	int maps_num;
	if (argc < 2) {
		fprintf(stderr, "mkmmaps [number of maps to be created]\n");
		exit(1);
	}
	if (sscanf(argv[1], "%d", &maps_num) == EOF) {
		perror("sscanf");
		exit(2);
	}
	if (maps_num < 0) {
		fprintf(stderr, "%d is invalid\n", maps_num);
		exit(3);
	}
	for (; maps_num > 0; --maps_num) {
		if (MAP_FAILED == mmap((void *)NULL, (size_t) 1, PROT_READ,
					MAP_SHARED | MAP_ANONYMOUS, (int) -1,
					(off_t) NULL)) {
			perror("mmap");
			exit(4);
		}
	}
	abort();
	{
		char buffer[128];
		sprintf(buffer, "wc -l /proc/%u/maps", getpid());
		system(buffer);
	}
	return 0;
}

Tested on i386, ia64 and um/sys-i386.
Built on sh4 (which covers fs/binfmt_elf_fdpic.c)

References
==========

 - Sun microsystems: Linker and Libraries.
   Part No: 817-1984-17, September 2008.
   URL: http://docs.sun.com/app/docs/doc/817-1984

 - System V ABI AMD64 Architecture Processor Supplement
   Draft Version 0.99., May 11, 2009.
   URL: http://www.x86-64.org/

This patch:

There are three different definitions for dump_seek() functions in
binfmt_aout.c, binfmt_elf.c and binfmt_elf_fdpic.c, respectively.  The
only for binfmt_elf.c.

My next patch will move dump_seek() into a header file in order to share
the same implementations for dump_write() and dump_seek().  As the first
step, this patch unify these three definitions for dump_seek() by applying
the past commits that have been applied only for binfmt_elf.c.

Specifically, the modification made here is part of the following commits:

  * d025c9db7f
  * 7f14daa19e

This patch does not change a shape of corefiles.

Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:45 -08:00
Alexey Dobriyan 12bac0d9f4 proc: warn on non-existing proc entries
* warn if creation goes on to non-existent directory
* warn if removal goes on from non-existing directory
* warn if non-existing proc entry is removed

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:45 -08:00
Alexey Dobriyan e17a5765f2 proc: do translation + unlink atomically at remove_proc_entry()
remove_proc_entry() does

	lock
	lookup parent
	unlock
	lock
	unlink proc entry from lists
	unlock

which can be made bit more correct by doing parent translation + unlink
without dropping lock.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:45 -08:00
Andrew Morton 45bf5cd7be fs/compat_ioctl.c: suppress two warnings
fs/compat_ioctl.c: In function 'do_ioctl_trans':
fs/compat_ioctl.c:534: warning: 'karg' may be used uninitialized in this function
fs/compat_ioctl.c:533: warning: 'kcmd' may be used uninitialized in this function
fs/compat_ioctl.c:656: warning: 'ret' may be used uninitialized in this function

Reduces text size by 44 bytes.

If someone calls one of these functions with an unexpected argument, the
code's buggy as-is.

Amerigo Wang <amwang@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:35 -08:00
Don Mullis a069c266ae lib: build list_sort() only if needed
Build list_sort() only for configs that need it -- those that don't save
~581 bytes (i386).

Signed-off-by: Don Mullis <don.mullis@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:35 -08:00