Commit Graph

310704 Commits

Author SHA1 Message Date
Konstantin Khlebnikov
b1d4d9e0cb proc/smaps: carefully handle migration entries
Currently smaps reports migration entries as "swap", as result "swap" can
appears in shared mapping.

This patch converts migration entries into pages and handles them as usual.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:29 -07:00
Konstantin Khlebnikov
052fb0d635 proc: report file/anon bit in /proc/pid/pagemap
This is an implementation of Andrew's proposal to extend the pagemap file
bits to report what is missing about tasks' working set.

The problem with the working set detection is multilateral.  In the criu
(checkpoint/restore) project we dump the tasks' memory into image files
and to do it properly we need to detect which pages inside mappings are
really in use.  The mincore syscall I though could help with this did not.
 First, it doesn't report swapped pages, thus we cannot find out which
parts of anonymous mappings to dump.  Next, it does report pages from page
cache as present even if they are not mapped, and it doesn't make that has
not been cow-ed.

Note, that issue with swap pages is critical -- we must dump swap pages to
image file.  But the issues with file pages are optimization -- we can
take all file pages to image, this would be correct, but if we know that a
page is not mapped or not cow-ed, we can remove them from dump file.  The
dump would still be self-consistent, though significantly smaller in size
(up to 10 times smaller on real apps).

Andrew noticed, that the proc pagemap file solved 2 of 3 above issues --
it reports whether a page is present or swapped and it doesn't report not
mapped page cache pages.  But, it doesn't distinguish cow-ed file pages
from not cow-ed.

I would like to make the last unused bit in this file to report whether the
page mapped into respective pte is PageAnon or not.

[comment stolen from Pavel Emelyanov's v1 patch]

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:29 -07:00
Jan Engelhardt
715be1fce0 procfs: use more apprioriate types when dumping /proc/N/stat
- use int fpr priority and nice, since task_nice()/task_prio() return that

- field 24: get_mm_rss() returns unsigned long

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:29 -07:00
Alexey Dobriyan
af5e617143 proc: pass "fd" by value in /proc/*/{fd,fdinfo} code
Pass "fd" directly, not via pointer -- one less memory read.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:29 -07:00
Alexey Dobriyan
f05ed3f1ab proc: don't do dummy rcu_read_lock/rcu_read_unlock on error path
rcu_read_lock()/rcu_read_unlock() is nop for TINY_RCU, but is not a nop
for, say, PREEMPT_RCU.

proc_fill_cache() is called without RCU lock, there is no need to
lock/unlock on error path, simply jump out of the loop.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:29 -07:00
Cong Wang
2344bec788 proc: use mm_access() instead of ptrace_may_access()
mm_access() handles this much better, and avoids some race conditions.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:29 -07:00
Cong Wang
e7dcd9990e proc: remove mm_for_maps()
mm_for_maps() is a simple wrapper for mm_access(), and the name is
misleading, so just remove it and use mm_access() directly.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:28 -07:00
Cong Wang
b409e578d9 proc: clean up /proc/<pid>/environ handling
Similar to e268337dfe ("proc: clean up and fix /proc/<pid>/mem
handling"), move the check of permission to open(), this will simplify
read() code.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:28 -07:00
Tim Bird
168eeccbc9 stack usage: add pid to warning printk in check_stack_usage
In embedded systems, sometimes the same program (busybox) is the cause of
multiple warnings.  Outputting the pid with the program name in the
warning printk helps distinguish which instances of a program are using
the stack most.

This is a small patch, but useful.

Signed-off-by: Tim Bird <tim.bird@am.sony.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:28 -07:00
Oleg Nesterov
43e13cc107 cred: remove task_is_dead() from __task_cred() validation
Commit 8f92054e7c ("CRED: Fix __task_cred()'s lockdep check and banner
comment"):

    add the following validation condition:

        task->exit_state >= 0

    to permit the access if the target task is dead and therefore
    unable to change its own credentials.

OK, but afaics currently this can only help wait_task_zombie() which calls
__task_cred() without rcu lock.

Remove this validation and change wait_task_zombie() to use task_uid()
instead.  This means we do rcu_read_lock() only to shut up the lockdep,
but we already do the same in, say, wait_task_stopped().

task_is_dead() should die, task->exit_state != 0 means that this task has
passed exit_notify(), only do_wait-like code paths should use this.

Unfortunately, we can't kill task_is_dead() right now, it has already
acquired buggy users in drivers/staging.  The fix already exists.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:28 -07:00
Randy Dunlap
9b3c98cd66 kmod.c: fix kernel-doc warning
Warning(kernel/kmod.c:419): No description found for parameter 'depth'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:28 -07:00
Boaz Harrosh
785042f2e2 kmod: move call_usermodehelper_fns() to .c file and unexport all it's helpers
If we move call_usermodehelper_fns() to kmod.c file and EXPORT_SYMBOL it
we can avoid exporting all it's helper functions:
	call_usermodehelper_setup
	call_usermodehelper_setfns
	call_usermodehelper_exec
And make all of them static to kmod.c

Since the optimizer will see all these as a single call site it will
inline them inside call_usermodehelper_fns().  So we loose the call to
_fns but gain 3 calls to the helpers.  (Not that it matters)

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:28 -07:00
Boaz Harrosh
81ab6e7b26 kmod: convert two call sites to call_usermodehelper_fns()
Both kernel/sys.c && security/keys/request_key.c where inlining the exact
same code as call_usermodehelper_fns(); So simply convert these sites to
directly use call_usermodehelper_fns().

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:28 -07:00
Boaz Harrosh
ae3cef7300 kmod: unexport call_usermodehelper_freeinfo()
call_usermodehelper_freeinfo() is not used outside of kmod.c.  So unexport
it, and make it static to kmod.c

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:28 -07:00
Namjae Jeon
f0aac6162e fat: use fat_msg_ratelimit() in fat__get_entry()
If an application tries to lookup (opendir/readdir/stat) 5000 files on a
fatfs USB device and the device is unplugged, many message occur, shown
below.  This makes the application slow.  So use the new
fat_msg_ratelimit() decrease the messaging rate.

  #> ./file_lookup_testcase ./files_directory/
  usb 2-1.4: USB disconnect, device number 4
  FAT-fs (sda1): FAT read failed (blocknr 2631)
  FAT-fs (sda1): Directory bread(block 396816) failed
  FAT-fs (sda1): Directory bread(block 396817) failed
  FAT-fs (sda1): Directory bread(block 396818) failed
  FAT-fs (sda1): Directory bread(block 396819) failed
  FAT-fs (sda1): Directory bread(block 396820) failed
  FAT-fs (sda1): Directory bread(block 396821) failed
  FAT-fs (sda1): Directory bread(block 396822) failed
  FAT-fs (sda1): Directory bread(block 396823) failed
  FAT-fs (sda1): Directory bread(block 406824) failed
  FAT-fs (sda1): Directory bread(block 406825) failed
  FAT-fs (sda1): Directory bread(block 406826) failed
  FAT-fs (sda1): Directory bread(block 406827) failed
  FAT-fs (sda1): Directory bread(block 406828) failed
  FAT-fs (sda1): Directory bread(block 406829) failed
  FAT-fs (sda1): Directory bread(block 406830) failed
  FAT-fs (sda1): Directory bread(block 406831) failed
  FAT-fs (sda1): Directory bread(block 417696) failed
  FAT-fs (sda1): Directory bread(block 417697) failed
  FAT-fs (sda1): Directory bread(block 417698) failed
  FAT-fs (sda1): Directory bread(block 417699) failed
  FAT-fs (sda1): Directory bread(block 417700) failed
  FAT-fs (sda1): Directory bread(block 417701) failed
  FAT-fs (sda1): Directory bread(block 417702) failed
  FAT-fs (sda1): Directory bread(block 417703) failed
  FAT-fs (sda1): FAT read failed (blocknr 2631)
  FAT-fs (sda1): Directory bread(block 396816) failed
  FAT-fs (sda1): Directory bread(block 396817) failed
  FAT-fs (sda1): Directory bread(block 396818) failed
  FAT-fs (sda1): Directory bread(block 396819) failed
  FAT-fs (sda1): Directory bread(block 396820) failed
  FAT-fs (sda1): Directory bread(block 396821) failed

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Amit Sahrawat <amit.sahrawat83@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:28 -07:00
Namjae Jeon
b742c34153 fat: add fat_msg_ratelimit()
Add a fat_msg_ratelimit() to limit the message generation rate.

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Amit Sahrawat <amit.sahrawat83@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:28 -07:00
Artem Bityutskiy
78491189dd fat: switch to fsinfo_inode
Currently FAT file-system maps the VFS "superblock" abstraction to the
FSINFO block.  The FSINFO block contains non-essential data about the
amount of free clusters and the next free cluster.  FAT file-system can
always find out this information by scanning the FAT table, but having it
in the FSINFO block may speed things up sometimes.  So FAT file-system
relies on the VFS superblock write-out services to make sure the FSINFO
block is written out to the media from time to time.

The whole "superblock write-out" VFS infrastructure is served by the
'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds
and writes out all dirty superblock using the '->write_super()' call-back.
 But the problem with this thread is that it wastes power by waking up the
system every 5 seconds no matter what.  So we want to kill it completely
and thus, we need to make file-systems to stop using the '->write_super'
VFS service, and then remove it together with the kernel thread.

This patch switches the FAT FSINFO block management from
'->write_super()'/'->s_dirt' to 'fsinfo_inode'/'->write_inode'.  Now,
instead of setting the 's_dirt' flag, we just mark the special
'fsinfo_inode' inode as dirty and let VFS invoke the '->write_inode'
call-back when needed, where we write-out the FSINFO block.

This patch also makes sure we do not mark the 'fsinfo_inode' inode as
dirty if we are not FAT32 (FAT16 and FAT12 do not have the FSINFO block)
or if we are in R/O mode.

As a bonus, we can also remove the '->sync_fs()' and '->write_super()' FAT
call-back function because they become unneeded.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:28 -07:00
Artem Bityutskiy
330fe3c4c6 fat: mark superblock as dirty less often
Preparation for further changes.  It touches few functions in fatent.c and
prevents them from marking the superblock as dirty unnecessarily often.
Namely, instead of marking it as dirty in the internal tight loops - do it
only once at the end of the functions.  And instead of marking it as dirty
while holding the FAT table lock, do it outside the lock.

The reason for this patch is that marking the superblock as dirty will
soon become a little bit heavier operation, so it is cleaner to do this
only when it is necessary.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:28 -07:00
Artem Bityutskiy
90b436657e fat: introduce mark_fsinfo_dirty helper
A preparation patch which introduces a 'mark_fsinfo_dirty()' helper
function which just sets the 's_dirt' flag to 1 so far.  I'll add more
code to this helper later, so I do not mark it as 'inline'.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:27 -07:00
Artem Bityutskiy
020ac5b6be fat: introduce special inode for managing the FSINFO block
This is patchset makes fatfs stop using the VFS '->write_super()' method
for writing out the FSINFO block.

The final goal is to get rid of the 'sync_supers()' kernel thread.  This
kernel thread wakes up every 5 seconds (by default) and calls
'->write_super()' for all mounted file-systems.  And the bad thing is that
this is done even if all the superblocks are clean.  Moreover, some
file-systems do not even need this end they do not register the
'->write_super()' method at all (e.g., btrfs).

So 'sync_supers()' most often just generates useless wake-ups and wastes
power.  I am trying to make all file-systems independent of
'->write_super()' and plan to remove 'sync_supers()' and '->write_super'
completely once there are no more users.

The '->write_supers()' method is mostly used by baroque file-systems like
hfs, udf, etc.  Modern file-systems like btrfs and xfs do not use it.
This justifies removing this stuff from VFS completely and make every FS
self-manage own superblock.

Tested with xfstests.

This patch:

Preparation for further changes.  It introduces a special inode
('fsinfo_inode') in FAT file-system which we'll later use for managing the
FSINFO block.  Note, this there is already one special inode ('fat_inode')
which is used for managing the FAT tables.

Introduce new 'MSDOS_FSINFO_INO' constant for this special inode.  It is
safe to do because FAT file-system does not store inode numbers on the
media but generates them run-time.

I've also cleaned up the comment to existing 'MSDOS_ROOT_INO' constant,
while on it.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:27 -07:00
Dan Carpenter
7bc1bac77a HPFS: remove PRINTK() macro
The PRINTK() macro isn't really used.  Let's just remove it because it
is ugly and out of date.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:27 -07:00
Ryusuke Konishi
11475975dd nilfs2: flush disk caches in syncing
There are two cases that the cache flush is needed to avoid data loss
against unexpected hang or power failure.  One is sync file function (i.e.
 nilfs_sync_file) and another is checkpointing ioctl.

This issues a cache flush request to device for such cases if barrier
mount option is enabled, and makes sure data really is on persistent
storage on their completion.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:27 -07:00
Will Deacon
a1d494495c pipe: return -ENOIOCTLCMD instead of -EINVAL on unknown ioctl command
As described in commit 07d106d0a3 ("vfs: fix up ENOIOCTLCMD error
handling"), drivers should return -ENOIOCTLCMD if they receive an ioctl
command which they don't understand.  Doing so will result in -ENOTTY
being returned to userspace, which matches the behaviour of the compat
layer if it fails to translate an ioctl command.

This patch fixes the pipe ioctl to return -ENOIOCTLCMD instead of -EINVAL
when passed an unknown ioctl command.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Alan Cox <alan@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:27 -07:00
H Hartley Sweeten
c67e5382fb init: disable sparse checking of the mount.o source files
The init/mount.o source files produce a number of sparse warnings of the
type:

warning: incorrect type in argument 1 (different address spaces)
   expected char [noderef] <asn:1>*dev_name
   got char *name

This is due to the syscalls expecting some of the arguments to be user
pointers but they are being passed as kernel pointers.  This is harmless
but adds a lot of noise to a sparse build.

To limit the noise just disable the sparse checking in the relevant source
files, but still display a warning so that the user knows this has been
done.

Since the sparse checking has been disabled we can also remove the __user
__force casts that are scattered thru the source.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:27 -07:00
Joe Perches
243f3803cf checkpatch: suggest pr_<level> over printk(KERN_<LEVEL>
Suggest the shorter pr_<level> instead of printk(KERN_<LEVEL>.

Prefer to use pr_<level> over bare printks.
Prefer to use pr_warn over pr_warning.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:27 -07:00