Adds an additional unsigned long field to struct mem_size_stats called
'referenced'. For each pte walked in the smaps code, this field is
incremented by PAGE_SIZE if it has pte-reference bits.
An additional line was added to the /proc/pid/smaps output for each VMA to
indicate how many pages within it are currently marked as referenced or
accessed.
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Extracts the pmd walker from smaps-specific code in fs/proc/task_mmu.c.
The new struct pmd_walker includes the struct vm_area_struct of the memory to
walk over. Iteration begins at the vma->vm_start and completes at
vma->vm_end. A pointer to another data structure may be stored in the private
field such as struct mem_size_stats, which acts as the smaps accumulator. For
each pmd in the VMA, the action function is called with a pointer to its
struct vm_area_struct, a pointer to the pmd_t, its start and end addresses,
and the private field.
The interface for walking pmd's in a VMA for fs/proc/task_mmu.c is now:
void for_each_pmd(struct vm_area_struct *vma,
void (*action)(struct vm_area_struct *vma,
pmd_t *pmd, unsigned long addr,
unsigned long end,
void *private),
void *private);
Since the pmd walker is now extracted from the smaps code, smaps_one_pmd() is
invoked for each pmd in the VMA. Its behavior and efficiency is identical to
the existing implementation.
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The specific case I am encountering is kdump under Xen with a 64 bit
hypervisor and 32 bit kernel/userspace. The dump created is 64 bit due to
the hypervisor but the dump kernel is 32 bit for maximum compatibility.
It's possibly less likely to be useful in a purely native scenario but I
see no reason to disallow it.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Ian Campbell <ian.campbell@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Horms <horms@verge.net.au>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We're using #ifdef CONFIG_SYSCTL, but we should be using CONFIG_PROC_SYSCTL,
so we get
fs/built-in.o: In function `proc_root_init':
/usr/src/linux/fs/proc/root.c:83: undefined reference to `proc_sys_init'
Fix that up and remove an ifdef-in-C.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Helge Hafting <helgehaf@aitel.hist.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Without attached patch against current -git I get following with
!PROC_SYSCTL (with EMBEDDED and PROC_FS set):
CC init/version.o
LD init/built-in.o
LD vmlinux
fs/built-in.o: In function `do_proc_sys_lookup':
proc_sysctl.c:(.text+0x26583): undefined reference to `sysctl_head_next'
fs/built-in.o: In function `proc_sys_revalidate':
proc_sysctl.c:(.text+0x265bb): undefined reference to `sysctl_head_finish'
fs/built-in.o: In function `proc_sys_readdir':
proc_sysctl.c:(.text+0x26720): undefined reference to `sysctl_head_next'
proc_sysctl.c:(.text+0x267d8): undefined reference to `sysctl_head_finish'
proc_sysctl.c:(.text+0x268e7): undefined reference to `sysctl_head_next'
proc_sysctl.c:(.text+0x26910): undefined reference to `sysctl_head_finish'
fs/built-in.o: In function `proc_sys_write':
proc_sysctl.c:(.text+0x2695d): undefined reference to `sysctl_perm'
proc_sysctl.c:(.text+0x2699c): undefined reference to `sysctl_head_finish'
fs/built-in.o: In function `proc_sys_read':
proc_sysctl.c:(.text+0x269e9): undefined reference to `sysctl_perm'
proc_sysctl.c:(.text+0x26a25): undefined reference to `sysctl_head_finish'
fs/built-in.o: In function `proc_sys_permission':
proc_sysctl.c:(.text+0x26ad1): undefined reference to `sysctl_perm'
proc_sysctl.c:(.text+0x26adb): undefined reference to `sysctl_head_finish'
fs/built-in.o: In function `proc_sys_lookup':
proc_sysctl.c:(.text+0x26b39): undefined reference to `sysctl_head_finish'
make: *** [vmlinux] Virhe 1
All those functions are in fs/proc/proc_sysctl.c, which has no CONFIG_
#define's in it, so the patch makes the compilation of that file to depend
on CONFIG_PROC_SYSCTL (the simplest choice).
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since the security checks are applied on each read and write of a sysctl file,
just like they are applied when calling sys_sysctl, they are redundant on the
standard VFS constructs. Since it is difficult to compute the security labels
on the standard VFS constructs we just mark the sysctl inodes in proc private
so selinux won't even bother with them.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With this change the sysctl inodes can be cached and nothing needs to be done
when removing a sysctl table.
For a cost of 2K code we will save about 4K of static tables (when we remove
de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or
about half that on a 32bit arch.
The speed feels about the same, even though we can now cache the sysctl
dentries :(
We get the core advantage that we don't need to have a 1 to 1 mapping between
ctl table entries and proc files. Making it possible to have /proc/sys vary
depending on the namespace you are in. The currently merged namespaces don't
have an issue here but the network namespace under /proc/sys/net needs to have
different directories depending on which network adapters are visible. By
simply being a cache different directories being visible depending on who you
are is trivial to implement.
[akpm@osdl.org: fix uninitialised var]
[akpm@osdl.org: fix ARM build]
[bunk@stusta.de: make things static]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
binfmt_misc has a mount point in the middle of the sysctl and that mount point
is created as a proc_generic directory.
Doing it that way gets in the way of cleaning up the sysctl proc support as it
continues the existence of a horrible hack. So instead simply create the
directory as an ordinary sysctl directory. At least that removes the magic
special case.
[akpm@osdl.org: warning fix]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch is inspired by Arjan's "Patch series to mark struct
file_operations and struct inode_operations const".
Compile tested with gcc & sparse.
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many struct inode_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Of kernel subsystems that work with pids the tty layer is probably the largest
consumer. But it has the nice virtue that the assiation with a session only
lasts until the session leader exits. Which means that no reference counting
is required. So using struct pid winds up being a simple optimization to
avoid hash table lookups.
In the long term the use of pid_nr also ensures that when we have multiple pid
spaces mixed everything will work correctly.
Signed-off-by: Eric W. Biederman <eric@maxwell.lnxi.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
They are fat: 4x8 bytes in task_struct.
They are uncoditionally updated in every fork, read, write and sendfile.
They are used only if you have some "extended acct fields feature".
And please, please, please, read(2) knows about bytes, not characters,
why it is called "rchar"?
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As reported by Andy Whitcroft, at least the SLES9 initrd build process
depends on getting the kernel version from the kernel binary. It does
that by simply trawling the binary and looking for the signature of the
"linux_banner" string (the string "Linux version " to be exact. Which
is really broken in itself, but whatever..)
That got broken when the string was changed to allow /proc/version to
change the UTS release information dynamically, and "get_kernel_version"
thus returned "%s" (see commit a2ee8649ba:
"[PATCH] Fix linux banner utsname information").
This just restores "linux_banner" as a static string, which should fix
the version finding. And /proc/version simply uses a different string.
To avoid wasting even that miniscule amount of memory, the early boot
string should really be marked __initdata, but that just causes the same
bug in SLES9 to re-appear, since it will then find other occurrences of
"Linux version " first.
Cc: Andy Whitcroft <apw@shadowen.org>
Acked-by: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Steve Fox <drfickle@us.ibm.com>
Acked-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a simple /proc/pid/io to show the IO accounting fields.
Maybe this shouldn't be merged in mainline - the preferred reporting channel
is taskstats. But given the poor state of our userspace support for
taskstats, this is useful for developer-testing, at least. And it improves
the changes that the procps developers will wire it up into top(1). Opinions
are sought.
The patch also wires up the existing IO-accounting fields.
It's a bit racy on 32-bit machines: if process A reads process B's
/proc/pid/io while process B is updating one of those 64-bit counters, process
A could see an intermediate result.
Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fs/proc/proc_misc.c: In function `version_read_proc':
fs/proc/proc_misc.c:256: warning: implicit declaration of function `utsname'
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>