Commit Graph

31 Commits

Author SHA1 Message Date
Bron Gondwana 195cf453d2 mm/page-writeback: highmem_is_dirtyable option
Add vm.highmem_is_dirtyable toggle

A 32 bit machine with HIGHMEM64 enabled running DCC has an MMAPed file of
approximately 2Gb size which contains a hash format that is written
randomly by the dbclean process.  On 2.6.16 this process took a few
minutes.  With lowmem only accounting of dirty ratios, this takes about 12
hours of 100% disk IO, all random writes.

Include a toggle in /proc/sys/vm/highmem_is_dirtyable which can be set to 1 to
add the highmem back to the total available memory count.

[akpm@linux-foundation.org: Fix the CONFIG_DETECT_SOFTLOCKUP=y build]
Signed-off-by: Bron Gondwana <brong@fastmail.fm>
Cc: Ethan Solomita <solo@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: WU Fengguang <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:18 -08:00
Leonardo Chiquitto 2e01e00e76 Documentation: missing proc/$PID/stat field
There's a missing field in the /proc/$PID/stat output documented in
Documentation/filesystems/proc.txt.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 16:17:16 +02:00
Eric Paris de6bbd1d30 [AUDIT] break large execve argument logging into smaller messages
execve arguments can be quite large.  There is no limit on the number of
arguments and a 4G limit on the size of an argument.

this patch prints those aruguments in bite sized pieces.  a userspace size
limitation of 8k was discovered so this keeps messages around 7.5k

single arguments larger than 7.5k in length are split into multiple records
and can be identified as aX[Y]=

Signed-off-by: Eric Paris <eparis@redhat.com>
2008-02-01 14:23:55 -05:00
Eric Dumazet 29e75252da [IPV4] route cache: Introduce rt_genid for smooth cache invalidation
Current ip route cache implementation is not suited to large caches.

We can consume a lot of CPU when cache must be invalidated, since we
currently need to evict all cache entries, and this eviction is
sometimes asynchronous. min_delay & max_delay can somewhat control this
asynchronism behavior, but whole thing is a kludge, regularly triggering
infamous soft lockup messages. When entries are still in use, this also
consumes a lot of ram, filling dst_garbage.list.

A better scheme is to use a generation identifier on each entry,
so that cache invalidation can be performed by changing the table
identifier, without having to scan all entries.
No more delayed flushing, no more stalling when secret_interval expires.

Invalidated entries will then be freed at GC time (controled by
ip_rt_gc_timeout or stress), or when an invalidated entry is found
in a chain when an insert is done.
Thus we keep a normal equilibrium.

This patch :
- renames rt_hash_rnd to rt_genid (and makes it an atomic_t)
- Adds a new rt_genid field to 'struct rtable' (filling a hole on 64bit)
- Checks entry->rt_genid at appropriate places :
2008-01-31 19:28:27 -08:00
Alex Tomas c9de560ded ext4: Add multi block allocator for ext4
Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-01-29 00:19:52 -05:00
Leonardo Chiquitto b68f2c3a98 proc.txt: Add /proc/stat field
This patch updates the "cat /proc/stat" output found
in Documentation/filesystems/proc.txt.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-20 03:03:38 +02:00
Joe Korty 38e760a133 x86: expand /proc/interrupts to include missing vectors, v2
Add missing IRQs and IRQ descriptions to /proc/interrupts.

/proc/interrupts is most useful when it displays every IRQ vector in use by
the system, not just those somebody thought would be interesting.

This patch inserts the following vector displays to the i386 and x86_64
platforms, as appropriate:

	rescheduling interrupts
	TLB flush interrupts
	function call interrupts
	thermal event interrupts
	threshold interrupts
	spurious interrupts

A threshold interrupt occurs when ECC memory correction is occuring at too
high a frequency.  Thresholds are used by the ECC hardware as occasional
ECC failures are part of normal operation, but long sequences of ECC
failures usually indicate a memory chip that is about to fail.

Thermal event interrupts occur when a temperature threshold has been
exceeded for some CPU chip.  IIRC, a thermal interrupt is also generated
when the temperature drops back to a normal level.

A spurious interrupt is an interrupt that was raised then lowered by the
device before it could be fully processed by the APIC.  Hence the apic sees
the interrupt but does not know what device it came from.  For this case
the APIC hardware will assume a vector of 0xff.

Rescheduling, call, and TLB flush interrupts are sent from one CPU to
another per the needs of the OS.  Typically, their statistics would be used
to discover if an interrupt flood of the given type has been occuring.

AK: merged v2 and v4 which had some more tweaks
AK: replace Local interrupts with Local timer interrupts
AK: Fixed description of interrupt types.

[ tglx: arch/x86 adaptation ]
[ mingo: small cleanup ]

Signed-off-by: Joe Korty <joe.korty@ccur.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Tim Hockin <thockin@hockin.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-17 20:16:53 +02:00
Kawai, Hidehiro bb90110dcb coredump masking: documentation for /proc/pid/coredump_filter
This patch adds the documentation for /proc/<pid>/coredump_filter.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:47 -07:00
Peter Zijlstra bdf4c48af2 audit: rework execve audit
The purpose of audit_bprm() is to log the argv array to a userspace daemon at
the end of the execve system call.  Since user-space hasn't had time to run,
this array is still in pristine state on the process' stack; so no need to
copy it, we can just grab it from there.

In order to minimize the damage to audit_log_*() copy each string into a
temporary kernel buffer first.

Currently the audit code requires that the full argument vector fits in a
single packet.  So currently it does clip the argv size to a (sysctl) limit,
but only when execve auditing is enabled.

If the audit protocol gets extended to allow for multiple packets this check
can be removed.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ollie Wild <aaw@google.com>
Cc: <linux-audit@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:45 -07:00
Mel Gorman ed7ed36517 handle kernelcore=: generic
This patch adds the kernelcore= parameter for x86.

Once all patches are applied, a new command-line parameter exist and a new
sysctl.  This patch adds the necessary documentation.

From: Yasunori Goto <y-goto@jp.fujitsu.com>

  When "kernelcore" boot option is specified, kernel can't boot up on ia64
  because of an infinite loop.  In addition, the parsing code can be handled
  in an architecture-independent manner.

  This patch uses common code to handle the kernelcore= parameter.  It is
  only available to architectures that support arch-independent zone-sizing
  (i.e.  define CONFIG_ARCH_POPULATES_NODE_MAP).  Other architectures will
  ignore the boot parameter.

[bunk@stusta.de: make cmdline_parse_kernelcore() static]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:22:59 -07:00
Kees Cook 18d96779d9 Documentation: /proc/$pid/stat files
Documentation for the /proc/$pid/stat file.

Signed-off-by: Kees Cook <kees@outflux.net>
Cc: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:45 -07:00
Randy Dunlap 8b60756a62 Fix more "deprecated" spellos.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 07:19:14 +02:00
Kees Cook 5096add84b proc: maps protection
The /proc/pid/ "maps", "smaps", and "numa_maps" files contain sensitive
information about the memory location and usage of processes.  Issues:

- maps should not be world-readable, especially if programs expect any
  kind of ASLR protection from local attackers.
- maps cannot just be 0400 because "-D_FORTIFY_SOURCE=2 -O2" makes glibc
  check the maps when %n is in a *printf call, and a setuid(getuid())
  process wouldn't be able to read its own maps file.  (For reference
  see http://lkml.org/lkml/2006/1/22/150)
- a system-wide toggle is needed to allow prior behavior in the case of
  non-root applications that depend on access to the maps contents.

This change implements a check using "ptrace_may_attach" before allowing
access to read the maps contents.  To control this protection, the new knob
/proc/sys/kernel/maps_protect has been added, with corresponding updates to
the procfs documentation.

[akpm@linux-foundation.org: build fixes]
[akpm@linux-foundation.org: New sysctl numbers are old hat]
Signed-off-by: Kees Cook <kees@outflux.net>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:02 -07:00
David Rientjes b813e931b4 smaps: add clear_refs file to clear reference
Adds /proc/pid/clear_refs.  When any non-zero number is written to this file,
pte_mkold() and ClearPageReferenced() is called for each pte and its
corresponding page, respectively, in that task's VMAs.  This file is only
writable by the user who owns the task.

It is now possible to measure _approximately_ how much memory a task is using
by clearing the reference bits with

	echo 1 > /proc/pid/clear_refs

and checking the reference count for each VMA from the /proc/pid/smaps output
at a measured time interval.  For example, to observe the approximate change
in memory footprint for a task, write a script that clears the references
(echo 1 > /proc/pid/clear_refs), sleeps, and then greps for Pgs_Referenced and
extracts the size in kB.  Add the sizes for each VMA together for the total
referenced footprint.  Moments later, repeat the process and observe the
difference.

For example, using an efficient Mozilla:

	accumulated time		referenced memory
	----------------		-----------------
		 0 s				 408 kB
		 1 s				 408 kB
		 2 s				 556 kB
		 3 s				1028 kB
		 4 s				 872 kB
		 5 s				1956 kB
		 6 s				 416 kB
		 7 s				1560 kB
		 8 s				2336 kB
		 9 s				1044 kB
		10 s				 416 kB

This is a valuable tool to get an approximate measurement of the memory
footprint for a task.

Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
[akpm@linux-foundation.org: build fixes]
[mpm@selenic.com: rename for_each_pmd]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:52 -07:00
Stephen Hemminger a2a316fd06 [NET]: Replace CONFIG_NET_DEBUG with sysctl.
Covert network warning messages from a compile time to runtime choice.
Removes kernel config option and replaces it with new /proc/sys/net/core/warnings.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:05 -07:00
Roland Kletzing f9c99463b0 [PATCH] Documentation for io-accounting / reporting via procfs
Add some documentation for the new and very useful io-accounting feature.
It's being added to Documentation/filesystems/proc.txt

Signed-off-by: Roland Kletzing <devzero@web.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-05 07:57:54 -08:00
Matt LaPlante 4ae0edc21b Fix typos in /Documentation : 'U-Z'
This patch fixes typos in various Documentation txts. The patch addresses some
+words starting with the letters 'U-Z'.

Looks like I made it through the alphabet...just in time to start over again
+too!  Maybe I can fit more profound fixes into the next round...?  Time will
+tell. :)

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-30 04:58:40 +01:00
Matt LaPlante fa00e7e152 Fix typos in /Documentation : 'T''
This patch fixes typos in various Documentation txts. The patch addresses some
+words starting with the letter 'T'.

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-30 04:55:36 +01:00
Matt LaPlante 53cb47268e Fix typos in Documentation/: 'S'
This patch fixes typos in various Documentation txts. The patch addresses
some words starting with the letter 'S'.

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Alan Cox <alan@redhat.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:55:17 +02:00
Matt LaPlante 84eb8d0608 Fix "can not" in Documentation and Kconfig
Randy brought it to my attention that in proper english "can not" should always
be written "cannot". I donot see any reason to argue, even if I mightnot
understand why this rule exists.  This patch fixes "can not" in several
Documentation files as well as three Kconfigs.

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:53:09 +02:00
Matt LaPlante 2fe0ae78c6 Fix typos in Documentation/: 'H'-'M'
This patch fixes typos in various Documentation txts. The patch addresses
some words starting with the letters 'H'-'M'.

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:50:39 +02:00
Matt LaPlante 3f6dee9b2a Fix some typos in Documentation/: 'A'
This patch fixes typos in various Documentation txts.
This patch addresses some words starting with the letter 'A'.

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:45:33 +02:00
Jan-Frode Myklebust d7ff0dbf45 [PATCH] oom_adj/oom_score documentation
I was looking for the a way around an OOM-problem, and found a couple of
undocumented new features for tuning the OOM-score of individual processes.
 Here's a small documentation patch for /proc/<pid>/oom_adj and
/proc/<pid>/oom_score.

Signed-off-by: Jan-Frode Myklebust <mykleb@no.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:10 -07:00
Don Zickus e33e89ab1a [PATCH] x86: Add abilty to enable/disable nmi watchdog from procfs (update)
Adds a new /proc/sys/kernel/nmi_watchdog call that will enable/disable the
nmi watchdog.

By entering a non-zero value here, a user can enable the nmi watchdog to
monitor the online cpus in the system.  By entering a zero value here, a
user can disable the nmi watchdog and free up a performance counter which
could then be utilized by the oprofile subsystem, otherwise oprofile may be
short a counter when in use.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-26 10:52:27 +02:00
Uwe Zeisberger c30fe7f731 fix typos "wich" -> "which"
Signed-off-by: Uwe Zeisberger <zeisberg@informatik.uni-freiburg.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-24 18:23:14 +01:00