Commit Graph

184 Commits

Author SHA1 Message Date
Christoph Lameter f8891e5e1f [PATCH] Light weight event counters
The remaining counters in page_state after the zoned VM counter patches
have been applied are all just for show in /proc/vmstat.  They have no
essential function for the VM.

We use a simple increment of per cpu variables.  In order to avoid the most
severe races we disable preempt.  Preempt does not prevent the race between
an increment and an interrupt handler incrementing the same statistics
counter.  However, that race is exceedingly rare, we may only loose one
increment or so and there is no requirement (at least not in kernel) that
the vm event counters have to be accurate.

In the non preempt case this results in a simple increment for each
counter.  For many architectures this will be reduced by the compiler to a
single instruction.  This single instruction is atomic for i386 and x86_64.
 And therefore even the rare race condition in an interrupt is avoided for
both architectures in most cases.

The patchset also adds an off switch for embedded systems that allows a
building of linux kernels without these counters.

The implementation of these counters is through inline code that hopefully
results in only a single instruction increment instruction being emitted
(i386, x86_64) or in the increment being hidden though instruction
concurrency (EPIC architectures such as ia64 can get that done).

Benefits:
- VM event counter operations usually reduce to a single inline instruction
  on i386 and x86_64.
- No interrupt disable, only preempt disable for the preempt case.
  Preempt disable can also be avoided by moving the counter into a spinlock.
- Handling is similar to zoned VM counters.
- Simple and easily extendable.
- Can be omitted to reduce memory use for embedded use.

References:

RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2
RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2
local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2
V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2
V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2
V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:36 -07:00
Christoph Lameter 347ce434d5 [PATCH] zoned vm counters: conversion of nr_pagecache to per zone counter
Currently a single atomic variable is used to establish the size of the page
cache in the whole machine.  The zoned VM counters have the same method of
implementation as the nr_pagecache code but also allow the determination of
the pagecache size per zone.

Remove the special implementation for nr_pagecache and make it a zoned counter
named NR_FILE_PAGES.

Updates of the page cache counters are always performed with interrupts off.
We can therefore use the __ variant here.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:34 -07:00
Gerald Schaefer 5b5dd21a8e [S390] appldata enhancements.
Add CPU ID and steal time, and make OS record size variable.

Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2006-06-29 15:08:35 +02:00
Peter Oberparleiter 585c3047a8 [S390] Add vmpanic parameter.
Implementation of new kernel parameter vmpanic that provides a means to
perform a z/VM CP command after a kernel panic occurred.

Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2006-06-29 15:08:25 +02:00
Martin Schwidefsky 8f27766a88 [S390] remove export of sys_call_table
Remove export of the sys_call_table symbol to prevent the misuse of it.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2006-06-29 15:03:54 +02:00
Martin Schwidefsky 65b73c69c5 [S390] remove unused macros from binfmt_elf32.c
The two macros NEW_TO_OLD_UID and NEW_TO_OLD_GID in binfmt_elf32.c
are not used by any code. Remove them.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2006-06-29 15:03:48 +02:00
Serge E. Hallyn 8e0474f3b4 [S390] fix duplicate export of overflow{ug}id
overflowuid and overflowgid were exported twice.  Remove the export
from s390_ksyms.c

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2006-06-29 15:03:42 +02:00
Gerald Schaefer 3ee526841b [S390] avenrun export in appdata_base.c
Remove EXPORT_SYMBOL_GPL(avenrun) from appdata_base.c, since it is
already exported in kernel/timer.c

Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2006-06-29 15:03:28 +02:00
Heiko Carstens b1b7030691 [S390] head.S code moving.
There is almost no room left for any new code between 0x10000
and 0x10480. Move the code from 0x10000 to 0x11000.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2006-06-29 14:58:17 +02:00
Martin Schwidefsky 63b1224664 [S390] virtual cpu accounting vs. machine checks.
If a machine checks interrupts the external or the i/o interrupt
handler before they have completed the cpu time calculations, the
accounting goes wrong. After the cpu returned from the machine check
handler to the interrupted interrupt handler, a negative cpu time delta
can occur.  If the accumulated cpu time in lowcore is small enough
this value can get negative as well. The next jiffy interrupt will pick
up that negative value, shift it by 12 and add the now huge positive
value to the cpu time of the process.
To solve this the machine check handler is modified not to change any
of the timestamps in the lowcore if the machine check interrupted kernel
context.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2006-06-29 14:58:05 +02:00
Gerald Schaefer 9faf06547e [S390] add __cpuinit to appldata cpu hotplug notifier.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2006-06-29 14:57:58 +02:00
Martin Schwidefsky 06fa46a2fc [S390] console_unblank woes.
The software watchdog calls machine_restart from a timer function.
The s390 machine_restart calls console_unblank to flush the console
output. This is needed for panic to get the panic message printed.
If console_unblank is called in interrupt a BUG is triggered in
acquire_console_sem. That makes the software watchdog panic instead
of restarting the machine. To get around this problem the call to
console_unblank is made conditionally on !in_interrupt() ||
oops_in_progress.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2006-06-29 14:57:32 +02:00
Heiko Carstens d7d2370255 [S390] memory detection.
The wrong base register is used to read a value from the sclp data
structure. The value is used to calculate the memory size.
Use correct register %r4.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2006-06-29 14:56:32 +02:00
Heiko Carstens 7380534314 [S390] incomplete stack traces.
show_stack() passes a pointer to the current stack frame to show_trace().
Because of tail call optimization the pointer doesn't point to the original
stack frame anymory and therefore traces are wrong. Don't pass the pointer
of the current stack frame to show_trace(). Instead let show_trace()
calculate the pointer on its own.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2006-06-29 14:56:23 +02:00
Chandra Seetharaman 054cc8a2d8 [PATCH] cpu hotplug: revert initdata patch submitted for 2.6.17
This patch reverts notifier_block changes made in 2.6.17

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:41 -07:00
KAMEZAWA Hiroyuki 76b67ed9dc [PATCH] node hotplug: register cpu: remove node struct
With Goto-san's patch, we can add new pgdat/node at runtime.  I'm now
considering node-hot-add with cpu + memory on ACPI.

I found acpi container, which describes node, could evaluate cpu before
memory. This means cpu-hot-add occurs before memory hot add.

In most part, cpu-hot-add doesn't depend on node hot add.  But register_cpu(),
which creates symbolic link from node to cpu, requires that node should be
onlined before register_cpu().  When a node is onlined, its pgdat should be
there.

This patch-set holds off creating symbolic link from node to cpu
until node is onlined.

This removes node arguments from register_cpu().

Now, register_cpu() requires 'struct node' as its argument.  But the array of
struct node is now unified in driver/base/node.c now (By Goto's node hotplug
patch).  We can get struct node in generic way.  So, this argument is not
necessary now.

This patch also guarantees add cpu under node only when node is onlined.  It
is necessary for node-hot-add vs.  cpu-hot-add patch following this.

Moreover, register_cpu calculates cpu->node_id by cpu_to_node() without regard
to its 'struct node *root' argument.  This patch removes it.

Also modify callers of register_cpu()/unregister_cpu, whose args are changed
by register-cpu-remove-node-struct patch.

[Brice.Goglin@ens-lyon.org: fix it]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:37 -07:00
Linus Torvalds da206c9e68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
  typo fixes
  Clean up 'inline is not at beginning' warnings for usb storage
  Storage class should be first
  i386: Trivial typo fixes
  ixj: make ixj_set_tone_off() static
  spelling fixes
  fix paniced->panicked typos
  Spelling fixes for Documentation/atomic_ops.txt
  move acknowledgment for Mark Adler to CREDITS
  remove the bouncing email address of David Campbell
2006-06-26 13:33:14 -07:00
Tobias Klauser 2efe55a9ce Storage class should be first
Storage class should be before const

Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-26 18:57:34 +02:00
Andreas Mohr d6e05edc59 spelling fixes
acquired (aquired)
contiguous (contigious)
successful (succesful, succesfull)
surprise (suprise)
whether (weather)
some other misspellings

Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-26 18:35:02 +02:00
Herbert Xu 6c2bb98bc3 [CRYPTO] all: Pass tfm instead of ctx to algorithms
Up until now algorithms have been happy to get a context pointer since
they know everything that's in the tfm already (e.g., alignment, block
size).

However, once we have parameterised algorithms, such information will
be specific to each tfm.  So the algorithm API needs to be changed to
pass the tfm structure instead of the context pointer.

This patch is basically a text substitution.  The only tricky bit is
the assembly routines that need to get the context pointer offset
through asm-offsets.h.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2006-06-26 17:34:39 +10:00
Herbert Xu 43600106e3 [CRYPTO] digest: Remove unnecessary zeroing during init
Various digest algorithms operate one block at a time and therefore
keep a temporary buffer of partial blocks.  This buffer does not need
to be initialised since there is a counter which indicates what is and
isn't valid in it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2006-06-26 17:34:38 +10:00
Heiko Carstens cc13ad6217 [PATCH] s390: setup.c cleanup + build fix
Cleanup & fix 31 bit compilation:

  CC      arch/s390/kernel/setup.o
arch/s390/kernel/setup.c:83: error: initializer element is not computable at
                                    load time
arch/s390/kernel/setup.c:83: error: (near initialization for
                                    'code_resource.start')
Not sure which patch in the -mm tree breaks this, but since this can be
considered a cleanup it can be merged anyway.

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:25 -07:00
Andrew Morton a5cf4b9a02 [PATCH] s390_hypfs filesystem: get_sb_single() fix
Update hypfs for dhowells API changes.

Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Ingo Oeser <ioe-lkml@rameria.de>
Cc: Joern Engel <joern@wohnheim.fh-wedel.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 08:47:27 -07:00
Michael Holzheu 24bbb1faf3 [PATCH] s390_hypfs filesystem
On zSeries machines there exists an interface which allows the operating
system to retrieve LPAR hypervisor accounting data.  For example, it is
possible to get usage data for physical and virtual cpus.  In order to
provide this information to user space programs, I implemented a new
virtual Linux file system named 's390_hypfs' using the Linux 2.6 libfs
framework.  The name 's390_hypfs' stands for 'S390 Hypervisor Filesystem'.
All the accounting information is put into different virtual files which
can be accessed from user space.  All data is represented as ASCII strings.

When the file system is mounted the accounting information is retrieved and
a file system tree is created with the attribute files containing the cpu
information.  The content of the files remains unchanged until a new update
is made.  An update can be triggered from user space through writing
'something' into a special purpose update file.

We create the following directory structure:

<mount-point>/
        update
        cpus/
                <cpu-id>
                        type
                        mgmtime
                <cpu-id>
                        ...
        hyp/
                type
        systems/
                <lpar-name>
                        cpus/
                                <cpu-id>
                                        type
                                        mgmtime
                                        cputime
                                        onlinetime
                                <cpu-id>
                                        ...
                <lpar-name>
                        cpus/
                                ...

- update: File to trigger update
- cpus/: Directory for all physical cpus
- cpus/<cpu-id>/: Directory for one physical cpu.
- cpus/<cpu-id>/type: Type name of physical zSeries cpu.
- cpus/<cpu-id>/mgmtime: Physical-LPAR-management time in microseconds.
- hyp/: Directory for hypervisor information
- hyp/type: Typ of hypervisor (currently only 'LPAR Hypervisor')
- systems/: Directory for all LPARs
- systems/<lpar-name>/: Directory for one LPAR.
- systems/<lpar-name>/cpus/<cpu-id>/: Directory for the virtual cpus
- systems/<lpar-name>/cpus/<cpu-id>/type: Typ of cpu.
- systems/<lpar-name>/cpus/<cpu-id>/mgmtime:
Accumulated number of microseconds during which a physical
CPU was assigned to the logical cpu and the cpu time was
consumed by the hypervisor and was not provided to
the LPAR (LPAR overhead).

- systems/<lpar-name>/cpus/<cpu-id>/cputime:
Accumulated number of microseconds during which a physical CPU
was assigned to the logical cpu and the cpu time was consumed
by the LPAR.

- systems/<lpar-name>/cpus/<cpu-id>/onlinetime:
Accumulated number of microseconds during which the logical CPU
has been online.

As mount point for the filesystem /sys/hypervisor/s390 is created.

The update process is triggered when writing 'something' into the
'update' file at the top level hypfs directory. You can do this e.g.
with 'echo 1 > update'. During the update the whole directory structure
is deleted and built up again.

Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Ingo Oeser <ioe-lkml@rameria.de>
Cc: Joern Engel <joern@wohnheim.fh-wedel.de>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:02 -07:00
Martin Schwidefsky 705af30950 [PATCH] s390: fix typo in stop_hz_timer.
Add missing parentheses for type cast to u64.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-25 12:09:55 -07:00