Commit Graph

2909 Commits

Author SHA1 Message Date
Herbert Xu
a98ce5c6fe Fix synchronize_irq races with IRQ handler
As it is some callers of synchronize_irq rely on memory barriers
to provide synchronisation against the IRQ handlers.  For example,
the tg3 driver does

	tp->irq_sync = 1;
	smp_mb();
	synchronize_irq();

and then in the IRQ handler:

	if (!tp->irq_sync)
		netif_rx_schedule(dev, &tp->napi);

Unfortunately memory barriers only work well when they come in
pairs.  Because we don't actually have memory barriers on the
IRQ path, the memory barrier before the synchronize_irq() doesn't
actually protect us.

In particular, synchronize_irq() may return followed by the
result of netif_rx_schedule being made visible.

This patch (mostly written by Linus) fixes this by using spin
locks instead of memory barries on the synchronize_irq() path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-23 09:01:31 -07:00
Randy Dunlap
481968f44e auditsc: fix kernel-doc param warnings
Fix kernel-doc for auditsc parameter changes.

Warning(linux-2.6.23-git17//kernel/auditsc.c:1623): No description found for parameter 'dentry'
Warning(linux-2.6.23-git17//kernel/auditsc.c:1666): No description found for parameter 'dentry'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22 19:40:02 -07:00
Linus Torvalds
0fd56c7033 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
  KVM: Use new smp_call_function_mask() in kvm_flush_remote_tlbs()
  sched: don't clear PF_VCPU in scheduler
  KVM: Improve local apic timer wraparound handling
  KVM: Fix local apic timer divide by zero
  KVM: Move kvm_guest_exit() after local_irq_enable()
  KVM: x86 emulator: fix access registers for instructions with ModR/M byte and Mod = 3
  KVM: VMX: Force vm86 mode if setting flags during real mode
  KVM: x86 emulator: implement 'movnti mem, reg'
  KVM: VMX: Reset mmu context when entering real mode
  KVM: VMX: Handle NMIs before enabling interrupts and preemption
  KVM: MMU: Set shadow pte atomically in mmu_pte_write_zap_pte()
  KVM: x86 emulator: fix repne/repnz decoding
  KVM: x86 emulator: fix merge screwup due to emulator split
2007-10-22 19:24:17 -07:00
Eric W. Biederman
5081dba658 Fix appletalk sysctl entry name
Gabriel C reported that modprobing appletalk on current git gives a
warning in dmesg :

   "sysctl table check failed: /net/appletalk .3.7 procname does not match binary path procname"

Oops.  My apologies it appears I made a mistake when creating my table
to check up on sysctl values.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Tested-by: Gabriel C <nix.or.die@googlemail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22 19:15:59 -07:00
Laurent Vivier
83d87d1673 sched: don't clear PF_VCPU in scheduler
KVM clears it by itself now, and for s390 this is plain wrong.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-22 12:03:29 +02:00
Al Viro
74c3cbe33b [PATCH] audit: watching subtrees
New kind of audit rule predicates: "object is visible in given subtree".
The part that can be sanely implemented, that is.  Limitations:
	* if you have hardlink from outside of tree, you'd better watch
it too (or just watch the object itself, obviously)
	* if you mount something under a watched tree, tell audit
that new chunk should be added to watched subtrees
	* if you umount something in a watched tree and it's still mounted
elsewhere, you will get matches on events happening there.  New command
tells audit to recalculate the trees, trimming such sources of false
positives.

Note that it's _not_ about path - if something mounted in several places
(multiple mount, bindings, different namespaces, etc.), the match does
_not_ depend on which one we are using for access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2007-10-21 02:37:45 -04:00
Al Viro
5a190ae697 [PATCH] pass dentry to audit_inode()/audit_inode_child()
makes caller simpler *and* allows to scan ancestors

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2007-10-21 02:37:18 -04:00
Linus Torvalds
c00046c279 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (74 commits)
  fix do_sys_open() prototype
  sysfs: trivial: fix sysfs_create_file kerneldoc spelling mistake
  Documentation: Fix typo in SubmitChecklist.
  Typo: depricated -> deprecated
  Add missing profile=kvm option to Documentation/kernel-parameters.txt
  fix typo about TBI in e1000 comment
  proc.txt: Add /proc/stat field
  small documentation fixes
  Fix compiler warning in smount example program from sharedsubtree.txt
  docs/sysfs: add missing word to sysfs attribute explanation
  documentation/ext3: grammar fixes
  Documentation/java.txt: typo and grammar fixes
  Documentation/filesystems/vfs.txt: typo fix
  include/asm-*/system.h: remove unused set_rmb(), set_wmb() macros
  trivial copy_data_pages() tidy up
  Fix typo in arch/x86/kernel/tsc_32.c
  file link fix for Pegasus USB net driver help
  remove unused return within void return function
  Typo fixes retrun -> return
  x86 hpet.h: remove broken links
  ...
2007-10-19 20:36:17 -07:00
Eric W. Biederman
c1cb8e48bd sysctl: Don't compile sysctl_check when !CONFIG_SYSCTL
Weird I thought I had written the makefile so this would be handled.  Oh
well this should fix it.

Sorry about that.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-and-tested-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 18:04:22 -07:00
Fengguang Wu
df7c487250 trivial copy_data_pages() tidy up
Change the loop style of copy_data_pages() to remove a duplicate condition.

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-20 02:26:04 +02:00
Uwe Kleine-König
6506f2aa66 fix comment: unlock_hrtimer_base is the counterpart of lock_hrtimer_base
Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-20 01:56:53 +02:00
Michael Neuling
6888c1ecd6 kernel/sched.c: remove bogus comment from account_user_time
hardirq_offset is no longer needed.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-20 01:41:05 +02:00
Daniel Roesen
9aa5e993fa trivial comment wording/typo fix regarding taint flags
Signed-off-by: Daniel Roesen <dr@cluenet.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-20 00:30:06 +02:00
Robert P. J. Day
3a4fa0a25d Fix misspellings of "system", "controller", "interrupt" and "necessary".
Fix the various misspellings of "system", controller", "interrupt" and
"[un]necessary".

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-19 23:10:43 +02:00
Linus Torvalds
c4ec207173 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (41 commits)
  ACPICA: hw: Don't carry spinlock over suspend
  ACPICA: hw: remove use_lock flag from acpi_hw_register_{read, write}
  ACPI: cpuidle: port idle timer suspend/resume workaround to cpuidle
  ACPI: clean up acpi_enter_sleep_state_prep
  Hibernation: Make sure that ACPI is enabled in acpi_hibernation_finish
  ACPI: suppress uninitialized var warning
  cpuidle: consolidate 2.6.22 cpuidle branch into one patch
  ACPI: thinkpad-acpi: skip blanks before the data when parsing sysfs
  ACPI: AC: Add sysfs interface
  ACPI: SBS: Add sysfs alarm
  ACPI: SBS: Add ACPI_PROCFS around procfs handling code.
  ACPI: SBS: Add support for power_supply class (and sysfs)
  ACPI: SBS: Make SBS reads table-driven.
  ACPI: SBS: Simplify data structures in SBS
  ACPI: SBS: Split host controller (ACPI0001) from SBS driver (ACPI0002)
  ACPI: EC: Add new query handler to list head.
  ACPI: Add acpi_bus_generate_event4() function
  ACPI: Battery: add sysfs alarm
  ACPI: Battery: Add sysfs support
  ACPI: Battery: Misc clean-ups, no functional changes
  ...

Fix up conflicts in drivers/misc/thinkpad_acpi.[ch] manually
2007-10-19 13:12:46 -07:00
Alexey Dobriyan
a39bc51691 Uninline fork.c/exit.c
Save ~650 bytes here.

add/remove: 4/0 grow/shrink: 0/7 up/down: 430/-1088 (-658)
function                                     old     new   delta
__copy_fs_struct                               -     202    +202
__put_fs_struct                                -     112    +112
__exit_fs                                      -      58     +58
__exit_files                                   -      58     +58
exit_files                                    58       2     -56
put_fs_struct                                112       5    -107
exit_fs                                      161       2    -159
sys_unshare                                  774     590    -184
copy_process                                4031    3840    -191
do_exit                                     1791    1597    -194
copy_fs_struct                               202       5    -197

No difference in lmbench lat_proc tests on 2-way Opteron 246.
Smaaaal degradation on UP P4 (within errors).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:56 -07:00
Mariusz Kozlowski
a24efe62dd kernel/fork.c: remove unneeded variable initialization in copy_process()
This initialization of is not needed so just remove it.

Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:56 -07:00
Mathieu Desnoyers
8256e47cdc Linux Kernel Markers
The marker activation functions sits in kernel/marker.c.  A hash table is used
to keep track of the registered probes and armed markers, so the markers
within a newly loaded module that should be active can be activated at module
load time.

marker_query has been removed. marker_get_first, marker_get_next and
marker_release should be used as iterators on the markers.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Mike Mason <mmlnx@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:54 -07:00
Mathieu Desnoyers
09cadedbdc Combine instrumentation menus in kernel/Kconfig.instrumentation
Quoting Randy:

"It seems sad that this patch sources Kconfig.marker, a 7-line file,
20-something times.  Yes, you (we) don't want to put those 7 lines into
20-something different files, so sourcing is the right thing.

However, what you did for avr32 seems more on the right track to me: make
_one_ Instrumentation support menu that includes PROFILING, OPROFILE, KPROBES,
and MARKERS and then use (source) that in all of the arches."

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:54 -07:00
Srivatsa Vaddagiri
68318b8e0b Hook up group scheduler with control groups
Enable "cgroup" (formerly containers) based fair group scheduling.  This
will let administrator create arbitrary groups of tasks (using "cgroup"
pseudo filesystem) and control their cpu bandwidth usage.

[akpm@linux-foundation.org: fix cpp condition]
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:51 -07:00
Bernhard Walle
cba63c3089 Extended crashkernel command line
This patch adds a extended crashkernel syntax that makes the value of reserved
system RAM dependent on the system RAM itself:

    crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset]
    range=start-[end]

For example:

    crashkernel=512M-2G:64M,2G-:128M

The motivation comes from distributors that configure their crashkernel
command line automatically with some configuration tool (YaST, you know ;)).
Of course that tool knows the value of System RAM, but if the user removes
RAM, then the system becomes unbootable or at least unusable and error
handling is very difficult.

This series implements this change for i386, x86_64, ia64, ppc64 and sh.  That
should be all platforms that support kdump in current mainline.  I tested all
platforms except sh due to the lack of a sh processor.

This patch:

This is the generic part of the patch.  It adds a parse_crashkernel() function
in kernel/kexec.c that is called by the architecture specific code that
actually reserves the memory.  That function takes the whole command line and
looks itself for "crashkernel=" in it.

If there are multiple occurrences, then the last one is taken.  The advantage
is that if you have a bootloader like lilo or elilo which allows you to append
a command line parameter but not to remove one (like in GRUB), then you can
add another crashkernel value for testing at the boot command line and this
one overwrites the command line in the configuration then.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:49 -07:00
KAMEZAWA Hiroyuki
73e753a50d CPU HOTPLUG: avoid hotadd when proper possible_map isn't specified
cpu-hot-add should be fail if cpu is not set in cpu_possible_map.  If go
ahead, the system will panic soon.

Especially, arch which requires additional_cpus= parameter should handle
this.  Tested on ia64.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:44 -07:00
Cliff Wickman
470fd64644 hotplug cpu: migrate a task within its cpuset
When a cpu is disabled, move_task_off_dead_cpu() is called for tasks that have
been running on that cpu.

Currently, such a task is migrated:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any cpu which is both online and among that task's cpus_allowed

It is typical of a multithreaded application running on a large NUMA system to
have its tasks confined to a cpuset so as to cluster them near the memory that
they share.  Furthermore, it is typical to explicitly place such a task on a
specific cpu in that cpuset.  And in that case the task's cpus_allowed
includes only a single cpu.

This patch would insert a preference to migrate such a task to some cpu within
its cpuset (and set its cpus_allowed to its entire cpuset).

With this patch, migrate the task to:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any online cpu within the task's cpuset
 3) to any cpu which is both online and among that task's cpus_allowed

In order to do this, move_task_off_dead_cpu() must make a call to
cpuset_cpus_allowed_locked(), a new subset of cpuset_cpus_allowed(), that will
not block.  (name change - per Oleg's suggestion)

Calls are made to cpuset_lock() and cpuset_unlock() in migration_call() to set
the cpuset mutex during the whole migrate_live_tasks() and
migrate_dead_tasks() procedure.

[akpm@linux-foundation.org: build fix]
[pj@sgi.com: Fix indentation and spacing]
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:44 -07:00
Paul Menage
bd89aabc67 Control groups: Replace "cont" with "cgrp" and other misc renaming
Replace "cont" with "cgrp" and other misc renaming

This patch finishes some of the names that got missed in the great
"task containers" -> "control groups" rename. Primarily it renames
the local variable "cont" to "cgrp" in a number of places, and renames
the CONT_* enum members to CGRP_*.

This patch is not intended to have any effect on the generated code;
the output of "objdump -d kernel/cgroup.o" is unchanged.

Signed-off-by: Paul Menage <menage@google.com>
Acked-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:43 -07:00
Pavel Emelyanov
69cccb887a Use task_pid_nr() instead of pid_nr(task_pid())
There are two places that do so - the cgroups subsystem and the autofs
code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Ian Kent <raven@themaw.net>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:43 -07:00