Commit Graph

10713 Commits

Author SHA1 Message Date
Li Zefan
3ec762ad8b cgroups: Fix a lockdep warning at cgroup removal
Commit 2fd6b7f5 ("fs: dcache scale subdirs") forgot to annotate a dentry
lock, which caused a lockdep warning.

Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-01-14 08:46:29 +00:00
Nick Piggin
fb045adb99 fs: dcache reduce branches in lookup path
Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:28 +11:00
Nick Piggin
dc0474be3e fs: dcache rationalise dget variants
dget_locked was a shortcut to avoid the lazy lru manipulation when we already
held dcache_lock (lru manipulation was relatively cheap at that point).
However, how that the lru lock is an innermost one, we never hold it at any
caller, so the lock cost can now be avoided. We already have well working lazy
dcache LRU, so it should be fine to defer LRU manipulations to scan time.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:24 +11:00
Nick Piggin
b5c84bf6f6 fs: dcache remove dcache_lock
dcache_lock no longer protects anything. remove it.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:23 +11:00
Nick Piggin
2fd6b7f507 fs: dcache scale subdirs
Protect d_subdirs and d_child with d_lock, except in filesystems that aren't
using dcache_lock for these anyway (eg. using i_mutex).

Note: if we change the locking rule in future so that ->d_child protection is
provided only with ->d_parent->d_lock, it may allow us to reduce some locking.
But it would be an exception to an otherwise regular locking scheme, so we'd
have to see some good results. Probably not worthwhile.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:21 +11:00
Nick Piggin
b7ab39f631 fs: dcache scale dentry refcount
Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
we start protecting many other dentry members with d_lock.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:21 +11:00
Nick Piggin
fe15ce446b fs: change d_delete semantics
Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:18 +11:00
Nick Piggin
5adcee1d8d cgroup fs: avoid switching ->d_op on live dentry
Switching d_op on a live dentry is racy in general, so avoid it. In this case
it is a negative dentry, which is safer, but there are still concurrent ops
which may be called on d_op in that case (eg. d_revalidate). So in general
a filesystem may not do this. Fix cgroupfs so as not to do this.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:17 +11:00
Ben Hutchings
551423748a watchdog: Improve initialisation error message and documentation
The error message 'NMI watchdog failed to create perf event...'
does not make it clear that this is a fatal error for the
watchdog.  It also currently prints the error value as a
pointer, rather than extracting the error code with PTR_ERR().
Fix that.

Add a note to the description of the 'nowatchdog' kernel
parameter to associate it with this message.

Reported-by: Cesare Leonardi <celeonar@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: 599368@bugs.debian.org
Cc: 608138@bugs.debian.org
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@kernel.org> # .37.x and later
LKML-Reference: <1294009362.3167.126.camel@localhost>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-03 05:25:52 +01:00
Hillf Danton
4ef9e11d68 fix freeing user_struct in user cache
When racing on adding into user cache, the new allocated from mm slab
is freed without putting user namespace.

Since the user namespace is already operated by getting, putting has
to be issued.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-29 11:31:38 -08:00
Linus Torvalds
6f7f41851c Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  ring_buffer: Off-by-one and duplicate events in ring_buffer_read_page
2010-12-28 15:54:24 -08:00
Linus Torvalds
a4790c9457 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: print out alloc information with KERN_DEBUG instead of KERN_INFO
  kthread_work: make lockdep happy
2010-12-24 12:59:09 -08:00
David Sharp
e1e3592735 ring_buffer: Off-by-one and duplicate events in ring_buffer_read_page
Fix two related problems in the event-copying loop of
ring_buffer_read_page.

The loop condition for copying events is off-by-one.
"len" is the remaining space in the caller-supplied page.
"size" is the size of the next event (or two events).
If len == size, then there is just enough space for the next event.

size was set to rb_event_ts_length, which may include the size of two
events if the first event is a time-extend, in order to assure time-
extends are kept together with the event after it. However,
rb_advance_reader always advances by one event. This would result in the
event after any time-extend being duplicated. Instead, get the size of
a single event for the memcpy, but use rb_event_ts_length for the loop
condition.

Signed-off-by: David Sharp <dhsharp@google.com>
LKML-Reference: <1293064704-8101-1-git-send-email-dhsharp@google.com>
LKML-Reference: <AANLkTin7nLrRPc9qGjdjHbeVDDWiJjAiYyb-L=gH85bx@mail.gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-12-23 12:09:30 -05:00
Jeff Mahoney
4be2c95d1f taskstats: pad taskstats netlink response for aligment issues on ia64
The taskstats structure is internally aligned on 8 byte boundaries but the
layout of the aggregrate reply, with two NLA headers and the pid (each 4
bytes), actually force the entire structure to be unaligned.  This causes
the kernel to issue unaligned access warnings on some architectures like
ia64.  Unfortunately, some software out there doesn't properly unroll the
NLA packet and assumes that the start of the taskstats structure will
always be 20 bytes from the start of the netlink payload.  Aligning the
start of the taskstats structure breaks this software, which we don't
want.  So, for now the alignment only happens on architectures that
require it and those users will have to update to fixed versions of those
packages.  Space is reserved in the packet only when needed.  This ifdef
should be removed in several years e.g.  2012 once we can be confident
that fixed versions are installed on most systems.  We add the padding
before the aggregate since the aggregate is already a defined type.

Commit 85893120 ("delayacct: align to 8 byte boundary on 64-bit systems")
previously addressed the alignment issues by padding out the pid field.
This was supposed to be a compatible change but the circumstances
described above mean that it wasn't.  This patch backs out that change,
since it was a hack, and introduces a new NULL attribute type to provide
the padding.  Padding the response with 4 bytes avoids allocating an
aligned taskstats structure and copying it back.  Since the structure
weighs in at 328 bytes, it's too big to do it on the stack.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reported-by: Brian Rogers <brian@xyzw.org>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Guillaume Chazarain <guichaz@gmail.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-22 19:43:34 -08:00
Yong Zhang
4f32e9b1f8 kthread_work: make lockdep happy
spinlock in kthread_worker and wait_queue_head in kthread_work both
should be lockdep sensible, so change the interface to make it
suiltable for CONFIG_LOCKDEP.

tj: comment update

Reported-by: Nicolas <nicolas.mailhot@laposte.net>
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Signed-off-by: Andy Walls <awalls@md.metrocast.net>
Tested-by: Andy Walls <awalls@md.metrocast.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2010-12-22 10:27:53 +01:00
Linus Torvalds
de5e9d5820 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Remove debugging check
2010-12-20 09:05:26 -08:00
Ingo Molnar
050c6c9b89 sched: Remove debugging check
Linus reported that the new warning introduced by commit f26f9aff6a
"Sched: fix skip_clock_update optimization" triggers. The need_resched
flag can be set by other CPUs asynchronously so this debug check is
bogus - remove it.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <AANLkTinJ8hAG1TpyC+CSYPR47p48+1=E7fiC45hMXT_1@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-19 23:24:27 +01:00
Linus Torvalds
55ec86f848 Merge branches 'x86-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-32: Make sure we can map all of lowmem if we need to
  x86, vt-d: Handle previous faults after enabling fault handling
  x86: Enable the intr-remap fault handling after local APIC setup
  x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic mode
  x86, vt-d: Quirk for masking vtd spec errors to platform error handling logic
  x86, xsave: Use alloc_bootmem_align() instead of alloc_bootmem()
  bootmem: Add alloc_bootmem_align()
  x86, gcc-4.6: Use gcc -m options when building vdso
  x86: HPET: Chose a paranoid safe value for the ETIME check
  x86: io_apic: Avoid unused variable warning when CONFIG_GENERIC_PENDING_IRQ=n

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: Fix off by one in perf_swevent_init()
  perf: Fix duplicate events with multiple-pmu vs software events
  ftrace: Have recordmcount honor endianness in fn_ELF_R_INFO
  scripts/tags.sh: Add magic for trace-events
  tracing: Fix panic when lseek() called on "trace" opened for writing
2010-12-19 10:44:54 -08:00
Linus Torvalds
21228e4557 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Fix the irqtime code for 32bit
  sched: Fix the irqtime code to deal with u64 wraps
  nohz: Fix get_next_timer_interrupt() vs cpu hotplug
  Sched: fix skip_clock_update optimization
  sched: Cure more NO_HZ load average woes
2010-12-19 10:37:37 -08:00
Linus Torvalds
46bdfe6a50 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  x86: avoid high BIOS area when allocating address space
  x86: avoid E820 regions when allocating address space
  x86: avoid low BIOS area when allocating address space
  resources: add arch hook for preventing allocation in reserved areas
  Revert "resources: support allocating space within a region from the top down"
  Revert "PCI: allocate bus resources from the top down"
  Revert "x86/PCI: allocate space from the end of a region, not the beginning"
  Revert "x86: allocate space within a region top-down"
  Revert "PCI: fix pci_bus_alloc_resource() hang, prefer positive decode"
  PCI: Update MCP55 quirk to not affect non HyperTransport variants
2010-12-18 10:13:24 -08:00
Bjorn Helgaas
fcb119183c resources: add arch hook for preventing allocation in reserved areas
This adds arch_remove_reservations(), which an arch can implement if it
needs to protect part of the address space from allocation.

Sometimes that can be done by just putting a region in the resource tree,
but there are cases where that doesn't work well.  For example, x86 BIOS
E820 reservations are not related to devices, so they may overlap part of,
all of, or more than a device resource, so they may not end up at the
correct spot in the resource tree.

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17 10:01:09 -08:00
Bjorn Helgaas
c0f5ac5426 Revert "resources: support allocating space within a region from the top down"
This reverts commit e7f8567db9.

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17 10:00:59 -08:00
Rafael J. Wysocki
be8cd644c4 PM / Hibernate: Restore old swap signature to avoid user space breakage
Commit 3624eb0 (PM / Hibernate: Modify signature used to mark swap)
attempted to modify hibernate signature used to mark swap partitions
containing hibernation images, so that old kernels don't try to
handle compressed images.  However, this change broke resume from
hibernation on Fedora 14 that apparently doesn't pass the resume=
argument to the kernel and tries to trigger resume from early user
space.  This doesn't work, because the signature is now different,
so the old signature has to be restored to avoid the problem.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=22732 .

Reported-by: Dr. David Alan Gilbert <linux@treblig.org>
Reported-by: Zhang Rui <rui.zhang@intel.com>
Reported-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-12-16 17:08:58 +01:00
Takashi Iwai
1497dd1d29 PM / Hibernate: Fix PM_POST_* notification with user-space suspend
The user-space hibernation sends a wrong notification after the image
restoration because of thinko for the file flag check.  RDONLY
corresponds to hibernation and WRONLY to restoration, confusingly.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@kernel.org
2010-12-16 17:08:43 +01:00
Ingo Molnar
d949750fed Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent 2010-12-16 11:21:24 +01:00