Commit Graph

1325 Commits

Author SHA1 Message Date
Paul E. McKenney 8bfb2f8e65 rcu: Make RCU's CPU-stall detector be default
The RCU_CPU_STALL_DETECTOR costs almost nothing and has located
some bugs that might otherwise have been difficult to track
down.  Make it be default for the TREE RCU implementations.

The vmlinux size impact is limited (on 64-bit x86 defconfig):

   text	   data	    bss	    dec	    hex	filename
   8440248	1260076	 995588	10695912	 a334e8	vmlinux.before
   8440774	1260060	 995588	10696422	 a336e6	vmlinux.after

+526 bytes - acceptable default cost.

For RAM starved systems, TINY_RCU does not support CPU-stall detection
and is much smaller, but then again it is a uniprocessor...

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12597846162906-git-send-email->
[ v2: added image size calculations to the changelog ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03 11:35:27 +01:00
David Howells 285e728b0a FS-Cache: Don't delete pending pages from the page-store tracking tree
Don't delete pending pages from the page-store tracking tree, but rather send
them for another write as they've presumably been updated.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:29 +00:00
David Howells b34df792b4 FS-Cache: Use radix tree preload correctly in tracking of pages to be stored
__fscache_write_page() attempts to load the radix tree preallocation pool for
the CPU it is on before calling radix_tree_insert(), as the insertion must be
done inside a pair of spinlocks.

Use of the preallocation pool, however, is contingent on the radix tree being
initialised without __GFP_WAIT specified.  __fscache_acquire_cookie() was
passing GFP_NOFS to INIT_RADIX_TREE() - but that includes __GFP_WAIT.

The solution is to AND out __GFP_WAIT.

Additionally, the banner comment to radix_tree_preload() is altered to make
note of this prerequisite.  Possibly there should be a WARN_ON() too.

Without this fix, I have seen the following recursive deadlock caused by
radix_tree_insert() attempting to allocate memory inside the spinlocked
region, which resulted in FS-Cache being called back into to release memory -
which required the spinlock already held.

=============================================
[ INFO: possible recursive locking detected ]
2.6.32-rc6-cachefs #24
---------------------------------------------
nfsiod/7916 is trying to acquire lock:
 (&cookie->lock){+.+.-.}, at: [<ffffffffa0076872>] __fscache_uncache_page+0xdb/0x160 [fscache]

but task is already holding lock:
 (&cookie->lock){+.+.-.}, at: [<ffffffffa0076acc>] __fscache_write_page+0x15c/0x3f3 [fscache]

other info that might help us debug this:
5 locks held by nfsiod/7916:
 #0:  (nfsiod){+.+.+.}, at: [<ffffffff81048290>] worker_thread+0x19a/0x2e2
 #1:  (&task->u.tk_work#2){+.+.+.}, at: [<ffffffff81048290>] worker_thread+0x19a/0x2e2
 #2:  (&cookie->lock){+.+.-.}, at: [<ffffffffa0076acc>] __fscache_write_page+0x15c/0x3f3 [fscache]
 #3:  (&object->lock#2){+.+.-.}, at: [<ffffffffa0076b07>] __fscache_write_page+0x197/0x3f3 [fscache]
 #4:  (&cookie->stores_lock){+.+...}, at: [<ffffffffa0076b0f>] __fscache_write_page+0x19f/0x3f3 [fscache]

stack backtrace:
Pid: 7916, comm: nfsiod Not tainted 2.6.32-rc6-cachefs #24
Call Trace:
 [<ffffffff8105ac7f>] __lock_acquire+0x1649/0x16e3
 [<ffffffff81059ded>] ? __lock_acquire+0x7b7/0x16e3
 [<ffffffff8100e27d>] ? dump_trace+0x248/0x257
 [<ffffffff8105ad70>] lock_acquire+0x57/0x6d
 [<ffffffffa0076872>] ? __fscache_uncache_page+0xdb/0x160 [fscache]
 [<ffffffff8135467c>] _spin_lock+0x2c/0x3b
 [<ffffffffa0076872>] ? __fscache_uncache_page+0xdb/0x160 [fscache]
 [<ffffffffa0076872>] __fscache_uncache_page+0xdb/0x160 [fscache]
 [<ffffffffa0077eb7>] ? __fscache_check_page_write+0x0/0x71 [fscache]
 [<ffffffffa00b4755>] nfs_fscache_release_page+0x86/0xc4 [nfs]
 [<ffffffffa00907f0>] nfs_release_page+0x3c/0x41 [nfs]
 [<ffffffff81087ffb>] try_to_release_page+0x32/0x3b
 [<ffffffff81092c2b>] shrink_page_list+0x316/0x4ac
 [<ffffffff81058a9b>] ? mark_held_locks+0x52/0x70
 [<ffffffff8135451b>] ? _spin_unlock_irq+0x2b/0x31
 [<ffffffff81093153>] shrink_inactive_list+0x392/0x67c
 [<ffffffff81058a9b>] ? mark_held_locks+0x52/0x70
 [<ffffffff810934ca>] shrink_list+0x8d/0x8f
 [<ffffffff81093744>] shrink_zone+0x278/0x33c
 [<ffffffff81052c70>] ? ktime_get_ts+0xad/0xba
 [<ffffffff8109453b>] try_to_free_pages+0x22e/0x392
 [<ffffffff8109184c>] ? isolate_pages_global+0x0/0x212
 [<ffffffff8108e16b>] __alloc_pages_nodemask+0x3dc/0x5cf
 [<ffffffff810ae24a>] cache_alloc_refill+0x34d/0x6c1
 [<ffffffff811bcf74>] ? radix_tree_node_alloc+0x52/0x5c
 [<ffffffff810ae929>] kmem_cache_alloc+0xb2/0x118
 [<ffffffff811bcf74>] radix_tree_node_alloc+0x52/0x5c
 [<ffffffff811bcfd5>] radix_tree_insert+0x57/0x19c
 [<ffffffffa0076b53>] __fscache_write_page+0x1e3/0x3f3 [fscache]
 [<ffffffffa00b4248>] __nfs_readpage_to_fscache+0x58/0x11e [nfs]
 [<ffffffffa009bb77>] nfs_readpage_release+0x34/0x9b [nfs]
 [<ffffffffa009c0d9>] nfs_readpage_release_full+0x32/0x4b [nfs]
 [<ffffffffa0006cff>] rpc_release_calldata+0x12/0x14 [sunrpc]
 [<ffffffffa0006e2d>] rpc_free_task+0x59/0x61 [sunrpc]
 [<ffffffffa0006f03>] rpc_async_release+0x10/0x12 [sunrpc]
 [<ffffffff810482e5>] worker_thread+0x1ef/0x2e2
 [<ffffffff81048290>] ? worker_thread+0x19a/0x2e2
 [<ffffffff81352433>] ? thread_return+0x3e/0x101
 [<ffffffffa0006ef3>] ? rpc_async_release+0x0/0x12 [sunrpc]
 [<ffffffff8104bff5>] ? autoremove_wake_function+0x0/0x34
 [<ffffffff81058d25>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff810480f6>] ? worker_thread+0x0/0x2e2
 [<ffffffff8104bd21>] kthread+0x7a/0x82
 [<ffffffff8100beda>] child_rip+0xa/0x20
 [<ffffffff8100b87c>] ? restore_args+0x0/0x30
 [<ffffffff8104c2b9>] ? add_wait_queue+0x15/0x44
 [<ffffffff8104bca7>] ? kthread+0x0/0x82
 [<ffffffff8100bed0>] ? child_rip+0x0/0x20

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:14 +00:00
Linus Torvalds a414f01ac2 strcmp: fix overflow and possibly signedness error
Doing the strcmp return value as

	signed char __res = *cs - *ct;

is wrong for two reasons.  The subtraction can overflow because __res
doesn't use a type big enough.  Moreover the compared bytes should be
interpreted as unsigned char as specified by POSIX.

The same problem is fixed in strncmp.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Michael Buesch <mb@bu3sch.de>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-18 17:18:13 -08:00
Eric W. Biederman bb9074ff58 Merge commit 'v2.6.32-rc7'
Resolve the conflict between v2.6.32-rc7 where dn_def_dev_handler
gets a small bug fix and the sysctl tree where I am removing all
sysctl strategy routines.
2009-11-17 01:01:34 -08:00
Thomas Gleixner dc186ad741 workqueue: Add debugobjects support
Add debugobject support to track the life time of work_structs.

While at it, remove duplicate definition of
INIT_DELAYED_WORK_ON_STACK().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-11-16 01:09:48 +09:00
FUJITA Tomonori b18485e7ac swiotlb: Remove the swiotlb variable usage
POWERPC doesn't expect it to be used.

This fixes the linux-next build failure reported by
Stephen Rothwell:

  lib/swiotlb.c: In function 'setup_io_tlb_npages':
  lib/swiotlb.c:114: error: 'swiotlb' undeclared (first use in this function)

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: peterz@infradead.org
LKML-Reference: <20091112000258F.fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-11 16:51:18 +01:00
Eric W. Biederman 83ac201b4f sysctl: Remove dead code from sysctl_check
Now that the sys_sysctl is now a compatibility wrapper around
/proc/sys we can remove much of sysctl_check and reduce it
to a few remaining sanity checks.  This completely decouples
it from the binary sysctl system call.

Little things like ensuring that the sysctl has not already
been registered are all that remain.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2009-11-11 00:42:53 -08:00
FUJITA Tomonori 75f1cdf1dd x86: Handle HW IOMMU initialization failure gracefully
If HW IOMMU initialization fails (Intel VT-d often does this,
typically due to BIOS bugs), we fall back to nommu. It doesn't
work for the majority since nowadays we have more than 4GB
memory so we must use swiotlb instead of nommu.

The problem is that it's too late to initialize swiotlb when HW
IOMMU initialization fails. We need to allocate swiotlb memory
earlier from bootmem allocator. Chris explained the issue in
detail:

  http://marc.info/?l=linux-kernel&m=125657444317079&w=2

The current x86 IOMMU initialization sequence is too complicated
and handling the above issue makes it more hacky.

This patch changes x86 IOMMU initialization sequence to handle
the above issue cleanly.

The new x86 IOMMU initialization sequence are:

1. we initialize the swiotlb (and setting swiotlb to 1) in the case
   of (max_pfn > MAX_DMA32_PFN && !no_iommu). dma_ops is set to
   swiotlb_dma_ops or nommu_dma_ops. if swiotlb usage is forced by
   the boot option, we finish here.

2. we call the detection functions of all the IOMMUs

3. the detection function sets x86_init.iommu.iommu_init to the
   IOMMU initialization function (so we can avoid calling the
   initialization functions of all the IOMMUs needlessly).

4. if the IOMMU initialization function doesn't need to swiotlb
   then sets swiotlb to zero (e.g. the initialization is
   sucessful).

5. if we find that swiotlb is set to zero, we free swiotlb
   resource.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-10-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 12:32:07 +01:00
FUJITA Tomonori ad32e8cb86 swiotlb: Defer swiotlb init printing, export swiotlb_print_info()
This enables us to avoid printing swiotlb memory info when we
initialize swiotlb. After swiotlb initialization, we could find
that we don't need swiotlb.

This patch removes the code to print swiotlb memory info in
swiotlb_init() and exports the function to do that.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
Cc: tony.luck@intel.com
Cc: benh@kernel.crashing.org
LKML-Reference: <1257849980-22640-9-git-send-email-fujita.tomonori@lab.ntt.co.jp>
[ -v2: merge up conflict ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 12:32:00 +01:00
FUJITA Tomonori 5740afdb68 swiotlb: Add swiotlb_free() function
swiotlb_free() function frees all allocated memory for swiotlb.

We need to initialize swiotlb before IOMMU initialization (x86
and powerpc needs to allocate memory from bootmem allocator). If
IOMMU initialization is successful, we need to free swiotlb
resource (don't want to waste 64MB).

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-8-git-send-email-fujita.tomonori@lab.ntt.co.jp>
[ -v2: build fix for the !CONFIG_SWIOTLB case ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 12:31:52 +01:00
Randy Dunlap b60e26a2f0 kmemleak: fix kconfig for crc32 build error
kmemleak uses crc32 functions so it needs to select CRC32.
Fixes build error:

kmemleak.c:(.text+0x7ce62): undefined reference to `crc32_le'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2009-11-06 23:16:04 +00:00
Linus Torvalds 411094acb7 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, fs: Fix x86 procfs stack information for threads on 64-bit
  x86: Add reboot quirk for 3 series Mac mini
  x86: Fix printk message typo in mtrr cleanup code
  dma-debug: Fix compile warning with PAE enabled
  x86/amd-iommu: Un__init function required on shutdown
  x86/amd-iommu: Workaround for erratum 63
2009-11-05 10:54:08 -08:00
Bjorn Helgaas c7dabef8a2 vsprintf: use %pR, %pr instead of %pRt, %pRf
Jesse accidentally applied v1 [1] of the patchset instead of v2 [2].  This
is the diff between v1 and v2.

The changes in this patch are:
    - tidied vsprintf stack buffer to shrink and compute size more
      accurately
    - use %pR for decoding and %pr for "raw" (with type and flags) instead
      of adding %pRt and %pRf

[1] http://lkml.org/lkml/2009/10/6/491
[2] http://lkml.org/lkml/2009/10/13/441

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 13:06:41 -08:00
Bjorn Helgaas fd95541e23 vsprintf: add %pRt, %pRf to print struct resource details
This adds support for printing struct resource type and flag information.
For example, "%pRt" looks like "[mem 0x80080000000-0x8008001ffff 64bit pref]",
and "%pRf" looks like "[mem 0xff5e2000-0xff5e2007 pref flags 0x1]".

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 08:47:17 -08:00
Bjorn Helgaas c91d3376e5 vsprintf: add %pR support for IRQ and DMA resources
Print addresses (IO port numbers and memory addresses) in hex, but print
others (IRQs and DMA channels) in decimal.  Only print the end if it's
different from the start.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 08:47:16 -08:00
Bjorn Helgaas 2840537228 vsprintf: fix io/mem resource width
The leading "0x" consumes field width, so leave space for it in addition to
the 4 or 8 hex digits.  This means we'll print "0x0000-0x01df" rather than
"0x00-0x1df", for example.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 08:47:15 -08:00
Jens Axboe 2058297d2d Merge branch 'for-linus' into for-2.6.33
Conflicts:
	block/cfq-iosched.c

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-11-03 21:14:39 +01:00
Arnd Bergmann 0a5549ed16 lib/checksum: fix one more thinko
When do_csum gets unaligned data, we really need to treat
the first byte as an even byte, not an odd byte, because
we swap the two halves later.

Found by Mike's checksum-selftest module.

Reported-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2009-11-03 16:06:53 +01:00
Arnd Bergmann 20c1f641bb lib/checksum.c: make do_csum optional
Mike Frysinger suggested that do_csum should be optional
so that an architecture can use the generic checksum code
but still provide an optimized fast-path for the most
critical function.

This can mean an implementation using inline assembly,
or in case of Alpha one using 64-bit arithmetic in C.

Cc: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2009-11-03 16:06:52 +01:00
Arnd Bergmann c44ba9f668 lib/checksum.c: use 32-bit arithmetic consistently
The use of 'unsigned long' variables in the 32-bit part of do_csum()
is confusing at best, and potentially broken for long input on 64-bit
machines.

This changes the code to use 'unsigned int' instead, which makes
the code behave in the same (correct) way on both 32 and 64 bit
machines.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2009-11-03 16:06:52 +01:00
Joerg Roedel 59a40e7045 dma-debug: Fix compile warning with PAE enabled
When PAE is enabled in the kernel configuration the size of
phys_addr_t differs from the size of a void pointer. The gcc
prints a warning about that in dma-debug code.
This patch fixes the warning by converting the output to
unsigned long long instead of a pointer.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-10-29 16:25:50 +01:00
Ingo Molnar 9de09ace8d Merge branch 'tracing/urgent' into tracing/core
Merge reason: Pick up fixes and move base from -rc1 to -rc5.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-29 09:02:20 +01:00
Kumar Gala 01deab98e3 powerpc: Minor cleanup to lib/Kconfig.debug
We don't need an explicit PPC64 in the DEBUG_PREEMPT dependancies as all
PPC platforms now support TRACE_IRQFLAGS_SUPPORT.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-27 16:42:42 +11:00
Christian Borntraeger 5c82871335 ratelimit: Make suppressed output messages more useful
Today I got:

  [39648.224782] Registered led device: iwl-phy0::TX
  [40676.545099] __ratelimit: 246 callbacks suppressed
  [40676.545103] abcdef[23675]: segfault at 0 ...

as you can see the ratelimit message contains a function prefix.
Since this is always __ratelimit, this wont help much.

This patch changes __ratelimit and printk_ratelimit to print the
function name that calls ratelimit.

This will pinpoint the responsible function, as long as not several
different places call ratelimit with the same ratelimit state at
the same time. In that case we catch only one random function that
calls ratelimit after the wait period.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Young <hidave.darkstar@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <200910231458.11832.borntraeger@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-23 17:26:37 +02:00