Commit Graph

554 Commits

Author SHA1 Message Date
Nick Piggin
e2848a0efe radix-tree: avoid atomic allocations for preloaded insertions
Most pagecache (and some other) radix tree insertions have the great
opportunity to preallocate a few nodes with relaxed gfp flags.  But the
preallocation is squandered when it comes time to allocate a node, we
default to first attempting a GFP_ATOMIC allocation -- that doesn't
normally fail, but it can eat into atomic memory reserves that we don't
need to be using.

Another upshot of this is that it removes the sometimes highly contended
zone->lock from underneath tree_lock.  Pagecache insertions are always
performed with a radix tree preload, and after this change, such a
situation will never fall back to kmem_cache_alloc within
radix_tree_node_alloc.

David Miller reports seeing this allocation fail on a highly threaded
sparc64 system:

[527319.459981] dd: page allocation failure. order:0, mode:0x20
[527319.460403] Call Trace:
[527319.460568]  [00000000004b71e0] __slab_alloc+0x1b0/0x6a8
[527319.460636]  [00000000004b7bbc] kmem_cache_alloc+0x4c/0xa8
[527319.460698]  [000000000055309c] radix_tree_node_alloc+0x20/0x90
[527319.460763]  [0000000000553238] radix_tree_insert+0x12c/0x260
[527319.460830]  [0000000000495cd0] add_to_page_cache+0x38/0xb0
[527319.460893]  [00000000004e4794] mpage_readpages+0x6c/0x134
[527319.460955]  [000000000049c7fc] __do_page_cache_readahead+0x170/0x280
[527319.461028]  [000000000049cc88] ondemand_readahead+0x208/0x214
[527319.461094]  [0000000000496018] do_generic_mapping_read+0xe8/0x428
[527319.461152]  [0000000000497948] generic_file_aio_read+0x108/0x170
[527319.461217]  [00000000004badac] do_sync_read+0x88/0xd0
[527319.461292]  [00000000004bb5cc] vfs_read+0x78/0x10c
[527319.461361]  [00000000004bb920] sys_read+0x34/0x60
[527319.461424]  [0000000000406294] linux_sparc_syscall32+0x3c/0x40

The calltrace is significant: __do_page_cache_readahead allocates a number
of pages with GFP_KERNEL, and hence it should have reclaimed sufficient
memory to satisfy GFP_ATOMIC allocations.  However after the list of pages
goes to mpage_readpages, there can be significant intervals (including disk
IO) before all the pages are inserted into the radix-tree.  So the reserves
can easily be depleted at that point.  The patch is confirmed to fix the
problem.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:17 -08:00
FUJITA Tomonori
681cc5cd3e iommu sg merging: swiotlb: respect the segment boundary limits
This patch makes swiotlb not allocate a memory area spanning LLD's segment
boundary.

is_span_boundary() judges whether a memory area spans LLD's segment boundary.
If map_single finds such a area, map_single tries to find the next available
memory area.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:12 -08:00
FUJITA Tomonori
0291df8cc9 iommu sg: add IOMMU helper functions for the free area management
This adds IOMMU helper functions for the free area management.  These
functions take care of LLD's segment boundary limit for IOMMUs.  They would be
useful for IOMMUs that use bitmap for the free area management.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:11 -08:00
Linus Torvalds
f5bb3a5e9d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (79 commits)
  Jesper Juhl is the new trivial patches maintainer
  Documentation: mention email-clients.txt in SubmittingPatches
  fs/binfmt_elf.c: spello fix
  do_invalidatepage() comment typo fix
  Documentation/filesystems/porting fixes
  typo fixes in net/core/net_namespace.c
  typo fix in net/rfkill/rfkill.c
  typo fixes in net/sctp/sm_statefuns.c
  lib/: Spelling fixes
  kernel/: Spelling fixes
  include/scsi/: Spelling fixes
  include/linux/: Spelling fixes
  include/asm-m68knommu/: Spelling fixes
  include/asm-frv/: Spelling fixes
  fs/: Spelling fixes
  drivers/watchdog/: Spelling fixes
  drivers/video/: Spelling fixes
  drivers/ssb/: Spelling fixes
  drivers/serial/: Spelling fixes
  drivers/scsi/: Spelling fixes
  ...
2008-02-04 07:58:52 -08:00
Linus Torvalds
519cb68807 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild:
  scsi: fix dependency bug in aic7 Makefile
  kbuild: add svn revision information to setlocalversion
  kbuild: do not warn about __*init/__*exit symbols being exported
  Move Kconfig.instrumentation to arch/Kconfig and init/Kconfig
  Add HAVE_KPROBES
  Add HAVE_OPROFILE
  Create arch/Kconfig
  Fix ARM to play nicely with generic Instrumentation menu
  kconfig: ignore select of unknown symbol
  kconfig: mark config as changed when loading an alternate config
  kbuild: Spelling/grammar fixes for config DEBUG_SECTION_MISMATCH
  Remove __INIT_REFOK and __INITDATA_REFOK
  kbuild: print only total number of section mismatces found
2008-02-04 07:56:17 -08:00
Joe Perches
643d1f7fe3 lib/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:48:52 +02:00
Geert Uytterhoeven
d6fbfa4fce kbuild: Spelling/grammar fixes for config DEBUG_SECTION_MISMATCH
Including additional fixes from Randy Dunlap <randy.dunlap@oracle.com>

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-02-03 08:58:07 +01:00
Sam Ravnborg
e5f95c8b77 kbuild: print only total number of section mismatces found
We have too many section mismatches detected at the moment.
So silence modpost and prevent the option from being
set in a typical allyesconfig build.

Tell the user how to see all the deteils in the summary
message from modpost.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-02-03 08:58:07 +01:00
Dave Young
f70701a34e kobject: kerneldoc comment fix
Fix kerneldoc comment of kobject_create.

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-02 15:14:48 -08:00
Heiko Carstens
aa7d93506c latencytop: Change Kconfig dependency.
Change latencytop Kconfig entry so it doesn't list the archictectures
that support it. Instead introduce HAVE_LATENCY_SUPPORT which any
architecture can set. Should reduce patch conflicts.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Holger Wolf <wolf@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-01 17:45:14 +01:00
Bernhard Kaindl
f212ec4b7b x86: early boot debugging via FireWire (ohci1394_dma=early)
This patch adds a new configuration option, which adds support for a new
early_param which gets checked in arch/x86/kernel/setup_{32,64}.c:setup_arch()
to decide wether OHCI-1394 FireWire controllers should be initialized and
enabled for physical DMA access to allow remote debugging of early problems
like issues ACPI or other subsystems which are executed very early.

If the config option is not enabled, no code is changed, and if the boot
paramenter is not given, no new code is executed, and independent of that,
all new code is freed after boot, so the config option can be even enabled
in standard, non-debug kernels.

With specialized tools, it is then possible to get debugging information
from machines which have no serial ports (notebooks) such as the printk
buffer contents, or any data which can be referenced from global pointers,
if it is stored below the 4GB limit and even memory dumps of of the physical
RAM region below the 4GB limit can be taken without any cooperation from the
CPU of the host, so the machine can be crashed early, it does not matter.

In the extreme, even kernel debuggers can be accessed in this way. I wrote
a small kgdb module and an accompanying gdb stub for FireWire which allows
to gdb to talk to kgdb using remote remory reads and writes over FireWire.

An version of the gdb stub fore FireWire is able to read all global data
from a system which is running a a normal kernel without any kernel debugger,
without any interruption or support of the system's CPU. That way, e.g. the
task struct and so on can be read and even manipulated when the physical DMA
access is granted.

A HOWTO is included in this patch, in Documentation/debugging-via-ohci1394.txt
and I've put a copy online at
ftp://ftp.suse.de/private/bk/firewire/docs/debugging-via-ohci1394.txt

It also has links to all the tools which are available to make use of it
another copy of it is online at:
ftp://ftp.suse.de/private/bk/firewire/kernel/ohci1394_dma_early-v2.diff

Signed-Off-By: Bernhard Kaindl <bk@suse.de>
Tested-By: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:11 +01:00
Arjan van de Ven
6dab27784b x86: add a simple backtrace test module
During the work on the x86 32 and 64 bit backtrace code I found it useful
to have a simple test module to test a process and irq context backtrace.
Since the existing backtrace code was buggy, I figure it might be useful
to have such a test module in the kernel so that maybe we can even
detect such bugs earlier..

[ mingo@elte.hu: build fix ]

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:08 +01:00
Ingo Molnar
d50efc6c40 x86: fix UML and -regparm=3
introduce the "asmregparm" calling convention: for functions
implemented in assembly with a fixed regparm input parameters
calling convention.

mark the semaphore and rwsem slowpath functions with that.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:00 +01:00
Ananth N Mavinakayanahalli
8c1c935642 x86: kprobes: add kprobes smoke tests that run on boot
Here is a quick and naive smoke test for kprobes. This is intended to
just verify if some unrelated change broke the *probes subsystem. It is
self contained, architecture agnostic and isn't of any great use by itself.

This needs to be built in the kernel and runs a basic set of tests to
verify if kprobes, jprobes and kretprobes run fine on the kernel. In case
of an error, it'll print out a message with a "BUG" prefix.

This is a start; we intend to add more tests to this bucket over time.

Thanks to Jim Keniston and Masami Hiramatsu for comments and suggestions.

Tested on x86 (32/64) and powerpc.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30 13:32:53 +01:00
Linus Torvalds
0ba6c33bcd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25: (1470 commits)
  [IPV6] ADDRLABEL: Fix double free on label deletion.
  [PPP]: Sparse warning fixes.
  [IPV4] fib_trie: remove unneeded NULL check
  [IPV4] fib_trie: More whitespace cleanup.
  [NET_SCHED]: Use nla_policy for attribute validation in ematches
  [NET_SCHED]: Use nla_policy for attribute validation in actions
  [NET_SCHED]: Use nla_policy for attribute validation in classifiers
  [NET_SCHED]: Use nla_policy for attribute validation in packet schedulers
  [NET_SCHED]: sch_api: introduce constant for rate table size
  [NET_SCHED]: Use typeful attribute parsing helpers
  [NET_SCHED]: Use typeful attribute construction helpers
  [NET_SCHED]: Use NLA_PUT_STRING for string dumping
  [NET_SCHED]: Use nla_nest_start/nla_nest_end
  [NET_SCHED]: Propagate nla_parse return value
  [NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get
  [NET_SCHED]: act_api: use nlmsg_parse
  [NET_SCHED]: act_api: fix netlink API conversion bug
  [NET_SCHED]: sch_netem: use nla_parse_nested_compat
  [NET_SCHED]: sch_atm: fix format string warning
  [NETNS]: Add namespace for ICMP replying code.
  ...
2008-01-29 22:54:01 +11:00
Linus Torvalds
5ea293a904 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (79 commits)
  Remove references to "make dep"
  kconfig: document use of HAVE_*
  Introduce new section reference annotations tags: __ref, __refdata, __refconst
  kbuild: warn about ld added unique sections
  kbuild: add verbose option to Section mismatch reporting in modpost
  kconfig: tristate choices with mixed tristate and boolean values
  asm-generic/vmlix.lds.h: simplify __mem{init,exit}* dependencies
  remove __attribute_used__
  kbuild: support ARCH=x86 in buildtar
  kconfig: remove "enable"
  kbuild: simplified warning report in modpost
  kbuild: introduce a few helpers in modpost
  kbuild: use simpler section mismatch warnings in modpost
  kbuild: link vmlinux.o before kallsyms passes
  kbuild: introduce new option to enhance section mismatch analysis
  Use separate sections for __dev/__cpu/__mem code/data
  compiler.h: introduce __section()
  all archs: consolidate init and exit sections in vmlinux.lds.h
  kbuild: check section names consistently in modpost
  kbuild: introduce blacklisting in modpost
  ...
2008-01-29 22:46:14 +11:00
Aneesh Kumar K.V
aa02ad67d9 ext4: Add ext4_find_next_bit()
This function is used by the ext4 multi block allocator patches.

Also add generic_find_next_le_bit

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2008-01-28 23:58:27 -05:00
Eric Dumazet
571e768202 [LIB] pcounter : unline too big functions
Before pushing pcounter to Linus tree, I would like to make some adjustments.

Goal is to reduce kernel text size, by unlining too big functions.

When a pcounter is bound to a statically defined per_cpu variable,
we define two small helpers functions. (No more folding function
using the fat for_each_possible_cpu(cpu) ... )

static DEFINE_PER_CPU(int, NAME##_pcounter_values);
static void NAME##_pcounter_add(struct pcounter *self, int val)
{
       __get_cpu_var(NAME##_pcounter_values) += val;
}
static int NAME##_pcounter_getval(const struct pcounter *self, int cpu)
{
       return per_cpu(NAME##_pcounter_values, cpu);
}

Fast path is therefore unchanged, while folding/alloc/free is now unlined.

This saves 228 bytes on i386

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:35 -08:00
Arnaldo Carvalho de Melo
de4d1db369 [LIB]: Introduce struct pcounter
This just generalises what was introduced by Eric Dumazet for the struct proto
inuse field in 286ab3d460:

    [NET]: Define infrastructure to keep 'inuse' changes in an efficent SMP/NUMA way.

Please look at the comment in there to see the rationale.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:39 -08:00
Sam Ravnborg
588ccd732b kbuild: add verbose option to Section mismatch reporting in modpost
If the config option CONFIG_SECTION_MISMATCH is not set and
we see a Section mismatch present the following to the user:

modpost: Found 1 section mismatch(es).
To see additional details select "Enable full Section mismatch analysis"
in the Kernel Hacking menu (CONFIG_SECTION_MISMATCH).

If the option CONFIG_SECTION_MISMATCH is selected
then be verbose in the Section mismatch reporting from mdopost.
Sample outputs:

WARNING: o-x86_64/vmlinux.o(.text+0x7396): Section mismatch in reference from the function discover_ebda() to the variable .init.data:ebda_addr
The function  discover_ebda() references
the variable __initdata ebda_addr.
This is often because discover_ebda lacks a __initdata
annotation or the annotation of ebda_addr is wrong.

WARNING: o-x86_64/vmlinux.o(.data+0x74d58): Section mismatch in reference from the variable pci_serial_quirks to the function .devexit.text:pci_plx9050_exit()
The variable pci_serial_quirks references
the function __devexit pci_plx9050_exit()
If the reference is valid then annotate the
variable with __exit* (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,

WARNING: o-x86_64/vmlinux.o(__ksymtab+0x630): Section mismatch in reference from the variable __ksymtab_arch_register_cpu to the function .cpuinit.text:arch_register_cpu()
The symbol arch_register_cpu is exported and annotated __cpuinit
Fix this by removing the __cpuinit annotation of arch_register_cpu or drop the export.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-01-28 23:21:18 +01:00
Sam Ravnborg
91341d4b2c kbuild: introduce new option to enhance section mismatch analysis
Setting the option DEBUG_SECTION_MISMATCH will
report additional section mismatch'es but this
should in the end makes it possible to get rid of
all of them.

See help text in lib/Kconfig.debug for details.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-01-28 23:21:18 +01:00
James Bottomley
7cedb1f17f SG: work with the SCSI fixed maximum allocations.
SCSI sg table allocation has a maximum size (of SCSI_MAX_SG_SEGMENTS,
currently 128) and this will cause a BUG_ON() in SCSI if something
tries an allocation over it.  This patch adds a size limit to the
chaining allocator to allow the specification of the maximum
allocation size for chaining, so we always chain in units of the
maximum SCSI allocation size.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:54:49 +01:00
Jens Axboe
0db9299f48 SG: Move functions to lib/scatterlist.c and add sg chaining allocator helpers
Manually doing chained sg lists is not trivial, so add some helpers
to make sure that drivers get it right.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:05:27 +01:00
Arjan van de Ven
9745512ce7 sched: latencytop support
LatencyTOP kernel infrastructure; it measures latencies in the
scheduler and tracks it system wide and per process.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:34 +01:00
Ingo Molnar
6478d8800b sched: remove the !PREEMPT_BKL code
remove the !PREEMPT_BKL code.

this removes 160 lines of legacy code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:33 +01:00