Commit Graph

180179 Commits

Author SHA1 Message Date
Michael Ellerman
24551f64d4 lmb: Add lmb_free()
We can free memory allocated with lmb_alloc() by removing it from the
list of reserved LMBs. Rework lmb_remove() to allow that possibility
and add lmb_free() which exploits it.

BenH: Removed some useless parenthesis

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-03 17:39:50 +11:00
Anton Blanchard
859aefc5af powerpc: Increase NR_IRQS Kconfig maximum to 32768
With dynamic irq descriptors the overhead of a large NR_IRQS is much lower
than it used to be. With more MSI-X capable adapters and drivers exploiting
multiple vectors we may as well allow the user to increase it beyond the
current maximum of 512.

32768 seems large enough that we'd never have to bump it again (although I bet
my prediction is horribly wrong). It boot tests OK and the vmlinux footprint
increase is only around 500kB due to:

struct irq_map_entry irq_map[NR_IRQS];

We format /proc/interrupts correctly with the previous changes:

             CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
  286:          0          0          0          0          0          0
  516:          0          0          0          0          0          0
16689:       1833          0          0          0          0          0
17157:          0          0          0          0          0          0
17158:        319          0          0          0          0          0
25092:          0          0          0          0          0          0

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-03 17:39:50 +11:00
Jean Delvare
98ceb75c7c macintosh/hwmon/ams: Fix device removal sequence
Some code that is in ams_exit() (the module exit code) should instead
be called when the device (not module) is removed. It probably doesn't
make much of a difference in the PMU case, but in the I2C case it does
matter.

I make no guarantee that my fix isn't racy, I'm not familiar enough
with the ams driver code to tell for sure.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Stelian Pop <stelian@popies.net>
Cc: Michael Hanselmann <linux-kernel@hansmi.ch>
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-03 17:39:49 +11:00
Jean Delvare
33a470f6d5 macintosh/therm_adt746x: Fix sysfs attributes lifetime
Looking at drivers/macintosh/therm_adt746x.c, the sysfs files are
created in thermostat_init() and removed in thermostat_exit(), which
are the driver's init and exit functions. These files are backed-up by
a per-device structure, so it looks like the wrong thing to do: the
sysfs files have a lifetime longer than the data structure that is
backing it up.

I think that sysfs files creation should be moved to the end of
probe_thermostat() and sysfs files removal should be moved to the
beginning of remove_thermostat().

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Colin Leroy <colin@colino.net>
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-03 17:39:49 +11:00
Amit Shah
119ea10947 hvc_console: Remove __devinit annotation from hvc_alloc
Virtio consoles can be hotplugged, so hvc_alloc gets called from
multiple sites: from the initial probe() routine as well as later on
from workqueue handlers which aren't __devinit code.

So, drop the __devinit annotation for hvc_alloc.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-03 17:39:49 +11:00
Rusty Russell
b511306858 hvc_console: Make the ops pointer const.
This is nicer for modern R/O protection.  And noone needs it non-const, so
constify the callers as well.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
To: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-03 17:39:49 +11:00
Peter Tyser
7b62922a07 powerpc/85xx: Fix SMP when "cpu-release-addr" is in lowmem
Recent U-Boot commit 5ccd29c3679b3669b0bde5c501c1aa0f325a7acb caused
the "cpu-release-addr" device tree property to contain the physical RAM
location that secondary cores were spinning at.  Previously, the
"cpu-release-addr" property contained a value referencing the boot page
translation address range of 0xfffffxxx, which then indirectly accessed
RAM.

The "cpu-release-addr" is currently ioremapped and the secondary cores
kicked.  However, due to the recent change in "cpu-release-addr", it
sometimes points to a memory location in low memory that cannot be
ioremapped.  For example on a P2020-based board with 512MB of RAM the
following error occurs on bootup:

  <...>
  mpic: requesting IPIs ...
  __ioremap(): phys addr 0x1ffff000 is RAM lr c05df9a0
  Unable to handle kernel paging request for data at address 0x00000014
  Faulting instruction address: 0xc05df9b0
  Oops: Kernel access of bad area, sig: 11 [#1]
  SMP NR_CPUS=2 P2020 RDB
  Modules linked in:
  <... eventual kernel panic>

Adding logic to conditionally ioremap or access memory directly resolves
the issue.

Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
Signed-off-by: Nate Case <ncase@xes-inc.com>
Reported-by: Dipen Dudhat <B09055@freescale.com>
Tested-by: Dipen Dudhat <B09055@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-03 17:39:49 +11:00
Anton Blanchard
5be3492f97 powerpc: Mark some variables in the page fault path __read_mostly
Using perf to trace L1 dcache misses and dumping data addresses I found a few
variables taking a lot of misses. Since they are almost never written, they
should go into the __read_mostly section.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-03 17:39:48 +11:00
Anton Blanchard
61c03ddbdf powerpc: Replace per_cpu(, smp_processor_id()) with __get_cpu_var()
The cputime code has a few places that do per_cpu(, smp_processor_id()).
Replace them with __get_cpu_var().

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-03 17:39:48 +11:00
Robert P. J. Day
4ba525d134 powerpc: Simplify param.h by including <asm-generic/param.h>
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-03 17:39:48 +11:00
Joe Perches
dc942cee2f powerpc/viodasd: Remove VIOD_KERN_<level> macros for printks
Use #define pr_fmt(fmt) "viod: " fmt
Remove #define VIOD_KERN_WARNING and VIOD_KERN_INFO
Convert printk(VIOD_KERN_<level> to pr_<level>
Coalesce long format strings

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>

 drivers/block/viodasd.c |   86 +++++++++++++++++++---------------------------
 1 files changed, 36 insertions(+), 50 deletions(-)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-03 17:39:48 +11:00
Linus Torvalds
1a45dcfe25 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  cfq-iosched: Do not idle on async queues
  blk-cgroup: Fix potential deadlock in blk-cgroup
  block: fix bugs in bio-integrity mempool usage
  block: fix bio_add_page for non trivial merge_bvec_fn case
  drbd: null dereference bug
  drbd: fix max_segment_size initialization
2010-02-02 12:54:37 -08:00
Nick Piggin
02b709df81 mm: purge fragmented percpu vmap blocks
Improve handling of fragmented per-CPU vmaps.  We previously don't free
up per-CPU maps until all its addresses have been used and freed.  So
fragmented blocks could fill up vmalloc space even if they actually had
no active vmap regions within them.

Add some logic to allow all CPUs to have these blocks purged in the case
of failure to allocate a new vm area, and also put some logic to trim
such blocks of a current CPU if we hit them in the allocation path (so
as to avoid a large build up of them).

Christoph reported some vmap allocation failures when using the per CPU
vmap APIs in XFS, which cannot be reproduced after this patch and the
previous bug fix.

Cc: linux-mm@kvack.org
Cc: stable@kernel.org
Tested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
--
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-02 12:50:47 -08:00
Nick Piggin
de5604231c mm: percpu-vmap fix RCU list walking
RCU list walking of the per-cpu vmap cache was broken.  It did not use
RCU primitives, and also the union of free_list and rcu_head is
obviously wrong (because free_list is indeed the list we are RCU
walking).

While we are there, remove a couple of unused fields from an earlier
iteration.

These APIs aren't actually used anywhere, because of problems with the
XFS conversion.  Christoph has now verified that the problems are solved
with these patches.  Also it is an exported interface, so I think it
will be good to be merged now (and Christoph wants to get the XFS
changes into their local tree).

Cc: stable@kernel.org
Cc: linux-mm@kvack.org
Tested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
--
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-02 12:50:47 -08:00
Linus Torvalds
489b24f2cb Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  random: Remove unused inode variable
  crypto: padlock-sha - Add import/export support
  random: drop weird m_time/a_time manipulation
2010-02-02 12:48:49 -08:00
Linus Torvalds
4dab75ec3e Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
  GFS2: Use GFP_NOFS for alloc structure
  GFS2: Fix previous patch
  GFS2: Don't withdraw on partial rindex entries
  GFS2: Fix refcnt leak on gfs2_follow_link() error path
2010-02-02 12:48:26 -08:00
Linus Torvalds
7fbcca25c0 Merge branch 'sh/for-2.6.33' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
* 'sh/for-2.6.33' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  sh: Fix access to released memory in clk_debugfs_register_one()
  sh: Fix access to released memory in dwarf_unwinder_cleanup()
  usb: r8a66597-hdc disable interrupts fix
  spi: spi_sh_msiof: Fixed data sampling on the correct edge
2010-02-02 12:47:51 -08:00
Linus Torvalds
e770a0f115 Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  MIPS: 64-bit: Detect virtual memory size
  MIPS: AR7: Fix USB slave mem range typo
  MIPS: Alchemy: Fix dbdma ring destruction memory debugcheck.
2010-02-02 12:45:33 -08:00
Linus Torvalds
7ab02af428 Fix 'flush_old_exec()/setup_new_exec()' split
Commit 221af7f87b ("Split 'flush_old_exec' into two functions") split
the function at the point of no return - ie right where there were no
more error cases to check.  That made sense from a technical standpoint,
but when we then also combined it with the actual personality setting
going in between flush_old_exec() and setup_new_exec(), it needs to be a
bit more careful.

In particular, we need to make sure that we really flush the old
personality bits in the 'flush' stage, rather than later in the 'setup'
stage, since otherwise we might be flushing the _new_ personality state
that we're just setting up.

So this moves the flags and personality flushing (and 'flush_thread()',
which is the arch-specific function that generally resets lazy FP state
etc) of the old process into flush_old_exec(), so that it doesn't affect
any state that execve() is setting up for the new process environment.

This was reported by Michal Simek as breaking his Microblaze qemu
environment.

Reported-and-tested-by: Michal Simek <michal.simek@petalogix.com>
Cc: Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-02 12:37:44 -08:00
Vivek Goyal
1efe8fe1c2 cfq-iosched: Do not idle on async queues
Few weeks back, Shaohua Li had posted similar patch. I am reposting it
with more test results.

This patch does two things.

- Do not idle on async queues.

- It also changes the write queue depth CFQ drives (cfq_may_dispatch()).
  Currently, we seem to driving queue depth of 1 always for WRITES. This is
  true even if there is only one write queue in the system and all the logic
  of infinite queue depth in case of single busy queue as well as slowly
  increasing queue depth based on last delayed sync request does not seem to
  be kicking in at all.

This patch will allow deeper WRITE queue depths (subjected to the other
WRITE queue depth contstraints like cfq_quantum and last delayed sync
request).

Shaohua Li had reported getting more out of his SSD. For me, I have got
one Lun exported from an HP EVA and when pure buffered writes are on, I
can get more out of the system. Following are test results of pure
buffered writes (with end_fsync=1) with vanilla and patched kernel. These
results are average of 3 sets of run with increasing number of threads.

AVERAGE[bufwfs][vanilla]
-------
job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
---       --- --  ------------   -----------    -------------  -----------
bufwfs    3   1   0              0              95349          474141
bufwfs    3   2   0              0              100282         806926
bufwfs    3   4   0              0              109989         2.7301e+06
bufwfs    3   8   0              0              116642         3762231
bufwfs    3   16  0              0              118230         6902970

AVERAGE[bufwfs] [patched kernel]
-------
bufwfs    3   1   0              0              270722         404352
bufwfs    3   2   0              0              206770         1.06552e+06
bufwfs    3   4   0              0              195277         1.62283e+06
bufwfs    3   8   0              0              260960         2.62979e+06
bufwfs    3   16  0              0              299260         1.70731e+06

I also ran buffered writes along with some sequential reads and some
buffered reads going on in the system on a SATA disk because the potential
risk could be that we should not be driving queue depth higher in presence
of sync IO going to keep the max clat low.

With some random and sequential reads going on in the system on one SATA
disk I did not see any significant increase in max clat. So it looks like
other WRITE queue depth control logic is doing its job. Here are the
results.

AVERAGE[brr, bsr, bufw together] [vanilla]
-------
job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
---       --- --  ------------   -----------    -------------  -----------
brr       3   1   850            546345         0              0
bsr       3   1   14650          729543         0              0
bufw      3   1   0              0              23908          8274517

brr       3   2   981.333        579395         0              0
bsr       3   2   14149.7        1175689        0              0
bufw      3   2   0              0              21921          1.28108e+07

brr       3   4   898.333        1.75527e+06    0              0
bsr       3   4   12230.7        1.40072e+06    0              0
bufw      3   4   0              0              19722.3        2.4901e+07

brr       3   8   900            3160594        0              0
bsr       3   8   9282.33        1.91314e+06    0              0
bufw      3   8   0              0              18789.3        23890622

AVERAGE[brr, bsr, bufw mixed] [patched kernel]
-------
job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
---       --- --  ------------   -----------    -------------  -----------
brr       3   1   837            417973         0              0
bsr       3   1   14357.7        591275         0              0
bufw      3   1   0              0              24869.7        8910662

brr       3   2   1038.33        543434         0              0
bsr       3   2   13351.3        1205858        0              0
bufw      3   2   0              0              18626.3        13280370

brr       3   4   913            1.86861e+06    0              0
bsr       3   4   12652.3        1430974        0              0
bufw      3   4   0              0              15343.3        2.81305e+07

brr       3   8   890            2.92695e+06    0              0
bsr       3   8   9635.33        1.90244e+06    0              0
bufw      3   8   0              0              17200.3        24424392

So looks like it might make sense to include this patch.

Thanks
Vivek

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-02-02 20:46:10 +01:00
Guenter Roeck
91dfc423cc MIPS: 64-bit: Detect virtual memory size
Linux kernel 2.6.32 and later allocate address space from the top of the
kernel virtual memory address space.

This patch implements virtual memory size detection for 64 bit MIPS CPUs
to avoid resulting crashes.

Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: linux-mips@linux-mips.org
Patchwork: http://patchwork.linux-mips.org/patch/935/
Reviewed-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-02-02 19:56:23 +01:00
Marek Skuczynski
bc10e875d4 sh: Fix access to released memory in clk_debugfs_register_one()
Signed-off-by: Marek Skuczynski <mareksk7@gmail.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-02-02 11:32:23 +09:00
Marek Skuczynski
00b3e0a2e0 sh: Fix access to released memory in dwarf_unwinder_cleanup()
Signed-off-by: Marek Skuczynski <mareksk7@gmail.com>
Acked-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-02-02 11:32:22 +09:00
Magnus Damm
e5ff15bec9 usb: r8a66597-hdc disable interrupts fix
This patch improves disable_controller() in the r8a66597-hdc
driver to disable all interrupts and clear status flags. It
also makes sure that disable_controller() is called during
probe(). This fixes the relatively rare case of unexpected
pending interrupts after kexec reboot.

Signed-off-by: Magnus Damm <damm@opensource.se>
Acked-by: Yoshihiro Shimoda <shimoda.yoshihiro@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-02-02 11:30:45 +09:00
Markus Pietrek
e8708ef7e8 spi: spi_sh_msiof: Fixed data sampling on the correct edge
The spi_sh_msiof.c driver presently misconfigures REDG and TEDG. TEDG==0
outputs data at the **rising edge** of the clock and REDG==0 samples data
at the **falling edge** of the clock. Therefore for SPI, TEDG must be
equal to REDG, otherwise the last byte received is not sampled in SPI
mode 3.

This brings the driver in line with the SH7723 HW Reference Manual
settings documented in Figures 20.20 and 20.21 ("SPI Clock and data
timing").

Signed-off-by: Markus Pietrek <Markus.Pietrek@emtrion.de>
Acked-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-02-02 11:29:15 +09:00