We can free memory allocated with lmb_alloc() by removing it from the
list of reserved LMBs. Rework lmb_remove() to allow that possibility
and add lmb_free() which exploits it.
BenH: Removed some useless parenthesis
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
With dynamic irq descriptors the overhead of a large NR_IRQS is much lower
than it used to be. With more MSI-X capable adapters and drivers exploiting
multiple vectors we may as well allow the user to increase it beyond the
current maximum of 512.
32768 seems large enough that we'd never have to bump it again (although I bet
my prediction is horribly wrong). It boot tests OK and the vmlinux footprint
increase is only around 500kB due to:
struct irq_map_entry irq_map[NR_IRQS];
We format /proc/interrupts correctly with the previous changes:
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
286: 0 0 0 0 0 0
516: 0 0 0 0 0 0
16689: 1833 0 0 0 0 0
17157: 0 0 0 0 0 0
17158: 319 0 0 0 0 0
25092: 0 0 0 0 0 0
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some code that is in ams_exit() (the module exit code) should instead
be called when the device (not module) is removed. It probably doesn't
make much of a difference in the PMU case, but in the I2C case it does
matter.
I make no guarantee that my fix isn't racy, I'm not familiar enough
with the ams driver code to tell for sure.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Stelian Pop <stelian@popies.net>
Cc: Michael Hanselmann <linux-kernel@hansmi.ch>
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Looking at drivers/macintosh/therm_adt746x.c, the sysfs files are
created in thermostat_init() and removed in thermostat_exit(), which
are the driver's init and exit functions. These files are backed-up by
a per-device structure, so it looks like the wrong thing to do: the
sysfs files have a lifetime longer than the data structure that is
backing it up.
I think that sysfs files creation should be moved to the end of
probe_thermostat() and sysfs files removal should be moved to the
beginning of remove_thermostat().
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Colin Leroy <colin@colino.net>
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Virtio consoles can be hotplugged, so hvc_alloc gets called from
multiple sites: from the initial probe() routine as well as later on
from workqueue handlers which aren't __devinit code.
So, drop the __devinit annotation for hvc_alloc.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Recent U-Boot commit 5ccd29c3679b3669b0bde5c501c1aa0f325a7acb caused
the "cpu-release-addr" device tree property to contain the physical RAM
location that secondary cores were spinning at. Previously, the
"cpu-release-addr" property contained a value referencing the boot page
translation address range of 0xfffffxxx, which then indirectly accessed
RAM.
The "cpu-release-addr" is currently ioremapped and the secondary cores
kicked. However, due to the recent change in "cpu-release-addr", it
sometimes points to a memory location in low memory that cannot be
ioremapped. For example on a P2020-based board with 512MB of RAM the
following error occurs on bootup:
<...>
mpic: requesting IPIs ...
__ioremap(): phys addr 0x1ffff000 is RAM lr c05df9a0
Unable to handle kernel paging request for data at address 0x00000014
Faulting instruction address: 0xc05df9b0
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2 P2020 RDB
Modules linked in:
<... eventual kernel panic>
Adding logic to conditionally ioremap or access memory directly resolves
the issue.
Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
Signed-off-by: Nate Case <ncase@xes-inc.com>
Reported-by: Dipen Dudhat <B09055@freescale.com>
Tested-by: Dipen Dudhat <B09055@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Using perf to trace L1 dcache misses and dumping data addresses I found a few
variables taking a lot of misses. Since they are almost never written, they
should go into the __read_mostly section.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The cputime code has a few places that do per_cpu(, smp_processor_id()).
Replace them with __get_cpu_var().
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
cfq-iosched: Do not idle on async queues
blk-cgroup: Fix potential deadlock in blk-cgroup
block: fix bugs in bio-integrity mempool usage
block: fix bio_add_page for non trivial merge_bvec_fn case
drbd: null dereference bug
drbd: fix max_segment_size initialization
Improve handling of fragmented per-CPU vmaps. We previously don't free
up per-CPU maps until all its addresses have been used and freed. So
fragmented blocks could fill up vmalloc space even if they actually had
no active vmap regions within them.
Add some logic to allow all CPUs to have these blocks purged in the case
of failure to allocate a new vm area, and also put some logic to trim
such blocks of a current CPU if we hit them in the allocation path (so
as to avoid a large build up of them).
Christoph reported some vmap allocation failures when using the per CPU
vmap APIs in XFS, which cannot be reproduced after this patch and the
previous bug fix.
Cc: linux-mm@kvack.org
Cc: stable@kernel.org
Tested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
--
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
RCU list walking of the per-cpu vmap cache was broken. It did not use
RCU primitives, and also the union of free_list and rcu_head is
obviously wrong (because free_list is indeed the list we are RCU
walking).
While we are there, remove a couple of unused fields from an earlier
iteration.
These APIs aren't actually used anywhere, because of problems with the
XFS conversion. Christoph has now verified that the problems are solved
with these patches. Also it is an exported interface, so I think it
will be good to be merged now (and Christoph wants to get the XFS
changes into their local tree).
Cc: stable@kernel.org
Cc: linux-mm@kvack.org
Tested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
--
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'sh/for-2.6.33' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
sh: Fix access to released memory in clk_debugfs_register_one()
sh: Fix access to released memory in dwarf_unwinder_cleanup()
usb: r8a66597-hdc disable interrupts fix
spi: spi_sh_msiof: Fixed data sampling on the correct edge
Commit 221af7f87b ("Split 'flush_old_exec' into two functions") split
the function at the point of no return - ie right where there were no
more error cases to check. That made sense from a technical standpoint,
but when we then also combined it with the actual personality setting
going in between flush_old_exec() and setup_new_exec(), it needs to be a
bit more careful.
In particular, we need to make sure that we really flush the old
personality bits in the 'flush' stage, rather than later in the 'setup'
stage, since otherwise we might be flushing the _new_ personality state
that we're just setting up.
So this moves the flags and personality flushing (and 'flush_thread()',
which is the arch-specific function that generally resets lazy FP state
etc) of the old process into flush_old_exec(), so that it doesn't affect
any state that execve() is setting up for the new process environment.
This was reported by Michal Simek as breaking his Microblaze qemu
environment.
Reported-and-tested-by: Michal Simek <michal.simek@petalogix.com>
Cc: Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Few weeks back, Shaohua Li had posted similar patch. I am reposting it
with more test results.
This patch does two things.
- Do not idle on async queues.
- It also changes the write queue depth CFQ drives (cfq_may_dispatch()).
Currently, we seem to driving queue depth of 1 always for WRITES. This is
true even if there is only one write queue in the system and all the logic
of infinite queue depth in case of single busy queue as well as slowly
increasing queue depth based on last delayed sync request does not seem to
be kicking in at all.
This patch will allow deeper WRITE queue depths (subjected to the other
WRITE queue depth contstraints like cfq_quantum and last delayed sync
request).
Shaohua Li had reported getting more out of his SSD. For me, I have got
one Lun exported from an HP EVA and when pure buffered writes are on, I
can get more out of the system. Following are test results of pure
buffered writes (with end_fsync=1) with vanilla and patched kernel. These
results are average of 3 sets of run with increasing number of threads.
AVERAGE[bufwfs][vanilla]
-------
job Set NR ReadBW(KB/s) MaxClat(us) WriteBW(KB/s) MaxClat(us)
--- --- -- ------------ ----------- ------------- -----------
bufwfs 3 1 0 0 95349 474141
bufwfs 3 2 0 0 100282 806926
bufwfs 3 4 0 0 109989 2.7301e+06
bufwfs 3 8 0 0 116642 3762231
bufwfs 3 16 0 0 118230 6902970
AVERAGE[bufwfs] [patched kernel]
-------
bufwfs 3 1 0 0 270722 404352
bufwfs 3 2 0 0 206770 1.06552e+06
bufwfs 3 4 0 0 195277 1.62283e+06
bufwfs 3 8 0 0 260960 2.62979e+06
bufwfs 3 16 0 0 299260 1.70731e+06
I also ran buffered writes along with some sequential reads and some
buffered reads going on in the system on a SATA disk because the potential
risk could be that we should not be driving queue depth higher in presence
of sync IO going to keep the max clat low.
With some random and sequential reads going on in the system on one SATA
disk I did not see any significant increase in max clat. So it looks like
other WRITE queue depth control logic is doing its job. Here are the
results.
AVERAGE[brr, bsr, bufw together] [vanilla]
-------
job Set NR ReadBW(KB/s) MaxClat(us) WriteBW(KB/s) MaxClat(us)
--- --- -- ------------ ----------- ------------- -----------
brr 3 1 850 546345 0 0
bsr 3 1 14650 729543 0 0
bufw 3 1 0 0 23908 8274517
brr 3 2 981.333 579395 0 0
bsr 3 2 14149.7 1175689 0 0
bufw 3 2 0 0 21921 1.28108e+07
brr 3 4 898.333 1.75527e+06 0 0
bsr 3 4 12230.7 1.40072e+06 0 0
bufw 3 4 0 0 19722.3 2.4901e+07
brr 3 8 900 3160594 0 0
bsr 3 8 9282.33 1.91314e+06 0 0
bufw 3 8 0 0 18789.3 23890622
AVERAGE[brr, bsr, bufw mixed] [patched kernel]
-------
job Set NR ReadBW(KB/s) MaxClat(us) WriteBW(KB/s) MaxClat(us)
--- --- -- ------------ ----------- ------------- -----------
brr 3 1 837 417973 0 0
bsr 3 1 14357.7 591275 0 0
bufw 3 1 0 0 24869.7 8910662
brr 3 2 1038.33 543434 0 0
bsr 3 2 13351.3 1205858 0 0
bufw 3 2 0 0 18626.3 13280370
brr 3 4 913 1.86861e+06 0 0
bsr 3 4 12652.3 1430974 0 0
bufw 3 4 0 0 15343.3 2.81305e+07
brr 3 8 890 2.92695e+06 0 0
bsr 3 8 9635.33 1.90244e+06 0 0
bufw 3 8 0 0 17200.3 24424392
So looks like it might make sense to include this patch.
Thanks
Vivek
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch improves disable_controller() in the r8a66597-hdc
driver to disable all interrupts and clear status flags. It
also makes sure that disable_controller() is called during
probe(). This fixes the relatively rare case of unexpected
pending interrupts after kexec reboot.
Signed-off-by: Magnus Damm <damm@opensource.se>
Acked-by: Yoshihiro Shimoda <shimoda.yoshihiro@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The spi_sh_msiof.c driver presently misconfigures REDG and TEDG. TEDG==0
outputs data at the **rising edge** of the clock and REDG==0 samples data
at the **falling edge** of the clock. Therefore for SPI, TEDG must be
equal to REDG, otherwise the last byte received is not sampled in SPI
mode 3.
This brings the driver in line with the SH7723 HW Reference Manual
settings documented in Figures 20.20 and 20.21 ("SPI Clock and data
timing").
Signed-off-by: Markus Pietrek <Markus.Pietrek@emtrion.de>
Acked-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>