Create a kernel/rcutree_plugin.h file that contains definitions
for preemptable RCU (or, under the #else branch of the #ifdef,
empty definitions for the classic non-preemptable semantics).
These definitions fit into plugins defined in kernel/rcutree.c
for this purpose.
This variant of preemptable RCU uses a new algorithm whose
read-side expense is roughly that of classic hierarchical RCU
under CONFIG_PREEMPT. This new algorithm's update-side expense
is similar to that of classic hierarchical RCU, and, in absence
of read-side preemption or blocking, is exactly that of classic
hierarchical RCU. Perhaps more important, this new algorithm
has a much simpler implementation, saving well over 1,000 lines
of code compared to mainline's implementation of preemptable
RCU, which will hopefully be retired in favor of this new
algorithm.
The simplifications are obtained by maintaining per-task
nesting state for running tasks, and using a simple
lock-protected algorithm to handle accounting when tasks block
within RCU read-side critical sections, making use of lessons
learned while creating numerous user-level RCU implementations
over the past 18 months.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josht@linux.vnet.ibm.com
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <12509746134003-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When 'and'ing two bitmasks (where 'andnot' is a variation on it), some
cases want to know whether the result is the empty set or not. In
particular, the TLB IPI sending code wants to do cpumask operations and
determine if there are any CPU's left in the final set.
So this just makes the bitmask (and cpumask) functions return a boolean
for whether the result has any bits set.
Cc: stable@kernel.org (2.6.30, needed by TLB shootdown fix)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
swiotlb_full() in lib/swiotlb.c throws one of two panic messages
based on whether the direction of transfer is from the device
or to the device. The logic around this is somewhat weird in
the case of bidirectional transfers. It appears to want to
throw both in succession, but since its a panic only the first
makes it.
This patch adds a third, separate error for DMA_BIDIRECTIONAL
to make things a bit clearer.
Signed-off-by: Casey Dahlin <cdahlin@redhat.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Becky Bruce <beckyb@kernel.crashing.org>
[ further fixed the error message ]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <200908202327.n7KNRuqK001504@imap1.linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
While it's debatable whether or not a NULL device argument to
the DMA API functions is valid... since it certainly isn't
valid on devices with an IOMMU... dma-debug really shouldn't be
dereferencing null pointers either.
Guard against that in err_printk and the driver_filter
functions. A Fedora rawhide user was seeing this in one of the
dvb drivers resulting in an oops on boot.
[ A patch has been sent for testing to the driver, but I feel
the dma debugging support should be fixed as well. (There's
still a pile of legacy garbage in the kernel passing null
pointers to dma_{alloc,free}_*. :( ]
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Cc: mchehab@infradead.org
Cc: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <20090820011708.GP25206@bombadil.infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/sparc/kernel/smp_64.c
arch/x86/kernel/cpu/perf_counter.c
arch/x86/kernel/setup_percpu.c
drivers/cpufreq/cpufreq_ondemand.c
mm/percpu.c
Conflicts in core and arch percpu codes are mostly from commit
ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many
num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all
the first chunk allocators into mm/percpu.c, the changes are moved
from arch code to mm/percpu.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
decompress_bunzip2 and decompress_unlzma have a nasty hack that subtracts
4 from the input length if being called in the pre-boot environment.
This is a nasty hack because it relies on the fact that flush = NULL only
when called from the pre-boot environment (i.e.
arch/x86/boot/compressed/misc.c). initramfs.c/do_mounts_rd.c pass in a
flush buffer (flush != NULL).
This hack prevents the decompressors from being used with flush = NULL by
other callers unless knowledge of the hack is propagated to them.
This patch removes the hack by making decompress (called only from the
pre-boot environment) a wrapper function that subtracts 4 from the input
length before calling the decompressor.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix and improve comments in decompress/generic.h that describe the
decompressor API. Also remove an unused definition, and rename INBUF_LEN
in lib/decompress_inflate.c to conform to bzip2/lzma naming.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sg_miter_start() is currently unaware of the direction of the copy
process (to or from the scatter list). It is important to know the
direction because the page has to be flushed in case the data written
is seen on a different mapping in user land on cache incoherent
architectures.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
Once a structure goes over PAGE_SIZE*2, we see occasional allocation
failures. Some people have chosen to switch over to things like vmalloc()
that will let them keep array-like access to such a large structures.
But, vmalloc() has plenty of downsides.
Here's an alternative. I think it's what Andrew was suggesting here:
http://lkml.org/lkml/2009/7/2/518
I call it a flexible array. It does all of its work in PAGE_SIZE bits, so
never does an order>0 allocation. The base level has
PAGE_SIZE-2*sizeof(int) bytes of storage for pointers to the second level.
So, with a 32-bit arch, you get about 4MB (4183112 bytes) of total
storage when the objects pack nicely into a page. It is half that on
64-bit because the pointers are twice the size. There's a table detailing
this in the code.
There are kerneldocs for the functions, but here's an
overview:
flex_array_alloc() - dynamically allocate a base structure
flex_array_free() - free the array and all of the
second-level pages
flex_array_free_parts() - free the second-level pages, but
not the base (for static bases)
flex_array_put() - copy into the array at the given index
flex_array_get() - copy out of the array at the given index
flex_array_prealloc() - preallocate the second-level pages
between the given indexes to
guarantee no allocs will occur at
put() time.
We could also potentially just pass the "element_size" into each of the
API functions instead of storing it internally. That would get us one
more base pointer on 32-bit.
I've been testing this by running it in userspace. The header and patch
that I've been using are here, as well as the little script I'm using to
generate the size table which goes in the kerneldocs.
http://sr71.net/~dave/linux/flexarray/
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The generic atomic64_t implementation in lib/ did not export the functions
it defined, which means that modules that use atomic64_t would not link on
platforms (such as 32-bit powerpc). For example, trying to build a kernel
with CONFIG_NET_RDS on such a platform would fail with:
ERROR: "atomic64_read" [net/rds/rds.ko] undefined!
ERROR: "atomic64_set" [net/rds/rds.ko] undefined!
Fix this by exporting the atomic64_t functions to modules. (I export the
entire API even if it's not all currently used by in-tree modules to avoid
having to continue fixing this in dribs and drabs)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This converts swiotlb to use phys_to_dma and dma_to_phys instead of
swiotlb_phys_to_bus() and swiotlb_bus_to_phys().
swiotlb_phys_to_bus() and swiotlb_bus_to_phys() are not necessary so
this patch also removes them.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
This converts swiotlb to use dma_capable() instead of
swiotlb_arch_address_needs_mapping() and is_buffer_dma_capable().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
is_current_single_threaded() can safely miss a freshly forked CLONE_VM
task, but in this case it must not miss its parent. That is why we take
mm->mmap_sem for writing to make sure a thread/task with the same ->mm
can't pass exit_mm() and disappear.
However we can avoid ->mmap_sem and rely on rcu/barriers:
- if we do not see the exiting parent on thread/process list
we see the result of list_del_rcu(), in this case we must
also see the result of list_add_rcu() which does wmb().
- if we do see the parent but its ->mm == NULL, we need rmb()
to make sure we can't miss the child.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>