Introduce __iowrite64_copy. It will be used by the Myri-10G Ethernet
driver to post requests to the NIC. This driver will be submitted soon.
__iowrite64_copy copies to I/O memory in units of 64 bits when possible (on
64 bit architectures). It reverts to __iowrite32_copy on 32 bit
architectures.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* git://git.infradead.org/~dwmw2/rbtree-2.6:
[RBTREE] Switch rb_colour() et al to en_US spelling of 'color' for consistency
Update UML kernel/physmem.c to use rb_parent() accessor macro
[RBTREE] Update hrtimers to use rb_parent() accessor macro.
[RBTREE] Add explicit alignment to sizeof(long) for struct rb_node.
[RBTREE] Merge colour and parent fields of struct rb_node.
[RBTREE] Remove dead code in rb_erase()
[RBTREE] Update JFFS2 to use rb_parent() accessor macro.
[RBTREE] Update eventpoll.c to use rb_parent() accessor macro.
[RBTREE] Update key.c to use rb_parent() accessor macro.
[RBTREE] Update ext3 to use rb_parent() accessor macro.
[RBTREE] Change rbtree off-tree marking in I/O schedulers.
[RBTREE] Add accessor macros for colour and parent fields of rb_node
Since rb_insert_color() is part of the _public_ API, while the others are
purely internal, switch to be consistent with that.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
People don't like released kernels yelling at them, no matter how real the
error might be. So only report it if CONFIG_KOBJECT_DEBUG is enabled.
Sent on request of Andrew Morton.
(akpm: should bring this back post-2.6.17)
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For sparc32 we need R_SPARC_UA32 relocation support, for
sparc64 we need the handle R_SPARC_DISP32 relocations.
Based upon reports and initial patch by Martin Habets.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch contains the following possible cleanups:
- #if 0 the following unused global function:
- subsys_remove_file()
- remove the following unused EXPORT_SYMBOL's:
- kset_find_obj
- subsystem_init
- remove the following unused EXPORT_SYMBOL_GPL:
- kobject_add_dir
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We only used a single bit for colour information, so having a whole
machine word of space allocated for it was a bit wasteful. Instead,
store it in the lowest bit of the 'parent' pointer, since that was
always going to be aligned anyway.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Observe rb_erase(), when the victim node 'old' has two children so
neither of the simple cases at the beginning are taken.
Observe that it effectively does an 'rb_next()' operation to find the
next (by value) node in the tree. That is; we go to the victim's
right-hand child and then follow left-hand pointers all the way
down the tree as far as we can until we find the next node 'node'. We
end up with 'node' being either the same immediate right-hand child of
'old', or one of its descendants on the far left-hand side.
For a start, we _know_ that 'node' has a parent. We can drop that check.
We also know that if 'node's parent is 'old', then 'node' is the
right-hand child of its parent. And that if 'node's parent is _not_
'old', then 'node' is the left-hand child of its parent.
So instead of checking for 'node->rb_parent == old' in one place and
also checking 'node's heritage separately when we're trying to change
its link from its parent, we can shuffle things around a bit and do
it like this...
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
DEBUG_MUTEX flag is on by default in current kernel configuration.
During performance testing, we saw mutex debug functions like
mutex_debug_check_no_locks_freed (called by kfree()) is expensive as it
goes through a global list of memory areas with mutex lock and do the
checking. For benchmarks such as Volanomark and Hackbench, we have seen
more than 40% drop in performance on some platforms. We suggest to set
DEBUG_MUTEX off by default. Or at least do that later when we feel that
the mutex changes in the current code have stabilized.
Signed-off-by: Tim Chen <tim.c.chen@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It works like this:
Open the file
Read all the contents.
Call poll requesting POLLERR or POLLPRI (so select/exceptfds works)
When poll returns,
close the file and go to top of loop.
or lseek to start of file and go back to the 'read'.
Events are signaled by an object manager calling
sysfs_notify(kobj, dir, attr);
If the dir is non-NULL, it is used to find a subdirectory which
contains the attribute (presumably created by sysfs_create_group).
This has a cost of one int per attribute, one wait_queuehead per kobject,
one int per open file.
The name "sysfs_notify" may be confused with the inotify
functionality. Maybe it would be nice to support inotify for sysfs
attributes as well?
This patch also uses sysfs_notify to allow /sys/block/md*/md/sync_action
to be pollable
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Some string functions were safely overrideable in lib/string.c, but their
corresponding declarations in linux/string.h were not. Correct this, and
make strcspn overrideable.
Odds of someone wanting to do optimized assembly of these are small, but
for the sake of cleanliness, might as well bring them into line with the
rest of the file.
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While cleaning up parisc_ksyms.c earlier, I noticed that strpbrk wasn't
being exported from lib/string.c. Investigating further, I noticed a
changeset that removed its export and added it to _ksyms.c on a few more
architectures. The justification was that "other arches do it."
I think this is wrong, since no architecture currently defines
__HAVE_ARCH_STRPBRK, there's no reason for any of them to be exporting it
themselves. Therefore, consolidate the export to lib/string.c.
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove CONFIG_DEBUG_IOREMAP, it's now obsolete and won't work anyway.
Remove it from lib/KConfig since it was only available on parisc.
Signed-off-by: Helge Deller <deller@parisc-linux.org>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
<linux@horizon.com> wrote:
This is an extremely well-known technique. You can see a similar version that
uses a multiply for the last few steps at
http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel whch
refers to "Software Optimization Guide for AMD Athlon 64 and Opteron
Processors"
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF
It's section 8.6, "Efficient Implementation of Population-Count Function in
32-bit Mode", pages 179-180.
It uses the name that I am more familiar with, "popcount" (population count),
although "Hamming weight" also makes sense.
Anyway, the proof of correctness proceeds as follows:
b = a - ((a >> 1) & 0x55555555);
c = (b & 0x33333333) + ((b >> 2) & 0x33333333);
d = (c + (c >> 4)) & 0x0f0f0f0f;
#if SLOW_MULTIPLY
e = d + (d >> 8)
f = e + (e >> 16);
return f & 63;
#else
/* Useful if multiply takes at most 4 cycles */
return (d * 0x01010101) >> 24;
#endif
The input value a can be thought of as 32 1-bit fields each holding their own
hamming weight. Now look at it as 16 2-bit fields. Each 2-bit field a1..a0
has the value 2*a1 + a0. This can be converted into the hamming weight of the
2-bit field a1+a0 by subtracting a1.
That's what the (a >> 1) & mask subtraction does. Since there can be no
borrows, you can just do it all at once.
Enumerating the 4 possible cases:
0b00 = 0 -> 0 - 0 = 0
0b01 = 1 -> 1 - 0 = 1
0b10 = 2 -> 2 - 1 = 1
0b11 = 3 -> 3 - 1 = 2
The next step consists of breaking up b (made of 16 2-bir fields) into
even and odd halves and adding them into 4-bit fields. Since the largest
possible sum is 2+2 = 4, which will not fit into a 4-bit field, the 2-bit
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
"which will not fit into a 2-bit field"
fields have to be masked before they are added.
After this point, the masking can be delayed. Each 4-bit field holds a
population count from 0..4, taking at most 3 bits. These numbers can be added
without overflowing a 4-bit field, so we can compute c + (c >> 4), and only
then mask off the unwanted bits.
This produces d, a number of 4 8-bit fields, each in the range 0..8. From
this point, we can shift and add d multiple times without overflowing an 8-bit
field, and only do a final mask at the end.
The number to mask with has to be at least 63 (so that 32 on't be truncated),
but can also be 128 or 255. The x86 has a special encoding for signed
immediate byte values -128..127, so the value of 255 is slower. On other
processors, a special "sign extend byte" instruction might be faster.
On a processor with fast integer multiplies (Athlon but not P4), you can
reduce the final few serially dependent instructions to a single integer
multiply. Consider d to be 3 8-bit values d3, d2, d1 and d0, each in the
range 0..8. The multiply forms the partial products:
d3 d2 d1 d0
d3 d2 d1 d0
d3 d2 d1 d0
+ d3 d2 d1 d0
----------------------
e3 e2 e1 e0
Where e3 = d3 + d2 + d1 + d0. e2, e1 and e0 obviously cannot generate
any carries.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
By defining generic hweight*() routines
- hweight64() will be defined on all architectures
- hweight_long() will use architecture optimized hweight32() or hweight64()
I found two possible cleanups by these reasons.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch introduces the C-language equivalents of the functions below:
int ext2_set_bit(int nr, volatile unsigned long *addr);
int ext2_clear_bit(int nr, volatile unsigned long *addr);
int ext2_test_bit(int nr, const volatile unsigned long *addr);
unsigned long ext2_find_first_zero_bit(const unsigned long *addr,
unsigned long size);
unsinged long ext2_find_next_zero_bit(const unsigned long *addr,
unsigned long size);
In include/asm-generic/bitops/ext2-non-atomic.h
This code largely copied from:
include/asm-powerpc/bitops.h
include/asm-parisc/bitops.h
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch introduces the C-language equivalents of the functions below:
unsigned int hweight32(unsigned int w);
unsigned int hweight16(unsigned int w);
unsigned int hweight8(unsigned int w);
unsigned long hweight64(__u64 w);
In include/asm-generic/bitops/hweight.h
This code largely copied from: include/linux/bitops.h
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch introduces the C-language equivalents of the functions below:
unsigned logn find_next_bit(const unsigned long *addr, unsigned long size,
unsigned long offset);
unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
unsigned long offset);
unsigned long find_first_zero_bit(const unsigned long *addr,
unsigned long size);
unsigned long find_first_bit(const unsigned long *addr, unsigned long size);
In include/asm-generic/bitops/find.h
This code largely copied from: arch/powerpc/lib/bitops.c
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
DEBUG_KERNEL is often enabled just for sysrq, but this doesn't
mean the user wants more heavyweight debugging information.
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (21 commits)
BUG_ON() Conversion in drivers/video/
BUG_ON() Conversion in drivers/parisc/
BUG_ON() Conversion in drivers/block/
BUG_ON() Conversion in sound/sparc/cs4231.c
BUG_ON() Conversion in drivers/s390/block/dasd.c
BUG_ON() Conversion in lib/swiotlb.c
BUG_ON() Conversion in kernel/cpu.c
BUG_ON() Conversion in ipc/msg.c
BUG_ON() Conversion in block/elevator.c
BUG_ON() Conversion in fs/coda/
BUG_ON() Conversion in fs/binfmt_elf_fdpic.c
BUG_ON() Conversion in input/serio/hil_mlc.c
BUG_ON() Conversion in md/dm-hw-handler.c
BUG_ON() Conversion in md/bitmap.c
The comment describing how MS_ASYNC works in msync.c is confusing
rcu: undeclared variable used in documentation
fix typos "wich" -> "which"
typo patch for fs/ufs/super.c
Fix simple typos
tabify drivers/char/Makefile
...
text data bss dec hex filename
before: 3605597 1363528 363328 5332453 515de5 vmlinux
after: 3605295 1363612 363200 5332107 515c8b vmlinux
218 bytes saved.
Also, optimise any_online_cpu() out of existence on CONFIG_SMP=n.
This function seems inefficient. Can't we simply AND the two masks, then use
find_first_bit()?
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>