Commit Graph

854 Commits

Author SHA1 Message Date
Linus Torvalds
79d8a8f736 Merge branch 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu, x86: Add arch-specific this_cpu_cmpxchg_double() support
  percpu: Generic support for this_cpu_cmpxchg_double()
  alpha: use L1_CACHE_BYTES for cacheline size in the linker script
  percpu: align percpu readmostly subsection to cacheline

Fix up trivial conflict in arch/x86/kernel/vmlinux.lds.S due to the
percpu alignment having changed ("x86: Reduce back the alignment of the
per-CPU data section")
2011-03-16 08:22:41 -07:00
Linus Torvalds
9620639b7e Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
  sched: Resched proper CPU on yield_to()
  sched: Allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy
  sched: Allow SCHED_BATCH to preempt SCHED_IDLE tasks
  sched: Clean up the IRQ_TIME_ACCOUNTING code
  sched: Add #ifdef around irq time accounting functions
  sched, autogroup: Stop claiming ownership of the root task group
  sched, autogroup: Stop going ahead if autogroup is disabled
  sched, autogroup, sysctl: Use proc_dointvec_minmax() instead
  sched: Fix the group_imb logic
  sched: Clean up some f_b_g() comments
  sched: Clean up remnants of sd_idle
  sched: Wholesale removal of sd_idle logic
  sched: Add yield_to(task, preempt) functionality
  sched: Use a buddy to implement yield_task_fair()
  sched: Limit the scope of clear_buddies
  sched: Check the right ->nr_running in yield_task_fair()
  sched: Avoid expensive initial update_cfs_load(), on UP too
  sched: Fix switch_from_fair()
  sched: Simplify the idle scheduling class
  softirqs: Account ksoftirqd time as cpustat softirq
  ...
2011-03-15 18:37:30 -07:00
Linus Torvalds
a926021cb1 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (184 commits)
  perf probe: Clean up probe_point_lazy_walker() return value
  tracing: Fix irqoff selftest expanding max buffer
  tracing: Align 4 byte ints together in struct tracer
  tracing: Export trace_set_clr_event()
  tracing: Explain about unstable clock on resume with ring buffer warning
  ftrace/graph: Trace function entry before updating index
  ftrace: Add .ref.text as one of the safe areas to trace
  tracing: Adjust conditional expression latency formatting.
  tracing: Fix event alignment: skb:kfree_skb
  tracing: Fix event alignment: mce:mce_record
  tracing: Fix event alignment: kvm:kvm_hv_hypercall
  tracing: Fix event alignment: module:module_request
  tracing: Fix event alignment: ftrace:context_switch and ftrace:wakeup
  tracing: Remove lock_depth from event entry
  perf header: Stop using 'self'
  perf session: Use evlist/evsel for managing perf.data attributes
  perf top: Don't let events to eat up whole header line
  perf top: Fix events overflow in top command
  ring-buffer: Remove unused #include <linux/trace_irq.h>
  tracing: Add an 'overwrite' trace_option.
  ...
2011-03-15 18:31:30 -07:00
Linus Torvalds
b80cd62b7d Merge branch 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  arm: Remove bogus comment in futex_atomic_cmpxchg_inatomic()
  futex: Deobfuscate handle_futex_death()
  plist: Add priority list test
  plist: Shrink struct plist_head
  futex,plist: Remove debug lock assignment from plist_node
  futex,plist: Pass the real head of the priority list to plist_del()
  futex: Sanitize futex ops argument types
  futex: Sanitize cmpxchg_futex_value_locked API
  futex: Remove redundant pagefault_disable in futex_atomic_cmpxchg_inatomic()
  futex: Avoid redudant evaluation of task_pid_vnr()
  futex: Update futex_wait_setup comments about locking
2011-03-15 18:23:52 -07:00
Al Viro
1abf0c718f New kind of open files - "location only".
New flag for open(2) - O_PATH.  Semantics:
	* pathname is resolved, but the file itself is _NOT_ opened
as far as filesystem is concerned.
	* almost all operations on the resulting descriptors shall
fail with -EBADF.  Exceptions are:
	1) operations on descriptors themselves (i.e.
		close(), dup(), dup2(), dup3(), fcntl(fd, F_DUPFD),
		fcntl(fd, F_DUPFD_CLOEXEC, ...), fcntl(fd, F_GETFD),
		fcntl(fd, F_SETFD, ...))
	2) fcntl(fd, F_GETFL), for a common non-destructive way to
		check if descriptor is open
	3) "dfd" arguments of ...at(2) syscalls, i.e. the starting
		points of pathname resolution
	* closing such descriptor does *NOT* affect dnotify or
posix locks.
	* permissions are checked as usual along the way to file;
no permission checks are applied to the file itself.  Of course,
giving such thing to syscall will result in permission checks (at
the moment it means checking that starting point of ....at() is
a directory and caller has exec permissions on it).

fget() and fget_light() return NULL on such descriptors; use of
fget_raw() and fget_raw_light() is needed to get them.  That protects
existing code from dealing with those things.

There are two things still missing (they come in the next commits):
one is handling of symlinks (right now we refuse to open them that
way; see the next commit for semantics related to those) and another
is descriptor passing via SCM_RIGHTS datagrams.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V
a51571ccb8 unistd.h: Add new syscalls numbers to asm-generic
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:45 -04:00
Michel Lespinasse
8d7718aa08 futex: Sanitize futex ops argument types
Change futex_atomic_op_inuser and futex_atomic_cmpxchg_inatomic
prototypes to use u32 types for the futex as this is the data type the
futex core code uses all over the place.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Darren Hart <darren@dvhart.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20110311025058.GD26122@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-11 12:23:31 +01:00
Michel Lespinasse
37a9d912b2 futex: Sanitize cmpxchg_futex_value_locked API
The cmpxchg_futex_value_locked API was funny in that it returned either
the original, user-exposed futex value OR an error code such as -EFAULT.
This was confusing at best, and could be a source of livelocks in places
that retry the cmpxchg_futex_value_locked after trying to fix the issue
by running fault_in_user_writeable().
    
This change makes the cmpxchg_futex_value_locked API more similar to the
get_futex_value_locked one, returning an error code and updating the
original value through a reference argument.
    
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>  [tile]
Acked-by: Tony Luck <tony.luck@intel.com>  [ia64]
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michal Simek <monstr@monstr.eu>  [microblaze]
Acked-by: David Howells <dhowells@redhat.com> [frv]
Cc: Darren Hart <darren@dvhart.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20110311024851.GC26122@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-11 12:23:08 +01:00
Jiri Olsa
ea7145477a x86: Separate out entry text section
Put x86 entry code into a separate link section: .entry.text.

Separating the entry text section seems to have performance
benefits - caused by more efficient instruction cache usage.

Running hackbench with perf stat --repeat showed that the change
compresses the icache footprint. The icache load miss rate went
down by about 15%:

 before patch:
         19417627  L1-icache-load-misses      ( +-   0.147% )

 after patch:
         16490788  L1-icache-load-misses      ( +-   0.180% )

The motivation of the patch was to fix a particular kprobes
bug that relates to the entry text section, the performance
advantage was discovered accidentally.

Whole perf output follows:

 - results for current tip tree:

  Performance counter stats for './hackbench/hackbench 10' (500 runs):

         19417627  L1-icache-load-misses      ( +-   0.147% )
       2676914223  instructions             #      0.497 IPC     ( +- 0.079% )
       5389516026  cycles                     ( +-   0.144% )

      0.206267711  seconds time elapsed   ( +-   0.138% )

 - results for current tip tree with the patch applied:

  Performance counter stats for './hackbench/hackbench 10' (500 runs):

         16490788  L1-icache-load-misses      ( +-   0.180% )
       2717734941  instructions             #      0.502 IPC     ( +- 0.079% )
       5414756975  cycles                     ( +-   0.148% )

      0.206747566  seconds time elapsed   ( +-   0.137% )

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: masami.hiramatsu.pt@hitachi.com
Cc: ananth@in.ibm.com
Cc: davem@davemloft.net
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference: <20110307181039.GB15197@jolsa.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-08 17:22:11 +01:00
Ingo Molnar
e0a92c1747 Merge branch 'sched/urgent' into sched/core
Merge reason: Add fixes before applying dependent patches.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-04 11:12:26 +01:00
Tobias Klauser
9b4f06fae4 asm-generic/user.h: Fix spelling in comment
Fix two misspellings.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-03-01 15:49:39 +01:00
Ben Hutchings
fbd7184485 mm: <asm-generic/pgtable.h> must include <linux/mm_types.h>
Commit e2cda32264 ("thp: add pmd mangling generic functions") replaced
some macros in <asm-generic/pgtable.h> with inline functions.

If the functions are to be defined (not all architectures need them)
then struct vm_area_struct must be defined first.  So include
<linux/mm_types.h>.

Fixes a build failure seen in Debian:

    CC [M]  drivers/media/dvb/mantis/mantis_pci.o
  In file included from arch/arm/include/asm/pgtable.h:460,
                   from drivers/media/dvb/mantis/mantis_pci.c:25:
  include/asm-generic/pgtable.h: In function 'ptep_test_and_clear_young':
  include/asm-generic/pgtable.h:29: error: dereferencing pointer to incomplete type

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-28 17:46:49 -08:00
Greg Kroah-Hartman
f227e08b71 Merge 2.6.38-rc6 into tty-next
This was to resolve a merge issue with drivers/char/Makefile and
drivers/tty/serial/68328serial.c

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-24 11:36:31 -08:00
Kay Sievers
3c95c985fa tty: add TIOCVHANGUP to allow clean tty shutdown of all ttys
This is useful for system management software so that it can kick
off things like gettys and everything that's started from a tty,
before we reuse it from/for something else or shut it down.

Without this ioctl it would have to temporarily become the owner of
the tty, then call vhangup() and then give it up again.

Cc: Lennart Poettering <lennart@poettering.net>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17 14:16:30 -08:00
Ingo Molnar
48fa4b8ecf Merge commit 'v2.6.38-rc5' into sched/core
Merge reason: Pick up upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-16 13:31:55 +01:00
Steven Rostedt
3d56e331b6 tracing: Replace syscall_meta_data struct array with pointer array
Currently the syscall_meta structures for the syscall tracepoints are
placed in the __syscall_metadata section, and at link time, the linker
makes one large array of all these syscall metadata structures. On boot
up, this array is read (much like the initcall sections) and the syscall
data is processed.

The problem is that there is no guarantee that gcc will place complex
structures nicely together in an array format. Two structures in the
same file may be placed awkwardly, because gcc has no clue that they
are suppose to be in an array.

A hack was used previous to force the alignment to 4, to pack the
structures together. But this caused alignment issues with other
architectures (sparc).

Instead of packing the structures into an array, the structures' addresses
are now put into the __syscall_metadata section. As pointers are always the
natural alignment, gcc should always pack them tightly together
(otherwise initcall, extable, etc would also fail).

By having the pointers to the structures in the section, we can still
iterate the trace_events without causing unnecessary alignment problems
with other architectures, or depending on the current behaviour of
gcc that will likely change in the future just to tick us kernel developers
off a little more.

The __syscall_metadata section is also moved into the .init.data section
as it is now only needed at boot up.

Suggested-by: David Miller <davem@davemloft.net>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-03 09:29:06 -05:00
Mathieu Desnoyers
6549864629 tracepoints: Fix section alignment using pointer array
Make the tracepoints more robust, making them solid enough to handle compiler
changes by not relying on anything based on compiler-specific behavior with
respect to structure alignment. Implement an approach proposed by David Miller:
use an array of const pointers to refer to the individual structures, and export
this pointer array through the linker script rather than the structures per se.
It will consume 32 extra bytes per tracepoint (24 for structure padding and 8
for the pointers), but are less likely to break due to compiler changes.

History:

commit 7e066fb8 tracepoints: add DECLARE_TRACE() and DEFINE_TRACE()
added the aligned(32) type and variable attribute to the tracepoint structures
to deal with gcc happily aligning statically defined structures on 32-byte
multiples.

One attempt was to use a 8-byte alignment for tracepoint structures by applying
both the variable and type attribute to tracepoint structures definitions and
declarations. It worked fine with gcc 4.5.1, but broke with gcc 4.4.4 and 4.4.5.

The reason is that the "aligned" attribute only specify the _minimum_ alignment
for a structure, leaving both the compiler and the linker free to align on
larger multiples. Because tracepoint.c expects the structures to be placed as an
array within each section, up-alignment cause NULL-pointer exceptions due to the
extra unexpected padding.

(this patch applies on top of -tip)

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: David S. Miller <davem@davemloft.net>
LKML-Reference: <20110126222622.GA10794@Krystal>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-03 09:28:46 -05:00
Steven Rostedt
e4a9ea5ee7 tracing: Replace trace_event struct array with pointer array
Currently the trace_event structures are placed in the _ftrace_events
section, and at link time, the linker makes one large array of all
the trace_event structures. On boot up, this array is read (much like
the initcall sections) and the events are processed.

The problem is that there is no guarantee that gcc will place complex
structures nicely together in an array format. Two structures in the
same file may be placed awkwardly, because gcc has no clue that they
are suppose to be in an array.

A hack was used previous to force the alignment to 4, to pack the
structures together. But this caused alignment issues with other
architectures (sparc).

Instead of packing the structures into an array, the structures' addresses
are now put into the _ftrace_event section. As pointers are always the
natural alignment, gcc should always pack them tightly together
(otherwise initcall, extable, etc would also fail).

By having the pointers to the structures in the section, we can still
iterate the trace_events without causing unnecessary alignment problems
with other architectures, or depending on the current behaviour of
gcc that will likely change in the future just to tick us kernel developers
off a little more.

The _ftrace_event section is also moved into the .init.data section
as it is now only needed at boot up.

Suggested-by: David Miller <davem@davemloft.net>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-02 21:37:13 -05:00
Venkatesh Pallipadi
a1dabb6bff time: Add nsecs_to_cputime64 interface for asm-generic
Add nsecs_to_cputime64 interface. This is used in following patches that
updates cpu irq stat based on ns granularity info in IRQ_TIME_ACCOUNTING.

Tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1292980144-28796-3-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-26 12:33:20 +01:00
Tejun Heo
19df0c2fef percpu: align percpu readmostly subsection to cacheline
Currently percpu readmostly subsection may share cachelines with other
percpu subsections which may result in unnecessary cacheline bounce
and performance degradation.

This patch adds @cacheline parameter to PERCPU() and PERCPU_VADDR()
linker macros, makes each arch linker scripts specify its cacheline
size and use it to align percpu subsections.

This is based on Shaohua's x86 only patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Shaohua Li <shaohua.li@intel.com>
2011-01-25 14:26:50 +01:00
Dmitry Torokhov
e94965ed5b module: show version information for built-in modules in sysfs
Currently only drivers that are built as modules have their versions
shown in /sys/module/<module_name>/version, but this information might
also be useful for built-in drivers as well. This especially important
for drivers that do not define any parameters - such drivers, if
built-in, are completely invisible from userspace.

This patch changes MODULE_VERSION() macro so that in case when we are
compiling built-in module, version information is stored in a separate
section. Kernel then uses this data to create 'version' sysfs attribute
in the same fashion it creates attributes for module parameters.

Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-01-24 14:32:51 +10:30
Andrea Arcangeli
b3697c0255 fix non-x86 build failure in pmdp_get_and_clear
pmdp_get_and_clear/pmdp_clear_flush/pmdp_splitting_flush were trapped as
BUG() and they were defined only to diminish the risk of build issues on
not-x86 archs and to be consistent with the generic pte methods previously
defined in include/asm-generic/pgtable.h.

But they are causing more trouble than they were supposed to solve, so
it's simpler not to define them when THP is off.

This is also correcting the export of pmdp_splitting_flush which is
currently unused (x86 isn't using the generic implementation in
mm/pgtable-generic.c and no other arch needs that [yet]).

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Sam Ravnborg <sam@ravnborg.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-16 15:05:44 -08:00
Andrea Arcangeli
1ddd6db43a thp: mm: define MADV_NOHUGEPAGE
Define MADV_NOHUGEPAGE.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:47 -08:00
Andrea Arcangeli
e2cda32264 thp: add pmd mangling generic functions
Some are needed to build but not actually used on archs not supporting
transparent hugepages.  Others like pmdp_clear_flush are used by x86 too.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:40 -08:00
Andrea Arcangeli
5f6e8da70a thp: special pmd_trans_* functions
These returns 0 at compile time when the config option is disabled, to
allow gcc to eliminate the transparent hugepage function calls at compile
time without additional #ifdefs (only the export of those functions have
to be visible to gcc but they won't be required at link time and
huge_memory.o can be not built at all).

_PAGE_BIT_UNUSED1 is never used for pmd, only on pte.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:40 -08:00