Commit Graph

11425 Commits

Author SHA1 Message Date
Anton Blanchard
228e548e60 net: Add sendmmsg socket system call
This patch adds a multiple message send syscall and is the send
version of the existing recvmmsg syscall. This is heavily
based on the patch by Arnaldo that added recvmmsg.

I wrote a microbenchmark to test the performance gains of using
this new syscall:

http://ozlabs.org/~anton/junkcode/sendmmsg_test.c

The test was run on a ppc64 box with a 10 Gbit network card. The
benchmark can send both UDP and RAW ethernet packets.

64B UDP

batch   pkts/sec
1       804570
2       872800 (+ 8 %)
4       916556 (+14 %)
8       939712 (+17 %)
16      952688 (+18 %)
32      956448 (+19 %)
64      964800 (+20 %)

64B raw socket

batch   pkts/sec
1       1201449
2       1350028 (+12 %)
4       1461416 (+22 %)
8       1513080 (+26 %)
16      1541216 (+28 %)
32      1553440 (+29 %)
64      1557888 (+30 %)

We see a 20% improvement in throughput on UDP send and 30%
on raw socket send.

[ Add sparc syscall entries. -DaveM ]

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-05 11:10:14 -07:00
Randy Dunlap
f9fa0bc1fa signal.c: fix erroneous syscall kernel-doc
Fix erroneous syscall kernel-doc comments in kernel/signal.c.

Reported-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-08 11:05:24 -07:00
Linus Torvalds
8b9686ff4d Merge branches 'x86-fixes-for-linus', 'sched-fixes-for-linus', 'timers-fixes-for-linus', 'irq-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-32, fpu: Fix FPU exception handling on non-SSE systems
  x86, hibernate: Initialize mmu_cr4_features during boot
  x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change
  x86: visws: Fixup irq overhaul fallout

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Clean up rebalance_domains() load-balance interval calculation

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/mrst/vrtc: Fix boot crash in mrst_rtc_init()
  rtc, x86/mrst/vrtc: Fix boot crash in rtc_read_alarm()

* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  genirq: Fix cpumask leak in __setup_irq()

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf probe: Fix listing incorrect line number with inline function
  perf probe: Fix to find recursively inlined function
  perf probe: Fix multiple --vars options behavior
  perf probe: Fix to remove redundant close
  perf probe: Fix to ensure function declared file
2011-04-07 12:12:58 -07:00
Linus Torvalds
42933bac11 Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6
* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6:
  Fix common misspellings
2011-04-07 11:14:49 -07:00
Peter Zijlstra
49c022e657 sched: Clean up rebalance_domains() load-balance interval calculation
Instead of the possible multiple-evaluation of num_online_cpus()
in rebalance_domains() that Linus reported, avoid it altogether
in the normal case since it's implemented with a Hamming weight
function over a cpu bitmask which can be darn expensive for those
with big iron.

This also makes it cleaner, smaller and documents the code.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1301991265.2225.12.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-05 10:29:36 +02:00
Randy Dunlap
41c57892a2 kernel/signal.c: add kernel-doc notation to syscalls
Add kernel-doc to syscalls in signal.c.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-04 17:51:46 -07:00
Randy Dunlap
5aba085ede kernel/signal.c: fix typos and coding style
General coding style and comment fixes; no code changes:

 - Use multi-line-comment coding style.
 - Put some function signatures completely on one line.
 - Hyphenate some words.
 - Spell Posix as POSIX.
 - Correct typos & spellos in some comments.
 - Drop trailing whitespace.
 - End sentences with periods.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-04 17:51:46 -07:00
Linus Torvalds
148086bb64 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Fix rebalance interval calculation
  sched, doc: Beef up load balancing description
  sched: Leave sched_setscheduler() earlier if possible, do not disturb SCHED_FIFO tasks
2011-04-04 08:36:58 -07:00
Linus Torvalds
4da7e90e65 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: Fix task_struct reference leak
  perf: Fix task context scheduling
  perf: mmap 512 kiB by default
  perf: Rebase max unprivileged mlock threshold on top of page size
  perf tools: Fix NO_NEWT=1 python build error
  perf symbols: Properly align symbol_conf.priv_size
  perf tools: Emit clearer message for sys_perf_event_open ENOENT return
  perf tools: Fixup exit path when not able to open events
  perf symbols: Fix vsyscall symbol lookup
  oprofile, x86: Allow setting EDGE/INV/CMASK for counter events
2011-04-04 08:36:40 -07:00
Richard Cochran
4352d9d44b ntp: fix non privileged system time shifting
The ADJ_SETOFFSET bit added in commit 094aa188 ("ntp: Add ADJ_SETOFFSET
mode bit") also introduced a way for any user to change the system time.
Sneaky or buggy calls to adjtimex() could set

    ADJ_OFFSET_SS_READ | ADJ_SETOFFSET

which would result in a successful call to timekeeping_inject_offset().
This patch fixes the issue by adding the capability check.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-04 08:31:23 -07:00
Xiaotian Feng
4f5058c3b7 genirq: Fix cpumask leak in __setup_irq()
The allocated cpumask should be freed in __setup_irq().

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
LKML-Reference: <1301744375-6812-1-git-send-email-dfeng@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-04-02 21:26:20 +02:00
Anton Blanchard
c0bb9e45f3 kdump: Allow shrinking of kdump region to be overridden
On ppc64 the crashkernel region almost always overlaps an area of firmware.
This works fine except when using the sysfs interface to reduce the kdump
region. If we free the firmware area we are guaranteed to crash.

Rename free_reserved_phys_range to crash_free_reserved_phys_range and make
it a weak function so we can override it.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-01 16:14:30 +11:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Peter Zijlstra
fd1edb3aa2 perf: Fix task_struct reference leak
sys_perf_event_open() had an imbalance in the number of task refs it
took causing memory leakage

Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@kernel.org # .37+
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-31 13:02:56 +02:00
Frederic Weisbecker
20443384fe perf: Rebase max unprivileged mlock threshold on top of page size
Ensure we allow 512 kiB + 1 page for user control without
assuming a 4096 bytes page size.

Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: <stable@kernel.org>
LKML-Reference: <1301535209-9679-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-31 13:02:54 +02:00
Sisir Koppaka
3436ae1298 sched: Fix rebalance interval calculation
The interval for checking scheduling domains if they are due to be
balanced currently depends on boot state NR_CPUS, which may not
accurately reflect the number of online CPUs at the time of check.

Thus replace NR_CPUS with num_online_cpus().

 (ed: Should only affect those who set NR_CPUS really high, such as 4096
      or so :-)

Signed-off-by: Sisir Koppaka <sisir.koppaka@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <AANLkTikqHWid2Q93F5U5Qw5snJH8C5PXoa7J6=6hYO94@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-31 13:00:37 +02:00
Dario Faggioli
a51e919818 sched: Leave sched_setscheduler() earlier if possible, do not disturb SCHED_FIFO tasks
sched_setscheduler() (in sched.c) is called in order of changing the
scheduling policy and/or the real-time priority of a task. Thus,
if we find out that neither of those are actually being modified, it
is possible to return earlier and save the overhead of a full
deactivate+activate cycle of the task in question.

Beside that, if we have more than one SCHED_FIFO task with the same
priority on the same rq (which means they share the same priority queue)
having one of them changing its position in the priority queue because of
a sched_setscheduler (as it happens by means of the deactivate+activate)
that does not actually change the priority violates POSIX which states,
for SCHED_FIFO:

  "If a thread whose policy or priority has been modified by
   pthread_setschedprio() is a running thread or is runnable, the effect on
   its position in the thread list depends on the direction of the
   modification, as follows: a. <...> b. If the priority is unchanged, the
   thread does not change position in the thread list. c. <...>"

     http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_08.html

 (ed: And the POSIX specification here does, briefly and somewhat unexpectedly,
      match what common sense tells us as well. )

Signed-off-by: Dario Faggioli <raistlin@linux.it>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1300971618.3960.82.camel@Palantir>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-31 13:00:34 +02:00
Thomas Gleixner
78c8982564 genirq: Remove the now obsolete config options and select statements
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-30 14:13:23 +02:00
Thomas Gleixner
353c8ed44f genirq: Fix misnamed label in handle_edge_eoi_irq
Reported-by: michael@ellerman.id.au
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
2011-03-29 22:24:05 +02:00
Thomas Gleixner
851d7cf647 genirq: Remove move_*irq leftovers
All users converted to new interface.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-29 14:50:32 +02:00
Thomas Gleixner
0c6f8a8b91 genirq: Remove compat code
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-29 14:48:19 +02:00
Thomas Gleixner
a6e120ed42 alpha: Use generic show_interrupts()
The only subtle difference is that alpha uses ACTUAL_NR_IRQS and
prints the IRQF_DISABLED flag.

Change the generic implementation to deal with ACTUAL_NR_IRQS if
defined.

The IRQF_DISABLED printing is pointless, as we nowadays run all
interrupts with irqs disabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-29 14:47:58 +02:00
Thomas Gleixner
cd22c0e44b genirq: Fix harmless typo
The late night fixup missed to convert the data type from irq_desc to
irq_data, which results in a harmless but annoying warning.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-29 11:36:05 +02:00
Linus Torvalds
e5217fb8ae Merge branches 'irq-cleanup-for-linus' and 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  vlynq: Convert irq functions

* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  genirq; Fix cleanup fallout
  genirq: Fix typo and remove unused variable
  genirq: Fix new kernel-doc warnings
  genirq: Add setter for AFFINITY_SET in irq_data state
  genirq: Provide setter inline for IRQD_IRQ_INPROGRESS
  genirq: Remove handle_IRQ_event
  arm: Ns9xxx: Remove private irq flow handler
  powerpc: cell: Use the core flow handler
  genirq: Provide edge_eoi flow handler
  genirq: Move INPROGRESS, MASKED and DISABLED state flags to irq_data
  genirq: Split irq_set_affinity() so it can be called with lock held.
  genirq: Add chip flag for restricting cpu_on/offline calls
  genirq: Add chip hooks for taking CPUs on/off line.
  genirq: Add irq disabled flag to irq_data state
  genirq: Reserve the irq when calling irq_set_chip()
2011-03-28 17:39:54 -07:00
Thomas Gleixner
0ef5ca1e1f genirq; Fix cleanup fallout
I missed the CONFIG_GENERIC_PENDING_IRQ dependency in the affinity
related functions and the IRQ_LEVEL propagation into irq_data
state. Did not pop up on my main test platforms. :(

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: David Daney <ddaney@caviumnetworks.com>
2011-03-29 01:41:22 +02:00