Commit Graph

7555 Commits

Author SHA1 Message Date
Thomas Gleixner
aa715284b4 futex: request only one page from get_user_pages()
Yanmin noticed that fault_in_user_writeable() requests 4 pages instead
of one.

That's the result of blindly trusting Linus' proposal :) I even looked
up the prototype to verify the correctness: the argument in question
is confusingly enough named "len" while in reality it means number of
pages.

Pointed-out-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-06-25 14:33:46 +02:00
Linus Torvalds
c622304825 Merge branches 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/{vfs-2.6,audit-current}
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  another race fix in jfs_check_acl()
  Get "no acls for this inode" right, fix shmem breakage
  inline functions left without protection of ifdef (acl)

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
  audit: inode watches depend on CONFIG_AUDIT not CONFIG_AUDIT_SYSCALL
2009-06-24 14:17:14 -07:00
Eric Paris
3a6a6c16be audit: inode watches depend on CONFIG_AUDIT not CONFIG_AUDIT_SYSCALL
Even though one cannot make use of the audit watch code without
CONFIG_AUDIT_SYSCALL the spaghetti nature of the audit code means that
the audit rule filtering requires that it at least be compiled.

Thus build the audit_watch code when we build auditfilter like it was
before cfcad62c74

Clearly this is a point of potential future cleanup..

Reported-by: Frans Pop <elendil@planet.nl>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 16:42:05 -04:00
Thomas Gleixner
d0725992c8 futex: Fix the write access fault problem for real
commit 64d1304a64 (futex: setup writeable mapping for futex ops which
modify user space data) did address only half of the problem of write
access faults.

The patch was made on two wrong assumptions:

1) access_ok(VERIFY_WRITE,...) would actually check write access.

   On x86 it does _NOT_. It's a pure address range check.

2) a RW mapped region can not go away under us.

   That's wrong as well. Nobody can prevent another thread to call
   mprotect(PROT_READ) on that region where the futex resides. If that
   call hits between the get_user_pages_fast() verification and the
   actual write access in the atomic region we are toast again.

The solution is to not rely on access_ok and get_user() for any write
access related fault on private and shared futexes. Instead we need to
fault it in with verification of write access.

There is no generic non destructive write mechanism which would fault
the user page in trough a #PF, but as we already know that we will
fault we can as well call get_user_pages() directly and avoid the #PF
overhead.

If get_user_pages() returns -EFAULT we know that we can not fix it
anymore and need to bail out to user space.

Remove a bunch of confusing comments on this issue as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
2009-06-24 21:27:35 +02:00
Al Viro
916d75761c Fix rule eviction order for AUDIT_DIR
If syscall removes the root of subtree being watched, we
definitely do not want the rules refering that subtree
to be destroyed without the syscall in question having
a chance to match them.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 00:02:38 -04:00
Eric Paris
9d96098510 Audit: clean up all op= output to include string quoting
A number of places in the audit system we send an op= followed by a string
that includes spaces.  Somehow this works but it's just wrong.  This patch
moves all of those that I could find to be quoted.

Example:

Change From: type=CONFIG_CHANGE msg=audit(1244666690.117:31): auid=0 ses=1
subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 op=remove rule
key="number2" list=4 res=0

Change To: type=CONFIG_CHANGE msg=audit(1244666690.117:31): auid=0 ses=1
subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 op="remove rule"
key="number2" list=4 res=0

Signed-off-by: Eric Paris <eparis@redhat.com>
2009-06-24 00:00:52 -04:00
Eric Paris
35fe4d0b1b Audit: move audit_get_nd completely into audit_watch
audit_get_nd() is only used  by audit_watch and could be more cleanly
implemented by having the audit watch functions call it when needed rather
than making the generic audit rule parsing code deal with those objects.

Signed-off-by: Eric Paris <eparis@redhat.com>
2009-06-23 23:51:05 -04:00
Eric Paris
cfcad62c74 audit: seperate audit inode watches into a subfile
In preparation for converting audit to use fsnotify instead of inotify we
seperate the inode watching code into it's own file.  This is similar to
how the audit tree watching code is already seperated into audit_tree.c

Signed-off-by: Eric Paris <eparis@redhat.com>
2009-06-23 23:50:59 -04:00
Eric Paris
ea7ae60bfe Audit: clean up audit_receive_skb
audit_receive_skb is hard to clearly parse what it is doing to the netlink
message.  Clean the function up so it is easy and clear to see what is going
on.

Signed-off-by: Eric Paris <eparis@redhat.com>
2009-06-23 23:50:40 -04:00
Eric Paris
ee080e6ce9 Audit: cleanup netlink mesg handling
The audit handling of netlink messages is all over the place.  Clean things
up, use predetermined macros, generally make it more readable.

Signed-off-by: Eric Paris <eparis@redhat.com>
2009-06-23 23:50:39 -04:00
Eric Paris
038cbcf65f Audit: unify the printk of an skb when auditd not around
Remove code duplication of skb printk when auditd is not around in userspace
to deal with this message.

Signed-off-by: Eric Paris <eparis@redhat.com>
2009-06-23 23:50:37 -04:00
Eric Paris
e85188f424 Audit: dereferencing krule as if it were an audit_watch
audit_update_watch() runs all of the rules for a given watch and duplicates
them, attaches a new watch to them, and then when it finishes that process
and has called free on all of the old rules (ok maybe still inside the rcu
grace period) it proceeds to use the last element from list_for_each_entry_safe()
as if it were a krule rather than being the audit_watch which was anchoring
the list to output a message about audit rules changing.

This patch unfies the audit message from two different places into a helper
function and calls it from the correct location in audit_update_rules().  We
will now get an audit message about the config changing for each rule (with
each rules filterkey) rather than the previous garbage.

Signed-off-by: Eric Paris <eparis@redhat.com>
2009-06-23 23:50:36 -04:00
Eric Paris
b87ce6e418 Audit: better estimation of execve record length
The audit execve record splitting code estimates the length of the message
generated.  But it forgot to include the "" that wrap each string in its
estimation.  This means that execve messages with lots of tiny (1-2 byte)
arguments could still cause records greater than 8k to be emitted.  Simply
fix the estimate.

Signed-off-by: Eric Paris <eparis@redhat.com>
2009-06-23 23:50:34 -04:00
Eric Paris
35aa901c0b Audit: fix audit watch use after free
When an audit watch is added to a parent the temporary watch inside the
original krule from userspace is freed.  Yet the original watch is used after
the real watch was created in audit_add_rules()

Signed-off-by: Eric Paris <eparis@redhat.com>
2009-06-23 23:50:33 -04:00
Linus Torvalds
31950eb66f mm/init: cpu_hotplug_init() must be initialized before SLAB
SLAB uses get/put_online_cpus() which use a mutex which is itself only
initialized when cpu_hotplug_init() is called.  Currently we hang suring
boot in SLAB due to doing that too late.

Reported by James Bottomley and Sachin Sant (and possibly others).
Debugged by Benjamin Herrenschmidt.

This just removes the dynamic initialization of the data structures, and
replaces it with a static one, avoiding this dependency entirely, and
removing one unnecessary special initcall.

Tested-by: Sachin Sant <sachinp@in.ibm.com>
Tested-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 21:18:12 -07:00
Linus Torvalds
2453d6ff6f Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  genirq, irq.h: Fix kernel-doc warnings
  genirq: fix comment to say IRQ_WAKE_THREAD
2009-06-20 11:30:01 -07:00
Linus Torvalds
12e24f34cb Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
  perfcounter: Handle some IO return values
  perf_counter: Push perf_sample_data through the swcounter code
  perf_counter tools: Define and use our own u64, s64 etc. definitions
  perf_counter: Close race in perf_lock_task_context()
  perf_counter, x86: Improve interactions with fast-gup
  perf_counter: Simplify and fix task migration counting
  perf_counter tools: Add a data file header
  perf_counter: Update userspace callchain sampling uses
  perf_counter: Make callchain samples extensible
  perf report: Filter to parent set by default
  perf_counter tools: Handle lost events
  perf_counter: Add event overlow handling
  fs: Provide empty .set_page_dirty() aop for anon inodes
  perf_counter: tools: Makefile tweaks for 64-bit powerpc
  perf_counter: powerpc: Add processor back-end for MPC7450 family
  perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels
  perf_counter: powerpc: Change how processor-specific back-ends get selected
  perf_counter: powerpc: Use unsigned long for register and constraint values
  perf_counter: powerpc: Enable use of software counters on 32-bit powerpc
  perf_counter tools: Add and use isprint()
  ...
2009-06-20 11:29:32 -07:00
Linus Torvalds
1eb51c33b2 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Fix out of scope variable access in sched_slice()
  sched: Hide runqueues from direct refer at source code level
  sched: Remove unneeded __ref tag
  sched, x86: Fix cpufreq + sched_clock() TSC scaling
2009-06-20 10:57:40 -07:00
Linus Torvalds
b0b7065b64 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits)
  tracing/urgent: warn in case of ftrace_start_up inbalance
  tracing/urgent: fix unbalanced ftrace_start_up
  function-graph: add stack frame test
  function-graph: disable when both x86_32 and optimize for size are configured
  ring-buffer: have benchmark test print to trace buffer
  ring-buffer: do not grab locks in nmi
  ring-buffer: add locks around rb_per_cpu_empty
  ring-buffer: check for less than two in size allocation
  ring-buffer: remove useless compile check for buffer_page size
  ring-buffer: remove useless warn on check
  ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index
  tracing: update sample event documentation
  tracing/filters: fix race between filter setting and module unload
  tracing/filters: free filter_string in destroy_preds()
  ring-buffer: use commit counters for commit pointer accounting
  ring-buffer: remove unused variable
  ring-buffer: have benchmark test handle discarded events
  ring-buffer: prevent adding write in discarded area
  tracing/filters: strloc should be unsigned short
  tracing/filters: operand can be negative
  ...

Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually
2009-06-20 10:56:46 -07:00
Linus Torvalds
38df92b8ce Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  NOHZ: Properly feed cpufreq ondemand governor
2009-06-20 10:51:44 -07:00
Ingo Molnar
d4c4038343 Merge branch 'tip/tracing/urgent-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/urgent 2009-06-20 18:26:48 +02:00
Ingo Molnar
3daeb4da9a Merge branch 'tip/tracing/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/urgent 2009-06-20 17:25:49 +02:00
Peter Zijlstra
92bf309a9c perf_counter: Push perf_sample_data through the swcounter code
Push the perf_sample_data further outwards to the swcounter interface,
to abstract it away some more.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-20 12:30:30 +02:00
Frederic Weisbecker
9ea1a153a4 tracing/urgent: warn in case of ftrace_start_up inbalance
Prevent from further ftrace_start_up inbalances so that we avoid
future nop patching omissions with dynamic ftrace.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2009-06-20 06:52:21 +02:00
Frederic Weisbecker
c85a17e226 tracing/urgent: fix unbalanced ftrace_start_up
Perfcounter reports the following stats for a wide system
profiling:

 #
 # (2364 samples)
 #
 # Overhead  Symbol
 # ........  ......
 #
    15.40%  [k] mwait_idle_with_hints
     8.29%  [k] read_hpet
     5.75%  [k] ftrace_caller
     3.60%  [k] ftrace_call
     [...]

This snapshot has been taken while neither the function tracer nor
the function graph tracer was running.
With dynamic ftrace, such results show a wrong ftrace behaviour
because all calls to ftrace_caller or ftrace_graph_caller (the patched
calls to mcount) are supposed to be patched into nop if none of those
tracers are running.

The problem occurs after the first run of the function tracer. Once we
launch it a second time, the callsites will never be nopped back,
unless you set custom filters.
For example it happens during the self tests at boot time.
The function tracer selftest runs, and then the dynamic tracing is
tested too. After that, the callsites are left un-nopped.

This is because the reset callback of the function tracer tries to
unregister two ftrace callbacks in once: the common function tracer
and the function tracer with stack backtrace, regardless of which
one is currently in use.
It then creates an unbalance on ftrace_start_up value which is expected
to be zero when the last ftrace callback is unregistered. When it
reaches zero, the FTRACE_DISABLE_CALLS is set on the next ftrace
command, triggering the patching into nop. But since it becomes
unbalanced, ie becomes lower than zero, if the kernel functions
are patched again (as in every further function tracer runs), they
won't ever be nopped back.

Note that ftrace_call and ftrace_graph_call are still patched back
to ftrace_stub in the off case, but not the callers of ftrace_call
and ftrace_graph_caller. It means that the tracing is well deactivated
but we waste a useless call into every kernel function.

This patch just unregisters the right ftrace_ops for the function
tracer on its reset callback and ignores the other one which is
not registered, fixing the unbalance. The problem also happens
is .30

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: stable@kernel.org
2009-06-20 06:28:46 +02:00