Commit Graph

288555 Commits

Author SHA1 Message Date
Namhyung Kim
fde0eeaba7 perf report: Document --symbol-filter option
Add missing description of --symbol-filter in Documentation/perf-report.txt.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1332125628-23088-1-git-send-email-namhyung.kim@lge.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-03-19 12:13:56 -03:00
Namhyung Kim
f7e5410920 perf ui browser: Clean lines inside of the input window
As Arnaldo pointed out, it should be cleared to prevent the window from
displaying overlapped strings on the region.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1332125180-23041-1-git-send-email-namhyung.kim@lge.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-03-19 12:07:30 -03:00
Namhyung Kim
6db6127c4d perf report: Treat an argument as a symbol filter
As Ingo requested, it'd be better off treating first (and the only)
argument as a symbol filter, so that user doesn't need to input the
symbol on the dialog window on TUI.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1331887855-874-5-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-03-16 16:44:36 -03:00
Namhyung Kim
b14ffaca44 perf report: Add --symbol-filter option
Add new --symbol-filter command line option to set appropriate filter
string.

Its short version is missing as I couldn't find an ideal one and
--filter option of perf record also has no short version.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1331887855-874-4-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-03-16 16:41:30 -03:00
Namhyung Kim
938a23ae7f perf ui browser: Add 's' key to filter by symbol name
Now user can enter symbol name interested via ui_browser__input_window,
and perf can process it using hists__filter_by_symbol(). Giving empty
symbol (by pressing 's' followed by ENTER) will disable the filtering.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1331887855-874-3-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-03-16 16:34:13 -03:00
Namhyung Kim
aa49f6ec99 perf ui browser: Introduce ui_browser__input_window
The ui_browser__input_window() function is to get user's key input.
Current implementation can handle maximum 49 characters.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1331887855-874-2-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-03-16 16:32:36 -03:00
Namhyung Kim
e94d53ebec perf hists: Add hists__filter_by_symbol
This function will be used for simple (sub-)string matching filter based
on user input.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1331887855-874-1-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-03-16 16:31:09 -03:00
Namhyung Kim
5090c6aea8 perf tools: Do not disable members of group event
When event group is enabled for forked task (i.e. no target task/cpu
was specified) all events were disabled and marked ->enable_on_exec.
However they wouldn't be counted at all since only group leader will
be enabled on exec actually.

In contrast to perf stat, perf record doesn't have a real problem
as it enables all the event before proceeding. But it needs to be
fixed anyway IMHO.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1331887340-32448-2-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-03-16 16:29:33 -03:00
Namhyung Kim
4c19ea453d perf stat: Fix event grouping on forked task
When event group is enabled for forked task (i.e. no target task was
specified) all events were disabled and marked ->enable_on_exec.
However they are not counted at all since only group leader will be
enabled on exec actually. So the result looked like below:

 $ ./perf stat --group -- sleep 1

 Performance counter stats for 'sleep 1':

          0.554926 task-clock                #    0.001 CPUs utilized
     <not counted> context-switches
     <not counted> CPU-migrations
     <not counted> page-faults
     <not counted> cycles
   <not supported> stalled-cycles-frontend
   <not supported> stalled-cycles-backend
     <not counted> instructions
     <not counted> branches
     <not counted> branch-misses

       1.001228093 seconds time elapsed

Fix it by disabling group leader only.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1331887340-32448-1-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-03-16 16:13:45 -03:00
Jiri Olsa
5f537a2659 perf tools: Add support to specify pmu style event
Added new event rule to the event definition grammar:

event_def: event_pmu |
           ...
event_pmu: PE_NAME '/' event_config '/'

Using this rule, event could be now specified like:
  cpu/config=1,config1=2,config2=3/u

where pmu name 'cpu' is looked up via following path:
  ${sysfs_mount}/bus/event_source/devices/${pmu}

and config options are bound to the pmu's format definiton:
  ${sysfs_mount}/bus/event_source/devices/${pmu}/format

The hardcoded config options still stays and have precedence
over any format field defined with same name.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/n/tip-50d8nr94f8k4wkezutrxvthe@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-03-16 14:30:13 -03:00
Jiri Olsa
cd82a32e99 perf tools: Add perf pmu object to access pmu format definition
Adding pmu object which provides interface to pmu's sysfs
event format definition located at:
  ${sysfs_mount}/bus/event_source/devices/${pmu}/format

Following interface is exported:
  struct perf_pmu* perf_pmu__find(char *name);
  - this function returns pmu object, which is then
    passed as a handle to other interface functions

  int perf_pmu__config(struct perf_pmu *pmu, struct perf_event_attr *attr,
                       struct list_head *head_terms);
  - this function configures perf_event_attr struct based
    on pmu's format definitions and config terms data,
    containined in head_terms list.

Parser generator is used to retrive the pmu's format definition.
The generated parser is part of the patch. Added makefile rule
'pmu-parser' to generate the parser code out of the bison/flex
sources.

Added builtin test 'Test perf pmu format parsing', which could
be run like:
	perf test pmu

Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/n/tip-errz96u1668gj9wlop1zhpht@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-03-16 14:29:35 -03:00
Jiri Olsa
8f707d843c perf tools: Add config options support for event parsing
Adding a new rule to the event grammar to be able to specify
values of additional attributes of symbolic event.

The new syntax for event symbolic definition is:

event_legacy_symbol:  PE_NAME_SYM '/' event_config '/' |
                      PE_NAME_SYM sep_slash_dc

event_config:         event_config ',' event_term | event_term

event_term:           PE_NAME '=' PE_NAME |
                      PE_NAME '=' PE_VALUE
                      PE_NAME

sep_slash_dc: '/' | ':' |

At the moment the config options are hardcoded to be used for legacy
symbol events to define several perf_event_attr fields. It is:

  'config'   to define perf_event_attr::config
  'config1'  to define perf_event_attr::config1
  'config2'  to define perf_event_attr::config2
  'period'   to define perf_event_attr::sample_period

Legacy events could be now specified as:
  cycles/period=100000/

If term is specified without the value assignment, then 1 is
assigned by default.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/n/tip-mgkavww9790jbt2jdkooyv4q@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-03-16 14:26:06 -03:00
Jiri Olsa
89812fc81f perf tools: Add parser generator for events parsing
Changing event parsing to use flex/bison parse generator.
The event syntax stays as it was.

grammar description:

events: events ',' event | event

event:  event_def PE_MODIFIER_EVENT | event_def

event_def: event_legacy_symbol sep_dc     |
           event_legacy_cache sep_dc      |
           event_legacy_breakpoint sep_dc |
           event_legacy_tracepoint sep_dc |
           event_legacy_numeric sep_dc    |
           event_legacy_raw sep_dc

event_legacy_symbol:      PE_NAME_SYM

event_legacy_cache:       PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT '-' PE_NAME_CACHE_OP_RESULT |
                          PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT  |
                          PE_NAME_CACHE_TYPE

event_legacy_raw:         PE_SEP_RAW PE_VALUE

event_legacy_numeric:     PE_VALUE ':' PE_VALUE

event_legacy_breakpoint:  PE_SEP_BP ':' PE_VALUE ':' PE_MODIFIER_BP

event_breakpoint_type:    PE_MODIFIER_BPTYPE | empty

PE_NAME_SYM:              cpu-cycles|cycles                              |
                          stalled-cycles-frontend|idle-cycles-frontend   |
                          stalled-cycles-backend|idle-cycles-backend     |
                          instructions                                   |
                          cache-references                               |
                          cache-misses                                   |
                          branch-instructions|branches                   |
                          branch-misses                                  |
                          bus-cycles                                     |
                          cpu-clock                                      |
                          task-clock                                     |
                          page-faults|faults                             |
                          minor-faults                                   |
                          major-faults                                   |
                          context-switches|cs                            |
                          cpu-migrations|migrations                      |
                          alignment-faults                               |
                          emulation-faults

PE_NAME_CACHE_TYPE:       L1-dcache|l1-d|l1d|L1-data             |
                          L1-icache|l1-i|l1i|L1-instruction      |
                          LLC|L2                                 |
                          dTLB|d-tlb|Data-TLB                    |
                          iTLB|i-tlb|Instruction-TLB             |
                          branch|branches|bpu|btb|bpc            |
                          node

PE_NAME_CACHE_OP_RESULT:  load|loads|read                        |
                          store|stores|write                     |
                          prefetch|prefetches                    |
                          speculative-read|speculative-load      |
                          refs|Reference|ops|access              |
                          misses|miss

PE_MODIFIER_EVENT:        [ukhp]{0,5}

PE_MODIFIER_BP:           [rwx]

PE_SEP_BP:                'mem'

PE_SEP_RAW:               'r'

sep_dc:                   ':' |

Added flex/bison files for event grammar parsing. The generated
parser is part of the patch. Added makefile rule 'event-parser'
to generate the parser code out of the bison/flex sources.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/n/tip-u4pfig5waq3ll2bfcdex8fgi@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-03-16 14:20:21 -03:00
Jiri Olsa
641cc93881 perf: Adding sysfs group format attribute for pmu device
Adding sysfs group 'format' attribute for pmu device that
contains a syntax description on how to construct raw events.

The event configuration is described in following
struct pefr_event_attr attributes:

  config
  config1
  config2

Each sysfs attribute within the format attribute group,
describes mapping of name and bitfield definition within
one of above attributes.

eg:
  "/sys/...<dev>/format/event" contains "config:0-7"
  "/sys/...<dev>/format/umask" contains "config:8-15"
  "/sys/...<dev>/format/usr"   contains "config:16"

the attribute value syntax is:

  line:      config ':' bits
  config:    'config' | 'config1' | 'config2"
  bits:      bits ',' bit_term | bit_term
  bit_term:  VALUE '-' VALUE | VALUE

Adding format attribute definitions for x86 cpu pmus.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/n/tip-vhdk5y2hyype9j63prymty36@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-03-16 14:06:06 -03:00
Jan Beulich
4a3d2d9bfb perf tools: Adjust make rules
Add rules to generate pre-processed files (just like are available for
the normal kernel build), and adjust the rule to create assembly files
from C ones to produce its output in the output directory rather than
in the source tree.

E.g.

  $ make -C tools/perf/ O=/tmp/perf /tmp/perf/builtin-top.i
  make: Entering directory `/home/git/linux/tools/perf'
      CC /tmp/perf/builtin-top.i
  make: Leaving directory `/home/git/linux/tools/perf'
  $ wc -l /tmp/perf/builtin-top.i
  31379 /tmp/perf/builtin-top.i
  $

Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4F588A0802000078000770FF@nat28.tlf.novell.com
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-03-15 14:23:28 -03:00
Ingo Molnar
bea95c152d Merge branch 'perf/hw-branch-sampling' into perf/core
Merge reason: The 'perf record -b' hardware branch sampling feature is ready for upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12 20:47:05 +01:00
Stephane Eranian
24bff2dc0f perf report: Fix annotate double quit issue in branch view mode
This patch fixes perf report to not go back two levels when
pressing the 'q' key while annotating in branch view mode.

When pressing 'q' in annotate mode and if the branch source
and target belong to different functions, perf now brings
up the annotation popup menu again to offer the option to
annotate the other branch source or target.

As part of the code restructuring in perf_evsel__hists_browse()
we also fix a memory leak on options[] in case of error.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: acme@redhat.com
Cc: asharma@fb.com
Cc: ravitillo@lbl.gov
Cc: vweaver1@eecs.utk.edu
Cc: khandual@linux.vnet.ibm.com
Cc: dsahern@gmail.com
Link: http://lkml.kernel.org/r/1331565210-10865-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12 20:46:16 +01:00
Stephane Eranian
8bcd65fd29 perf report: Remove duplicate annotate choice in branch view mode
This patch removes the duplicated annotate selection when
browsing in branch view mode. If the sym and dso oof the branch
source and target are the same, then only one annotate choice is
proposed.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: acme@redhat.com
Cc: asharma@fb.com
Cc: ravitillo@lbl.gov
Cc: vweaver1@eecs.utk.edu
Cc: khandual@linux.vnet.ibm.com
Cc: dsahern@gmail.com
Link: http://lkml.kernel.org/r/1331565210-10865-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12 20:46:16 +01:00
Peter Zijlstra
f9b4eeb809 perf/x86: Prettify pmu config literals
I got somewhat tired of having to decode hex numbers..

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephane Eranian <eranian@google.com>
Cc: Robert Richter <robert.richter@amd.com>
Link: http://lkml.kernel.org/n/tip-0vsy1sgywc4uar3mu1szm0rg@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12 20:44:54 +01:00
Ingo Molnar
35239e23c6 Merge branch 'perf/urgent' into perf/core
Merge reason: We are going to queue up a dependent patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12 20:44:11 +01:00
Peter Zijlstra
87e24f4b67 perf/x86: Fix local vs remote memory events for NHM/WSM
Verified using the below proglet.. before:

[root@westmere ~]# perf stat -e node-stores -e node-store-misses ./numa 0
remote write

 Performance counter stats for './numa 0':

         2,101,554 node-stores
         2,096,931 node-store-misses

       5.021546079 seconds time elapsed

[root@westmere ~]# perf stat -e node-stores -e node-store-misses ./numa 1
local write

 Performance counter stats for './numa 1':

           501,137 node-stores
               199 node-store-misses

       5.124451068 seconds time elapsed

After:

[root@westmere ~]# perf stat -e node-stores -e node-store-misses ./numa 0
remote write

 Performance counter stats for './numa 0':

         2,107,516 node-stores
         2,097,187 node-store-misses

       5.012755149 seconds time elapsed

[root@westmere ~]# perf stat -e node-stores -e node-store-misses ./numa 1
local write

 Performance counter stats for './numa 1':

         2,063,355 node-stores
               165 node-store-misses

       5.082091494 seconds time elapsed

#define _GNU_SOURCE

#include <sched.h>
#include <stdio.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <dirent.h>
#include <signal.h>
#include <unistd.h>
#include <numaif.h>
#include <stdlib.h>

#define SIZE (32*1024*1024)

volatile int done;

void sig_done(int sig)
{
	done = 1;
}

int main(int argc, char **argv)
{
	cpu_set_t *mask, *mask2;
	size_t size;
	int i, err, t;
	int nrcpus = 1024;
	char *mem;
	unsigned long nodemask = 0x01; /* node 0 */
	DIR *node;
	struct dirent *de;
	int read = 0;
	int local = 0;

	if (argc < 2) {
		printf("usage: %s [0-3]\n", argv[0]);
		printf("  bit0 - local/remote\n");
		printf("  bit1 - read/write\n");
		exit(0);
	}

	switch (atoi(argv[1])) {
	case 0:
		printf("remote write\n");
		break;
	case 1:
		printf("local write\n");
		local = 1;
		break;
	case 2:
		printf("remote read\n");
		read = 1;
		break;
	case 3:
		printf("local read\n");
		local = 1;
		read = 1;
		break;
	}

	mask = CPU_ALLOC(nrcpus);
	size = CPU_ALLOC_SIZE(nrcpus);
	CPU_ZERO_S(size, mask);

	node = opendir("/sys/devices/system/node/node0/");
	if (!node)
		perror("opendir");
	while ((de = readdir(node))) {
		int cpu;

		if (sscanf(de->d_name, "cpu%d", &cpu) == 1)
			CPU_SET_S(cpu, size, mask);
	}
	closedir(node);

	mask2 = CPU_ALLOC(nrcpus);
	CPU_ZERO_S(size, mask2);
	for (i = 0; i < size; i++)
		CPU_SET_S(i, size, mask2);
	CPU_XOR_S(size, mask2, mask2, mask); // invert

	if (!local)
		mask = mask2;

	err = sched_setaffinity(0, size, mask);
	if (err)
		perror("sched_setaffinity");

	mem = mmap(0, SIZE, PROT_READ|PROT_WRITE,
			MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
	err = mbind(mem, SIZE, MPOL_BIND, &nodemask, 8*sizeof(nodemask), MPOL_MF_MOVE);
	if (err)
		perror("mbind");

	signal(SIGALRM, sig_done);
	alarm(5);

	if (!read) {
		while (!done) {
			for (i = 0; i < SIZE; i++)
				mem[i] = 0x01;
		}
	} else {
		while (!done) {
			for (i = 0; i < SIZE; i++)
				t += *(volatile char *)(mem + i);
		}
	}

	return 0;
}

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/n/tip-tq73sxus35xmqpojf7ootxgs@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12 20:43:41 +01:00
Linus Torvalds
fde7d9049e Linux 3.3-rc7 2012-03-10 13:49:52 -08:00
Al Viro
c7b2855505 aio: fix the "too late munmap()" race
Current code has put_ioctx() called asynchronously from aio_fput_routine();
that's done *after* we have killed the request that used to pin ioctx,
so there's nothing to stop io_destroy() waiting in wait_for_all_aios()
from progressing.  As the result, we can end up with async call of
put_ioctx() being the last one and possibly happening during exit_mmap()
or elf_core_dump(), neither of which expects stray munmap() being done
to them...

We do need to prevent _freeing_ ioctx until aio_fput_routine() is done
with that, but that's all we care about - neither io_destroy() nor
exit_aio() will progress past wait_for_all_aios() until aio_fput_routine()
does really_put_req(), so the ioctx teardown won't be done until then
and we don't care about the contents of ioctx past that point.

Since actual freeing of these suckers is RCU-delayed, we don't need to
bump ioctx refcount when request goes into list for async removal.
All we need is rcu_read_lock held just over the ->ctx_lock-protected
area in aio_fput_routine().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-09 18:59:59 -08:00
Al Viro
86b62a2cb4 aio: fix io_setup/io_destroy race
Have ioctx_alloc() return an extra reference, so that caller would drop it
on success and not bother with re-grabbing it on failure exit.  The current
code is obviously broken - io_destroy() from another thread that managed
to guess the address io_setup() would've returned would free ioctx right
under us; gets especially interesting if aio_context_t * we pass to
io_setup() points to PROT_READ mapping, so put_user() fails and we end
up doing io_destroy() on kioctx another thread has just got freed...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-09 18:59:59 -08:00
Linus Torvalds
86e0600833 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "I have two additional and btrfs fixes in my for-linus branch.  One is
  a casting error that leads to memory corruption on i386 during scrub,
  and the other fixes a corner case in the backref walking code (also
  triggered by scrub)."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix casting error in scrub reada code
  btrfs: fix locking issues in find_parent_nodes()
2012-03-09 18:09:18 -08:00