Commit Graph

103 Commits

Author SHA1 Message Date
Linus Torvalds
ade0899b29 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "This tree includes some late late perf items that missed the first
  round:

  tools:

   - Bash auto completion improvements, now we can auto complete the
     tools long options, tracepoint event names, etc, from Namhyung Kim.

   - Look up thread using tid instead of pid in 'perf sched'.

   - Move global variables into a perf_kvm struct, from David Ahern.

   - Hists refactorings, preparatory for improved 'diff' command, from
     Jiri Olsa.

   - Hists refactorings, preparatory for event group viewieng work, from
     Namhyung Kim.

   - Remove double negation on optional feature macro definitions, from
     Namhyung Kim.

   - Remove several cases of needless global variables, on most
     builtins.

   - misc fixes

  kernel:

   - sysfs support for IBS on AMD CPUs, from Robert Richter.

   - Support for an upcoming Intel CPU, the Xeon-Phi / Knights Corner
     HPC blade PMU, from Vince Weaver.

   - misc fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
  perf: Fix perf_cgroup_switch for sw-events
  perf: Clarify perf_cpu_context::active_pmu usage by renaming it to ::unique_pmu
  perf/AMD/IBS: Add sysfs support
  perf hists: Add more helpers for hist entry stat
  perf hists: Move he->stat.nr_events initialization to a template
  perf hists: Introduce struct he_stat
  perf diff: Removing the total_period argument from output code
  perf tool: Add hpp interface to enable/disable hpp column
  perf tools: Removing hists pair argument from output path
  perf hists: Separate overhead and baseline columns
  perf diff: Refactor diff displacement possition info
  perf hists: Add struct hists pointer to struct hist_entry
  perf tools: Complete tracepoint event names
  perf/x86: Add support for Intel Xeon-Phi Knights Corner PMU
  perf evlist: Remove some unused methods
  perf evlist: Introduce add_newtp method
  perf kvm: Move global variables into a perf_kvm struct
  perf tools: Convert to BACKTRACE_SUPPORT
  perf tools: Long option completion support for each subcommands
  perf tools: Complete long option names of perf command
  ...
2012-10-13 10:20:11 +09:00
Konstantin Khlebnikov
314e51b985 mm: kill vma flag VM_RESERVED and mm->reserved_vm counter
A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:

 | effect                 | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump      | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock           | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP

This patch removes reserved_vm counter from mm_struct.  Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.

Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.

remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.

[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:19 +09:00
Peter Zijlstra
95cf59ea72 perf: Fix perf_cgroup_switch for sw-events
Jiri reported that he could trigger the WARN_ON_ONCE() in
perf_cgroup_switch() using sw-events. This is because sw-events share
a cpuctx with multiple PMUs.

Use the ->unique_pmu pointer to limit the pmu iteration to unique
cpuctx instances.

Reported-and-Tested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-so7wi2zf3jjzrwcutm2mkz0j@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-05 13:59:07 +02:00
Peter Zijlstra
3f1f33206c perf: Clarify perf_cpu_context::active_pmu usage by renaming it to ::unique_pmu
Stephane thought the perf_cpu_context::active_pmu name confusing and
suggested using 'unique_pmu' instead.

This pointer is a pointer to a 'random' pmu sharing the cpuctx
instance, therefore limiting a for_each_pmu loop to those where
cpuctx->unique_pmu matches the pmu we get a loop over unique cpuctx
instances.

Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-kxyjqpfj2fn9gt7kwu5ag9ks@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-05 13:59:06 +02:00
Linus Torvalds
aab174f0df Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs update from Al Viro:

 - big one - consolidation of descriptor-related logics; almost all of
   that is moved to fs/file.c

   (BTW, I'm seriously tempted to rename the result to fd.c.  As it is,
   we have a situation when file_table.c is about handling of struct
   file and file.c is about handling of descriptor tables; the reasons
   are historical - file_table.c used to be about a static array of
   struct file we used to have way back).

   A lot of stray ends got cleaned up and converted to saner primitives,
   disgusting mess in android/binder.c is still disgusting, but at least
   doesn't poke so much in descriptor table guts anymore.  A bunch of
   relatively minor races got fixed in process, plus an ext4 struct file
   leak.

 - related thing - fget_light() partially unuglified; see fdget() in
   there (and yes, it generates the code as good as we used to have).

 - also related - bits of Cyrill's procfs stuff that got entangled into
   that work; _not_ all of it, just the initial move to fs/proc/fd.c and
   switch of fdinfo to seq_file.

 - Alex's fs/coredump.c spiltoff - the same story, had been easier to
   take that commit than mess with conflicts.  The rest is a separate
   pile, this was just a mechanical code movement.

 - a few misc patches all over the place.  Not all for this cycle,
   there'll be more (and quite a few currently sit in akpm's tree)."

Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
  MAX_LFS_FILESIZE should be a loff_t
  compat: fs: Generic compat_sys_sendfile implementation
  fs: push rcu_barrier() from deactivate_locked_super() to filesystems
  btrfs: reada_extent doesn't need kref for refcount
  coredump: move core dump functionality into its own file
  coredump: prevent double-free on an error path in core dumper
  usb/gadget: fix misannotations
  fcntl: fix misannotations
  ceph: don't abuse d_delete() on failure exits
  hypfs: ->d_parent is never NULL or negative
  vfs: delete surplus inode NULL check
  switch simple cases of fget_light to fdget
  new helpers: fdget()/fdput()
  switch o2hb_region_dev_write() to fget_light()
  proc_map_files_readdir(): don't bother with grabbing files
  make get_file() return its argument
  vhost_set_vring(): turn pollstart/pollstop into bool
  switch prctl_set_mm_exe_file() to fget_light()
  switch xfs_find_handle() to fget_light()
  switch xfs_swapext() to fget_light()
  ...
2012-10-02 20:25:04 -07:00
Linus Torvalds
68d47a137c Merge branch 'for-3.7-hierarchy' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup hierarchy update from Tejun Heo:
 "Currently, different cgroup subsystems handle nested cgroups
  completely differently.  There's no consistency among subsystems and
  the behaviors often are outright broken.

  People at least seem to agree that the broken hierarhcy behaviors need
  to be weeded out if any progress is gonna be made on this front and
  that the fallouts from deprecating the broken behaviors should be
  acceptable especially given that the current behaviors don't make much
  sense when nested.

  This patch makes cgroup emit warning messages if cgroups for
  subsystems with broken hierarchy behavior are nested to prepare for
  fixing them in the future.  This was put in a separate branch because
  more related changes were expected (didn't make it this round) and the
  memory cgroup wanted to pull in this and make changes on top."

* 'for-3.7-hierarchy' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
2012-10-02 10:52:28 -07:00
Linus Torvalds
7e92daaefa Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf update from Ingo Molnar:
 "Lots of changes in this cycle as well, with hundreds of commits from
  over 30 contributors.  Most of the activity was on the tooling side.

  Higher level changes:

   - New 'perf kvm' analysis tool, from Xiao Guangrong.

   - New 'perf trace' system-wide tracing tool

   - uprobes fixes + cleanups from Oleg Nesterov.

   - Lots of patches to make perf build on Android out of box, from
     Irina Tirdea

   - Extend ftrace function tracing utility to be more dynamic for its
     users.  It allows for data passing to the callback functions, as
     well as reading regs as if a breakpoint were to trigger at function
     entry.

     The main goal of this patch series was to allow kprobes to use
     ftrace as an optimized probe point when a probe is placed on an
     ftrace nop.  With lots of help from Masami Hiramatsu, and going
     through lots of iterations, we finally came up with a good
     solution.

   - Add cpumask for uncore pmu, use it in 'stat', from Yan, Zheng.

   - Various tracing updates from Steve Rostedt

   - Clean up and improve 'perf sched' performance by elliminating lots
     of needless calls to libtraceevent.

   - Event group parsing support, from Jiri Olsa

   - UI/gtk refactorings and improvements from Namhyung Kim

   - Add support for non-tracepoint events in perf script python, from
     Feng Tang

   - Add --symbols to 'script', similar to the one in 'report', from
     Feng Tang.

  Infrastructure enhancements and fixes:

   - Convert the trace builtins to use the growing evsel/evlist
     tracepoint infrastructure, removing several open coded constructs
     like switch like series of strcmp to dispatch events, etc.
     Basically what had already been showcased in 'perf sched'.

   - Add evsel constructor for tracepoints, that uses libtraceevent just
     to parse the /format events file, use it in a new 'perf test' to
     make sure the libtraceevent format parsing regressions can be more
     readily caught.

   - Some strange errors were happening in some builds, but not on the
     next, reported by several people, problem was some parser related
     files, generated during the build, didn't had proper make deps, fix
     from Eric Sandeen.

   - Introduce struct and cache information about the environment where
     a perf.data file was captured, from Namhyung Kim.

   - Fix handling of unresolved samples when --symbols is used in
     'report', from Feng Tang.

   - Add union member access support to 'probe', from Hyeoncheol Lee.

   - Fixups to die() removal, from Namhyung Kim.

   - Render fixes for the TUI, from Namhyung Kim.

   - Don't enable annotation in non symbolic view, from Namhyung Kim.

   - Fix pipe mode in 'report', from Namhyung Kim.

   - Move related stats code from stat to util/, will be used by the
     'stat' kvm tool, from Xiao Guangrong.

   - Remove die()/exit() calls from several tools.

   - Resolve vdso callchains, from Jiri Olsa

   - Don't pass const char pointers to basename, so that we can
     unconditionally use libgen.h and thus avoid ifdef BIONIC lines,
     from David Ahern

   - Refactor hist formatting so that it can be reused with the GTK
     browser, From Namhyung Kim

   - Fix build for another rbtree.c change, from Adrian Hunter.

   - Make 'perf diff' command work with evsel hists, from Jiri Olsa.

   - Use the only field_sep var that is set up: symbol_conf.field_sep,
     fix from Jiri Olsa.

   - .gitignore compiled python binaries, from Namhyung Kim.

   - Get rid of die() in more libtraceevent places, from Namhyung Kim.

   - Rename libtraceevent 'private' struct member to 'priv' so that it
     works in C++, from Steven Rostedt

   - Remove lots of exit()/die() calls from tools so that the main perf
     exit routine can take place, from David Ahern

   - Fix x86 build on x86-64, from David Ahern.

   - {int,str,rb}list fixes from Suzuki K Poulose

   - perf.data header fixes from Namhyung Kim

   - Allow user to indicate objdump path, needed in cross environments,
     from Maciek Borzecki

   - Fix hardware cache event name generation, fix from Jiri Olsa

   - Add round trip test for sw, hw and cache event names, catching the
     problem Jiri fixed, after Jiri's patch, the test passes
     successfully.

   - Clean target should do clean for lib/traceevent too, fix from David
     Ahern

   - Check the right variable for allocation failure, fix from Namhyung
     Kim

   - Set up evsel->tp_format regardless of evsel->name being set
     already, fix from Namhyung Kim

   - Oprofile fixes from Robert Richter.

   - Remove perf_event_attr needless version inflation, from Jiri Olsa

   - Introduce libtraceevent strerror like error reporting facility,
     from Namhyung Kim

   - Add pmu mappings to perf.data header and use event names from cmd
     line, from Robert Richter

   - Fix include order for bison/flex-generated C files, from Ben
     Hutchings

   - Build fixes and documentation corrections from David Ahern

   - Assorted cleanups from Robert Richter

   - Let O= makes handle relative paths, from Steven Rostedt

   - perf script python fixes, from Feng Tang.

   - Initial bash completion support, from Frederic Weisbecker

   - Allow building without libelf, from Namhyung Kim.

   - Support DWARF CFI based unwind to have callchains when %bp based
     unwinding is not possible, from Jiri Olsa.

   - Symbol resolution fixes, while fixing support PPC64 files with an
     .opt ELF section was the end goal, several fixes for code that
     handles all architectures and cleanups are included, from Cody
     Schafer.

   - Assorted fixes for Documentation and build in 32 bit, from Robert
     Richter

   - Cache the libtraceevent event_format associated to each evsel
     early, so that we avoid relookups, i.e.  calling pevent_find_event
     repeatedly when processing tracepoint events.

     [ This is to reduce the surface contact with libtraceevents and
        make clear what is that the perf tools needs from that lib: so
        far parsing the common and per event fields.  ]

   - Don't stop the build if the audit libraries are not installed, fix
     from Namhyung Kim.

   - Fix bfd.h/libbfd detection with recent binutils, from Markus
     Trippelsdorf.

   - Improve warning message when libunwind devel packages not present,
     from Jiri Olsa"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (282 commits)
  perf trace: Add aliases for some syscalls
  perf probe: Print an enum type variable in "enum variable-name" format when showing accessible variables
  perf tools: Check libaudit availability for perf-trace builtin
  perf hists: Add missing period_* fields when collapsing a hist entry
  perf trace: New tool
  perf evsel: Export the event_format constructor
  perf evsel: Introduce rawptr() method
  perf tools: Use perf_evsel__newtp in the event parser
  perf evsel: The tracepoint constructor should store sys:name
  perf evlist: Introduce set_filter() method
  perf evlist: Renane set_filters method to apply_filters
  perf test: Add test to check we correctly parse and match syscall open parms
  perf evsel: Handle endianity in intval method
  perf evsel: Know if byte swap is needed
  perf tools: Allow handling a NULL cpu_map as meaning "all cpus"
  perf evsel: Improve tracepoint constructor setup
  tools lib traceevent: Fix error path on pevent_parse_event
  perf test: Fix build failure
  trace: Move trace event enable from fs_initcall to core_initcall
  tracing: Add an option for disabling markers
  ...
2012-10-01 10:28:49 -07:00
Al Viro
2903ff019b switch simple cases of fget_light to fdget
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26 22:20:08 -04:00
Al Viro
ab72a7028c events: don't use get_unused_fd_flags() when get_unused_fd() will do
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26 21:08:52 -04:00
Tejun Heo
8c7f6edbda cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
Currently, cgroup hierarchy support is a mess.  cpu related subsystems
behave correctly - configuration, accounting and control on a parent
properly cover its children.  blkio and freezer completely ignore
hierarchy and treat all cgroups as if they're directly under the root
cgroup.  Others show yet different behaviors.

These differing interpretations of cgroup hierarchy make using cgroup
confusing and it impossible to co-mount controllers into the same
hierarchy and obtain sane behavior.

Eventually, we want full hierarchy support from all subsystems and
probably a unified hierarchy.  Users using separate hierarchies
expecting completely different behaviors depending on the mounted
subsystem is deterimental to making any progress on this front.

This patch adds cgroup_subsys.broken_hierarchy and sets it to %true
for controllers which are lacking in hierarchy support.  The goal of
this patch is two-fold.

* Move users away from using hierarchy on currently non-hierarchical
  subsystems, so that implementing proper hierarchy support on those
  doesn't surprise them.

* Keep track of which controllers are broken how and nudge the
  subsystems to implement proper hierarchy support.

For now, start with a single warning message.  We can whine louder
later on.

v2: Fixed a typo spotted by Michal. Warning message updated.

v3: Updated memcg part so that it doesn't generate warning in the
    cases where .use_hierarchy=false doesn't make the behavior
    different from root.use_hierarchy=true.  Fixed a typo spotted by
    Glauber.

v4: Check ->broken_hierarchy after cgroup creation is complete so that
    ->create() can affect the result per Michal.  Dropped unnecessary
    memcg root handling per Michal.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2012-09-14 12:01:16 -07:00
K.Prasad
500ad2d8b0 perf/hwpb: Invoke __perf_event_disable() if interrupts are already disabled
While debugging a warning message on PowerPC while using hardware
breakpoints, it was discovered that when perf_event_disable is invoked
through hw_breakpoint_handler function with interrupts disabled, a
subsequent IPI in the code path would trigger a WARN_ON_ONCE message in
smp_call_function_single function.

This patch calls __perf_event_disable() when interrupts are already
disabled, instead of perf_event_disable().

Reported-by: Edjunior Barbosa Machado <emachado@linux.vnet.ibm.com>
Signed-off-by: K.Prasad <Prasad.Krishnan@gmail.com>
[naveen.n.rao@linux.vnet.ibm.com: v3: Check to make sure we target current task]
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120802081635.5811.17737.stgit@localhost.localdomain
[ Fixed build error on MIPS. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-04 17:29:53 +02:00
Al Viro
a6fa941d94 perf_event: Switch to internal refcount, fix race with close()
Don't mess with file refcounts (or keep a reference to file, for
that matter) in perf_event.  Use explicit refcount of its own
instead.  Deal with the race between the final reference to event
going away and new children getting created for it by use of
atomic_long_inc_not_zero() in inherit_event(); just have the
latter free what it had allocated and return NULL, that works
out just fine (children of siblings of something doomed are
created as singletons, same as if the child of leader had been
created and immediately killed).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120820135925.GG23464@ZenIV.linux.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-04 17:29:22 +02:00
Jiri Olsa
c5ebcedb56 perf: Add ability to attach user stack dump to sample
Introducing PERF_SAMPLE_STACK_USER sample type bit to trigger the dump
of the user level stack on sample. The size of the dump is specified by
sample_stack_user value.

Being able to dump parts of the user stack, starting from the stack
pointer, will be useful to make a post mortem dwarf CFI based stack
unwinding.

Added HAVE_PERF_USER_STACK_DUMP config option to determine if the
architecture provides user stack dump on perf event samples.  This needs
access to the user stack pointer which is not unified across
architectures. Enabling this for x86 architecture.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Original-patch-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1344345647-11536-6-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-08-10 12:17:58 -03:00
Jiri Olsa
4018994f3d perf: Add ability to attach user level registers dump to sample
Introducing PERF_SAMPLE_REGS_USER sample type bit to trigger the dump of
user level registers on sample. Registers we want to dump are specified
by sample_regs_user bitmask.

Only user level registers are dumped at the moment. Meaning the register
values of the user space context as it was before the user entered the
kernel for whatever reason (syscall, irq, exception, or a PMI happening
in userspace).

The layout of the sample_regs_user bitmap is described in
asm/perf_regs.h for archs that support register dump.

This is going to be useful to bring Dwarf CFI based stack unwinding on
top of samples.

Original-patch-by: Frederic Weisbecker <fweisbec@gmail.com>
[ Dump registers ABI specification. ]
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Suggested-by: Stephane Eranian <eranian@google.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1344345647-11536-3-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-08-10 11:31:26 -03:00
Andrew Vagin
e6dab5ffab perf/trace: Add ability to set a target task for events
A few events are interesting not only for a current task.
For example, sched_stat_* events are interesting for a task
which wakes up. For this reason, it will be good if such
events will be delivered to a target task too.

Now a target task can be set by using __perf_task().

The original idea and a draft patch belongs to Peter Zijlstra.

I need these events for profiling sleep times. sched_switch is used for
getting callchains and sched_stat_* is used for getting time periods.
These events are combined in user space, then it can be analyzed by
perf tools.

Inspired-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arun Sharma <asharma@fb.com>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1342016098-213063-1-git-send-email-avagin@openvz.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-31 17:02:05 +02:00
Yan, Zheng
0cda4c0231 perf: Introduce perf_pmu_migrate_context()
Originally from Peter Zijlstra. The helper migrates perf events
from one cpu to another cpu.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1339741902-8449-5-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-18 12:13:21 +02:00
Yan, Zheng
e2d37cd213 perf: Allow the PMU driver to choose the CPU on which to install events
Allow the pmu->event_init callback to change event->cpu, so the PMU driver
can choose the CPU on which to install events.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1339741902-8449-4-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-18 12:13:21 +02:00
Yan, Zheng
fbfc623f82 perf: Avoid race between cpu hotplug and installing event
perf_event_open() requires the cpu on which to install event is online,
but the cpu can go offline after perf_event_open checks that. Add a
get_online_cpus()/put_online_cpus() pair to avoid the race.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1339741902-8449-3-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-18 12:13:20 +02:00
Salman Qazi
9c5da09d26 perf: Use css_tryget() to avoid propping up css refcount
An rmdir pushes css's ref count to zero.  However, if the associated
directory is open at the time, the dentry ref count is non-zero.  If
the fd for this directory is then passed into perf_event_open, it
does a css_get().  This bounces the ref count back up from zero.  This
is a problem by itself.  But what makes it turn into a crash is the
fact that we end up doing an extra dput, since we perform a dput
when css_put sees the ref count go down to zero.

css_tryget() does not fall into that trap. So, we use that instead.

Reproduction test-case for the bug:

 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <linux/unistd.h>
 #include <linux/perf_event.h>
 #include <string.h>
 #include <errno.h>
 #include <stdio.h>

 #define PERF_FLAG_PID_CGROUP    (1U << 2)

 int perf_event_open(struct perf_event_attr *hw_event_uptr,
                     pid_t pid, int cpu, int group_fd, unsigned long flags) {
         return syscall(__NR_perf_event_open,hw_event_uptr, pid, cpu,
                 group_fd, flags);
 }

 /*
  * Directly poke at the perf_event bug, since it's proving hard to repro
  * depending on where in the kernel tree.  what moved?
  */
 int main(int argc, char **argv)
 {
        int fd;
        struct perf_event_attr attr;
        memset(&attr, 0, sizeof(attr));
        attr.exclude_kernel = 1;
        attr.size = sizeof(attr);
        mkdir("/dev/cgroup/perf_event/blah", 0777);
        fd = open("/dev/cgroup/perf_event/blah", O_RDONLY);
        perror("open");
        rmdir("/dev/cgroup/perf_event/blah");
        sleep(2);
        perf_event_open(&attr, fd, 0, -1,  PERF_FLAG_PID_CGROUP);
        perror("perf_event_open");
        close(fd);
        return 0;
 }

Signed-off-by: Salman Qazi <sqazi@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20120614223108.1025.2503.stgit@dungbeetle.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-18 11:45:57 +02:00
Namhyung Kim
cb7225feec perf: Remove duplicate invocation on perf_event_for_each
The @func callback was invoked twice for group leader when
perf_event_for_each() called. It seems the commit 75f937f24b
("perf_counter: Fix ctx->mutex vs counter ->mutex inversion") made the
mistake during the change.

Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338443506-25009-1-git-send-email-namhyung.kim@lge.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-31 14:01:00 -03:00
Jiri Olsa
ab0cce560e Revert "sched, perf: Use a single callback into the scheduler"
This reverts commit cb04ff9ac4 ("sched, perf: Use a single
callback into the scheduler").

Before this change was introduced, the process switch worked
like this (wrt. to perf event schedule):

     schedule (prev, next)
       - schedule out all perf events for prev
       - switch to next
       - schedule in all perf events for current (next)

After the commit, the process switch looks like:

     schedule (prev, next)
       - schedule out all perf events for prev
       - schedule in all perf events for (next)
       - switch to next

The problem is, that after we schedule perf events in, the pmu
is enabled and we can receive events even before we make the
switch to next - so "current" still being prev process (event
SAMPLE data are filled based on the value of the "current"
process).

Thats exactly what we see for test__PERF_RECORD test. We receive
SAMPLES with PID of the process that our tracee is scheduled
from.

Discussed with Peter Zijlstra:

 > Bah!, yeah I guess reverting is the right thing for now. Sad
 > though.
 >
 > So by having the two hooks we have a black-spot between them
 > where we receive no events at all, this black-spot covers the
 > hand-over of current and we thus don't receive the 'wrong'
 > events.
 >
 > I rather liked we could do away with both that black-spot and
 > clean up the code a little, but apparently people rely on it.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: acme@redhat.com
Cc: paulus@samba.org
Cc: cjashfor@linux.vnet.ibm.com
Cc: fweisbec@gmail.com
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20120523111302.GC1638@m.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-23 17:40:51 +02:00
Arnaldo Carvalho de Melo
16ee6576e2 Merge remote-tracking branch 'tip/perf/urgent' into perf/core
Merge reason: We are going to queue up a dependent patch:

"perf tools: Move parse event automated tests to separated object"

That depends on:

commit e7c72d8
perf tools: Add 'G' and 'H' modifiers to event parsing

Conflicts:
	tools/perf/builtin-stat.c

Conflicted with the recent 'perf_target' patches when checking the
result of perf_evsel open routines to see if a retry is needed to cope
with older kernels where the exclude guest/host perf_event_attr bits
were not used.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-18 13:13:33 -03:00
Peter Zijlstra
cb04ff9ac4 sched, perf: Use a single callback into the scheduler
We can easily use a single callback for both sched-in and sched-out. This
reduces the code footprint in the scheduler path as well as removes
the PMU black spot otherwise present between the out and in callback.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-o56ajxp1edwqg6x9d31wb805@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 15:23:17 +02:00
Robert Richter
fd0d000b2c perf: Pass last sampling period to perf_sample_data_init()
We always need to pass the last sample period to
perf_sample_data_init(), otherwise the event distribution will be
wrong. Thus, modifiyng the function interface with the required period
as argument. So basically a pattern like this:

        perf_sample_data_init(&data, ~0ULL);
        data.period = event->hw.last_period;

will now be like that:

        perf_sample_data_init(&data, ~0ULL, event->hw.last_period);

Avoids unininitialized data.period and simplifies code.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-3-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 15:23:12 +02:00
Robert Richter
33b07b8be7 perf: Use static variant of perf_event_overflow in core.c
No need to have an additional function layer.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333643084-26776-4-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-04-26 13:52:52 +02:00