Commit Graph

785 Commits

Author SHA1 Message Date
Linus Torvalds
68cf8d0c72 Merge branch 'for-3.12/core' of git://git.kernel.dk/linux-block
Pull block IO fixes from Jens Axboe:
 "After merge window, no new stuff this time only a collection of neatly
  confined and simple fixes"

* 'for-3.12/core' of git://git.kernel.dk/linux-block:
  cfq: explicitly use 64bit divide operation for 64bit arguments
  block: Add nr_bios to block_rq_remap tracepoint
  If the queue is dying then we only call the rq->end_io callout. This leaves bios setup on the request, because the caller assumes when the blk_execute_rq_nowait/blk_execute_rq call has completed that the rq->bios have been cleaned up.
  bio-integrity: Fix use of bs->bio_integrity_pool after free
  blkcg: relocate root_blkg setting and clearing
  block: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...)
  block: trace all devices plug operation
2013-09-22 15:00:11 -07:00
Linus Torvalds
0fbf2cc983 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "These are mostly bug fixes and a two small performance fixes.  The
  most important of the bunch are Josef's fix for a snapshotting
  regression and Mark's update to fix compile problems on arm"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits)
  Btrfs: create the uuid tree on remount rw
  btrfs: change extent-same to copy entire argument struct
  Btrfs: dir_inode_operations should use btrfs_update_time also
  btrfs: Add btrfs: prefix to kernel log output
  btrfs: refuse to remount read-write after abort
  Btrfs: btrfs_ioctl_default_subvol: Revert back to toplevel subvolume when arg is 0
  Btrfs: don't leak transaction in btrfs_sync_file()
  Btrfs: add the missing mutex unlock in write_all_supers()
  Btrfs: iput inode on allocation failure
  Btrfs: remove space_info->reservation_progress
  Btrfs: kill delay_iput arg to the wait_ordered functions
  Btrfs: fix worst case calculator for space usage
  Revert "Btrfs: rework the overcommit logic to be based on the total size"
  Btrfs: improve replacing nocow extents
  Btrfs: drop dir i_size when adding new names on replay
  Btrfs: replay dir_index items before other items
  Btrfs: check roots last log commit when checking if an inode has been logged
  Btrfs: actually log directory we are fsync()'ing
  Btrfs: actually limit the size of delalloc range
  Btrfs: allocate the free space by the existed max extent size when ENOSPC
  ...
2013-09-22 14:58:49 -07:00
Jun'ichi Nomura
75afb35299 block: Add nr_bios to block_rq_remap tracepoint
Adding the number of bios in a remapped request to 'block_rq_remap'
tracepoint.

Request remapper clones bios in a request to track the completion
status of each bio. So the number of bios can be useful information
for investigation.

Related discussions:
  http://www.redhat.com/archives/dm-devel/2013-August/msg00084.html
  http://www.redhat.com/archives/dm-devel/2013-September/msg00024.html

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-09-21 13:57:47 -06:00
David Sterba
13fd8da98f btrfs: add lockdep and tracing annotations for uuid tree
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 10:58:56 -04:00
Linus Torvalds
26935fb06e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 4 from Al Viro:
 "list_lru pile, mostly"

This came out of Andrew's pile, Al ended up doing the merge work so that
Andrew didn't have to.

Additionally, a few fixes.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
  super: fix for destroy lrus
  list_lru: dynamically adjust node arrays
  shrinker: Kill old ->shrink API.
  shrinker: convert remaining shrinkers to count/scan API
  staging/lustre/libcfs: cleanup linux-mem.h
  staging/lustre/ptlrpc: convert to new shrinker API
  staging/lustre/obdclass: convert lu_object shrinker to count/scan API
  staging/lustre/ldlm: convert to shrinkers to count/scan API
  hugepage: convert huge zero page shrinker to new shrinker API
  i915: bail out earlier when shrinker cannot acquire mutex
  drivers: convert shrinkers to new count/scan API
  fs: convert fs shrinkers to new scan/count API
  xfs: fix dquot isolation hang
  xfs-convert-dquot-cache-lru-to-list_lru-fix
  xfs: convert dquot cache lru to list_lru
  xfs: rework buffer dispose list tracking
  xfs-convert-buftarg-lru-to-generic-code-fix
  xfs: convert buftarg LRU to generic code
  fs: convert inode and dentry shrinking to be node aware
  vmscan: per-node deferred work
  ...
2013-09-12 15:01:38 -07:00
Linus Torvalds
b7c09ad401 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "This is against 3.11-rc7, but was pulled and tested against your tree
  as of yesterday.  We do have two small incrementals queued up, but I
  wanted to get this bunch out the door before I hop on an airplane.

  This is a fairly large batch of fixes, performance improvements, and
  cleanups from the usual Btrfs suspects.

  We've included Stefan Behren's work to index subvolume UUIDs, which is
  targeted at speeding up send/receive with many subvolumes or snapshots
  in place.  It closes a long standing performance issue that was built
  in to the disk format.

  Mark Fasheh's offline dedup work is also here.  In this case offline
  means the FS is mounted and active, but the dedup work is not done
  inline during file IO.  This is a building block where utilities are
  able to ask the FS to dedup a series of extents.  The kernel takes
  care of verifying the data involved really is the same.  Today this
  involves reading both extents, but we'll continue to evolve the
  patches"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (118 commits)
  Btrfs: optimize key searches in btrfs_search_slot
  Btrfs: don't use an async starter for most of our workers
  Btrfs: only update disk_i_size as we remove extents
  Btrfs: fix deadlock in uuid scan kthread
  Btrfs: stop refusing the relocation of chunk 0
  Btrfs: fix memory leak of uuid_root in free_fs_info
  btrfs: reuse kbasename helper
  btrfs: return btrfs error code for dev excl ops err
  Btrfs: allow partial ordered extent completion
  Btrfs: convert all bug_ons in free-space-cache.c
  Btrfs: add support for asserts
  Btrfs: adjust the fs_devices->missing count on unmount
  Btrf: cleanup: don't check for root_refs == 0 twice
  Btrfs: fix for patch "cleanup: don't check the same thing twice"
  Btrfs: get rid of one BUG() in write_all_supers()
  Btrfs: allocate prelim_ref with a slab allocater
  Btrfs: pass gfp_t to __add_prelim_ref() to avoid always using GFP_ATOMIC
  Btrfs: fix race conditions in BTRFS_IOC_FS_INFO ioctl
  Btrfs: fix race between removing a dev and writing sbs
  Btrfs: remove ourselves from the cluster list under lock
  ...
2013-09-12 09:58:51 -07:00
Srivatsa S. Bhat
f92310c187 mm/page_alloc.c: fix the value of fallback_migratetype in alloc_extfrag tracepoint()
In the current code, the value of fallback_migratetype that is printed
using the mm_page_alloc_extfrag tracepoint, is the value of the
migratetype *after* it has been set to the preferred migratetype (if the
ownership was changed).  Obviously that wouldn't have been the original
intent.  (We already have a separate 'change_ownership' field to tell
whether the ownership of the pageblock was changed from the
fallback_migratetype to the preferred type.)

The intent of the fallback_migratetype field is to show the migratetype
from which we borrowed pages in order to satisfy the allocation request.
So fix the code to print that value correctly.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:19 -07:00
Dave Chinner
a0b02131c5 shrinker: Kill old ->shrink API.
There are no more users of this API, so kill it dead, dead, dead and
quietly bury the corpse in a shallow, unmarked grave in a dark forest deep
in the hills...

[glommer@openvz.org: added flowers to the grave]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Reviewed-by: Greg Thelen <gthelen@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:32 -04:00
Linus Torvalds
bf97293eb8 Merge tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
 "Highlights include:

   - Fix NFSv4 recovery so that it doesn't recover lost locks in cases
     such as lease loss due to a network partition, where doing so may
     result in data corruption.  Add a kernel parameter to control
     choice of legacy behaviour or not.
   - Performance improvements when 2 processes are writing to the same
     file.
   - Flush data to disk when an RPCSEC_GSS session timeout is imminent.
   - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
     NFS clients from being able to manipulate our lease and file
     locking state.
   - Allow sharing of RPCSEC_GSS caches between different rpc clients.
   - Fix the broken NFSv4 security auto-negotiation between client and
     server.
   - Fix rmdir() to wait for outstanding sillyrename unlinks to complete
   - Add a tracepoint framework for debugging NFSv4 state recovery
     issues.
   - Add tracing to the generic NFS layer.
   - Add tracing for the SUNRPC socket connection state.
   - Clean up the rpc_pipefs mount/umount event management.
   - Merge more patches from Chuck in preparation for NFSv4 migration
     support"

* tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (107 commits)
  NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity
  NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set
  NFSv4: Allow security autonegotiation for submounts
  NFSv4: Disallow security negotiation for lookups when 'sec=' is specified
  NFSv4: Fix security auto-negotiation
  NFS: Clean up nfs_parse_security_flavors()
  NFS: Clean up the auth flavour array mess
  NFSv4.1 Use MDS auth flavor for data server connection
  NFS: Don't check lock owner compatability unless file is locked (part 2)
  NFS: Don't check lock owner compatibility in writes unless file is locked
  nfs4: Map NFS4ERR_WRONG_CRED to EPERM
  nfs4.1: Add SP4_MACH_CRED write and commit support
  nfs4.1: Add SP4_MACH_CRED stateid support
  nfs4.1: Add SP4_MACH_CRED secinfo support
  nfs4.1: Add SP4_MACH_CRED cleanup support
  nfs4.1: Add state protection handler
  nfs4.1: Minimal SP4_MACH_CRED implementation
  SUNRPC: Replace pointer values with task->tk_pid and rpc_clnt->cl_clid
  SUNRPC: Add an identifier for struct rpc_clnt
  SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints
  ...
2013-09-09 09:19:15 -07:00
Trond Myklebust
92cb6c5be8 SUNRPC: Replace pointer values with task->tk_pid and rpc_clnt->cl_clid
Instead of the pointer values, use the task and client identifier values
for tracing purposes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05 10:13:16 -04:00
Linus Torvalds
ae67d9a888 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "New features for 3.12:

   - Added aggressive extent caching using the extent status tree.  This
     can actually decrease memory usage in read-mostly workloads since
     the information is much more compactly stored in the extent status
     tree than if we had to keep the extent tree metadata blocks in the
     buffer cache.  This also improves Asynchronous I/O since it is it
     makes much less likely that we need to do metadata I/O to lookup
     the extent tree information.

   - Improve the recovery after corrupted allocation bitmaps are found
     when running in errors=ignore mode.

  Also fixed some writeback vs truncate races when using a blocksize
  less than the page size"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits)
  ext4: allow specifying external journal by pathname mount option
  ext4: mark group corrupt on group descriptor checksum
  ext4: mark block group as corrupt on inode bitmap error
  ext4: mark block group as corrupt on block bitmap error
  ext4: fix type declaration of ext4_validate_block_bitmap
  ext4: error out if verifying the block bitmap fails
  jbd2: Fix endian mixing problems in the checksumming code
  ext4: isolate ext4_extents.h file
  ext4: Fix misspellings using 'codespell' tool
  ext4: convert write_begin methods to stable_page_writes semantics
  ext4: fix use of potentially uninitialized variables in debugging code
  ext4: fix lost truncate due to race with writeback
  ext4: simplify truncation code in ext4_setattr()
  ext4: fix ext4_writepages() in presence of truncate
  ext4: move test whether extent to map can be extended to one place
  ext4: fix warning in ext4_da_update_reserve_space()
  quota: provide interface for readding allocated space into reserved space
  ext4: avoid reusing recently deleted inodes in no journal mode
  ext4: allocate delayed allocation blocks before rename
  ext4: start handle at least possible moment when renaming files
  ...
2013-09-04 17:19:27 -07:00
Linus Torvalds
6832d9652f Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timers/nohz changes from Ingo Molnar:
 "It mostly contains fixes and full dynticks off-case optimizations, by
  Frederic Weisbecker"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  nohz: Include local CPU in full dynticks global kick
  nohz: Optimize full dynticks's sched hooks with static keys
  nohz: Optimize full dynticks state checks with static keys
  nohz: Rename a few state variables
  vtime: Always debug check snapshot source _before_ updating it
  vtime: Always scale generic vtime accounting results
  vtime: Optimize full dynticks accounting off case with static keys
  vtime: Describe overriden functions in dedicated arch headers
  m68k: hardirq_count() only need preempt_mask.h
  hardirq: Split preempt count mask definitions
  context_tracking: Split low level state headers
  vtime: Fix racy cputime delta update
  vtime: Remove a few unneeded generic vtime state checks
  context_tracking: User/kernel broundary cross trace events
  context_tracking: Optimize context switch off case with static keys
  context_tracking: Optimize guest APIs off case with static key
  context_tracking: Optimize main APIs off case with static key
  context_tracking: Ground setup for static key use
  context_tracking: Remove full dynticks' hacky dependency on wide context tracking
  nohz: Only enable context tracking on full dynticks CPUs
  ...
2013-09-04 09:36:54 -07:00
Trond Myklebust
40b5ea0c25 SUNRPC: Add tracepoints to help debug socket connection issues
Add client side debugging to help trace socket connection/disconnection
and unexpected state change issues.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-04 12:26:31 -04:00
Linus Torvalds
0d99b70873 Merge branches 'perf-urgent-for-linus' and 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "As a first remark I'd like to point out that the obsolete '-f'
  (--force) option, which has not done anything for several releases,
  has been removed from 'perf record' and related utilities.  Everyone
  please update muscle memory accordingly! :-)

  Main changes on the perf kernel side:

   - Performance optimizations:
        . for trace events, by Steve Rostedt.
        . for time values, by Peter Zijlstra

   - New hardware support:
        . for Intel Silvermont (22nm Atom) CPUs, by Zheng Yan
        . for Intel SNB-EP uncore PMUs, by Zheng Yan

   - Enhanced hardware support:
        . for Intel uncore PMUs: add filter support for QPI boxes, by Zheng Yan

   - Core perf events code enhancements and fixes:
        . for full-nohz feature handling, by Frederic Weisbecker
        . for group events, by Jiri Olsa
        . for call chains, by Frederic Weisbecker
        . for event stream parsing, by Adrian Hunter

   - New ABI details:
        . Add attr->mmap2 attribute, by Stephane Eranian
        . Add PERF_EVENT_IOC_ID ioctl to return event ID, by Jiri Olsa
        . Export u64 time_zero on the mmap header page to allow TSC
          calculation, by Adrian Hunter
        . Add dummy software event, by Adrian Hunter.
        . Add a new PERF_SAMPLE_IDENTIFIER to make samples always
          parseable, by Adrian Hunter.
        . Make Power7 events available via sysfs, by Runzhen Wang.

   - Code cleanups and refactorings:
        . for nohz-full, by Frederic Weisbecker
        . for group events, by Jiri Olsa

   - Documentation updates:
        . for perf_event_type, by Peter Zijlstra

  Main changes on the perf tooling side (some of these tooling changes
  utilize the above kernel side changes):

   - Lots of 'perf trace' enhancements:

        . Make 'perf trace' command line arguments consistent with
          'perf record', by David Ahern.

        . Allow specifying syscalls a la strace, by Arnaldo Carvalho de Melo.

        . Add --verbose and -o/--output options, by Arnaldo Carvalho de Melo.

        . Support ! in -e expressions, to filter a list of syscalls,
          by Arnaldo Carvalho de Melo.

        . Arg formatting improvements to allow masking arguments in
          syscalls such as futex and open, where the some arguments are
          ignored and thus should not be printed depending on other args,
          by Arnaldo Carvalho de Melo.

        . Beautify futex open, openat, open_by_handle_at, lseek and futex
          syscalls, by Arnaldo Carvalho de Melo.

        . Add option to analyze events in a file versus live, so that
          one can do:

           [root@zoo ~]# perf record -a -e raw_syscalls:* sleep 1
           [ perf record: Woken up 0 times to write data ]
           [ perf record: Captured and wrote 25.150 MB perf.data (~1098836 samples) ]
           [root@zoo ~]# perf trace -i perf.data -e futex --duration 1
              17.799 ( 1.020 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, ua
             113.344 (95.429 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 4294967
             133.778 ( 1.042 ms): 18004 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 429496
           [root@zoo ~]#

          By David Ahern.

        . Honor target pid / tid options when analyzing a file, by David Ahern.

        . Introduce better formatting of syscall arguments, including so
          far beautifiers for mmap, madvise, syscall return values,
          by Arnaldo Carvalho de Melo.

        . Handle HUGEPAGE defines in the mmap beautifier, by David Ahern.

   - 'perf report/top' enhancements:

        . Do annotation using /proc/kcore and /proc/kallsyms when
          available, removing the forced need for a vmlinux file kernel
          assembly annotation. This also improves this use case because
          vmlinux has just the initial kernel image, not what is actually
          in use after various code patchings by things like alternatives.
          By Adrian Hunter.

        . Add --ignore-callees=<regex> option to collapse undesired parts
          of call graphs, by Greg Price.

        . Simplify symbol filtering by doing it at machine class level,
          by Adrian Hunter.

        . Add support for callchains in the gtk UI, by Namhyung Kim.

        . Add --objdump option to 'perf top', by Sukadev Bhattiprolu.

   - 'perf kvm' enhancements:

        . Add option to print only events that exceed a specified time
          duration, by David Ahern.

        . Improve stack trace printing, by David Ahern.

        . Update documentation of the live command, by David Ahern

        . Add perf kvm stat live mode that combines aspects of 'perf kvm
          stat' record and report, by David Ahern.

        . Add option to analyze specific VM in perf kvm stat report, by
          David Ahern.

        . Do not require /lib/modules/* on a guest, by Jason Wessel.

   - 'perf script' enhancements:

        . Fix symbol offset computation for some dsos, by David Ahern.

        . Fix named threads support, by David Ahern.

        . Don't install scripting files files when perl/python support
          is disabled, by Arnaldo Carvalho de Melo.

   - 'perf test' enhancements:

        . Add various improvements and fixes to the "vmlinux matches
          kallsyms" 'perf test' entry, related to the /proc/kcore
          annotation feature. By Adrian Hunter.

        . Add sample parsing test, by Adrian Hunter.

        . Add test for reading object code, by Adrian Hunter.

        . Add attr record group sampling test, by Jiri Olsa.

        . Misc testing infrastructure improvements and other details,
          by Jiri Olsa.

   - 'perf list' enhancements:

        . Skip unsupported hardware events, by Namhyung Kim.

        . List pmu events, by Andi Kleen.

   - 'perf diff' enhancements:

        . Add support for more than two files comparison, by Jiri Olsa.

   - 'perf sched' enhancements:

        . Various improvements, including removing reliance on some
          scheduler tracepoints that provide the same information as the
          PERF_RECORD_{FORK,EXIT} events. By David Ahern.

        . Remove odd build stall by moving a large struct initialization
          from a local variable to a global one, by Namhyung Kim.

   - 'perf stat' enhancements:

        . Add --initial-delay option to skip measuring for a defined
          startup phase, by Andi Kleen.

   - Generic perf tooling infrastructure/plumbing changes:

        . Tidy up sample parsing validation, by Adrian Hunter.

        . Fix up jobserver setup in libtraceevent Makefile.
          by Arnaldo Carvalho de Melo.

        . Debug improvements, by Adrian Hunter.

        . Fix correlation of samples coming after PERF_RECORD_EXIT event,
          by David Ahern.

        . Improve robustness of the topology parsing code,
          by Stephane Eranian.

        . Add group leader sampling, that allows just one event in a group
          to sample while the other events have just its values read,
          by Jiri Olsa.

        . Add support for a new modifier "D", which requests that the
          event, or group of events, be pinned to the PMU.
          By Michael Ellerman.

        . Support callchain sorting based on addresses, by Andi Kleen

        . Prep work for multi perf data file storage, by Jiri Olsa.

        . libtraceevent cleanups, by Namhyung Kim.

  And lots and lots of other fixes and code reorganizations that did not
  make it into the list, see the shortlog, diffstat and the Git log for
  details!"

[ Also merge a leftover from the 3.11 cycle ]

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Prevent race in unthrottling code

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (237 commits)
  perf trace: Tell arg formatters the arg index
  perf trace: Add beautifier for open's flags arg
  perf trace: Add beautifier for lseek's whence arg
  perf tools: Fix symbol offset computation for some dsos
  perf list: Skip unsupported events
  perf tests: Add 'keep tracking' test
  perf tools: Add support for PERF_COUNT_SW_DUMMY
  perf: Add a dummy software event to keep tracking
  perf trace: Add beautifier for futex 'operation' parm
  perf trace: Allow syscall arg formatters to mask args
  perf: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node()
  perf: Export struct perf_branch_entry to userspace
  perf: Add attr->mmap2 attribute to an event
  perf/x86: Add Silvermont (22nm Atom) support
  perf/x86: use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_X
  perf trace: Handle missing HUGEPAGE defines
  perf trace: Honor target pid / tid options when analyzing a file
  perf trace: Add option to analyze events in a file versus live
  perf evlist: Add tracepoint lookup by name
  perf tests: Add a sample parsing test
  ...
2013-09-04 08:25:35 -07:00
Linus Torvalds
b854e4de0b Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "Main RCU changes this cycle were:

   - Full-system idle detection.  This is for use by Frederic
     Weisbecker's adaptive-ticks mechanism.  Its purpose is to allow the
     timekeeping CPU to shut off its tick when all other CPUs are idle.

   - Miscellaneous fixes.

   - Improved rcutorture test coverage.

   - Updated RCU documentation"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
  nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
  nohz_full: Add full-system-idle state machine
  jiffies: Avoid undefined behavior from signed overflow
  rcu: Simplify _rcu_barrier() processing
  rcu: Make rcutorture emit online failures if verbose
  rcu: Remove unused variable from rcu_torture_writer()
  rcu: Sort rcutorture module parameters
  rcu: Increase rcutorture test coverage
  rcu: Add duplicate-callback tests to rcutorture
  doc: Fix memory-barrier control-dependency example
  rcu: Update RTFP documentation
  nohz_full: Add full-system-idle arguments to API
  nohz_full: Add full-system idle states and variables
  nohz_full: Add per-CPU idle-state tracking
  nohz_full: Add rcu_dyntick data for scalable detection of all-idle state
  nohz_full: Add Kconfig parameter for scalable detection of all-idle state
  nohz_full: Add testing information to documentation
  rcu: Eliminate unused APIs intended for adaptive ticks
  rcu: Select IRQ_WORK from TREE_PREEMPT_RCU
  rculist: list_first_or_null_rcu() should use list_entry_rcu()
  ...
2013-09-04 08:17:12 -07:00
Ingo Molnar
7d992feb76 Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

"
 * Update RCU documentation.  These were posted to LKML at
   https://lkml.org/lkml/2013/8/19/611.

 * Miscellaneous fixes.  These were posted to LKML at
   https://lkml.org/lkml/2013/8/19/619.

 * Full-system idle detection.  This is for use by Frederic
   Weisbecker's adaptive-ticks mechanism.  Its purpose is
   to allow the timekeeping CPU to shut off its tick when
   all other CPUs are idle.  These were posted to LKML at
   https://lkml.org/lkml/2013/8/19/648.

 * Improve rcutorture test coverage.  These were posted to LKML at
   https://lkml.org/lkml/2013/8/19/675.
"

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-03 07:41:11 +02:00
Liu Bo
599c75ec3f Btrfs/tracepoint: update delayed ref tracepoints
This shows exactly how btrfs processes the delayed refs onto disks,
which is very helpful on understanding delayed ref mechanism and
debugging related bugs.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 07:57:39 -04:00
Zheng Liu
d7b2a00c2e ext4: isolate ext4_extents.h file
After applied the commit (4a092d73), we have reduced the number of
source files that need to #include ext4_extents.h.  But we can do
better.

This commit defines ext4_zeroout_es() in extents.c and move
EXT_MAX_BLOCKS into ext4.h in order not to include ext4_extents.h in
indirect.c and ioctl.c.  Meanwhile we just need to include this file in
extent_status.c when ES_AGGRESSIVE_TEST is defined.  Otherwise, this
commit removes a duplicated declaration in trace/events/ext4.h.

After applied this patch, we just need to include ext4_extents.h file
in {super,migrate,move_extents,extents}.c, and it is easy for us to
define a new extent disk layout.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-08-28 14:47:06 -04:00
Rafael J. Wysocki
e894245c78 Merge branch 'pm-sleep'
* pm-sleep:
  PM / Sleep: new trace event to print device suspend and resume times
  PM / Sleep: increase ftrace coverage in suspend/resume
2013-08-27 01:41:47 +02:00
Theodore Ts'o
107a7bd31a ext4: cache all of an extent tree's leaf block upon reading
When we read in an extent tree leaf block from disk, arrange to have
all of its entries cached.  In nearly all cases the in-memory
representation will be more compact than the on-disk representation in
the buffer cache, and it allows us to get the information without
having to traverse the extent tree for successive extents.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
2013-08-16 21:23:41 -04:00
Theodore Ts'o
3be78c7317 ext4: use unsigned int for es_status values
Don't use an unsigned long long for the es_status flags; this requires
that we pass 64-bit values around which is painful on 32-bit systems.
Instead pass the extent status flags around using the low 4 bits of an
unsigned int, and shift them into place when we are reading or writing
es_pblk.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
2013-08-16 21:22:41 -04:00
Frederic Weisbecker
1b6a259aa5 context_tracking: User/kernel broundary cross trace events
This can be useful to track all kernel/user round trips.
And it's also helpful to debug the context tracking subsystem.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
2013-08-14 17:14:48 +02:00
Oleg Nesterov
d027e6a9c8 tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible
perf_trace_buf_prepare() + perf_trace_buf_submit(task => NULL)
make no sense if hlist_empty(head). Change perf_trace_##call()
to check ->perf_events beforehand and do nothing if it is empty.

This removes the overhead for tasks without events associated
with them. For example, "perf record -e sched:sched_switch -p1"
attaches the counter(s) to the single task, but every task in
system will do perf_trace_buf_prepare/submit() just to realize
that it was not attached to this event.

However, we can only do this if __task == NULL, so we also add
the __builtin_constant_p(__task) check.

With this patch "perf bench sched pipe" shows approximately 4%
improvement when "perf record -p1" runs in parallel, many thanks
to Steven for the testing.

Link: http://lkml.kernel.org/r/20130806160847.GA2746@redhat.com

Tested-by: David Ahern <dsahern@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-13 21:06:30 -04:00
Oleg Nesterov
12473965c3 tracing/perf: Reimplement TP_perf_assign() logic
The next patch tries to avoid the costly perf_trace_buf_* calls
when possible but there is a problem. We can only do this if
__task == NULL, perf_tp_event(task != NULL) has the additional
code for this case.

Unfortunately, TP_perf_assign/__perf_xxx which changes the default
values of __count/__task variables for perf_trace_buf_submit() is
called "too late", after we already did perf_trace_buf_prepare(),
and the optimization above can't work.

So this patch simply embeds __perf_xxx() into TP_ARGS(), this way
DECLARE_EVENT_CLASS() can use the result of assignments hidden in
"args" right after ftrace_get_offsets_##call() which is mostly
trivial. This allows us to have the fast-path "__task != NULL"
check at the start, see the next patch.

Link: http://lkml.kernel.org/r/20130806160844.GA2739@redhat.com

Tested-by: David Ahern <dsahern@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-13 21:05:57 -04:00
Oleg Nesterov
36009d07b7 tracing/perf: Expand TRACE_EVENT(sched_stat_runtime)
To simplify the review of the next patches:

1. We are going to reimplent __perf_task/counter and embedd them
   into TP_ARGS(). expand TRACE_EVENT(sched_stat_runtime) into
   DECLARE_EVENT_CLASS() + DEFINE_EVENT(), this way they can use
   different TP_ARGS's.

2. Change perf_trace_##call() macro to do perf_fetch_caller_regs()
   right before perf_trace_buf_prepare().

   This way it evaluates TP_ARGS() asap, the next patch explores
   this fact.

   Note: after 87f44bbc perf_trace_buf_prepare() doesn't need
   "struct pt_regs *regs", perhaps it makes sense to remove this
   argument. And perhaps we can teach perf_trace_buf_submit()
   to accept regs == NULL and do fetch_caller_regs(CALLER_ADDR1)
   in this case.

3. Cosmetic, but the typecast from "void*" buys nothing. It just
   adds the noise, remove it.

Link: http://lkml.kernel.org/r/20130806160841.GA2736@redhat.com

Acked-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-13 21:05:12 -04:00