The code to update the print formats for events requires a vprintf
format in the trace_seq. This patch adds that interface.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
these new capabilities to this tracepoint:
- zero-copy and per-cpu splice() tracing
- binary tracing without printf overhead
- structured logging records exposed under /debug/tracing/events
- trace events embedded in function tracer output and other plugins
- user-defined, per tracepoint filter expressions
...
Cons:
- no dev_t info for the output of plug, unplug_timer and unplug_io events.
no dev_t info for getrq and sleeprq events if bio == NULL.
no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.
This is mainly because we can't get the deivce from a request queue.
But this may change in the future.
- A packet command is converted to a string in TP_assign, not TP_print.
While blktrace do the convertion just before output.
Since pc requests should be rather rare, this is not a big issue.
- In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
has a unique format, which means we have some unused data in a trace entry.
The overhead is minimized by using __dynamic_array() instead of __array().
I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:
dd dd + ioctl blktrace dd + TRACE_EVENT (splice)
1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s
2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s
3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/s
So the overhead of tracing is very small, and no regression when using
those trace events vs blktrace.
And the binary output of TRACE_EVENT is much smaller than blktrace:
# ls -l -h
-rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
-rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
-rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out
Following are some comparisons between TRACE_EVENT and blktrace:
plug:
kjournald-480 [000] 303.084981: block_plug: [kjournald]
kjournald-480 [000] 303.084981: 8,0 P N [kjournald]
unplug_io:
kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1
kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1
remap:
kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384
kjournald-480 [000] 303.085043: 8,0 A W 102736992 + 8 <- (8,8) 33384
bio_backmerge:
kjournald-480 [000] 303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald]
kjournald-480 [000] 303.085086: 8,0 M W 102737032 + 8 [kjournald]
getrq:
kjournald-480 [000] 303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald]
kjournald-480 [000] 303.084975: 8,0 G W 102736984 + 8 [kjournald]
bash-2066 [001] 1072.953770: 8,0 G N [bash]
bash-2066 [001] 1072.953773: block_getrq: 0,0 N 0 + 0 [bash]
rq_complete:
konsole-2065 [001] 300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0]
konsole-2065 [001] 300.053191: 8,0 C W 103669040 + 16 [0]
ksoftirqd/1-7 [001] 1072.953811: 8,0 C N (5a 00 08 00 00 00 00 00 24 00) [0]
ksoftirqd/1-7 [001] 1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0]
rq_insert:
kjournald-480 [000] 303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald]
kjournald-480 [000] 303.084986: 8,0 I W 102736984 + 8 [kjournald]
Changelog from v2 -> v3:
- use the newly introduced __dynamic_array().
Changelog from v1 -> v2:
- use __string() instead of __array() to minimize the memory required
to store hex dump of rq->cmd().
- support large pc requests.
- add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.
- some cleanups.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
On Sun, 7 Jun 2009, Ingo Molnar wrote:
> Testing tracer sched_switch: <6>Starting ring buffer hammer
> PASSED
> Testing tracer sysprof: PASSED
> Testing tracer function: PASSED
> Testing tracer irqsoff:
> =============================================
> PASSED
> Testing tracer preemptoff: PASSED
> Testing tracer preemptirqsoff: [ INFO: possible recursive locking detected ]
> PASSED
> Testing tracer branch: 2.6.30-rc8-tip-01972-ge5b9078-dirty #5760
> ---------------------------------------------
> rb_consumer/431 is trying to acquire lock:
> (&cpu_buffer->reader_lock){......}, at: [<c109eef7>] ring_buffer_reset_cpu+0x37/0x70
>
> but task is already holding lock:
> (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0
>
> other info that might help us debug this:
> 1 lock held by rb_consumer/431:
> #0: (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0
The ring buffer is a generic structure, and can be used outside of
ftrace. If ftrace traces within the use of the ring buffer, it can produce
false positives with lockdep.
This patch passes in a static lock key into the allocation of the ring
buffer, so that different ring buffers will have their own lock class.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1244477919.13761.9042.camel@twins>
[ store key in ring buffer descriptor ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The trace_pipe did not recognize the latency format flag and would produce
different output than the trace file. The problem was partly due that
the trace flags in the iterator was not set as well as the trace_pipe
zeros out part of the iterator (including the flags) to be able to use
the same routines as the trace file. trace_flags of the iterator should
not cause any problems when not zeroed out by for trace_pipe.
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This patch adds __print_symbolic which is similar to __print_flags but
works for an enumeration type instead. That is, there is only a one to one
mapping between the values and the symbols. When a match is made, then
it is printed, otherwise the hex value is outputed.
[ Impact: add interface for showing symbol names in events ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Developers have been asking for the ability in the ftrace event tracer
to display names of bits in a flags variable.
Instead of printing out c2, it would be easier to read FOO|BAR|GOO,
assuming that FOO is bit 1, BAR is bit 6 and GOO is bit 7.
Some examples where this would be useful are the state flags in a context
switch, kmalloc flags, and even permision flags in accessing files.
[
v2 changes include:
Frederic Weisbecker's idea of using a mask instead of bits,
thus we can output GFP_KERNEL instead of GPF_WAIT|GFP_IO|GFP_FS.
Li Zefan's idea of allowing the caller of __print_flags to add their
own delimiter (or no delimiter) where we can get for file permissions
rwx instead of r|w|x.
]
[
v3 changes:
Christoph Hellwig's idea of using an array instead of va_args.
]
[ Impact: better displaying of flags in trace output ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
This patch provides new heuristics for parsing both the form factor and
media rotation rate ATA IDENFITY words.
The reported ATA version must be 7 or greater and the device must return
values defined as valid in the standard. Only then are the
characteristics reported to SCSI via the VPD B1 page.
This seems like a reasonable compromise to me considering that we have
been shipping several kernel releases that key off the rotation rate bit
without any version checking whatsoever. With no complaints so far.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
Revert "mm: add /proc controls for pdflush threads"
viocd: needs to depend on BLOCK
block: fix the bio_vec array index out-of-bounds test
This reverts commit fafd688e4c.
Work is progressing to switch away from pdflush as the process backing
for flushing out dirty data. So it seems pointless to add more knobs
to control pdflush threads. The original author of the patch did not
have any specific use cases for adding the knobs, so we can easily
revert this before 2.6.30 to avoid having to maintain this API
forever.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
dma: fix ipu_idmac.c to not discard the last queued buffer
ioatdma: fix "ioatdma frees DMA memory with wrong function"
ipu_idmac: Use disable_irq_nosync() from within irq handlers.
dmatest: fix max channels handling
as reported by Alexander Beregalov <a.beregalov@gmail.com>
ioatdma 0000:00:08.0: DMA-API: device driver frees DMA memory with
wrong function [device address=0x000000007f76f800] [size=2000 bytes]
[map
ped as single] [unmapped as page]
The ioatdma driver was unmapping all regions
(either allocated as page or single) using unmap_page.
This patch lets dma driver recognize if unmap_single or unmap_page should be used.
It introduces two new dma control flags:
DMA_COMPL_SRC_UNMAP_SINGLE and DMA_COMPL_DEST_UNMAP_SINGLE.
They should be set to indicate dma driver to do dma-unmapping as single
(first one for the source, tha latter for the destination).
If respective flag is not set, the driver assumes dma-unmapping as page.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Tested-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In order to build the generic syscall table, we need a declaration for
every system call. sys_pipe2 was added without a proper declaration, so
add this to syscalls.h now.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
Revert driver core: move platform_data into platform_device
Revert driver core: fix passing platform_data
Remove old PRINTK_DEBUG config item
Doc/sysfs-rules: Swap the order of the words so the sentence makes more sense
Driver core: platform: fix kernel-doc warnings
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (22 commits)
Fix the race between capifs remount and node creation
Fix races around the access to ->s_options
switch ufs directories to ufs_sync_file()
Switch open_exec() and sys_uselib() to do_open_filp()
Make open_exec() and sys_uselib() use may_open(), instead of duplicating its parts
Reduce path_lookup() abuses
Make checkpatch.pl shut up on fs/inode.c
NULL noise in fs/super.c:kill_bdev_super()
romfs: cleanup romfs_fs.h
ROMFS: romfs_dev_read() error ignored
fs: dcache fix LRU ordering
ocfs2: Use nd_set_link().
Fix deadlock in ipathfs ->get_sb()
Fix a leak in failure exit in 9p ->get_sb()
Convert obvious places to deactivate_locked_super()
New helper: deactivate_locked_super()
reiserfs: remove privroot hiding in lookup
reiserfs: dont associate security.* with xattr files
reiserfs: fixup xattr_root caching
Always lookup priv_root on reiserfs mount and keep it
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits)
bonding: fix panic if initialization fails
IXP4xx: complete Ethernet netdev setup before calling register_netdev().
IXP4xx: use "ENODEV" instead of "ENOSYS" in module initialization.
ipvs: Fix IPv4 FWMARK virtual services
ipv4: Make INET_LRO a bool instead of tristate.
net: remove stale reference to fastroute from Kconfig help text
net: update skb_recycle_check() for hardware timestamping changes
bnx2: Fix panic in bnx2_poll_work().
net-sched: fix bfifo default limit
igb: resolve panic on shutdown when SR-IOV is enabled
wimax: oops: wimax_dev_add() is the only one that can initialize the state
wimax: fix oops if netlink fails to add attribute
Bluetooth: Move dev_set_name() to a context that can sleep
netfilter: ctnetlink: fix wrong message type in user updates
netfilter: xt_cluster: fix use of cluster match with 32 nodes
netfilter: ip6t_ipv6header: fix match on packets ending with NEXTHDR_NONE
netfilter: add missing linux/types.h include to xt_LED.h
mac80211: pid, fix memory corruption
mac80211: minstrel, fix memory corruption
cfg80211: fix comment on regulatory hint processing
...
Put generic_show_options read access to s_options under rcu_read_lock,
split save_mount_options() into "we are setting it the first time"
(uses in foo_fill_super()) and "we are relacing and freeing the old one",
synchronize_rcu() before kfree() in the latter.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
There's no kernel-only content in it anymore, so move it to header-y
and remove the superflous #ifdef __KERNEL__.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Does equivalent of up_write(&s->s_umount); deactivate_super(s);
However, it does not does not unlock it until it's all over.
As the result, it's safe to use to dispose of new superblock on ->get_sb()
failure exits - nobody will see the sucker until it's all over.
Equivalent using up_write/deactivate_super is safe for that purpose
if superblock is either safe to use or has NULL ->s_root when we unlock.
Normally filesystems take the required precautions, but
a) we do have bugs in that area in some of them.
b) up_write/deactivate_super sequence is extremely common,
so the helper makes sense anyway.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
With Al Viro's patch to move privroot lookup to fs mount, there's no need
to have special code to hide the privroot in reiserfs_lookup.
I've also cleaned up the privroot hiding in reiserfs_readdir_dentry and
removed the last user of reiserfs_xattrs().
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The xattr_root caching was broken from my previous patch set. It wouldn't
cause corruption, but could cause decreased performance due to allocating
a larger chunk of the journal (~ 27 blocks) than it would actually use.
This patch loads the xattr root dentry at xattr initialization and creates
it on-demand. Since we're using the cached dentry, there's no point
in keeping lookup_or_create_dir around, so that's removed.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... even if it's a negative dentry. That way we can set ->d_op on
root before anyone could race with us. Simplify d_compare(), while
we are at it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This reverts commit 006f4571a1:
This patch moves platform_data from struct device into
struct platform_device, based on the two ideas:
1. Now all platform_driver is registered by platform_driver_register,
which makes probe()/release()/... of platform_driver passed parameter
of platform_device *, so platform driver can get platform_data from
platform_device;
2. Other kind of devices do not need to use platform_data, we can
decrease size of device if moving it to platform_device.
Taking into consideration of thousands of files to be fixed and they
can't be finished in one night(maybe it will take a long time), so we
keep platform_data in device to allow two kind of cases coexist until
all platform devices pass its platfrom data from
platform_device->platform_data.
All patches to do this kind of conversion are welcome.
As we don't really want to do it, it was a bad idea.
Cc: David Brownell <david-b@pacbell.net>
Cc: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>