When CONFIG_XFS_POSIX_ACL is not set "xfs_check_acl" is #defined
to NULL - which breaks the code attempting to add a tracepoint
on this function.
Only define the tracepoint when the function exists.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
map_len is unsigned. Checking map_len <= 0 is buggy when it should be
below zero. So, check exact expression instead of map_len.
Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Avoid a lockdep warning by preventing page cache allocation from
recursing back into the filesystem during memory reclaim.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Calling into memory reclaim with a locked inode buffer can deadlock
if memory reclaim tries to lock the inode buffer during inode
teardown. Convert the relevant memory allocations to use KM_NOFS to
avoid this deadlock condition.
Reported-by: Peter Watkins <treestem@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
xfs_ireclaim has to get and put te pag structure because it is only
called with the inode to reclaim. The one caller of this function
already has a reference on the pag and a pointer to is, so move the
radix tree delete to the caller and remove xfs_ireclaim completely.
This avoids a xfs_perag_get/put on every inode being reclaimed.
The overhead was noticed in a bug report at:
https://bugzilla.kernel.org/show_bug.cgi?id=16348
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
xfs_buf_read() fails to detect dispatch errors before attempting to
wait on sychronous IO. If there was an error, it will get stuck
forever, waiting for an I/O that was never started. Make sure the
error is detected correctly.
Further, such a failure can leave locked pages in the page cache
which will cause a later operation to hang on the page. Ensure that
we correctly process pages in the buffers when we get a dispatch
error.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
I missed Dave Chinner's second revision of this change, and pushed
his first version out to the repository instead.
commit a476c59ebb279d738718edc0e3fb76aab3687114
Author: Dave Chinner <dchinner@redhat.com>
This commit compensates for that by moving a block of code up a bit
further, with a result that matches the the effect of Dave's second
version.
Dave's first version was:
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Dave's second version was:
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
The open_exec file operation is only added by the external dmapi
patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
These days we always have buffers thanks to ->page_mkwrite. And we
already have an assert a few lines above tripping in case that was
not true due to a bug.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
We do need a barrier for the first buffer of a split log write.
Otherwise we might incorrectly stamp the tail LSN into transactions
in the first part of the split write, or not flush data I/O before
updating the inode size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Currently we don't remove the XFS mount from the shrinker list until
late in the unmount path. By this time, we have already torn down
the internals of the filesystem (e.g. the per-ag structures), and
hence if the shrinker is executed between the teardown and the
unregistering, the shrinker will get NULL per-ag structure pointers
and panic trying to dereference them.
Fix this by removing the xfs mount from the shrinker list before
tearing down it's internal structures.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Replace the xfs_itrace_entry catchall with specific trace points. For
most simple callers we now use the simple inode class, which used to
be the iget class, but add more details tracing for namespace events,
which now includes the name of the directory entries manipulated.
Remove the xfs_inactive trace point, which is a duplicate of the clear_inode
one, and the xfs_change_file_space trace point, which is immediately
followed by the more specific alloc/free space trace points.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
xfs_iput is just a small wrapper for xfs_iunlock + IRELE. Having this
out of line wrapper means the trace events in those two can't track
their caller properly. So just remove the wrapper and opencode the
unlock + rele in the few callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
We never get an i_mode of 0 or a locked VFS inode until we pass in the
XFS_IGET_CREATE flag to xfs_iget, which makes xfs_iput_new equivalent to
xfs_iput for the only caller. In addition to that xfs_nfs_get_inode
does not even need to lock the inode given that the generation never changes
for a life inode, so just pass a 0 lock_flags to xfs_iget and release
the inode using IRELE in the error path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
The xfs_iget_alloc/found tracepoints are a bit misnamed and misplaced.
Rename them to xfs_iget_hit/xfs_iget_miss and move them to the beggining
of the xfs_iget_cache_hit/miss functions. Add a new xfs_iget_reclaim_fail
tracepoint for the case where we fail to re-initialize a VFS inode,
and add a second instance of the xfs_iget_skip tracepoint for the case
of a failed igrab() call.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
The tracing code can't print flags defined as enums. Most flags that
we want to print are defines as macros already, but move the few remaining
ones over to make the trace output more useful.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
On the final put of a superblock the VFS already calls sync_filesystem
for us to write out all data and wait for it. No need to start another
asynchronous writeback inside ->put_super.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Remove the flags argument to __xfs_get_blocks as we can easily derive
it from the direct argument, and remove the unused BMAPI_MMAP flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
xfs_iomap passes a xfs_bmbt_irec pointer to xfs_iomap_write_direct and
xfs_iomap_write_allocate to give them the results of our read-only
xfs_bmapi query. Instead of allocating a new xfs_bmbt_irec on stack
for the next call to xfs_bmapi re use the one we got passed as it's not
used after this point.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
We already rely on the fact that the sync code will cause a synchronous
log force later on (currently via xfs_fs_sync_fs -> xfs_quiesce_data ->
xfs_sync_data), so no need to do this here. This allows us to avoid
a lot of synchronous log forces during sync, which pays of especially
with delayed logging enabled. Some compilebench numbers that show
this:
xfs (delayed logging, 256k logbufs)
===================================
intial create 25.94 MB/s 25.75 MB/s 25.64 MB/s
create 8.54 MB/s 9.12 MB/s 9.15 MB/s
patch 2.47 MB/s 2.47 MB/s 3.17 MB/s
compile 29.65 MB/s 30.51 MB/s 27.33 MB/s
clean 90.92 MB/s 98.83 MB/s 128.87 MB/s
read tree 11.90 MB/s 11.84 MB/s 8.56 MB/s
read compiled 28.75 MB/s 29.96 MB/s 24.25 MB/s
delete tree 8.39 seconds 8.12 seconds 8.46 seconds
delete compiled 8.35 seconds 8.44 seconds 5.11 seconds
stat tree 6.03 seconds 5.59 seconds 5.19 seconds
stat compiled tree 9.00 seconds 9.52 seconds 8.49 seconds
xfs + write_inode log_force removal
===================================
intial create 25.87 MB/s 25.76 MB/s 25.87 MB/s
create 15.18 MB/s 14.80 MB/s 14.94 MB/s
patch 3.13 MB/s 3.14 MB/s 3.11 MB/s
compile 36.74 MB/s 37.17 MB/s 36.84 MB/s
clean 226.02 MB/s 222.58 MB/s 217.94 MB/s
read tree 15.14 MB/s 15.02 MB/s 15.14 MB/s
read compiled tree 29.30 MB/s 29.31 MB/s 29.32 MB/s
delete tree 6.22 seconds 6.14 seconds 6.15 seconds
delete compiled tree 5.75 seconds 5.92 seconds 5.81 seconds
stat tree 4.60 seconds 4.51 seconds 4.56 seconds
stat compiled tree 4.07 seconds 3.87 seconds 3.96 seconds
In addition to that also remove the delwri inode flush that is unessecary
now that bulkstat is always coherent.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
The writepage implementation in XFS still tries to deal with dirty but
unmapped buffers which used to caused by writes through shared mmaps. Since
the introduction of ->page_mkwrite these can't happen anymore, so remove the
code dealing with them.
Note that the all_bh variable which causes us to start I/O on all buffers on
the pages was controlled by the count of unmapped buffers, which also
included those not actually dirty. It's now unconditionally initialized to
0 but set to 1 for the case of small file size extensions. It probably can
be removed entirely, but that's left for another patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Currently the xfs releasepage implementation has code to deal with converting
delayed allocated and unwritten space. But we never get called for those as
we always convert delayed and unwritten space when cleaning a page, or drop
the state from the buffers in block_invalidatepage. We still keep a WARN_ON
on those cases for now, but remove all the case dealing with it, which allows
to fold xfs_page_state_convert into xfs_vm_writepage and remove the !startio
case from the whole writeback path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>