Commit Graph

2641 Commits

Author SHA1 Message Date
Dave Chinner 19de7351a8 xfs: split out symlink code into it's own file.
The symlink code is about to get more complicated when CRCs are
added for remote symlink blocks. The symlink management code is
mostly self contained, so move it to it's own files so that all the
new code and the existing symlink code will not be intermingled
with other unrelated code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-21 15:38:04 -05:00
Christoph Hellwig 93848a999c xfs: add version 3 inode format with CRCs
Add a new inode version with a larger core.  The primary objective is
to allow for a crc of the inode, and location information (uuid and ino)
to verify it was written in the right place.  We also extend it by:

	a creation time (for Samba);
	a changecount (for NFSv4);
	a flush sequence (in LSN format for recovery);
	an additional inode flags field; and
	some additional padding.

These additional fields are not implemented yet, but already laid
out in the structure.

[dchinner@redhat.com] Added LSN and flags field, some factoring and rework to
capture all the necessary information in the crc calculation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-21 15:03:33 -05:00
Christoph Hellwig 3fe58f30b4 xfs: add CRC checks for quota blocks
Use the reserved space in struct xfs_dqblk to store a UUID and a crc
for the quota blocks.

[dchinner@redhat.com] Add a LSN field and update for current verifier
infrastructure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-21 14:58:22 -05:00
Dave Chinner 983d09ffe3 xfs: add CRC checks to the AGI
Same set of changes made to the AGF need to be made to the AGI.
This patch has a similar history to the AGF, hence a similar
sign-off chain.

Signed-off-by: Dave Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dgc@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-21 14:57:43 -05:00
Christoph Hellwig 77c95bba01 xfs: add CRC checks to the AGFL
Add CRC checks, location information and a magic number to the AGFL.
Previously the AGFL was just a block containing nothing but the
free block pointers.  The new AGFL has a real header with the usual
boilerplate instead, so that we can verify it's not corrupted and
written into the right place.

[dchinner@redhat.com] Added LSN field, reworked significantly to fit
into new verifier structure and growfs structure, enabled full
verifier functionality now there is a header to verify and we can
guarantee an initialised AGFL.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-21 14:55:34 -05:00
Dave Chinner 4e0e6040c4 xfs: add CRC checks to the AGF
The AGF already has some self identifying fields (e.g. the sequence
number) so we only need to add the uuid to it to identify the
filesystem it belongs to. The location is fixed based on the
sequence number, so there's no need to add a block number, either.

Hence the only additional fields are the CRC and LSN fields. These
are unlogged, so place some space between the end of the logged
fields and them so that future expansion of the AGF for logged
fields can be placed adjacent to the existing logged fields and
hence not complicate the field-derived range based logging we
currently have.

Based originally on a patch from myself, modified further by
Christoph Hellwig and then modified again to fit into the
verifier structure with additional fields by myself. The multiple
signed-off-by tags indicate the age and history of this patch.

Signed-off-by: Dave Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-21 14:54:46 -05:00
Christoph Hellwig ee1a47ab0e xfs: add support for large btree blocks
Add support for larger btree blocks that contains a CRC32C checksum,
a filesystem uuid and block number for detecting filesystem
consistency and out of place writes.

[dchinner@redhat.com] Also include an owner field to allow reverse
mappings to be implemented for improved repairability and a LSN
field to so that log recovery can easily determine the last
modification that made it to disk for each buffer.

[dchinner@redhat.com] Add buffer log format flags to indicate the
type of buffer to recovery so that we don't have to do blind magic
number tests to determine what the buffer is.

[dchinner@redhat.com] Modified to fit into the verifier structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-21 14:53:46 -05:00
Dave Chinner a2050646f6 xfs: increase hexdump output in xfs_corruption_error
Currently xfs_corruption_error() dumps the first 16 bytes of the
buffer that is passed to it when a corruption occurs. This is not
large enough to see the entire state of the header of the block that
was determined to be corrupt.  increase the output to 64 bytes to
capture the majority of all headers in all types of metadata blocks.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-21 14:48:41 -05:00
Jeff Liu 7fe3258c50 xfs: Update xfs_log_commit_cil() comments
xfs_log_commit_iclog() function has been removed by commits 93b8a585:
	xfs: remove the deprecated nodelaylog option

Beginning from Linux 3.3, only delayed logging is supported so that
we call xfs_log_commit_cil() at xfs_trans_commit() only, remove the
useless comments so.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-16 13:20:03 -05:00
Jeff Liu d4fd0e92fb xfs: Remove the obsolete XLOG_CIL_HARD_SPACE_LIMIT() macros
There is no more users of this Macro, so it's time to kill it dead.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-16 13:18:33 -05:00
Dave Chinner 666d644cd7 xfs: don't free EFIs before the EFDs are committed
Filesystems are occasionally being shut down with this error:

xfs_trans_ail_delete_bulk: attempting to delete a log item that is
not in the AIL.

It was diagnosed to be related to the EFI/EFD commit order when the
EFI and EFD are in different checkpoints and the EFD is committed
before the EFI here:

http://oss.sgi.com/archives/xfs/2013-01/msg00082.html

The real problem is that a single bit cannot fully describe the
states that the EFI/EFD processing can be in. These completion
states are:

EFI			EFI in AIL	EFD		Result
committed/unpinned	Yes		committed	OK
committed/pinned	No		committed	Shutdown
uncommitted		No		committed	Shutdown


Note that the "result" field is what should happen, not what does
happen. The current logic is broken and handles the first two cases
correctly by luck.  That is, the code will free the EFI if the
XFS_EFI_COMMITTED bit is *not* set, rather than if it is set. The
inverted logic "works" because if both EFI and EFD are committed,
then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED
bit, and the second frees the EFI item. Hence as long as
xfs_efi_item_committed() has been called, everything appears to be
fine.

It is the third case where the logic fails - where
xfs_efd_item_committed() is called before xfs_efi_item_committed(),
and that results in the EFI being freed before it has been
committed. That is the bug that triggered the shutdown, and hence
keeping track of whether the EFI has been committed or not is
insufficient to correctly order the EFI/EFD operations w.r.t. the
AIL.

What we really want is this: the EFI is always placed into the
AIL before the last reference goes away. The only way to guarantee
that is that the EFI is not freed until after it has been unpinned
*and* the EFD has been committed. That is, restructure the logic so
that the only case that can occur is the first case.

This can be done easily by replacing the XFS_EFI_COMMITTED with an
EFI reference count. The EFI is initialised with it's own count, and
that is not released until it is unpinned. However, there is a
complication to this method - the high level EFI/EFD code in
xfs_bmap_finish() does not hold direct references to the EFI
structure, and runs a transaction commit between the EFI and EFD
processing. Hence the EFI can be freed even before the EFD is
created using such a method.

Further, log recovery uses the AIL for tracking EFI/EFDs that need
to be recovered, but it uses the AIL *differently* to the EFI
transaction commit. Hence log recovery never pins or unpins EFIs, so
we can't drop the EFI reference count indirectly to free the EFI.

However, this doesn't prevent us from using a reference count here.
There is a 1:1 relationship between EFIs and EFDs, so when we
initialise the EFI we can take a reference count for the EFD as
well. This solves the xfs_bmap_finish() issue - the EFI will never
be freed until the EFD is processed. In terms of log recovery,
during the committing of the EFD we can look for the
XFS_EFI_RECOVERED bit being set and drop the EFI reference as well,
thereby ensuring everything works correctly there as well.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-05 13:25:35 -05:00
Rich Johnston 3d6e036193 xfs: Add ratelimited printk for different alert levels
Ratelimited printk will be useful in printing xfs messages which are otherwise
not required to be printed always due to their high rate (to prevent kernel ring
buffer from overflowing), while at the same time required to be printed.

Signed-off-by: Raghavendra D Prabhu <rprabhu@wnohang.net>
Reviewed-by: Rich Johnston <rjohnston@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-03 13:20:39 -05:00
Jan Kara ff9a28f6c2 xfs: Fix WARN_ON(delalloc) in xfs_vm_releasepage()
When a dirty page is truncated from a file but reclaim gets to it before
truncate_inode_pages(), we hit WARN_ON(delalloc) in
xfs_vm_releasepage(). This is because reclaim tries to write the page,
xfs_vm_writepage() just bails out (leaving page clean) and thus reclaim
thinks it can continue and calls xfs_vm_releasepage() on page with dirty
buffers.

Fix the issue by redirtying the page in xfs_vm_writepage(). This makes
reclaim stop reclaiming the page and also logically it keeps page in a
more consistent state where page with dirty buffers has PageDirty set.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-22 16:12:37 -05:00
Brian Foster 19cb7e3854 xfs: xfs_iomap_prealloc_size() tracepoint
Add a tracepoint to provide some feedback on preallocation size
calculation.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-22 16:07:56 -05:00
Brian Foster 76a4202a38 xfs: add quota-driven speculative preallocation throttling
Introduce the need_throttle() and calc_throttle() functions to
independently check whether throttling is required for a particular
dquot and if so, calculate the associated throttling metrics based
on the state of the quota. We use the same general algorithm to
calculate the throttle shift as for global free space with the
exception of using three stages rather than five.

Update xfs_iomap_prealloc_size() to use the smallest available
prealloc size based on each of the constraints and apply the
maximum shift to obtain the throttled preallocation size.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-22 16:07:21 -05:00
Brian Foster b136645116 xfs: xfs_dquot prealloc throttling watermarks and low free space
Enable tracking of high and low watermarks for preallocation
throttling of files under quota restrictions. These values are
calculated when the quota limit is read from disk or modified and
cached for later use by the throttling algorithm.

The high watermark specifies when preallocation is disabled, the
low watermark specifies when throttling is enabled and the low free
space data structure contains precalculated low free space limits
to serve as input to determine the level of throttling required.

Note that the low free space data structure is based on the
existing global low free space data structure with the exception of
using three stages (5%, 3% and 1%) rather than five to reduce the
impact of xfs_dquot memory overhead.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-22 16:06:30 -05:00
Brian Foster 4b6eae2e6a xfs: pass xfs_dquot to xfs_qm_adjust_dqlimits() instead of xfs_disk_dquot_t
Modify xfs_qm_adjust_dqlimits() to take the xfs_dquot as a
parameter instead of just the xfs_disk_dquot_t so we can update
in-memory fields if necessary.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-22 16:05:52 -05:00
Brian Foster c9bdbdc074 xfs: push rounddown_pow_of_two() to after prealloc throttle
The round down occurs towards the beginning of the function. Push
it down after throttling has occurred. This is to support adding
further transformations to 'alloc_blocks' that might not preserve
power-of-two alignment (and thus could lead to rounding down
multiple times).

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-22 16:05:00 -05:00
Brian Foster 3c58b5f809 xfs: reorganize xfs_iomap_prealloc_size to remove indentation
The majority of xfs_iomap_prealloc_size() executes within the
check for lack of default I/O size. Reverse the logic to remove the
extra indentation.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-22 16:04:23 -05:00
Christoph Hellwig 56cea2d088 xfs: take inode version into account in XFS_LITINO
Add a version argument to XFS_LITINO so that it can return different values
depending on the inode version.  This is required for the upcoming v3 inodes
with a larger fixed layout dinode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-14 16:19:14 -05:00
Dave Chinner c163f9a176 xfs: ensure we capture IO errors correctly
Failed buffer readahead can leave the buffer in the cache marked
with an error. Most callers that then issue a subsequent read on the
buffer do not zero the b_error field out, and so we may incorectly
detect an error during IO completion due to the stale error value
left on the buffer.

Avoid this problem by zeroing the error before IO submission. This
ensures that the only IO errors that are detected those captured
from are those captured from bio submission or completion.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-14 15:56:53 -05:00
Jeff Liu d8ddfe81c7 xfs: Remove obsoleted m_inode_shrink from xfs_mount structure
Looks the old m_inode_shrink is obsoleted as we perform inodes reclaim per AG via
m_reclaim_workqueue, this patch remove it from the xfs_mount structure if so.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Cc: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-14 15:55:32 -05:00
Dave Chinner 9e5987a779 xfs: rearrange some code in xfs_bmap for better locality
xfs_bmap.c is a big file, and some of the related code is spread all
throughout the file requiring function prototypes for static
function and jumping all through the file to follow a single call
path. Rearrange the code so that:

	a) related functionality is grouped together; and
	b) functions are grouped in call dependency order

While the diffstat is large, there are no code changes in the patch;
it is just moving the functionality around and removing the function
prototypes at the top of the file. The resulting layout of the code
is as follows (top of file to bottom):

	- miscellaneous helper functions
	- extent tree block counting routines
	- debug/sanity checking code
	- bmap free list manipulation functions
	- inode fork format manipulation functions
	- internal/external extent tree seach functions
	- extent tree manipulation functions used during allocation
	- functions used during extent read/allocate/removal
	  operations (i.e. xfs_bmapi_write, xfs_bmapi_read,
	  xfs_bunmapi and xfs_getbmap)

This means that following logic paths through the bmapi code is much
simpler - most of the code relevant to a specific operation is now
clustered together rather than spread all over the file....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-07 12:35:22 -06:00
Akinobu Mita ecb3403de1 xfs: rename random32() to prandom_u32()
Use more preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: <bpm@sgi.com>
Cc: Ben Myers <bpm@sgi.com>
Cc: Alex Elder <elder@kernel.org>
Cc: xfs@oss.sgi.com
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-07 12:33:57 -06:00
Dave Chinner d5929de833 xfs: don't verify buffers after IO errors
When we read a buffer, we might get an error from the underlying
block device and not the real data. Hence if we get an IO error, we
shouldn't run the verifier but instead just pass the IO error
straight through.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-07 12:31:02 -06:00