generic: Test filesystem lockup on full overprovisioned dm-thin

With thin devices, it's possible to have a virtual device larger
than the physical device itself, and such situation can cause
problems to filesystems, once the filesystem 'believe' to have more
space than it actually has.

This can lead the filesystem to several weird behaviors. The one
tested here is filesystem lockup.

In case of XFS, it locks up when trying to writeback AIL metadata
back to the filesystem, but, once there is no physical space
available, XFS locks up and do not gracefuly handle this case.

Other filesystems usually are remounted as read-only, so they
already have this situation covered.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eryu Guan <eguan@redhat.com>
This commit is contained in:
Carlos Maiolino
2017-07-13 15:12:08 +02:00
committed by Eryu Guan
parent 37d7344ab6
commit 461dad511f
3 changed files with 154 additions and 0 deletions
+151
View File
@@ -0,0 +1,151 @@
#! /bin/bash
# FS QA Test 459
#
# Test buffer filesystem error recovery during a full overcommited dm-thin device.
#
# When a dm-thin device reaches its full capacity, but the virtual device still
# shows available space, the filesystem should be able to handle such cases
# failing its operation without locking up.
#
# This test has been created first to cover a XFS problem where it loops
# indefinitely in xfsaild due items still in AIL. The buffers containing such
# items couldn't be resubmitted because the items were flush locked.
# But, once this doesn't require any special filesystem feature to be executed,
# this has been integrated as a generic test.
#
# This test might hang the filesystem when ran on an unpatched kernel
#
#-----------------------------------------------------------------------
# Copyright (c) 2017 Red Hat, Inc. All Rights Reserved.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
#-----------------------------------------------------------------------
#
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
here=`pwd`
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15
_cleanup()
{
cd /
rm -f $tmp.*
$UMOUNT_PROG $SCRATCH_MNT >>$seqres.full 2>&1
$LVM_PROG vgremove -ff $vgname >>$seqres.full 2>&1
$LVM_PROG pvremove -ff $SCRATCH_DEV >>$seqres.full 2>&1
}
# get standard environment, filters and checks
. ./common/rc
# real QA test starts here
# This tests for filesystem lockup not consistency, so don't check for fs
# consistency after test
_supported_fs generic
_supported_os Linux
_require_scratch_nocheck
_require_dm_target thin-pool
_require_command $LVM_PROG lvm
_require_freeze
# remove previous $seqres.full before test
rm -f $seqres.full
vgname=vg_$seq
lvname=lv_$seq
poolname=pool_$seq
snapname=snap_$seq
origpsize=100
virtsize=200
newpsize=200
# Ensure we have enough disk space
_scratch_mkfs_sized $((250 * 1024 * 1024)) >>$seqres.full 2>&1
# Create a 100MB dm-thin POOL
$LVM_PROG pvcreate -f $SCRATCH_DEV >>$seqres.full 2>&1
$LVM_PROG vgcreate -f $vgname $SCRATCH_DEV >>$seqres.full 2>&1
$LVM_PROG lvcreate --thinpool $poolname --errorwhenfull y \
--zero n -L $origpsize \
--poolmetadatasize 4M $vgname >>$seqres.full 2>&1
# Create a overprovisioned 200MB dm-thin virt. device
$LVM_PROG lvcreate --virtualsize $virtsize \
-T $vgname/$poolname \
-n $lvname >>$seqres.full 2>&1
_mkfs_dev /dev/mapper/$vgname-$lvname >>$seqres.full 2>&1
# Running the test over the original volume doesn't reproduce the problem
# reliably, while, running it over a snapshot, makes the problem 100%
# reproducible, so, create a snapshot and run the test over it.
$LVM_PROG lvcreate -k n -s $vgname/$lvname \
-n $snapname >>$seqres.full 2>&1
_mount /dev/mapper/$vgname-$snapname $SCRATCH_MNT
# Consume all space available in the volume and freeze to ensure everything
# required to make the fs consistent is flushed to disk.
$XFS_IO_PROG -f -d -c 'pwrite -b 1m 0 120m' $SCRATCH_MNT/f1 >>$seqres.full 2>&1
# In XFS, this freeze will never complete until the dm-thin POOL device is
# extended. It is expected, and is only used so xfsaild is triggered to
# flush AIL items, other filesystems usually get remounted as read-only during
# the above write process.
fsfreeze -f $SCRATCH_MNT >>$seqres.full 2>&1 &
freezeid=$!
# Wait enough so xfsaild can run
sleep 10
# Make some extra space available so the freeze above can proceed
$LVM_PROG lvextend -L $newpsize $vgname/$poolname >>$seqres.full 2>&1
wait $freezeid
ret=$?
# Different filesystems will handle the lack of real space in different ways,
# some will remount the filesystem in read-only mode, some will not. These tests
# will check if:
# - The filesystem turns into Read-Only and reject further writes
# - The filesystem stays in Read-Write mode, but can be frozen/thawed
# without getting stuck.
ISRO=$(_fs_options /dev/mapper/$vgname-$snapname | grep -w "ro")
if [ $ret -ne 0 ]; then
if [ -n "$ISRO" ]; then
echo "Test OK"
else
echo "Freeze failed and FS isn't Read-Only. Test Failed"
status=1
exit
fi
else
# Try to thaw the filesystem, and complete test if if succeed.
# NOTE: This will hang on affected XFS filesystems.
fsfreeze -u $SCRATCH_MNT >>$seqres.full 2>&1
echo "Test OK"
fi
status=0
exit
+2
View File
@@ -0,0 +1,2 @@
QA output created by 459
Test OK
+1
View File
@@ -461,3 +461,4 @@
456 auto quick metadata
457 auto log replay clone
458 auto quick clone
459 auto dangerous