Commit Graph

152 Commits

Author SHA1 Message Date
Ritesh Harjani 9b298bdf54 check: add CLI option to repeat and stop tests in case of failure
Currently with -i <n> option the test can run for many iterations,
but in case if we want to stop the iteration in case of a failure,
it is much easier to have such an option which could check the
failed status and stop the test from further proceeding.

This patch adds such an option (-I <n>) thereby extending the -i <n>
option functionality.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2021-03-07 23:53:36 +08:00
Darrick J. Wong 49c09b5bad check: run tests in exactly the order specified
Introduce a new --exact-order switch to disable all sorting, filtering
of repeated lines, and shuffling of test order.  The goal of this is to
be able to run tests in a specific order, namely to try to reproduce
test failures that could be the result of a -r(andomize) run getting
lucky.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2021-03-07 22:36:15 +08:00
Darrick J. Wong 3d79e8ea24 check: don't abort on non-existent excluded groups
Don't abort the whole test run if we asked to exclude groups that aren't
included in the candidate group list, since we actually /are/ satisfying
the user's request.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2021-03-07 22:36:15 +08:00
Darrick J. Wong 5baeea6fe8 check: allow '-e testid' to exclude a single test
This enables us to mask off specific tests.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2021-03-07 22:36:15 +08:00
Eryu Guan 4767884aff check: source common/rc again if TEST_DEV was recreated
If TEST_DEV is recreated by check, FSTYP derived from TEST_DEV
previously could be changed too and might not reflect the reality.
So source common/rc again with correct FSTYP to get fs-specific
configs, e.g. common/xfs.

For example, using this config-section config file, and run section
ext4 first then xfs, you can see:

our local _scratch_mkfs routine ...
./common/rc: line 825: _scratch_mkfs_xfs: command not found
check: failed to mkfs $SCRATCH_DEV using specified options

local.config:
[default]
RECREATE_TEST_DEV=true
TEST_DEV=/dev/sda5
SCRATCH_DEV=/dev/sda6
TEST_DIR=/mnt/test
SCRATCH_MNT=/mnt/scratch

[ext4]
MKFS_OPTIONS="-b 4096"
FSTYP=ext4

[xfs]
FSTYP=xfs
MKFS_OPTIONS="-f -b size=4k"

Tested-by: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
2020-12-06 22:15:08 +08:00
Darrick J. Wong c0be121e12 check: run tests in a systemd scope for mandatory test cleanup
TLDR: If systemd is available, run each test in its own temporary
systemd scope.  This enables the test harness to forcibly clean up all
of the test's child processes (if it does not do so itself) so that we
can move into the post-test unmount and check cleanly.

I frequently run fstests in "low" memory situations (2GB!) to force the
kernel to do interesting things.  There are a few tests like generic/224
and generic/561 that put processes in the background and occasionally
trigger the OOM killer.  Most of the time the OOM killer correctly
shoots down fsstress or duperemove, but once in a while it's stupid
enough to shoot down the test control process (i.e. tests/generic/224)
instead.  fsstress is still running in the background, and the one
process that knew about that is dead.

When the control process dies, ./check moves on to the post-test fsck,
which fails because fsstress is still running and we can't unmount.
After fsck fails, ./check moves on to the next test, which fails because
fsstress is /still/ writing to the filesystem and we can't unmount or
format.

The end result is that that one OOM kill causes cascading test failures,
and I have to re-start fstests to see if I get a clean(er) run.

So, the solution I present in this patch is to teach ./check to try to
run the test script in a systemd scope.  If that succeeds, ./check will
tell systemd to kill the scope when the test script exits and returns
control to ./check.  Concretely, this means that systemd creates a new
cgroup, stuffs the processes in that cgroup, and when we kill the scope,
systemd kills all the processes in that cgroup and deletes the cgroup.

The end result is that fstests now has an easy way to ensure that /all/
child processes of a test are dead before we try to unmount the test and
scratch devices.  I've designed this to be optional, because not
everyone does or wants or likes to run systemd, but it makes QA easier.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2020-11-22 22:16:06 +08:00
Filipe Manana 2eb35a8594 check: fix misspelled variable name for sections
We have some places that refer to the variable OPTIONS_HAVE_SECTIONS
has OPTIONS_HAVE_SECIONS, obviously a typo. So fix them.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2020-10-21 23:23:12 +08:00
Josef Bacik 61e4eead3c fstests: drop check.log and check.time into section specific results dir
Right now we only track check.log and check.time globally, it would
be nice to do it per-section as well.  This makes it easier to parse
results from systems that run a bunch of different configurations at
once.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2020-10-18 19:30:51 +08:00
Darrick J. Wong 9a6005c31f check: try reloading modules
Optionally reload the module between each test to try to pinpoint slab
cache errors and whatnot.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2020-09-21 01:15:20 +08:00
Lukas Czerner c67ea23474 check: clear WARN_ONCE state before each test
clear WARN_ONCE state before each test to allow a potential problem
to be reported for each test

[Eryu: replace "/sys/kernel/debug" with $DEBUGFS_MNT ]

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2020-07-20 01:22:08 +08:00
Dave Chinner ffa3e99f22 check: add CLI option to repeat tests
Frequently when trying to reproduce a problem I want to run a set of
specific tests in a loop, over and over again. I run fstests from a
set of run scripts that have non-trivial overhead (e.g. patterning
block devices before the runs start), so if all I want to do is run
the same test 100x, using a shell loop over the entire run
scripts reduces the iteration rate substantially.

Hence add an option to check to allow fstests to loop a number of
times over the configured test set without stopping.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2020-07-20 00:49:26 +08:00
QI Fuli d28bf722e0 fstests: Add virtio-fs shared file system support
This patch adds support for virtio-fs shared file system that lets
virtual machines access a directory tree on the host.

To run xfstests on it, first, start virtiofsd daemon in host:
 ./virtiofsd -o vhost_user_socket=/tmp/vhostqemu0 -o source=$DIR0 -o cache=always
 ./virtiofsd -o vhost_user_socket=/tmp/vhostqemu1 -o source=$DIR1 -o cache=always

second, launch QEMU with:
 -chardev socket,id=char0,path=/tmp/vhostqemu0
 -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs0
 -chardev socket,id=char1,path=/tmp/vhostqemu1
 -device vhost-user-fs-pci,queue-size=1024,chardev=char1,tag=myfs1
 -m 8G
 -object memory-backend-file,id=mem,size=8G,mem-path=/dev/shm,share=on
 -numa node,memdev=mem

then, inside the VM run xfstests with:
 export TEST_DEV=myfs0
 export TEST_DIR=$TESTDIR
 export SCRATCH_DEV=myfs1
 export SCRATCH_MNT=$SCRATCHMNT
 export MOUNT_OPTIONS=""
 export TEST_FS_MOUNT_OPTS=""

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Misono Tomohiro <misono.tomohiro@fujitsu.com>
Signed-off-by: QI Fuli <qi.fuli@fujitsu.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2019-10-18 17:16:04 +08:00
Po-Hsu Lin 771f879c1a check: convert spaces to tabs in help msg
There are some mix use of spaces and tabs in the help message:
$ ./check --help
Usage: ./check [options] [testlist]

check options
    -nfs                test NFS
    -glusterfs                test GlusterFS
    -cifs               test CIFS
    -9p			test 9p
    -overlay		test overlay
    -pvfs2          test PVFS2
    -tmpfs              test TMPFS
    -ubifs              test ubifs

unify them with tabs.

Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2019-10-06 01:51:12 +08:00
Theodore Ts'o 43df23e1b5 check: add ext4 group list when testing ext2 and ext3
Modern kernels use the ext4 implementation to support ext2 and ext3
mounts, and a number of the ext4 tests are actually suitable for
ext2 and ext3.  We're trying to move tests out of shared anyway, so
instead of moving tests from ext4/NNN to shared, let's just include
the ext4 group list when FSTYP is ext2 or ext3.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2019-07-05 15:30:48 +08:00
Darrick J. Wong 8ca445c46e check: try to insulate the test framework from oom killer
Some of the tests in xfstests (e.g. generic/224 with 512M of memory)
consume a lot of memory, and when this happens the OOM killer will
run around stomping on processes.  Sometimes it kills the ./check
process before it kills the actual test, which means that the test
run doesn't complete.

Therefore, make the ./check process OOM-proof while bumping up the
attractiveness of the test itself, in the hopes that even if the
test OOMs we'll still be able to continue on our way.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2019-06-07 20:29:10 +08:00
Darrick J. Wong d0e484ac69 check: wipe scratch devices between tests
Wipe the scratch devices in between each test to ensure that tests are
formatting them and not making assumptions about previous contents.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2019-04-21 23:37:19 +08:00
Darrick J. Wong 5bb196119d check: remove require_{test,scratch}* after a test fails
Remove the require_{test,scratch]* sentinel files after a test fails.
This eliminates false fsck corruption reports such as the following:

1. Test A calls _require_scratch, which creates the sentinel file
$RESULT_DIR/require_scratch to facilitate fsck after the test completes.

2. Test A runs some test, which corrupts the scratch filesystem due to
kernel bug or something.

3. Test A calls _fail because of the errors in (2).  Note that the test
case returned 1, so ./check unmounts the test and scratch filesystems
without checking them or removing $RESULT_DIR/require_scratch

4. Test B starts up, but does not call _require_scratch.  The
$RESULT_DIR/require_scratch file is still there.

5. Test B completes successfully.

6. ./check calls _check_filesystems, which sees the
$RESULT_DIR/require_scratch file and runs fsck.

7. fsck reports the corrupt scratch device (which is associated with
test B) even though B did not ever touch the scratch device and it was
actually test A that corrupted the filesystem.

Note that with the "check: wipe scratch devices between tests" patch
applied, we can also reproduce this problem by running xfs/172 and
xfs/195 with a scratch device small enough that the files created in 172
span multiple AGs and therefore cause 172 to fail.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2019-04-21 23:37:19 +08:00
Darrick J. Wong 97fd33c767 check: really improve test list randomization
coreutils provides the shuf(1) utility that randomizes the order of a
list and seeds its random number generator with /dev/urandom.  It's a
bit speedier than awk, so use it if available.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2019-03-23 21:27:12 +08:00
Darrick J. Wong 07094a9652 check: improve test list randomization
awk doesn't have a particularly good random number generator -- it seeds
from the Unix epoch time in seconds, which means that the run order
across a bunch of VMs started at exactly the same time are unsettlingly
predictable.  Therefore, at least try to seed it with bash's $RANDOM,
which is slightly less predictable.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2019-03-23 21:27:12 +08:00
Darrick J. Wong 0c24aa077f common: fix kmemleak to work with sections
Refactor the kmemleak code to work correctly with sections.  This
requires changing the location of the "is kmemleak enabled?" flag to
use /tmp instead of RESULT_BASE, scanning for leaks after every
test, and clarifying which functions get used when.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2019-02-16 19:10:38 +08:00
Nikolay Borisov f3a1213c24 Revert "common/config: create $RESULT_BASE before dumping kmemleak leaks"
This commit tried to fix the brokennes of the kmemleak support but it
inadvertently broke the creation of the RESULT_BASE directory which lead to
problems creating check.time file. Turns out kmemleak support in xfstests has
more problems and it needs to be majorly refactor and this commit doesn't
really solve the problem. For the time being just revert to at least allow
older configuration files, which have explicitly set RESULT_BASE to work.

This reverts commit 7fc034868d.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2019-01-06 22:40:40 +08:00
Hou Tao 3f95ea9fb3 check: use _try_scratch_mount instead of _scratch_mount to mount SCRATCH_DEV
Else there won't be any error messages when mounting SCRATCH_DEV
failed, because _scratch_mount exits early by invoking _fail.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2018-12-29 15:50:55 +08:00
David Disseldorp 3a9ba205f2 check: fix -X exclude_file behaviour
It is currently processed before FSTYP has been properly set,
leading to xfs, btrfs, etc. specific exclude_files being ignored.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2018-12-22 20:51:16 +08:00
Johannes Thumshirn 7fc034868d common/config: create $RESULT_BASE before dumping kmemleak leaks
In _init_kmemleak() we're touching a check_kmemleak file in
${RESULT_BASE} if ${DEBUGFS_MNT/kmemleak} exists as a marker that we
have to check for kmemleak output after running a test.

In 'check' we're calling _init_kmemleak() at around 60% of the file,
but ${RESULT_BASE} is created later at around 62% of the file,
causing the 'touch' in _init_kmemleak() to fail.

Create the ${RESULT_BASE} just after assigning the default value in
get_next_config()

[Eryu: check for mkdir failure and remove the $RESULT_BASE creation
in check.]

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2018-12-22 20:02:26 +08:00
Dave Chinner 168bae3958 check: use full paths for diff on error
i don't run fstests from the source directory, so when I get a
golden image mismatch the relative path to the golden output is
not useful:

(Run 'diff -u tests/generic/013.out /home/dave/src/xfstests-dev/results//xfs_64k/generic/013.out.bad'  to see the entire diff)

Change the output to emit the real path for the golden out so this
can be cut and pasted and run from anywhere.

(Run 'diff -u /home/dave/src/xfstests-dev/tests/generic/013.out /home/dave/src/xfstests-dev/results//xfs_64k/generic/013.out.bad'  to see the entire diff)

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2018-11-03 16:49:12 +08:00