If TEST_DEV is recreated by check, FSTYP derived from TEST_DEV
previously could be changed too and might not reflect the reality.
So source common/rc again with correct FSTYP to get fs-specific
configs, e.g. common/xfs.
For example, using this config-section config file, and run section
ext4 first then xfs, you can see:
our local _scratch_mkfs routine ...
./common/rc: line 825: _scratch_mkfs_xfs: command not found
check: failed to mkfs $SCRATCH_DEV using specified options
local.config:
[default]
RECREATE_TEST_DEV=true
TEST_DEV=/dev/sda5
SCRATCH_DEV=/dev/sda6
TEST_DIR=/mnt/test
SCRATCH_MNT=/mnt/scratch
[ext4]
MKFS_OPTIONS="-b 4096"
FSTYP=ext4
[xfs]
FSTYP=xfs
MKFS_OPTIONS="-f -b size=4k"
Tested-by: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
TLDR: If systemd is available, run each test in its own temporary
systemd scope. This enables the test harness to forcibly clean up all
of the test's child processes (if it does not do so itself) so that we
can move into the post-test unmount and check cleanly.
I frequently run fstests in "low" memory situations (2GB!) to force the
kernel to do interesting things. There are a few tests like generic/224
and generic/561 that put processes in the background and occasionally
trigger the OOM killer. Most of the time the OOM killer correctly
shoots down fsstress or duperemove, but once in a while it's stupid
enough to shoot down the test control process (i.e. tests/generic/224)
instead. fsstress is still running in the background, and the one
process that knew about that is dead.
When the control process dies, ./check moves on to the post-test fsck,
which fails because fsstress is still running and we can't unmount.
After fsck fails, ./check moves on to the next test, which fails because
fsstress is /still/ writing to the filesystem and we can't unmount or
format.
The end result is that that one OOM kill causes cascading test failures,
and I have to re-start fstests to see if I get a clean(er) run.
So, the solution I present in this patch is to teach ./check to try to
run the test script in a systemd scope. If that succeeds, ./check will
tell systemd to kill the scope when the test script exits and returns
control to ./check. Concretely, this means that systemd creates a new
cgroup, stuffs the processes in that cgroup, and when we kill the scope,
systemd kills all the processes in that cgroup and deletes the cgroup.
The end result is that fstests now has an easy way to ensure that /all/
child processes of a test are dead before we try to unmount the test and
scratch devices. I've designed this to be optional, because not
everyone does or wants or likes to run systemd, but it makes QA easier.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
We have some places that refer to the variable OPTIONS_HAVE_SECTIONS
has OPTIONS_HAVE_SECIONS, obviously a typo. So fix them.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Right now we only track check.log and check.time globally, it would
be nice to do it per-section as well. This makes it easier to parse
results from systems that run a bunch of different configurations at
once.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Optionally reload the module between each test to try to pinpoint slab
cache errors and whatnot.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
clear WARN_ONCE state before each test to allow a potential problem
to be reported for each test
[Eryu: replace "/sys/kernel/debug" with $DEBUGFS_MNT ]
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Frequently when trying to reproduce a problem I want to run a set of
specific tests in a loop, over and over again. I run fstests from a
set of run scripts that have non-trivial overhead (e.g. patterning
block devices before the runs start), so if all I want to do is run
the same test 100x, using a shell loop over the entire run
scripts reduces the iteration rate substantially.
Hence add an option to check to allow fstests to loop a number of
times over the configured test set without stopping.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
There are some mix use of spaces and tabs in the help message:
$ ./check --help
Usage: ./check [options] [testlist]
check options
-nfs test NFS
-glusterfs test GlusterFS
-cifs test CIFS
-9p test 9p
-overlay test overlay
-pvfs2 test PVFS2
-tmpfs test TMPFS
-ubifs test ubifs
unify them with tabs.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Modern kernels use the ext4 implementation to support ext2 and ext3
mounts, and a number of the ext4 tests are actually suitable for
ext2 and ext3. We're trying to move tests out of shared anyway, so
instead of moving tests from ext4/NNN to shared, let's just include
the ext4 group list when FSTYP is ext2 or ext3.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Some of the tests in xfstests (e.g. generic/224 with 512M of memory)
consume a lot of memory, and when this happens the OOM killer will
run around stomping on processes. Sometimes it kills the ./check
process before it kills the actual test, which means that the test
run doesn't complete.
Therefore, make the ./check process OOM-proof while bumping up the
attractiveness of the test itself, in the hopes that even if the
test OOMs we'll still be able to continue on our way.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Wipe the scratch devices in between each test to ensure that tests are
formatting them and not making assumptions about previous contents.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Remove the require_{test,scratch]* sentinel files after a test fails.
This eliminates false fsck corruption reports such as the following:
1. Test A calls _require_scratch, which creates the sentinel file
$RESULT_DIR/require_scratch to facilitate fsck after the test completes.
2. Test A runs some test, which corrupts the scratch filesystem due to
kernel bug or something.
3. Test A calls _fail because of the errors in (2). Note that the test
case returned 1, so ./check unmounts the test and scratch filesystems
without checking them or removing $RESULT_DIR/require_scratch
4. Test B starts up, but does not call _require_scratch. The
$RESULT_DIR/require_scratch file is still there.
5. Test B completes successfully.
6. ./check calls _check_filesystems, which sees the
$RESULT_DIR/require_scratch file and runs fsck.
7. fsck reports the corrupt scratch device (which is associated with
test B) even though B did not ever touch the scratch device and it was
actually test A that corrupted the filesystem.
Note that with the "check: wipe scratch devices between tests" patch
applied, we can also reproduce this problem by running xfs/172 and
xfs/195 with a scratch device small enough that the files created in 172
span multiple AGs and therefore cause 172 to fail.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
coreutils provides the shuf(1) utility that randomizes the order of a
list and seeds its random number generator with /dev/urandom. It's a
bit speedier than awk, so use it if available.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
awk doesn't have a particularly good random number generator -- it seeds
from the Unix epoch time in seconds, which means that the run order
across a bunch of VMs started at exactly the same time are unsettlingly
predictable. Therefore, at least try to seed it with bash's $RANDOM,
which is slightly less predictable.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Refactor the kmemleak code to work correctly with sections. This
requires changing the location of the "is kmemleak enabled?" flag to
use /tmp instead of RESULT_BASE, scanning for leaks after every
test, and clarifying which functions get used when.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
This commit tried to fix the brokennes of the kmemleak support but it
inadvertently broke the creation of the RESULT_BASE directory which lead to
problems creating check.time file. Turns out kmemleak support in xfstests has
more problems and it needs to be majorly refactor and this commit doesn't
really solve the problem. For the time being just revert to at least allow
older configuration files, which have explicitly set RESULT_BASE to work.
This reverts commit 7fc034868d.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Else there won't be any error messages when mounting SCRATCH_DEV
failed, because _scratch_mount exits early by invoking _fail.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
It is currently processed before FSTYP has been properly set,
leading to xfs, btrfs, etc. specific exclude_files being ignored.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
In _init_kmemleak() we're touching a check_kmemleak file in
${RESULT_BASE} if ${DEBUGFS_MNT/kmemleak} exists as a marker that we
have to check for kmemleak output after running a test.
In 'check' we're calling _init_kmemleak() at around 60% of the file,
but ${RESULT_BASE} is created later at around 62% of the file,
causing the 'touch' in _init_kmemleak() to fail.
Create the ${RESULT_BASE} just after assigning the default value in
get_next_config()
[Eryu: check for mkdir failure and remove the $RESULT_BASE creation
in check.]
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
i don't run fstests from the source directory, so when I get a
golden image mismatch the relative path to the golden output is
not useful:
(Run 'diff -u tests/generic/013.out /home/dave/src/xfstests-dev/results//xfs_64k/generic/013.out.bad' to see the entire diff)
Change the output to emit the real path for the golden out so this
can be cut and pasted and run from anywhere.
(Run 'diff -u /home/dave/src/xfstests-dev/tests/generic/013.out /home/dave/src/xfstests-dev/results//xfs_64k/generic/013.out.bad' to see the entire diff)
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Scripted conversion, see script in initial SPDX license commit
message. Many files required touch-ups after the script had run
because of the old and widely different formats. most touchups were
to remove excess empty comment lines the script left behind.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Currently a test appears to pass even if it leaves a corrupt
filesystem behind, or a splat in the system logs that should not be
there. While the test is internally tracked as failed (and the
summary reports it as failed) the per-test output exits with a
success and so emits a completion time before the post-test checks
are run by the test harness. Rework the check code to report
post-test check failures as specific test failures rather than as
separate failure line items in the overall harness output.
Reworking where we emit the errors this also allows us to include
the post-test filesystem checking in the test runtime. This is
currently not accounted to the test and can be substantial. Hence
the real elapsed time of each test is not accurately reflected in
the time stats being reported and so regressions in filesystem
checking performance go unnoticed.
Changing the output reporting requires a complete reworking of the
main test check loop. It's a bunch of spaghetti at the moment
because it has post test reporting code at the end of the loop which
must run regardless of the test result. By moving the post test
reporting to the start of the next loop iteration, we can clean up
the code substantially by using continue directives where
appropriate.
Also, for cases where we haven't run the test or it's already been
marked as failed, don't bother running the filesystem/dmesg checks
for failure as we're already going to report the test as failed.
This touches almost all of the loop, so get rid of the remaining
4 space indents inside the loop while moving all this code around.
[Eryu: fixed wrong test seq name issue in xUnit report when test hit
"continue" in the check loop, e.g. notrun, with Dave ACKing the fix]
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Currently 'new' script sources common/config which tries to find
mkfs and fails if not found (which is likely for non-root user).
This is inconvenient as development usually does not happen as root.
In fact the vast majority of setup in common/config and common/rc is
not necessary for 'new'. Split out the necessary bits into new
common/test_names and use it in 'new'. Cleanup common/rc and
common/config now that they're only used from 'check' and 'setup'.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Finishing xfs/132 left a shutdown scratch fs and the test harness
didn't unmount the fs(because we told it not to check the fs) so the
test harness called by subsequent xfs/133 tried to "test -d
$SCRATCH_MNT" and received the IO error from the dead fs.
i.e. Running xfs/132 and xfs/133 together got the following error:
------------------------------------------------------------
...
xfs/132 1s ... 1s
xfs/133 1s ... [failed, exit status 1] - output mismatch (see /var/lib/xfstests/results//xfs/133.out.bad)
...
QA output created by 133
-Format and mount
-Corrupt filesystem
-Remount, try to append
-Write did not succeed (ok).
+SCRATCH_DEV=/dev/sda11 is mounted but not on SCRATCH_MNT=common/config: - aborting
+Already mounted result:
+/dev/sda11 /mnt/xfstests/scratch
...
------------------------------------------------------------
Even if we don't check fs, the test harness is supposed to unmount
fs and return an initial state before running the next test.
Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>