On kernel 5.10.178, when a squashfs file is stored on an EXT4 filesystem
backed by a dm-crypt volume, dissecting fails:
$ SYSTEMD_LOG_LEVEL=debug systemd-dissect /var/foo/bar.raw
Opened '/var/foo/bar.raw' in O_RDONLY access mode, with O_DIRECT enabled.
Couldn't find any partition table to derive sector size of.
loop2: Acquired exclusive lock.
Could not enable direct IO mode, proceeding in buffered IO mode.
Successfully acquired /dev/loop2, devno=7:2, nr=2, diskseq=87
Opened /dev/loop2 (fd=3, whole_block_devnum=7:2, diskseq=87).
Name: bar.raw
Size: 67.2M
Sec. Size: 512
Arch.: n/a
Successfully forked off '(sd-dissect)' as PID 4110.
Mounting /proc/self/fd/3 (squashfs) on /tmp/dissect-Zk3K5F (MS_RDONLY|MS_NODEV "")...
Failed to mount /proc/self/fd/3 (type squashfs) on /tmp/dissect-Zk3K5F (MS_RDONLY|MS_NODEV ""): Input/output error
Failed to mount dissected image: Input/output error
Failed to read /etc/hostname of image: No such file or directory
/etc/machine-id file of image is empty.
Failed to read has-init-system boolean: Input/output error
(sd-dissect) failed with exit status 1.
Failed to acquire image metadata: Input/output error
The kernel shows I/O errors:
kernel: blk_update_request: I/O error, dev loop2, sector 0 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 0
kernel: SQUASHFS error: Failed to read block 0x0: -5
kernel: unable to read squashfs_super_block
This is independent of a particular filesystem and can be reproduced
reliably in my setup, starting from freshly formatted disks.
Instead of continuing when O_DIRECT fails, start over the setup
process without the flag, including opening a new FD, to make the
kernel happy.
But the directories are changed from /dev/loop/by-ref/ -> /dev/disk/by-loop-ref/
and /dev/loop/by-inode/ -> /dev/disk/by-loop-inode/.
As /dev/loop/ is used by losetup command for other purpose.
See issue #28475.
This effectively reverts commits 9915cc6086,
5022fab15f, and
c0d998248e.
We use usec_t for storing time value, which is 64bit.
However, usleep() takes useconds_t that is (typically?) 32bit.
Also, usleep() may only support the range [0, 1000000].
This introduce usleep_safe() which takes usec_t.
The kernel may be syncing a file system or doing something else that requires
more time. So make the delay a bit longer, but provide some feedback and also
grow the delay exponentially (though with a long exponent). If the kernel is
doing something else, no need to repeat so often. With 38 attempts, we get a
total of slightly above 5000 ms.
I wrote this when I thought that the the delay is not long enough. It turned
out that we were blocking the file system on the loop device, so waiting longer
wasn't helpful. But I think it's nicer to do it this way anyway.
On top of taking a directory file descriptor, we use xopenat() so
that the function can also be used to work on existing file
descriptors to image files including all the logic to use O_DIRECT
and fallback to O_RDONLY if needed.
-1 was used everywhere, but -EBADF or -EBADFD started being used in various
places. Let's make things consistent in the new style.
Note that there are two candidates:
EBADF 9 Bad file descriptor
EBADFD 77 File descriptor in bad state
Since we're initializating the fd, we're just assigning a value that means
"no fd yet", so it's just a bad file descriptor, and the first errno fits
better. If instead we had a valid file descriptor that became invalid because
of some operation or state change, the other errno would fit better.
In some places, initialization is dropped if unnecessary.
flock(2) works with file descriptors opened with O_RDONLY.
This affects SELinux systems where access to block devices is quite
restricted to avoid bypasses on filesystem objects.