Commit Graph

65 Commits

Author SHA1 Message Date
Omar Sandoval
dcb77a9ae8 btrfs: add definitions and documentation for encoded I/O ioctls
In order to allow sending and receiving compressed data without
decompressing it, we need an interface to write pre-compressed data
directly to the filesystem and the matching interface to read compressed
data without decompressing it. This adds the definitions for ioctls to
do that and detailed explanations of how to use them.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-14 13:13:51 +01:00
Josef Bacik
2c7d2a2302 btrfs: add definition for EXTENT_TREE_V2
This adds the initial definition of the EXTENT_TREE_V2 incompat feature
flag.  This also hides the support behind CONFIG_BTRFS_DEBUG.

THIS IS A IN DEVELOPMENT FORMAT CHANGE, DO NOT USE UNLESS YOU ARE A
DEVELOPER OR A TESTER.

The format is in flux and will be added in stages, any fs will need to
be re-made between updates to the format.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-14 13:13:48 +01:00
David Sterba
e77fbf9903 btrfs: send: prepare for v2 protocol
This is preparatory work for send protocol update to version 2 and
higher.

We have many pending protocol update requests but still don't have the
basic protocol rev in place, the first thing that must happen is to do
the actual versioning support.

The protocol version is u32 and is a new member in the send ioctl
struct. Validity of the version field is backed by a new flag bit. Old
kernels would fail when a higher version is requested. Version protocol
0 will pick the highest supported version, BTRFS_SEND_STREAM_VERSION,
  that's also exported in sysfs.

The version is still unchanged and will be increased once we have new
incompatible commands or stream updates.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-29 12:38:43 +02:00
Boris Burkov
146054090b btrfs: initial fsverity support
Add support for fsverity in btrfs. To support the generic interface in
fs/verity, we add two new item types in the fs tree for inodes with
verity enabled. One stores the per-file verity descriptor and btrfs
verity item and the other stores the Merkle tree data itself.

Verity checking is done in end_page_read just before a page is marked
uptodate. This naturally handles a variety of edge cases like holes,
preallocated extents, and inline extents. Some care needs to be taken to
not try to verity pages past the end of the file, which are accessed by
the generic buffered file reading code under some circumstances like
reading to the end of the last page and trying to read again. Direct IO
on a verity file falls back to buffered reads.

Verity relies on PageChecked for the Merkle tree data itself to avoid
re-walking up shared paths in the tree. For this reason, we need to
cache the Merkle tree data. Since the file is immutable after verity is
turned on, we can cache it at an index past EOF.

Use the new inode ro_flags to store verity on the inode item, so that we
can enable verity on a file, then rollback to an older kernel and still
mount the file system and read the file. Since we can't safely write the
file anymore without ruining the invariants of the Merkle tree, we mark
a ro_compat flag on the file system when a file has verity enabled.

Acked-by: Eric Biggers <ebiggers@google.com>
Co-developed-by: Chris Mason <clm@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:09 +02:00
David Sterba
1a9fd4172d btrfs: fix typos in comments
Fix typos that have snuck in since the last round. Found by codespell.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-22 14:11:57 +02:00
Naohiro Aota
7b3d5a90cb btrfs: introduce ZONED feature flag
This patch introduces the ZONED incompat flag. The flag indicates that
the volume management will satisfy the constraints imposed by
host-managed zoned block devices (aligned chunk allocation, append-only
updates, reset zone after filled).

As the zoned support will happen incrementally due to enhancing some
core infrastructure like super block writes, tree-log, raid support, the
feature will appear in sysfs only on debug builds. It will be enabled
once the support is feature complete and applications can reliably check
whether zoned support is present or not.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08 15:54:16 +01:00
Johannes Thumshirn
49bac89768 btrfs: add metadata_uuid to FS_INFO ioctl
Add retrieval of the filesystem's metadata UUID to the fsinfo ioctl.
This is driven by setting the BTRFS_FS_INFO_FLAG_METADATA_UUID flag in
btrfs_ioctl_fs_info_args::flags.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:43 +02:00
Johannes Thumshirn
0fb408a558 btrfs: add filesystem generation to FS_INFO ioctl
Add retrieval of the filesystem's generation to the fsinfo ioctl. This is
driven by setting the BTRFS_FS_INFO_FLAG_GENERATION flag in
btrfs_ioctl_fs_info_args::flags.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:43 +02:00
Johannes Thumshirn
137c541821 btrfs: pass checksum type via BTRFS_IOC_FS_INFO ioctl
With the recent addition of filesystem checksum types other than CRC32c,
it is not anymore hard-coded which checksum type a btrfs filesystem uses.

Up to now there is no good way to read the filesystem checksum, apart from
reading the filesystem UUID and then query sysfs for the checksum type.

Add a new csum_type and csum_size fields to the BTRFS_IOC_FS_INFO ioctl
command which usually is used to query filesystem features. Also add a
flags member indicating that the kernel responded with a set csum_type and
csum_size field.

For compatibility reasons, only return the csum_type and csum_size if
the BTRFS_FS_INFO_FLAG_CSUM_INFO flag was passed to the kernel. Also
clear any unknown flags so we don't pass false positives to user-space
newer than the kernel.

To simplify further additions to the ioctl, also switch the padding to a
u8 array. Pahole was used to verify the result of this switch:

The csum members are added before flags, which might look odd, but this
is to keep the alignment requirements and not to introduce holes in the
structure.

  $ pahole -C btrfs_ioctl_fs_info_args fs/btrfs/btrfs.ko
  struct btrfs_ioctl_fs_info_args {
	  __u64                      max_id;               /*     0     8 */
	  __u64                      num_devices;          /*     8     8 */
	  __u8                       fsid[16];             /*    16    16 */
	  __u32                      nodesize;             /*    32     4 */
	  __u32                      sectorsize;           /*    36     4 */
	  __u32                      clone_alignment;      /*    40     4 */
	  __u16                      csum_type;            /*    44     2 */
	  __u16                      csum_size;            /*    46     2 */
	  __u64                      flags;                /*    48     8 */
	  __u8                       reserved[968];        /*    56   968 */

	  /* size: 1024, cachelines: 16, members: 10 */
  };

Fixes: 3951e7f050 ("btrfs: add xxhash64 to checksumming algorithms")
Fixes: 3831bf0094 ("btrfs: add sha256 to checksumming algorithm")
CC: stable@vger.kernel.org # 5.5+
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:43 +02:00
Eugene Syromiatnikov
34c51814b2 btrfs: re-instantiate the removed BTRFS_SUBVOL_CREATE_ASYNC definition
The commit 9c1036fdb1 ("btrfs: Remove BTRFS_SUBVOL_CREATE_ASYNC
support") breaks strace build with the kernel headers from git:

    btrfs.c: In function "btrfs_test_subvol_ioctls":
    btrfs.c:531:23: error: "BTRFS_SUBVOL_CREATE_ASYNC" undeclared (first use
    in this function)
       vol_args_v2.flags = BTRFS_SUBVOL_CREATE_ASYNC;

Moreover, it is improper to break UAPI, strace uses the definitions to
decode ioctls that are considered part of public API.

Restore the macro definition and put it under "#ifndef __KERNEL__"
in order to prevent inadvertent in-kernel usage.

Fixes: 9c1036fdb1 ("btrfs: Remove BTRFS_SUBVOL_CREATE_ASYNC support")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-10 18:48:27 +02:00
Nikolay Borisov
9c1036fdb1 btrfs: Remove BTRFS_SUBVOL_CREATE_ASYNC support
This functionality was deprecated in kernel 5.4. Since no one has
complained of the impending removal it's time we did so.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:02:00 +01:00
Marcos Paulo de Souza
949964c928 btrfs: add new BTRFS_IOC_SNAP_DESTROY_V2 ioctl
This ioctl will be responsible for deleting a subvolume using its id.
This can be used when a system has a file system mounted from a
subvolume, rather than the root file system, like below:

/
@subvol1/
@subvol2/
@subvol_default/

If only @subvol_default is mounted, we have no path to reach @subvol1
and @subvol2, thus no way to delete them. Current subvolume delete ioctl
takes a file handle point as argument, and if @subvol_default is
mounted, we can't reach @subvol1 and @subvol2 from the same mount point.

This patch introduces a new ioctl BTRFS_IOC_SNAP_DESTROY_V2 that takes
the extended structure with flags to allow to delete subvolume using
subvolid.

Now, we can use this new ioctl specifying the subvolume id and refer to
the same mount point. It doesn't matter which subvolume was mounted,
since we can reach to the desired one using the subvolume id, and then
delete it.

The full path to the subvolume id is resolved internally and access is
verified as if the subvolume was accessed by path.

The volume args v2 structure is extended to use the existing union for
subvolume id specification, that's valid in case the
BTRFS_SUBVOL_SPEC_BY_ID is set.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:42 +01:00
David Sterba
eed0269053 btrfs: define support masks for ioctl volume args v2
The ioctl data for devices or subvolumes can be passed via
btrfs_ioctl_vol_args or btrfs_ioctl_vol_args_v2. The latter is more
versatile and needs some caution as some of the flags make sense only
for some ioctls.

As we're going to extend the flags, define support masks for each ioctl
class separately.

Reviewed-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:42 +01:00
David Sterba
cfbb825c76 btrfs: add incompat for raid1 with 3, 4 copies
The new raid1c3 and raid1c4 profiles are backward incompatible and the
name shall be 'raid1c34', the status can be found in the global
supported features in /sys/fs/btrfs/features or in the per-filesystem
directory.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 17:51:49 +01:00
David Sterba
8d6fac0087 btrfs: add support for 4-copy replication (raid1c4)
Add new block group profile to store 4 copies in a simliar way that
current RAID1 does.  The profile attributes and constraints are defined
in the raid table and used by the same code that already handles the 2-
and 3-copy RAID1.

The minimum number of devices is 4, the maximum number of devices/chunks
that can be lost/damaged is 3. There is no comparable traditional RAID
level, the profile is added for future needs to accompany triple-parity
and beyond.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 17:51:49 +01:00
David Sterba
47e6f7423b btrfs: add support for 3-copy replication (raid1c3)
Add new block group profile to store 3 copies in a simliar way that
current RAID1 does. The profile attributes and constraints are defined
in the raid table and used by the same code that already handles the
2-copy RAID1.

The minimum number of devices is 3, the maximum number of devices/chunks
that can be lost/damaged is 2. Like RAID6 but with 33% space
utilization.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 17:51:49 +01:00
Hans van Kranenburg
73a3ca2093 btrfs: clarify btrfs_ioctl_get_dev_stats padding
In commit c11d2c236c ("Btrfs: add ioctl to get and reset the device
stats") the get_dev_stats ioctl was added.

Shortly thereafter, in commit b27f7c0c15 ("btrfs: join DEV_STATS
ioctls to one") , the flags field was added.  However, the calculation
for unused padding space was not updated, which also invalidated the
comment.

Clarify what happened to reduce confusion and wasted time for anyone
implementing this.

Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09 14:59:05 +02:00
Eric Sandeen
40cf931fa8 btrfs: use common vfs LABEL ioctl definitions
I lifted the btrfs label get/set ioctls to the vfs some time ago, but
never followed up to use those common definitions directly in btrfs.

This patch does that.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09 14:58:59 +02:00
Anand Jain
228a73abde btrfs: introduce new ioctl to unregister a btrfs device
Support for a new command that can be used eg. as a command

  $ btrfs device scan --forget [dev]'
(the final name may change though)

to undo the effects of 'btrfs device scan [dev]'. For this purpose
this patch proposes to use ioctl #5 as it was empty and is next to the
SCAN ioctl.

The new ioctl BTRFS_IOC_FORGET_DEV works only on the control device
(/dev/btrfs-control) to unregister one or all devices, devices that are
not mounted.

The argument is struct btrfs_ioctl_vol_args, ::name specifies the device
path. To unregister all device, the path is an empty string.

Again, the devices are removed only if they aren't part of a mounte
filesystem.

This new ioctl provides:

- release of unwanted btrfs_fs_devices and btrfs_devices structures
  from memory if the device is not going to be mounted

- ability to mount filesystem in degraded mode, when one devices is
  corrupted like in split brain raid1

- running test cases which would require reloading the kernel module
  but this is not possible eg. due to mounted filesystem or built-in

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25 14:13:30 +01:00
Nikolay Borisov
7239ff4b2b btrfs: Introduce support for FSID change without metadata rewrite
This field is going to be used when the user wants to change the UUID
of the filesystem without having to rewrite all metadata blocks. This
field adds another level of indirection such that when the FSID is
changed what really happens is the current UUID (the one with which the
fs was created) is copied to the 'metadata_uuid' field in the superblock
as well as a new incompat flag is set METADATA_UUID. When the kernel
detects this flag is set it knows that the superblock in fact has 2
UUIDs:

1. Is the UUID which is user-visible, currently known as FSID.
2. Metadata UUID - this is the UUID which is stamped into all on-disk
   datastructures belonging to this file system.

When the new incompat flag is present device scanning checks whether
both fsid/metadata_uuid of the scanned device match any of the
registered filesystems. When the flag is not set then both UUIDs are
equal and only the FSID is retained on disk, metadata_uuid is set only
in-memory during mount.

Additionally a new metadata_uuid field is also added to the fs_info
struct. It's initialised either with the FSID in case METADATA_UUID
incompat flag is not set or with the metdata_uuid of the superblock
otherwise.

This commit introduces the new fields as well as the new incompat flag
and switches all users of the fsid to the new logic.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ minor updates in comments ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17 14:51:37 +01:00
Tomohiro Misono
23d0b79dfa btrfs: Add unprivileged version of ino_lookup ioctl
Add unprivileged version of ino_lookup ioctl BTRFS_IOC_INO_LOOKUP_USER
to allow normal users to call "btrfs subvolume list/show" etc. in
combination with BTRFS_IOC_GET_SUBVOL_INFO/BTRFS_IOC_GET_SUBVOL_ROOTREF.

This can be used like BTRFS_IOC_INO_LOOKUP but the argument is
different. This is  because it always searches the fs/file tree
correspoinding to the fd with which this ioctl is called and also
returns the name of bottom subvolume.

The main differences from original ino_lookup ioctl are:

  1. Read + Exec permission will be checked using inode_permission()
     during path construction. -EACCES will be returned in case
     of failure.
  2. Path construction will be stopped at the inode number which
     corresponds to the fd with which this ioctl is called. If
     constructed path does not exist under fd's inode, -EACCES
     will be returned.
  3. The name of bottom subvolume is also searched and filled.

Note that the maximum length of path is shorter 256 (BTRFS_VOL_NAME_MAX+1)
bytes than ino_lookup ioctl because of space of subvolume's name.

Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
[ style fixes ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-31 11:35:24 +02:00
Tomohiro Misono
42e4b520c8 btrfs: Add unprivileged ioctl which returns subvolume's ROOT_REF
Add unprivileged ioctl BTRFS_IOC_GET_SUBVOL_ROOTREF which returns
ROOT_REF information of the subvolume containing this inode except the
subvolume name (this is because to prevent potential name leak). The
subvolume name will be gained by user version of ino_lookup ioctl
(BTRFS_IOC_INO_LOOKUP_USER) which also performs permission check.

The min id of root ref's subvolume to be searched is specified by
@min_id in struct btrfs_ioctl_get_subvol_rootref_args. After the search
ends, @min_id is set to the last searched root ref's subvolid + 1. Also,
if there are more root refs than BTRFS_MAX_ROOTREF_BUFFER_NUM,
-EOVERFLOW is returned. Therefore the caller can just call this ioctl
again without changing the argument to continue search.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
[ style fixes and struct item renames ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-31 11:35:23 +02:00
Tomohiro Misono
b64ec075bd btrfs: Add unprivileged ioctl which returns subvolume information
Add new unprivileged ioctl BTRFS_IOC_GET_SUBVOL_INFO which returns
the information of subvolume containing this inode.
(i.e. returns the information in ROOT_ITEM and ROOT_BACKREF.)

Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
[ minor style fixes, update struct comments ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-31 11:35:23 +02:00
Anand Jain
ad8bc4d005 btrfs: put btrfs_ioctl_vol_args_v2 related defines together
Just a code spatial rearrangement, no functional change.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22 16:08:16 +01:00
Linus Torvalds
5cea7647e6 Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
 "There are some new user features and the usual load of invisible
  enhancements or cleanups.

  New features:

   - extend mount options to specify zlib compression level, -o
     compress=zlib:9

   - v2 of ioctl "extent to inode mapping", addressing a usecase where
     we want to retrieve more but inaccurate results and do the
     postprocessing in userspace, aiding defragmentation or
     deduplication tools

   - populate compression heuristics logic, do data sampling and try to
     guess compressibility by: looking for repeated patterns, counting
     unique byte values and distribution, calculating Shannon entropy;
     this will need more benchmarking and possibly fine tuning, but the
     base should be good enough

   - enable indexing for btrfs as lower filesystem in overlayfs

   - speedup page cache readahead during send on large files

  Internal enhancements:

   - more sanity checks of b-tree items when reading them from disk

   - more EINVAL/EUCLEAN fixups, missing BLK_STS_* conversion, other
     errno or error handling fixes

   - remove some homegrown IO-related logic, that's been obsoleted by
     core block layer changes (batching, plug/unplug, own counters)

   - add ref-verify, optional debugging feature to verify extent
     reference accounting

   - simplify code handling outstanding extents, make it more clear
     where and how the accounting is done

   - make delalloc reservations per-inode, simplify the code and make
     the logic more straightforward

   - extensive cleanup of delayed refs code

  Notable fixes:

   - fix send ioctl on 32bit with 64bit kernel"

* 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (102 commits)
  btrfs: Fix bug for misused dev_t when lookup in dev state hash table.
  Btrfs: heuristic: add Shannon entropy calculation
  Btrfs: heuristic: add byte core set calculation
  Btrfs: heuristic: add byte set calculation
  Btrfs: heuristic: add detection of repeated data patterns
  Btrfs: heuristic: implement sampling logic
  Btrfs: heuristic: add bucket and sample counters and other defines
  Btrfs: compression: separate heuristic/compression workspaces
  btrfs: move btrfs_truncate_block out of trans handle
  btrfs: don't call btrfs_start_delalloc_roots in flushoncommit
  btrfs: track refs in a rb_tree instead of a list
  btrfs: add a comp_refs() helper
  btrfs: switch args for comp_*_refs
  btrfs: make the delalloc block rsv per inode
  btrfs: add tracepoints for outstanding extents mods
  Btrfs: rework outstanding_extents
  btrfs: increase output size for LOGICAL_INO_V2 ioctl
  btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2
  btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents
  btrfs: send: remove unused code
  ...
2017-11-14 13:35:29 -08:00