ext4, doc: remove unnecessary escaping

Signed-off-by: Wang Jianjian <wangjianjian3@huawei.com>
Link: https://lore.kernel.org/r/20220520022255.2120576-2-wangjianjian3@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This commit is contained in:
Wang Jianjian
2022-05-20 10:22:55 +08:00
committed by Theodore Ts'o
parent 48e02e6113
commit 3103084afc
17 changed files with 810 additions and 810 deletions

View File

@@ -13,8 +13,8 @@ disappeared as of Linux 3.0.
There are two places where extended attributes can be found. The first
place is between the end of each inode entry and the beginning of the
next inode entry. For example, if inode.i\_extra\_isize = 28 and
sb.inode\_size = 256, then there are 256 - (128 + 28) = 100 bytes
next inode entry. For example, if inode.i_extra_isize = 28 and
sb.inode_size = 256, then there are 256 - (128 + 28) = 100 bytes
available for in-inode extended attribute storage. The second place
where extended attributes can be found is in the block pointed to by
``inode.i_file_acl``. As of Linux 3.11, it is not possible for this
@@ -38,8 +38,8 @@ Extended attributes, when stored after the inode, have a header
- Name
- Description
* - 0x0
- \_\_le32
- h\_magic
- __le32
- h_magic
- Magic number for identification, 0xEA020000. This value is set by the
Linux driver, though e2fsprogs doesn't seem to check it(?)
@@ -55,28 +55,28 @@ The beginning of an extended attribute block is in
- Name
- Description
* - 0x0
- \_\_le32
- h\_magic
- __le32
- h_magic
- Magic number for identification, 0xEA020000.
* - 0x4
- \_\_le32
- h\_refcount
- __le32
- h_refcount
- Reference count.
* - 0x8
- \_\_le32
- h\_blocks
- __le32
- h_blocks
- Number of disk blocks used.
* - 0xC
- \_\_le32
- h\_hash
- __le32
- h_hash
- Hash value of all attributes.
* - 0x10
- \_\_le32
- h\_checksum
- __le32
- h_checksum
- Checksum of the extended attribute block.
* - 0x14
- \_\_u32
- h\_reserved[3]
- __u32
- h_reserved[3]
- Zero.
The checksum is calculated against the FS UUID, the 64-bit block number
@@ -100,46 +100,46 @@ Attributes stored inside an inode do not need be stored in sorted order.
- Name
- Description
* - 0x0
- \_\_u8
- e\_name\_len
- __u8
- e_name_len
- Length of name.
* - 0x1
- \_\_u8
- e\_name\_index
- __u8
- e_name_index
- Attribute name index. There is a discussion of this below.
* - 0x2
- \_\_le16
- e\_value\_offs
- __le16
- e_value_offs
- Location of this attribute's value on the disk block where it is stored.
Multiple attributes can share the same value. For an inode attribute
this value is relative to the start of the first entry; for a block this
value is relative to the start of the block (i.e. the header).
* - 0x4
- \_\_le32
- e\_value\_inum
- __le32
- e_value_inum
- The inode where the value is stored. Zero indicates the value is in the
same block as this entry. This field is only used if the
INCOMPAT\_EA\_INODE feature is enabled.
INCOMPAT_EA_INODE feature is enabled.
* - 0x8
- \_\_le32
- e\_value\_size
- __le32
- e_value_size
- Length of attribute value.
* - 0xC
- \_\_le32
- e\_hash
- __le32
- e_hash
- Hash value of attribute name and attribute value. The kernel doesn't
update the hash for in-inode attributes, so for that case this value
must be zero, because e2fsck validates any non-zero hash regardless of
where the xattr lives.
* - 0x10
- char
- e\_name[e\_name\_len]
- e_name[e_name_len]
- Attribute name. Does not include trailing NULL.
Attribute values can follow the end of the entry table. There appears to
be a requirement that they be aligned to 4-byte boundaries. The values
are stored starting at the end of the block and grow towards the
xattr\_header/xattr\_entry table. When the two collide, the overflow is
xattr_header/xattr_entry table. When the two collide, the overflow is
put into a separate disk block. If the disk block fills up, the
filesystem returns -ENOSPC.
@@ -167,15 +167,15 @@ the key name. Here is a map of name index values to key prefixes:
* - 1
- “user.”
* - 2
- “system.posix\_acl\_access”
- “system.posix_acl_access”
* - 3
- “system.posix\_acl\_default”
- “system.posix_acl_default”
* - 4
- “trusted.”
* - 6
- “security.”
* - 7
- “system.” (inline\_data only?)
- “system.” (inline_data only?)
* - 8
- “system.richacl” (SuSE kernels only?)

View File

@@ -23,7 +23,7 @@ means that a block group addresses 32 gigabytes instead of 128 megabytes,
also shrinking the amount of file system overhead for metadata.
The administrator can set a block cluster size at mkfs time (which is
stored in the s\_log\_cluster\_size field in the superblock); from then
stored in the s_log_cluster_size field in the superblock); from then
on, the block bitmaps track clusters, not individual blocks. This means
that block groups can be several gigabytes in size (instead of just
128MiB); however, the minimum allocation unit becomes a cluster, not a

View File

@@ -9,15 +9,15 @@ group.
The inode bitmap records which entries in the inode table are in use.
As with most bitmaps, one bit represents the usage status of one data
block or inode table entry. This implies a block group size of 8 \*
number\_of\_bytes\_in\_a\_logical\_block.
block or inode table entry. This implies a block group size of 8 *
number_of_bytes_in_a_logical_block.
NOTE: If ``BLOCK_UNINIT`` is set for a given block group, various parts
of the kernel and e2fsprogs code pretends that the block bitmap contains
zeros (i.e. all blocks in the group are free). However, it is not
necessarily the case that no blocks are in use -- if ``meta_bg`` is set,
the bitmaps and group descriptor live inside the group. Unfortunately,
ext2fs\_test\_block\_bitmap2() will return '0' for those locations,
ext2fs_test_block_bitmap2() will return '0' for those locations,
which produces confusing debugfs output.
Inode Table

View File

@@ -56,39 +56,39 @@ established that the super block and the group descriptor table, if
present, will be at the beginning of the block group. The bitmaps and
the inode table can be anywhere, and it is quite possible for the
bitmaps to come after the inode table, or for both to be in different
groups (flex\_bg). Leftover space is used for file data blocks, indirect
groups (flex_bg). Leftover space is used for file data blocks, indirect
block maps, extent tree blocks, and extended attributes.
Flexible Block Groups
---------------------
Starting in ext4, there is a new feature called flexible block groups
(flex\_bg). In a flex\_bg, several block groups are tied together as one
(flex_bg). In a flex_bg, several block groups are tied together as one
logical block group; the bitmap spaces and the inode table space in the
first block group of the flex\_bg are expanded to include the bitmaps
and inode tables of all other block groups in the flex\_bg. For example,
if the flex\_bg size is 4, then group 0 will contain (in order) the
first block group of the flex_bg are expanded to include the bitmaps
and inode tables of all other block groups in the flex_bg. For example,
if the flex_bg size is 4, then group 0 will contain (in order) the
superblock, group descriptors, data block bitmaps for groups 0-3, inode
bitmaps for groups 0-3, inode tables for groups 0-3, and the remaining
space in group 0 is for file data. The effect of this is to group the
block group metadata close together for faster loading, and to enable
large files to be continuous on disk. Backup copies of the superblock
and group descriptors are always at the beginning of block groups, even
if flex\_bg is enabled. The number of block groups that make up a
flex\_bg is given by 2 ^ ``sb.s_log_groups_per_flex``.
if flex_bg is enabled. The number of block groups that make up a
flex_bg is given by 2 ^ ``sb.s_log_groups_per_flex``.
Meta Block Groups
-----------------
Without the option META\_BG, for safety concerns, all block group
Without the option META_BG, for safety concerns, all block group
descriptors copies are kept in the first block group. Given the default
128MiB(2^27 bytes) block group size and 64-byte group descriptors, ext4
can have at most 2^27/64 = 2^21 block groups. This limits the entire
filesystem size to 2^21 * 2^27 = 2^48bytes or 256TiB.
The solution to this problem is to use the metablock group feature
(META\_BG), which is already in ext3 for all 2.6 releases. With the
META\_BG feature, ext4 filesystems are partitioned into many metablock
(META_BG), which is already in ext3 for all 2.6 releases. With the
META_BG feature, ext4 filesystems are partitioned into many metablock
groups. Each metablock group is a cluster of block groups whose group
descriptor structures can be stored in a single disk block. For ext4
filesystems with 4 KB block size, a single metablock group partition
@@ -110,7 +110,7 @@ bytes, a meta-block group contains 32 block groups for filesystems with
a 1KB block size, and 128 block groups for filesystems with a 4KB
blocksize. Filesystems can either be created using this new block group
descriptor layout, or existing filesystems can be resized on-line, and
the field s\_first\_meta\_bg in the superblock will indicate the first
the field s_first_meta_bg in the superblock will indicate the first
block group using this new layout.
Please see an important note about ``BLOCK_UNINIT`` in the section about
@@ -121,15 +121,15 @@ Lazy Block Group Initialization
A new feature for ext4 are three block group descriptor flags that
enable mkfs to skip initializing other parts of the block group
metadata. Specifically, the INODE\_UNINIT and BLOCK\_UNINIT flags mean
metadata. Specifically, the INODE_UNINIT and BLOCK_UNINIT flags mean
that the inode and block bitmaps for that group can be calculated and
therefore the on-disk bitmap blocks are not initialized. This is
generally the case for an empty block group or a block group containing
only fixed-location block group metadata. The INODE\_ZEROED flag means
only fixed-location block group metadata. The INODE_ZEROED flag means
that the inode table has been initialized; mkfs will unset this flag and
rely on the kernel to initialize the inode tables in the background.
By not writing zeroes to the bitmaps and inode table, mkfs time is
reduced considerably. Note the feature flag is RO\_COMPAT\_GDT\_CSUM,
but the dumpe2fs output prints this as “uninit\_bg”. They are the same
reduced considerably. Note the feature flag is RO_COMPAT_GDT_CSUM,
but the dumpe2fs output prints this as “uninit_bg”. They are the same
thing.

View File

@@ -1,7 +1,7 @@
.. SPDX-License-Identifier: GPL-2.0
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| i.i\_block Offset | Where It Points |
| i.i_block Offset | Where It Points |
+=====================+==============================================================================================================================================================================================================================+
| 0 to 11 | Direct map to file blocks 0 to 11. |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

View File

@@ -4,7 +4,7 @@ Checksums
---------
Starting in early 2012, metadata checksums were added to all major ext4
and jbd2 data structures. The associated feature flag is metadata\_csum.
and jbd2 data structures. The associated feature flag is metadata_csum.
The desired checksum algorithm is indicated in the superblock, though as
of October 2012 the only supported algorithm is crc32c. Some data
structures did not have space to fit a full 32-bit checksum, so only the
@@ -20,7 +20,7 @@ encounters directory blocks that lack sufficient empty space to add a
checksum, it will request that you run ``e2fsck -D`` to have the
directories rebuilt with checksums. This has the added benefit of
removing slack space from the directory files and rebalancing the htree
indexes. If you \_ignore\_ this step, your directories will not be
indexes. If you _ignore_ this step, your directories will not be
protected by a checksum!
The following table describes the data elements that go into each type
@@ -35,39 +35,39 @@ of checksum. The checksum function is whatever the superblock describes
- Length
- Ingredients
* - Superblock
- \_\_le32
- __le32
- The entire superblock up to the checksum field. The UUID lives inside
the superblock.
* - MMP
- \_\_le32
- __le32
- UUID + the entire MMP block up to the checksum field.
* - Extended Attributes
- \_\_le32
- __le32
- UUID + the entire extended attribute block. The checksum field is set to
zero.
* - Directory Entries
- \_\_le32
- __le32
- UUID + inode number + inode generation + the directory block up to the
fake entry enclosing the checksum field.
* - HTREE Nodes
- \_\_le32
- __le32
- UUID + inode number + inode generation + all valid extents + HTREE tail.
The checksum field is set to zero.
* - Extents
- \_\_le32
- __le32
- UUID + inode number + inode generation + the entire extent block up to
the checksum field.
* - Bitmaps
- \_\_le32 or \_\_le16
- __le32 or __le16
- UUID + the entire bitmap. Checksums are stored in the group descriptor,
and truncated if the group descriptor size is 32 bytes (i.e. ^64bit)
* - Inodes
- \_\_le32
- __le32
- UUID + inode number + inode generation + the entire inode. The checksum
field is set to zero. Each inode has its own checksum.
* - Group Descriptors
- \_\_le16
- If metadata\_csum, then UUID + group number + the entire descriptor;
else if gdt\_csum, then crc16(UUID + group number + the entire
- __le16
- If metadata_csum, then UUID + group number + the entire descriptor;
else if gdt_csum, then crc16(UUID + group number + the entire
descriptor). In all cases, only the lower 16 bits are stored.

View File

@@ -42,24 +42,24 @@ is at most 263 bytes long, though on disk you'll need to reference
- Name
- Description
* - 0x0
- \_\_le32
- __le32
- inode
- Number of the inode that this directory entry points to.
* - 0x4
- \_\_le16
- rec\_len
- __le16
- rec_len
- Length of this directory entry. Must be a multiple of 4.
* - 0x6
- \_\_le16
- name\_len
- __le16
- name_len
- Length of the file name.
* - 0x8
- char
- name[EXT4\_NAME\_LEN]
- name[EXT4_NAME_LEN]
- File name.
Since file names cannot be longer than 255 bytes, the new directory
entry format shortens the name\_len field and uses the space for a file
entry format shortens the name_len field and uses the space for a file
type flag, probably to avoid having to load every inode during directory
tree traversal. This format is ``ext4_dir_entry_2``, which is at most
263 bytes long, though on disk you'll need to reference
@@ -74,24 +74,24 @@ tree traversal. This format is ``ext4_dir_entry_2``, which is at most
- Name
- Description
* - 0x0
- \_\_le32
- __le32
- inode
- Number of the inode that this directory entry points to.
* - 0x4
- \_\_le16
- rec\_len
- __le16
- rec_len
- Length of this directory entry.
* - 0x6
- \_\_u8
- name\_len
- __u8
- name_len
- Length of the file name.
* - 0x7
- \_\_u8
- file\_type
- __u8
- file_type
- File type code, see ftype_ table below.
* - 0x8
- char
- name[EXT4\_NAME\_LEN]
- name[EXT4_NAME_LEN]
- File name.
.. _ftype:
@@ -137,19 +137,19 @@ entry uses this extension, it may be up to 271 bytes.
- Name
- Description
* - 0x0
- \_\_le32
- __le32
- hash
- The hash of the directory name
* - 0x4
- \_\_le32
- minor\_hash
- __le32
- minor_hash
- The minor hash of the directory name
In order to add checksums to these classic directory blocks, a phony
``struct ext4_dir_entry`` is placed at the end of each leaf block to
hold the checksum. The directory entry is 12 bytes long. The inode
number and name\_len fields are set to zero to fool old software into
number and name_len fields are set to zero to fool old software into
ignoring an apparently empty directory entry, and the checksum is stored
in the place where the name normally goes. The structure is
``struct ext4_dir_entry_tail``:
@@ -163,24 +163,24 @@ in the place where the name normally goes. The structure is
- Name
- Description
* - 0x0
- \_\_le32
- det\_reserved\_zero1
- __le32
- det_reserved_zero1
- Inode number, which must be zero.
* - 0x4
- \_\_le16
- det\_rec\_len
- __le16
- det_rec_len
- Length of this directory entry, which must be 12.
* - 0x6
- \_\_u8
- det\_reserved\_zero2
- __u8
- det_reserved_zero2
- Length of the file name, which must be zero.
* - 0x7
- \_\_u8
- det\_reserved\_ft
- __u8
- det_reserved_ft
- File type, which must be 0xDE.
* - 0x8
- \_\_le32
- det\_checksum
- __le32
- det_checksum
- Directory leaf block checksum.
The leaf directory block checksum is calculated against the FS UUID, the
@@ -194,7 +194,7 @@ Hash Tree Directories
A linear array of directory entries isn't great for performance, so a
new feature was added to ext3 to provide a faster (but peculiar)
balanced tree keyed off a hash of the directory entry name. If the
EXT4\_INDEX\_FL (0x1000) flag is set in the inode, this directory uses a
EXT4_INDEX_FL (0x1000) flag is set in the inode, this directory uses a
hashed btree (htree) to organize and find directory entries. For
backwards read-only compatibility with ext2, this tree is actually
hidden inside the directory file, masquerading as “empty” directory data
@@ -206,14 +206,14 @@ rest of the directory block is empty so that it moves on.
The root of the tree always lives in the first data block of the
directory. By ext2 custom, the '.' and '..' entries must appear at the
beginning of this first block, so they are put here as two
``struct ext4_dir_entry_2``\ s and not stored in the tree. The rest of
``struct ext4_dir_entry_2`` s and not stored in the tree. The rest of
the root node contains metadata about the tree and finally a hash->block
map to find nodes that are lower in the htree. If
``dx_root.info.indirect_levels`` is non-zero then the htree has two
levels; the data block pointed to by the root node's map is an interior
node, which is indexed by a minor hash. Interior nodes in this tree
contains a zeroed out ``struct ext4_dir_entry_2`` followed by a
minor\_hash->block map to find leafe nodes. Leaf nodes contain a linear
minor_hash->block map to find leafe nodes. Leaf nodes contain a linear
array of all ``struct ext4_dir_entry_2``; all of these entries
(presumably) hash to the same value. If there is an overflow, the
entries simply overflow into the next leaf node, and the
@@ -245,83 +245,83 @@ of a data block:
- Name
- Description
* - 0x0
- \_\_le32
- __le32
- dot.inode
- inode number of this directory.
* - 0x4
- \_\_le16
- dot.rec\_len
- __le16
- dot.rec_len
- Length of this record, 12.
* - 0x6
- u8
- dot.name\_len
- dot.name_len
- Length of the name, 1.
* - 0x7
- u8
- dot.file\_type
- dot.file_type
- File type of this entry, 0x2 (directory) (if the feature flag is set).
* - 0x8
- char
- dot.name[4]
- “.\\0\\0\\0”
- “.\0\0\0”
* - 0xC
- \_\_le32
- __le32
- dotdot.inode
- inode number of parent directory.
* - 0x10
- \_\_le16
- dotdot.rec\_len
- block\_size - 12. The record length is long enough to cover all htree
- __le16
- dotdot.rec_len
- block_size - 12. The record length is long enough to cover all htree
data.
* - 0x12
- u8
- dotdot.name\_len
- dotdot.name_len
- Length of the name, 2.
* - 0x13
- u8
- dotdot.file\_type
- dotdot.file_type
- File type of this entry, 0x2 (directory) (if the feature flag is set).
* - 0x14
- char
- dotdot\_name[4]
- “..\\0\\0”
- dotdot_name[4]
- “..\0\0”
* - 0x18
- \_\_le32
- struct dx\_root\_info.reserved\_zero
- __le32
- struct dx_root_info.reserved_zero
- Zero.
* - 0x1C
- u8
- struct dx\_root\_info.hash\_version
- struct dx_root_info.hash_version
- Hash type, see dirhash_ table below.
* - 0x1D
- u8
- struct dx\_root\_info.info\_length
- struct dx_root_info.info_length
- Length of the tree information, 0x8.
* - 0x1E
- u8
- struct dx\_root\_info.indirect\_levels
- Depth of the htree. Cannot be larger than 3 if the INCOMPAT\_LARGEDIR
- struct dx_root_info.indirect_levels
- Depth of the htree. Cannot be larger than 3 if the INCOMPAT_LARGEDIR
feature is set; cannot be larger than 2 otherwise.
* - 0x1F
- u8
- struct dx\_root\_info.unused\_flags
- struct dx_root_info.unused_flags
-
* - 0x20
- \_\_le16
- __le16
- limit
- Maximum number of dx\_entries that can follow this header, plus 1 for
- Maximum number of dx_entries that can follow this header, plus 1 for
the header itself.
* - 0x22
- \_\_le16
- __le16
- count
- Actual number of dx\_entries that follow this header, plus 1 for the
- Actual number of dx_entries that follow this header, plus 1 for the
header itself.
* - 0x24
- \_\_le32
- __le32
- block
- The block number (within the directory file) that goes with hash=0.
* - 0x28
- struct dx\_entry
- struct dx_entry
- entries[0]
- As many 8-byte ``struct dx_entry`` as fits in the rest of the data block.
@@ -362,38 +362,38 @@ also the full length of a data block:
- Name
- Description
* - 0x0
- \_\_le32
- __le32
- fake.inode
- Zero, to make it look like this entry is not in use.
* - 0x4
- \_\_le16
- fake.rec\_len
- The size of the block, in order to hide all of the dx\_node data.
- __le16
- fake.rec_len
- The size of the block, in order to hide all of the dx_node data.
* - 0x6
- u8
- name\_len
- name_len
- Zero. There is no name for this “unused” directory entry.
* - 0x7
- u8
- file\_type
- file_type
- Zero. There is no file type for this “unused” directory entry.
* - 0x8
- \_\_le16
- __le16
- limit
- Maximum number of dx\_entries that can follow this header, plus 1 for
- Maximum number of dx_entries that can follow this header, plus 1 for
the header itself.
* - 0xA
- \_\_le16
- __le16
- count
- Actual number of dx\_entries that follow this header, plus 1 for the
- Actual number of dx_entries that follow this header, plus 1 for the
header itself.
* - 0xE
- \_\_le32
- __le32
- block
- The block number (within the directory file) that goes with the lowest
hash value of this block. This value is stored in the parent block.
* - 0x12
- struct dx\_entry
- struct dx_entry
- entries[0]
- As many 8-byte ``struct dx_entry`` as fits in the rest of the data block.
@@ -410,11 +410,11 @@ long:
- Name
- Description
* - 0x0
- \_\_le32
- __le32
- hash
- Hash code.
* - 0x4
- \_\_le32
- __le32
- block
- Block number (within the directory file, not filesystem blocks) of the
next node in the htree.
@@ -423,13 +423,13 @@ long:
author.)
If metadata checksums are enabled, the last 8 bytes of the directory
block (precisely the length of one dx\_entry) are used to store a
block (precisely the length of one dx_entry) are used to store a
``struct dx_tail``, which contains the checksum. The ``limit`` and
``count`` entries in the dx\_root/dx\_node structures are adjusted as
necessary to fit the dx\_tail into the block. If there is no space for
the dx\_tail, the user is notified to run e2fsck -D to rebuild the
``count`` entries in the dx_root/dx_node structures are adjusted as
necessary to fit the dx_tail into the block. If there is no space for
the dx_tail, the user is notified to run e2fsck -D to rebuild the
directory index (which will ensure that there's space for the checksum.
The dx\_tail structure is 8 bytes long and looks like this:
The dx_tail structure is 8 bytes long and looks like this:
.. list-table::
:widths: 8 8 24 40
@@ -441,13 +441,13 @@ The dx\_tail structure is 8 bytes long and looks like this:
- Description
* - 0x0
- u32
- dt\_reserved
- dt_reserved
- Zero.
* - 0x4
- \_\_le32
- dt\_checksum
- __le32
- dt_checksum
- Checksum of the htree directory block.
The checksum is calculated against the FS UUID, the htree index header
(dx\_root or dx\_node), all of the htree indices (dx\_entry) that are in
use, and the tail block (dx\_tail).
(dx_root or dx_node), all of the htree indices (dx_entry) that are in
use, and the tail block (dx_tail).

View File

@@ -5,14 +5,14 @@ Large Extended Attribute Values
To enable ext4 to store extended attribute values that do not fit in the
inode or in the single extended attribute block attached to an inode,
the EA\_INODE feature allows us to store the value in the data blocks of
the EA_INODE feature allows us to store the value in the data blocks of
a regular file inode. This “EA inode” is linked only from the extended
attribute name index and must not appear in a directory entry. The
inode's i\_atime field is used to store a checksum of the xattr value;
and i\_ctime/i\_version store a 64-bit reference count, which enables
inode's i_atime field is used to store a checksum of the xattr value;
and i_ctime/i_version store a 64-bit reference count, which enables
sharing of large xattr values between multiple owning inodes. For
backward compatibility with older versions of this feature, the
i\_mtime/i\_generation *may* store a back-reference to the inode number
and i\_generation of the **one** owning inode (in cases where the EA
i_mtime/i_generation *may* store a back-reference to the inode number
and i_generation of the **one** owning inode (in cases where the EA
inode is not referenced by multiple inodes) to verify that the EA inode
is the correct one being accessed.

View File

@@ -7,34 +7,34 @@ Each block group on the filesystem has one of these descriptors
associated with it. As noted in the Layout section above, the group
descriptors (if present) are the second item in the block group. The
standard configuration is for each block group to contain a full copy of
the block group descriptor table unless the sparse\_super feature flag
the block group descriptor table unless the sparse_super feature flag
is set.
Notice how the group descriptor records the location of both bitmaps and
the inode table (i.e. they can float). This means that within a block
group, the only data structures with fixed locations are the superblock
and the group descriptor table. The flex\_bg mechanism uses this
and the group descriptor table. The flex_bg mechanism uses this
property to group several block groups into a flex group and lay out all
of the groups' bitmaps and inode tables into one long run in the first
group of the flex group.
If the meta\_bg feature flag is set, then several block groups are
grouped together into a meta group. Note that in the meta\_bg case,
If the meta_bg feature flag is set, then several block groups are
grouped together into a meta group. Note that in the meta_bg case,
however, the first and last two block groups within the larger meta
group contain only group descriptors for the groups inside the meta
group.
flex\_bg and meta\_bg do not appear to be mutually exclusive features.
flex_bg and meta_bg do not appear to be mutually exclusive features.
In ext2, ext3, and ext4 (when the 64bit feature is not enabled), the
block group descriptor was only 32 bytes long and therefore ends at
bg\_checksum. On an ext4 filesystem with the 64bit feature enabled, the
bg_checksum. On an ext4 filesystem with the 64bit feature enabled, the
block group descriptor expands to at least the 64 bytes described below;
the size is stored in the superblock.
If gdt\_csum is set and metadata\_csum is not set, the block group
If gdt_csum is set and metadata_csum is not set, the block group
checksum is the crc16 of the FS UUID, the group number, and the group
descriptor structure. If metadata\_csum is set, then the block group
descriptor structure. If metadata_csum is set, then the block group
checksum is the lower 16 bits of the checksum of the FS UUID, the group
number, and the group descriptor structure. Both block and inode bitmap
checksums are calculated against the FS UUID, the group number, and the
@@ -51,59 +51,59 @@ The block group descriptor is laid out in ``struct ext4_group_desc``.
- Name
- Description
* - 0x0
- \_\_le32
- bg\_block\_bitmap\_lo
- __le32
- bg_block_bitmap_lo
- Lower 32-bits of location of block bitmap.
* - 0x4
- \_\_le32
- bg\_inode\_bitmap\_lo
- __le32
- bg_inode_bitmap_lo
- Lower 32-bits of location of inode bitmap.
* - 0x8
- \_\_le32
- bg\_inode\_table\_lo
- __le32
- bg_inode_table_lo
- Lower 32-bits of location of inode table.
* - 0xC
- \_\_le16
- bg\_free\_blocks\_count\_lo
- __le16
- bg_free_blocks_count_lo
- Lower 16-bits of free block count.
* - 0xE
- \_\_le16
- bg\_free\_inodes\_count\_lo
- __le16
- bg_free_inodes_count_lo
- Lower 16-bits of free inode count.
* - 0x10
- \_\_le16
- bg\_used\_dirs\_count\_lo
- __le16
- bg_used_dirs_count_lo
- Lower 16-bits of directory count.
* - 0x12
- \_\_le16
- bg\_flags
- __le16
- bg_flags
- Block group flags. See the bgflags_ table below.
* - 0x14
- \_\_le32
- bg\_exclude\_bitmap\_lo
- __le32
- bg_exclude_bitmap_lo
- Lower 32-bits of location of snapshot exclusion bitmap.
* - 0x18
- \_\_le16
- bg\_block\_bitmap\_csum\_lo
- __le16
- bg_block_bitmap_csum_lo
- Lower 16-bits of the block bitmap checksum.
* - 0x1A
- \_\_le16
- bg\_inode\_bitmap\_csum\_lo
- __le16
- bg_inode_bitmap_csum_lo
- Lower 16-bits of the inode bitmap checksum.
* - 0x1C
- \_\_le16
- bg\_itable\_unused\_lo
- __le16
- bg_itable_unused_lo
- Lower 16-bits of unused inode count. If set, we needn't scan past the
``(sb.s_inodes_per_group - gdt.bg_itable_unused)``\ th entry in the
``(sb.s_inodes_per_group - gdt.bg_itable_unused)`` th entry in the
inode table for this group.
* - 0x1E
- \_\_le16
- bg\_checksum
- Group descriptor checksum; crc16(sb\_uuid+group\_num+bg\_desc) if the
RO\_COMPAT\_GDT\_CSUM feature is set, or
crc32c(sb\_uuid+group\_num+bg\_desc) & 0xFFFF if the
RO\_COMPAT\_METADATA\_CSUM feature is set. The bg\_checksum
field in bg\_desc is skipped when calculating crc16 checksum,
- __le16
- bg_checksum
- Group descriptor checksum; crc16(sb_uuid+group_num+bg_desc) if the
RO_COMPAT_GDT_CSUM feature is set, or
crc32c(sb_uuid+group_num+bg_desc) & 0xFFFF if the
RO_COMPAT_METADATA_CSUM feature is set. The bg_checksum
field in bg_desc is skipped when calculating crc16 checksum,
and set to zero if crc32c checksum is used.
* -
-
@@ -111,48 +111,48 @@ The block group descriptor is laid out in ``struct ext4_group_desc``.
- These fields only exist if the 64bit feature is enabled and s_desc_size
> 32.
* - 0x20
- \_\_le32
- bg\_block\_bitmap\_hi
- __le32
- bg_block_bitmap_hi
- Upper 32-bits of location of block bitmap.
* - 0x24
- \_\_le32
- bg\_inode\_bitmap\_hi
- __le32
- bg_inode_bitmap_hi
- Upper 32-bits of location of inodes bitmap.
* - 0x28
- \_\_le32
- bg\_inode\_table\_hi
- __le32
- bg_inode_table_hi
- Upper 32-bits of location of inodes table.
* - 0x2C
- \_\_le16
- bg\_free\_blocks\_count\_hi
- __le16
- bg_free_blocks_count_hi
- Upper 16-bits of free block count.
* - 0x2E
- \_\_le16
- bg\_free\_inodes\_count\_hi
- __le16
- bg_free_inodes_count_hi
- Upper 16-bits of free inode count.
* - 0x30
- \_\_le16
- bg\_used\_dirs\_count\_hi
- __le16
- bg_used_dirs_count_hi
- Upper 16-bits of directory count.
* - 0x32
- \_\_le16
- bg\_itable\_unused\_hi
- __le16
- bg_itable_unused_hi
- Upper 16-bits of unused inode count.
* - 0x34
- \_\_le32
- bg\_exclude\_bitmap\_hi
- __le32
- bg_exclude_bitmap_hi
- Upper 32-bits of location of snapshot exclusion bitmap.
* - 0x38
- \_\_le16
- bg\_block\_bitmap\_csum\_hi
- __le16
- bg_block_bitmap_csum_hi
- Upper 16-bits of the block bitmap checksum.
* - 0x3A
- \_\_le16
- bg\_inode\_bitmap\_csum\_hi
- __le16
- bg_inode_bitmap_csum_hi
- Upper 16-bits of the inode bitmap checksum.
* - 0x3C
- \_\_u32
- bg\_reserved
- __u32
- bg_reserved
- Padding to 64 bytes.
.. _bgflags:
@@ -166,8 +166,8 @@ Block group flags can be any combination of the following:
* - Value
- Description
* - 0x1
- inode table and bitmap are not initialized (EXT4\_BG\_INODE\_UNINIT).
- inode table and bitmap are not initialized (EXT4_BG_INODE_UNINIT).
* - 0x2
- block bitmap is not initialized (EXT4\_BG\_BLOCK\_UNINIT).
- block bitmap is not initialized (EXT4_BG_BLOCK_UNINIT).
* - 0x4
- inode table is zeroed (EXT4\_BG\_INODE\_ZEROED).
- inode table is zeroed (EXT4_BG_INODE_ZEROED).

View File

@@ -1,6 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0
The Contents of inode.i\_block
The Contents of inode.i_block
------------------------------
Depending on the type of file an inode describes, the 60 bytes of
@@ -47,7 +47,7 @@ In ext4, the file to logical block map has been replaced with an extent
tree. Under the old scheme, allocating a contiguous run of 1,000 blocks
requires an indirect block to map all 1,000 entries; with extents, the
mapping is reduced to a single ``struct ext4_extent`` with
``ee_len = 1000``. If flex\_bg is enabled, it is possible to allocate
``ee_len = 1000``. If flex_bg is enabled, it is possible to allocate
very large files with a single extent, at a considerable reduction in
metadata block use, and some improvement in disk efficiency. The inode
must have the extents flag (0x80000) flag set for this feature to be in
@@ -76,28 +76,28 @@ which is 12 bytes long:
- Name
- Description
* - 0x0
- \_\_le16
- eh\_magic
- __le16
- eh_magic
- Magic number, 0xF30A.
* - 0x2
- \_\_le16
- eh\_entries
- __le16
- eh_entries
- Number of valid entries following the header.
* - 0x4
- \_\_le16
- eh\_max
- __le16
- eh_max
- Maximum number of entries that could follow the header.
* - 0x6
- \_\_le16
- eh\_depth
- __le16
- eh_depth
- Depth of this extent node in the extent tree. 0 = this extent node
points to data blocks; otherwise, this extent node points to other
extent nodes. The extent tree can be at most 5 levels deep: a logical
block number can be at most ``2^32``, and the smallest ``n`` that
satisfies ``4*(((blocksize - 12)/12)^n) >= 2^32`` is 5.
* - 0x8
- \_\_le32
- eh\_generation
- __le32
- eh_generation
- Generation of the tree. (Used by Lustre, but not standard ext4).
Internal nodes of the extent tree, also known as index nodes, are
@@ -112,22 +112,22 @@ recorded as ``struct ext4_extent_idx``, and are 12 bytes long:
- Name
- Description
* - 0x0
- \_\_le32
- ei\_block
- __le32
- ei_block
- This index node covers file blocks from 'block' onward.
* - 0x4
- \_\_le32
- ei\_leaf\_lo
- __le32
- ei_leaf_lo
- Lower 32-bits of the block number of the extent node that is the next
level lower in the tree. The tree node pointed to can be either another
internal node or a leaf node, described below.
* - 0x8
- \_\_le16
- ei\_leaf\_hi
- __le16
- ei_leaf_hi
- Upper 16-bits of the previous field.
* - 0xA
- \_\_u16
- ei\_unused
- __u16
- ei_unused
-
Leaf nodes of the extent tree are recorded as ``struct ext4_extent``,
@@ -142,24 +142,24 @@ and are also 12 bytes long:
- Name
- Description
* - 0x0
- \_\_le32
- ee\_block
- __le32
- ee_block
- First file block number that this extent covers.
* - 0x4
- \_\_le16
- ee\_len
- __le16
- ee_len
- Number of blocks covered by extent. If the value of this field is <=
32768, the extent is initialized. If the value of the field is > 32768,
the extent is uninitialized and the actual extent length is ``ee_len`` -
32768. Therefore, the maximum length of a initialized extent is 32768
blocks, and the maximum length of an uninitialized extent is 32767.
* - 0x6
- \_\_le16
- ee\_start\_hi
- __le16
- ee_start_hi
- Upper 16-bits of the block number to which this extent points.
* - 0x8
- \_\_le32
- ee\_start\_lo
- __le32
- ee_start_lo
- Lower 32-bits of the block number to which this extent points.
Prior to the introduction of metadata checksums, the extent header +
@@ -182,8 +182,8 @@ including) the checksum itself.
- Name
- Description
* - 0x0
- \_\_le32
- eb\_checksum
- __le32
- eb_checksum
- Checksum of the extent block, crc32c(uuid+inum+igeneration+extentblock)
Inline Data

View File

@@ -11,12 +11,12 @@ file is smaller than 60 bytes, then the data are stored inline in
attribute space, then it might be found as an extended attribute
“system.data” within the inode body (“ibody EA”). This of course
constrains the amount of extended attributes one can attach to an inode.
If the data size increases beyond i\_block + ibody EA, a regular block
If the data size increases beyond i_block + ibody EA, a regular block
is allocated and the contents moved to that block.
Pending a change to compact the extended attribute key used to store
inline data, one ought to be able to store 160 bytes of data in a
256-byte inode (as of June 2015, when i\_extra\_isize is 28). Prior to
256-byte inode (as of June 2015, when i_extra_isize is 28). Prior to
that, the limit was 156 bytes due to inefficient use of inode space.
The inline data feature requires the presence of an extended attribute
@@ -25,12 +25,12 @@ for “system.data”, even if the attribute value is zero length.
Inline Directories
~~~~~~~~~~~~~~~~~~
The first four bytes of i\_block are the inode number of the parent
The first four bytes of i_block are the inode number of the parent
directory. Following that is a 56-byte space for an array of directory
entries; see ``struct ext4_dir_entry``. If there is a “system.data”
attribute in the inode body, the EA value is an array of
``struct ext4_dir_entry`` as well. Note that for inline directories, the
i\_block and EA space are treated as separate dirent blocks; directory
i_block and EA space are treated as separate dirent blocks; directory
entries cannot span the two.
Inline directory entries are not checksummed, as the inode checksum

File diff suppressed because it is too large Load Diff

View File

@@ -63,8 +63,8 @@ Generally speaking, the journal has this format:
:header-rows: 1
* - Superblock
- descriptor\_block (data\_blocks or revocation\_block) [more data or
revocations] commmit\_block
- descriptor_block (data_blocks or revocation_block) [more data or
revocations] commmit_block
- [more transactions...]
* -
- One transaction
@@ -93,8 +93,8 @@ superblock.
* - 1024 bytes of padding
- ext4 Superblock
- Journal Superblock
- descriptor\_block (data\_blocks or revocation\_block) [more data or
revocations] commmit\_block
- descriptor_block (data_blocks or revocation_block) [more data or
revocations] commmit_block
- [more transactions...]
* -
-
@@ -117,17 +117,17 @@ Every block in the journal starts with a common 12-byte header
- Name
- Description
* - 0x0
- \_\_be32
- h\_magic
- __be32
- h_magic
- jbd2 magic number, 0xC03B3998.
* - 0x4
- \_\_be32
- h\_blocktype
- __be32
- h_blocktype
- Description of what this block contains. See the jbd2_blocktype_ table
below.
* - 0x8
- \_\_be32
- h\_sequence
- __be32
- h_sequence
- The transaction ID that goes with this block.
.. _jbd2_blocktype:
@@ -177,99 +177,99 @@ which is 1024 bytes long:
-
- Static information describing the journal.
* - 0x0
- journal\_header\_t (12 bytes)
- s\_header
- journal_header_t (12 bytes)
- s_header
- Common header identifying this as a superblock.
* - 0xC
- \_\_be32
- s\_blocksize
- __be32
- s_blocksize
- Journal device block size.
* - 0x10
- \_\_be32
- s\_maxlen
- __be32
- s_maxlen
- Total number of blocks in this journal.
* - 0x14
- \_\_be32
- s\_first
- __be32
- s_first
- First block of log information.
* -
-
-
- Dynamic information describing the current state of the log.
* - 0x18
- \_\_be32
- s\_sequence
- __be32
- s_sequence
- First commit ID expected in log.
* - 0x1C
- \_\_be32
- s\_start
- __be32
- s_start
- Block number of the start of log. Contrary to the comments, this field
being zero does not imply that the journal is clean!
* - 0x20
- \_\_be32
- s\_errno
- Error value, as set by jbd2\_journal\_abort().
- __be32
- s_errno
- Error value, as set by jbd2_journal_abort().
* -
-
-
- The remaining fields are only valid in a v2 superblock.
* - 0x24
- \_\_be32
- s\_feature\_compat;
- __be32
- s_feature_compat;
- Compatible feature set. See the table jbd2_compat_ below.
* - 0x28
- \_\_be32
- s\_feature\_incompat
- __be32
- s_feature_incompat
- Incompatible feature set. See the table jbd2_incompat_ below.
* - 0x2C
- \_\_be32
- s\_feature\_ro\_compat
- __be32
- s_feature_ro_compat
- Read-only compatible feature set. There aren't any of these currently.
* - 0x30
- \_\_u8
- s\_uuid[16]
- __u8
- s_uuid[16]
- 128-bit uuid for journal. This is compared against the copy in the ext4
super block at mount time.
* - 0x40
- \_\_be32
- s\_nr\_users
- __be32
- s_nr_users
- Number of file systems sharing this journal.
* - 0x44
- \_\_be32
- s\_dynsuper
- __be32
- s_dynsuper
- Location of dynamic super block copy. (Not used?)
* - 0x48
- \_\_be32
- s\_max\_transaction
- __be32
- s_max_transaction
- Limit of journal blocks per transaction. (Not used?)
* - 0x4C
- \_\_be32
- s\_max\_trans\_data
- __be32
- s_max_trans_data
- Limit of data blocks per transaction. (Not used?)
* - 0x50
- \_\_u8
- s\_checksum\_type
- __u8
- s_checksum_type
- Checksum algorithm used for the journal. See jbd2_checksum_type_ for
more info.
* - 0x51
- \_\_u8[3]
- s\_padding2
- __u8[3]
- s_padding2
-
* - 0x54
- \_\_be32
- s\_num\_fc\_blocks
- __be32
- s_num_fc_blocks
- Number of fast commit blocks in the journal.
* - 0x58
- \_\_u32
- s\_padding[42]
- __u32
- s_padding[42]
-
* - 0xFC
- \_\_be32
- s\_checksum
- __be32
- s_checksum
- Checksum of the entire superblock, with this field set to zero.
* - 0x100
- \_\_u8
- s\_users[16\*48]
- __u8
- s_users[16*48]
- ids of all file systems sharing the log. e2fsprogs/Linux don't allow
shared external journals, but I imagine Lustre (or ocfs2?), which use
the jbd2 code, might.
@@ -286,7 +286,7 @@ The journal compat features are any combination of the following:
- Description
* - 0x1
- Journal maintains checksums on the data blocks.
(JBD2\_FEATURE\_COMPAT\_CHECKSUM)
(JBD2_FEATURE_COMPAT_CHECKSUM)
.. _jbd2_incompat:
@@ -299,23 +299,23 @@ The journal incompat features are any combination of the following:
* - Value
- Description
* - 0x1
- Journal has block revocation records. (JBD2\_FEATURE\_INCOMPAT\_REVOKE)
- Journal has block revocation records. (JBD2_FEATURE_INCOMPAT_REVOKE)
* - 0x2
- Journal can deal with 64-bit block numbers.
(JBD2\_FEATURE\_INCOMPAT\_64BIT)
(JBD2_FEATURE_INCOMPAT_64BIT)
* - 0x4
- Journal commits asynchronously. (JBD2\_FEATURE\_INCOMPAT\_ASYNC\_COMMIT)
- Journal commits asynchronously. (JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)
* - 0x8
- This journal uses v2 of the checksum on-disk format. Each journal
metadata block gets its own checksum, and the block tags in the
descriptor table contain checksums for each of the data blocks in the
journal. (JBD2\_FEATURE\_INCOMPAT\_CSUM\_V2)
journal. (JBD2_FEATURE_INCOMPAT_CSUM_V2)
* - 0x10
- This journal uses v3 of the checksum on-disk format. This is the same as
v2, but the journal block tag size is fixed regardless of the size of
block numbers. (JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3)
block numbers. (JBD2_FEATURE_INCOMPAT_CSUM_V3)
* - 0x20
- Journal has fast commit blocks. (JBD2\_FEATURE\_INCOMPAT\_FAST\_COMMIT)
- Journal has fast commit blocks. (JBD2_FEATURE_INCOMPAT_FAST_COMMIT)
.. _jbd2_checksum_type:
@@ -355,11 +355,11 @@ Descriptor blocks consume at least 36 bytes, but use a full block:
- Name
- Descriptor
* - 0x0
- journal\_header\_t
- journal_header_t
- (open coded)
- Common block header.
* - 0xC
- struct journal\_block\_tag\_s
- struct journal_block_tag_s
- open coded array[]
- Enough tags either to fill up the block or to describe all the data
blocks that follow this descriptor block.
@@ -367,7 +367,7 @@ Descriptor blocks consume at least 36 bytes, but use a full block:
Journal block tags have any of the following formats, depending on which
journal feature and block tag flags are set.
If JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 is set, the journal block tag is
If JBD2_FEATURE_INCOMPAT_CSUM_V3 is set, the journal block tag is
defined as ``struct journal_block_tag3_s``, which looks like the
following. The size is 16 or 32 bytes.
@@ -380,24 +380,24 @@ following. The size is 16 or 32 bytes.
- Name
- Descriptor
* - 0x0
- \_\_be32
- t\_blocknr
- __be32
- t_blocknr
- Lower 32-bits of the location of where the corresponding data block
should end up on disk.
* - 0x4
- \_\_be32
- t\_flags
- __be32
- t_flags
- Flags that go with the descriptor. See the table jbd2_tag_flags_ for
more info.
* - 0x8
- \_\_be32
- t\_blocknr\_high
- __be32
- t_blocknr_high
- Upper 32-bits of the location of where the corresponding data block
should end up on disk. This is zero if JBD2\_FEATURE\_INCOMPAT\_64BIT is
should end up on disk. This is zero if JBD2_FEATURE_INCOMPAT_64BIT is
not enabled.
* - 0xC
- \_\_be32
- t\_checksum
- __be32
- t_checksum
- Checksum of the journal UUID, the sequence number, and the data block.
* -
-
@@ -433,7 +433,7 @@ The journal tag flags are any combination of the following:
* - 0x8
- This is the last tag in this descriptor block.
If JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 is NOT set, the journal block tag
If JBD2_FEATURE_INCOMPAT_CSUM_V3 is NOT set, the journal block tag
is defined as ``struct journal_block_tag_s``, which looks like the
following. The size is 8, 12, 24, or 28 bytes:
@@ -446,18 +446,18 @@ following. The size is 8, 12, 24, or 28 bytes:
- Name
- Descriptor
* - 0x0
- \_\_be32
- t\_blocknr
- __be32
- t_blocknr
- Lower 32-bits of the location of where the corresponding data block
should end up on disk.
* - 0x4
- \_\_be16
- t\_checksum
- __be16
- t_checksum
- Checksum of the journal UUID, the sequence number, and the data block.
Note that only the lower 16 bits are stored.
* - 0x6
- \_\_be16
- t\_flags
- __be16
- t_flags
- Flags that go with the descriptor. See the table jbd2_tag_flags_ for
more info.
* -
@@ -466,8 +466,8 @@ following. The size is 8, 12, 24, or 28 bytes:
- This next field is only present if the super block indicates support for
64-bit block numbers.
* - 0x8
- \_\_be32
- t\_blocknr\_high
- __be32
- t_blocknr_high
- Upper 32-bits of the location of where the corresponding data block
should end up on disk.
* -
@@ -483,8 +483,8 @@ following. The size is 8, 12, 24, or 28 bytes:
``j_uuid`` field in ``struct journal_s``, but only tune2fs touches that
field.
If JBD2\_FEATURE\_INCOMPAT\_CSUM\_V2 or
JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 are set, the end of the block is a
If JBD2_FEATURE_INCOMPAT_CSUM_V2 or
JBD2_FEATURE_INCOMPAT_CSUM_V3 are set, the end of the block is a
``struct jbd2_journal_block_tail``, which looks like this:
.. list-table::
@@ -496,8 +496,8 @@ JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 are set, the end of the block is a
- Name
- Descriptor
* - 0x0
- \_\_be32
- t\_checksum
- __be32
- t_checksum
- Checksum of the journal UUID + the descriptor block, with this field set
to zero.
@@ -538,25 +538,25 @@ length, but use a full block:
- Name
- Description
* - 0x0
- journal\_header\_t
- r\_header
- journal_header_t
- r_header
- Common block header.
* - 0xC
- \_\_be32
- r\_count
- __be32
- r_count
- Number of bytes used in this block.
* - 0x10
- \_\_be32 or \_\_be64
- __be32 or __be64
- blocks[0]
- Blocks to revoke.
After r\_count is a linear array of block numbers that are effectively
After r_count is a linear array of block numbers that are effectively
revoked by this transaction. The size of each block number is 8 bytes if
the superblock advertises 64-bit block number support, or 4 bytes
otherwise.
If JBD2\_FEATURE\_INCOMPAT\_CSUM\_V2 or
JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 are set, the end of the revocation
If JBD2_FEATURE_INCOMPAT_CSUM_V2 or
JBD2_FEATURE_INCOMPAT_CSUM_V3 are set, the end of the revocation
block is a ``struct jbd2_journal_revoke_tail``, which has this format:
.. list-table::
@@ -568,8 +568,8 @@ block is a ``struct jbd2_journal_revoke_tail``, which has this format:
- Name
- Description
* - 0x0
- \_\_be32
- r\_checksum
- __be32
- r_checksum
- Checksum of the journal UUID + revocation block
Commit Block
@@ -592,38 +592,38 @@ bytes long (but uses a full block):
- Name
- Descriptor
* - 0x0
- journal\_header\_s
- journal_header_s
- (open coded)
- Common block header.
* - 0xC
- unsigned char
- h\_chksum\_type
- h_chksum_type
- The type of checksum to use to verify the integrity of the data blocks
in the transaction. See jbd2_checksum_type_ for more info.
* - 0xD
- unsigned char
- h\_chksum\_size
- h_chksum_size
- The number of bytes used by the checksum. Most likely 4.
* - 0xE
- unsigned char
- h\_padding[2]
- h_padding[2]
-
* - 0x10
- \_\_be32
- h\_chksum[JBD2\_CHECKSUM\_BYTES]
- __be32
- h_chksum[JBD2_CHECKSUM_BYTES]
- 32 bytes of space to store checksums. If
JBD2\_FEATURE\_INCOMPAT\_CSUM\_V2 or JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3
JBD2_FEATURE_INCOMPAT_CSUM_V2 or JBD2_FEATURE_INCOMPAT_CSUM_V3
are set, the first ``__be32`` is the checksum of the journal UUID and
the entire commit block, with this field zeroed. If
JBD2\_FEATURE\_COMPAT\_CHECKSUM is set, the first ``__be32`` is the
JBD2_FEATURE_COMPAT_CHECKSUM is set, the first ``__be32`` is the
crc32 of all the blocks already written to the transaction.
* - 0x30
- \_\_be64
- h\_commit\_sec
- __be64
- h_commit_sec
- The time that the transaction was committed, in seconds since the epoch.
* - 0x38
- \_\_be32
- h\_commit\_nsec
- __be32
- h_commit_nsec
- Nanoseconds component of the above timestamp.
Fast commits

View File

@@ -7,8 +7,8 @@ Multiple mount protection (MMP) is a feature that protects the
filesystem against multiple hosts trying to use the filesystem
simultaneously. When a filesystem is opened (for mounting, or fsck,
etc.), the MMP code running on the node (call it node A) checks a
sequence number. If the sequence number is EXT4\_MMP\_SEQ\_CLEAN, the
open continues. If the sequence number is EXT4\_MMP\_SEQ\_FSCK, then
sequence number. If the sequence number is EXT4_MMP_SEQ_CLEAN, the
open continues. If the sequence number is EXT4_MMP_SEQ_FSCK, then
fsck is (hopefully) running, and open fails immediately. Otherwise, the
open code will wait for twice the specified MMP check interval and check
the sequence number again. If the sequence number has changed, then the
@@ -40,38 +40,38 @@ The MMP structure (``struct mmp_struct``) is as follows:
- Name
- Description
* - 0x0
- \_\_le32
- mmp\_magic
- __le32
- mmp_magic
- Magic number for MMP, 0x004D4D50 (“MMP”).
* - 0x4
- \_\_le32
- mmp\_seq
- __le32
- mmp_seq
- Sequence number, updated periodically.
* - 0x8
- \_\_le64
- mmp\_time
- __le64
- mmp_time
- Time that the MMP block was last updated.
* - 0x10
- char[64]
- mmp\_nodename
- mmp_nodename
- Hostname of the node that opened the filesystem.
* - 0x50
- char[32]
- mmp\_bdevname
- mmp_bdevname
- Block device name of the filesystem.
* - 0x70
- \_\_le16
- mmp\_check\_interval
- __le16
- mmp_check_interval
- The MMP re-check interval, in seconds.
* - 0x72
- \_\_le16
- mmp\_pad1
- __le16
- mmp_pad1
- Zero.
* - 0x74
- \_\_le32[226]
- mmp\_pad2
- __le32[226]
- mmp_pad2
- Zero.
* - 0x3FC
- \_\_le32
- mmp\_checksum
- __le32
- mmp_checksum
- Checksum of the MMP block.

View File

@@ -7,7 +7,7 @@ An ext4 file system is split into a series of block groups. To reduce
performance difficulties due to fragmentation, the block allocator tries
very hard to keep each file's blocks within the same group, thereby
reducing seek times. The size of a block group is specified in
``sb.s_blocks_per_group`` blocks, though it can also calculated as 8 \*
``sb.s_blocks_per_group`` blocks, though it can also calculated as 8 *
``block_size_in_bytes``. With the default block size of 4KiB, each group
will contain 32,768 blocks, for a length of 128MiB. The number of block
groups is the size of the device divided by the size of a block group.

View File

@@ -34,7 +34,7 @@ ext4 reserves some inode for special features, as follows:
* - 10
- Replica inode, used for some non-upstream feature?
* - 11
- Traditional first non-reserved inode. Usually this is the lost+found directory. See s\_first\_ino in the superblock.
- Traditional first non-reserved inode. Usually this is the lost+found directory. See s_first_ino in the superblock.
Note that there are also some inodes allocated from non-reserved inode numbers
for other filesystem features which are not referenced from standard directory
@@ -47,9 +47,9 @@ hierarchy. These are generally reference from the superblock. They are:
* - Superblock field
- Description
* - s\_lpf\_ino
* - s_lpf_ino
- Inode number of lost+found directory.
* - s\_prj\_quota\_inum
* - s_prj_quota_inum
- Inode number of quota file tracking project quotas
* - s\_orphan\_file\_inum
* - s_orphan_file_inum
- Inode number of file tracking orphan inodes.

File diff suppressed because it is too large Load Diff