You've already forked linux-rockchip
mirror of
https://github.com/armbian/linux-rockchip.git
synced 2026-01-06 11:08:10 -08:00
Merge tag 'irqchip-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip/genirq updates from Marc Zyngier:
* Core code update:
- Non-SMP IRQ affinity fixes, allowing UP kernel to behave similarly
to SMP ones for the purpose of interrupt affinity
- Let irq_set_chip_handler_name_locked() take a const struct irq_chip *
- Tidy-up the NOMAP irqdomain API variant
- Teach action_show() to use for_each_action_of_desc()
- Make irq_chip_request_resources_parent() allow the parent callback
to be optional
- Remove dynamic allocations from populate_parent_alloc_arg()
* New drivers:
- Merge the long awaited IRQ support for the LoongArch architecture,
with the provisional ACPICA update (to be reverted once the official
support lands)
- New Renesas RZ/G2L IRQC driver, equipped with its companion GPIO
driver
* Driver updates
- Optimise the hot path operations for the SiFive PLIC, trading the
locking for per-CPU priority masking masking operations which are
apparently faster
- Work around broken PLIC implementations that deal pretty badly with
edge-triggered interrupts. Flag two implementations as affected.
- Simplify the irq-stm32-exti driver, particularly the table that
remaps the interrupts from exti to the GIC, reducing the memory usage
- Convert the ocelot irq_chip to being immutable
- Check ioremap() return value in the MIPS GIC driver
- Move MMP driver init function declarations into the common .h
- The obligatory typo fixes
Link: https://lore.kernel.org/all/20220727192356.1860546-1-maz@kernel.org
This commit is contained in:
@@ -526,6 +526,7 @@ What: /sys/devices/system/cpu/vulnerabilities
|
||||
/sys/devices/system/cpu/vulnerabilities/srbds
|
||||
/sys/devices/system/cpu/vulnerabilities/tsx_async_abort
|
||||
/sys/devices/system/cpu/vulnerabilities/itlb_multihit
|
||||
/sys/devices/system/cpu/vulnerabilities/mmio_stale_data
|
||||
Date: January 2018
|
||||
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
|
||||
Description: Information about CPU vulnerabilities
|
||||
|
||||
@@ -17,3 +17,4 @@ are configurable at compile, boot or run time.
|
||||
special-register-buffer-data-sampling.rst
|
||||
core-scheduling.rst
|
||||
l1d_flush.rst
|
||||
processor_mmio_stale_data.rst
|
||||
|
||||
246
Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
Normal file
246
Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
Normal file
@@ -0,0 +1,246 @@
|
||||
=========================================
|
||||
Processor MMIO Stale Data Vulnerabilities
|
||||
=========================================
|
||||
|
||||
Processor MMIO Stale Data Vulnerabilities are a class of memory-mapped I/O
|
||||
(MMIO) vulnerabilities that can expose data. The sequences of operations for
|
||||
exposing data range from simple to very complex. Because most of the
|
||||
vulnerabilities require the attacker to have access to MMIO, many environments
|
||||
are not affected. System environments using virtualization where MMIO access is
|
||||
provided to untrusted guests may need mitigation. These vulnerabilities are
|
||||
not transient execution attacks. However, these vulnerabilities may propagate
|
||||
stale data into core fill buffers where the data can subsequently be inferred
|
||||
by an unmitigated transient execution attack. Mitigation for these
|
||||
vulnerabilities includes a combination of microcode update and software
|
||||
changes, depending on the platform and usage model. Some of these mitigations
|
||||
are similar to those used to mitigate Microarchitectural Data Sampling (MDS) or
|
||||
those used to mitigate Special Register Buffer Data Sampling (SRBDS).
|
||||
|
||||
Data Propagators
|
||||
================
|
||||
Propagators are operations that result in stale data being copied or moved from
|
||||
one microarchitectural buffer or register to another. Processor MMIO Stale Data
|
||||
Vulnerabilities are operations that may result in stale data being directly
|
||||
read into an architectural, software-visible state or sampled from a buffer or
|
||||
register.
|
||||
|
||||
Fill Buffer Stale Data Propagator (FBSDP)
|
||||
-----------------------------------------
|
||||
Stale data may propagate from fill buffers (FB) into the non-coherent portion
|
||||
of the uncore on some non-coherent writes. Fill buffer propagation by itself
|
||||
does not make stale data architecturally visible. Stale data must be propagated
|
||||
to a location where it is subject to reading or sampling.
|
||||
|
||||
Sideband Stale Data Propagator (SSDP)
|
||||
-------------------------------------
|
||||
The sideband stale data propagator (SSDP) is limited to the client (including
|
||||
Intel Xeon server E3) uncore implementation. The sideband response buffer is
|
||||
shared by all client cores. For non-coherent reads that go to sideband
|
||||
destinations, the uncore logic returns 64 bytes of data to the core, including
|
||||
both requested data and unrequested stale data, from a transaction buffer and
|
||||
the sideband response buffer. As a result, stale data from the sideband
|
||||
response and transaction buffers may now reside in a core fill buffer.
|
||||
|
||||
Primary Stale Data Propagator (PSDP)
|
||||
------------------------------------
|
||||
The primary stale data propagator (PSDP) is limited to the client (including
|
||||
Intel Xeon server E3) uncore implementation. Similar to the sideband response
|
||||
buffer, the primary response buffer is shared by all client cores. For some
|
||||
processors, MMIO primary reads will return 64 bytes of data to the core fill
|
||||
buffer including both requested data and unrequested stale data. This is
|
||||
similar to the sideband stale data propagator.
|
||||
|
||||
Vulnerabilities
|
||||
===============
|
||||
Device Register Partial Write (DRPW) (CVE-2022-21166)
|
||||
-----------------------------------------------------
|
||||
Some endpoint MMIO registers incorrectly handle writes that are smaller than
|
||||
the register size. Instead of aborting the write or only copying the correct
|
||||
subset of bytes (for example, 2 bytes for a 2-byte write), more bytes than
|
||||
specified by the write transaction may be written to the register. On
|
||||
processors affected by FBSDP, this may expose stale data from the fill buffers
|
||||
of the core that created the write transaction.
|
||||
|
||||
Shared Buffers Data Sampling (SBDS) (CVE-2022-21125)
|
||||
----------------------------------------------------
|
||||
After propagators may have moved data around the uncore and copied stale data
|
||||
into client core fill buffers, processors affected by MFBDS can leak data from
|
||||
the fill buffer. It is limited to the client (including Intel Xeon server E3)
|
||||
uncore implementation.
|
||||
|
||||
Shared Buffers Data Read (SBDR) (CVE-2022-21123)
|
||||
------------------------------------------------
|
||||
It is similar to Shared Buffer Data Sampling (SBDS) except that the data is
|
||||
directly read into the architectural software-visible state. It is limited to
|
||||
the client (including Intel Xeon server E3) uncore implementation.
|
||||
|
||||
Affected Processors
|
||||
===================
|
||||
Not all the CPUs are affected by all the variants. For instance, most
|
||||
processors for the server market (excluding Intel Xeon E3 processors) are
|
||||
impacted by only Device Register Partial Write (DRPW).
|
||||
|
||||
Below is the list of affected Intel processors [#f1]_:
|
||||
|
||||
=================== ============ =========
|
||||
Common name Family_Model Steppings
|
||||
=================== ============ =========
|
||||
HASWELL_X 06_3FH 2,4
|
||||
SKYLAKE_L 06_4EH 3
|
||||
BROADWELL_X 06_4FH All
|
||||
SKYLAKE_X 06_55H 3,4,6,7,11
|
||||
BROADWELL_D 06_56H 3,4,5
|
||||
SKYLAKE 06_5EH 3
|
||||
ICELAKE_X 06_6AH 4,5,6
|
||||
ICELAKE_D 06_6CH 1
|
||||
ICELAKE_L 06_7EH 5
|
||||
ATOM_TREMONT_D 06_86H All
|
||||
LAKEFIELD 06_8AH 1
|
||||
KABYLAKE_L 06_8EH 9 to 12
|
||||
ATOM_TREMONT 06_96H 1
|
||||
ATOM_TREMONT_L 06_9CH 0
|
||||
KABYLAKE 06_9EH 9 to 13
|
||||
COMETLAKE 06_A5H 2,3,5
|
||||
COMETLAKE_L 06_A6H 0,1
|
||||
ROCKETLAKE 06_A7H 1
|
||||
=================== ============ =========
|
||||
|
||||
If a CPU is in the affected processor list, but not affected by a variant, it
|
||||
is indicated by new bits in MSR IA32_ARCH_CAPABILITIES. As described in a later
|
||||
section, mitigation largely remains the same for all the variants, i.e. to
|
||||
clear the CPU fill buffers via VERW instruction.
|
||||
|
||||
New bits in MSRs
|
||||
================
|
||||
Newer processors and microcode update on existing affected processors added new
|
||||
bits to IA32_ARCH_CAPABILITIES MSR. These bits can be used to enumerate
|
||||
specific variants of Processor MMIO Stale Data vulnerabilities and mitigation
|
||||
capability.
|
||||
|
||||
MSR IA32_ARCH_CAPABILITIES
|
||||
--------------------------
|
||||
Bit 13 - SBDR_SSDP_NO - When set, processor is not affected by either the
|
||||
Shared Buffers Data Read (SBDR) vulnerability or the sideband stale
|
||||
data propagator (SSDP).
|
||||
Bit 14 - FBSDP_NO - When set, processor is not affected by the Fill Buffer
|
||||
Stale Data Propagator (FBSDP).
|
||||
Bit 15 - PSDP_NO - When set, processor is not affected by Primary Stale Data
|
||||
Propagator (PSDP).
|
||||
Bit 17 - FB_CLEAR - When set, VERW instruction will overwrite CPU fill buffer
|
||||
values as part of MD_CLEAR operations. Processors that do not
|
||||
enumerate MDS_NO (meaning they are affected by MDS) but that do
|
||||
enumerate support for both L1D_FLUSH and MD_CLEAR implicitly enumerate
|
||||
FB_CLEAR as part of their MD_CLEAR support.
|
||||
Bit 18 - FB_CLEAR_CTRL - Processor supports read and write to MSR
|
||||
IA32_MCU_OPT_CTRL[FB_CLEAR_DIS]. On such processors, the FB_CLEAR_DIS
|
||||
bit can be set to cause the VERW instruction to not perform the
|
||||
FB_CLEAR action. Not all processors that support FB_CLEAR will support
|
||||
FB_CLEAR_CTRL.
|
||||
|
||||
MSR IA32_MCU_OPT_CTRL
|
||||
---------------------
|
||||
Bit 3 - FB_CLEAR_DIS - When set, VERW instruction does not perform the FB_CLEAR
|
||||
action. This may be useful to reduce the performance impact of FB_CLEAR in
|
||||
cases where system software deems it warranted (for example, when performance
|
||||
is more critical, or the untrusted software has no MMIO access). Note that
|
||||
FB_CLEAR_DIS has no impact on enumeration (for example, it does not change
|
||||
FB_CLEAR or MD_CLEAR enumeration) and it may not be supported on all processors
|
||||
that enumerate FB_CLEAR.
|
||||
|
||||
Mitigation
|
||||
==========
|
||||
Like MDS, all variants of Processor MMIO Stale Data vulnerabilities have the
|
||||
same mitigation strategy to force the CPU to clear the affected buffers before
|
||||
an attacker can extract the secrets.
|
||||
|
||||
This is achieved by using the otherwise unused and obsolete VERW instruction in
|
||||
combination with a microcode update. The microcode clears the affected CPU
|
||||
buffers when the VERW instruction is executed.
|
||||
|
||||
Kernel reuses the MDS function to invoke the buffer clearing:
|
||||
|
||||
mds_clear_cpu_buffers()
|
||||
|
||||
On MDS affected CPUs, the kernel already invokes CPU buffer clear on
|
||||
kernel/userspace, hypervisor/guest and C-state (idle) transitions. No
|
||||
additional mitigation is needed on such CPUs.
|
||||
|
||||
For CPUs not affected by MDS or TAA, mitigation is needed only for the attacker
|
||||
with MMIO capability. Therefore, VERW is not required for kernel/userspace. For
|
||||
virtualization case, VERW is only needed at VMENTER for a guest with MMIO
|
||||
capability.
|
||||
|
||||
Mitigation points
|
||||
-----------------
|
||||
Return to user space
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
Same mitigation as MDS when affected by MDS/TAA, otherwise no mitigation
|
||||
needed.
|
||||
|
||||
C-State transition
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
Control register writes by CPU during C-state transition can propagate data
|
||||
from fill buffer to uncore buffers. Execute VERW before C-state transition to
|
||||
clear CPU fill buffers.
|
||||
|
||||
Guest entry point
|
||||
^^^^^^^^^^^^^^^^^
|
||||
Same mitigation as MDS when processor is also affected by MDS/TAA, otherwise
|
||||
execute VERW at VMENTER only for MMIO capable guests. On CPUs not affected by
|
||||
MDS/TAA, guest without MMIO access cannot extract secrets using Processor MMIO
|
||||
Stale Data vulnerabilities, so there is no need to execute VERW for such guests.
|
||||
|
||||
Mitigation control on the kernel command line
|
||||
---------------------------------------------
|
||||
The kernel command line allows to control the Processor MMIO Stale Data
|
||||
mitigations at boot time with the option "mmio_stale_data=". The valid
|
||||
arguments for this option are:
|
||||
|
||||
========== =================================================================
|
||||
full If the CPU is vulnerable, enable mitigation; CPU buffer clearing
|
||||
on exit to userspace and when entering a VM. Idle transitions are
|
||||
protected as well. It does not automatically disable SMT.
|
||||
full,nosmt Same as full, with SMT disabled on vulnerable CPUs. This is the
|
||||
complete mitigation.
|
||||
off Disables mitigation completely.
|
||||
========== =================================================================
|
||||
|
||||
If the CPU is affected and mmio_stale_data=off is not supplied on the kernel
|
||||
command line, then the kernel selects the appropriate mitigation.
|
||||
|
||||
Mitigation status information
|
||||
-----------------------------
|
||||
The Linux kernel provides a sysfs interface to enumerate the current
|
||||
vulnerability status of the system: whether the system is vulnerable, and
|
||||
which mitigations are active. The relevant sysfs file is:
|
||||
|
||||
/sys/devices/system/cpu/vulnerabilities/mmio_stale_data
|
||||
|
||||
The possible values in this file are:
|
||||
|
||||
.. list-table::
|
||||
|
||||
* - 'Not affected'
|
||||
- The processor is not vulnerable
|
||||
* - 'Vulnerable'
|
||||
- The processor is vulnerable, but no mitigation enabled
|
||||
* - 'Vulnerable: Clear CPU buffers attempted, no microcode'
|
||||
- The processor is vulnerable, but microcode is not updated. The
|
||||
mitigation is enabled on a best effort basis.
|
||||
* - 'Mitigation: Clear CPU buffers'
|
||||
- The processor is vulnerable and the CPU buffer clearing mitigation is
|
||||
enabled.
|
||||
|
||||
If the processor is vulnerable then the following information is appended to
|
||||
the above information:
|
||||
|
||||
======================== ===========================================
|
||||
'SMT vulnerable' SMT is enabled
|
||||
'SMT disabled' SMT is disabled
|
||||
'SMT Host state unknown' Kernel runs in a VM, Host SMT state unknown
|
||||
======================== ===========================================
|
||||
|
||||
References
|
||||
----------
|
||||
.. [#f1] Affected Processors
|
||||
https://www.intel.com/content/www/us/en/developer/topic-technology/software-security-guidance/processors-affected-consolidated-product-cpu-model.html
|
||||
@@ -2469,7 +2469,6 @@
|
||||
|
||||
protected: nVHE-based mode with support for guests whose
|
||||
state is kept private from the host.
|
||||
Not valid if the kernel is running in EL2.
|
||||
|
||||
Defaults to VHE/nVHE based on hardware support. Setting
|
||||
mode to "protected" will disable kexec and hibernation
|
||||
@@ -3176,6 +3175,7 @@
|
||||
srbds=off [X86,INTEL]
|
||||
no_entry_flush [PPC]
|
||||
no_uaccess_flush [PPC]
|
||||
mmio_stale_data=off [X86]
|
||||
|
||||
Exceptions:
|
||||
This does not have any effect on
|
||||
@@ -3197,6 +3197,7 @@
|
||||
Equivalent to: l1tf=flush,nosmt [X86]
|
||||
mds=full,nosmt [X86]
|
||||
tsx_async_abort=full,nosmt [X86]
|
||||
mmio_stale_data=full,nosmt [X86]
|
||||
|
||||
mminit_loglevel=
|
||||
[KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
|
||||
@@ -3206,6 +3207,40 @@
|
||||
log everything. Information is printed at KERN_DEBUG
|
||||
so loglevel=8 may also need to be specified.
|
||||
|
||||
mmio_stale_data=
|
||||
[X86,INTEL] Control mitigation for the Processor
|
||||
MMIO Stale Data vulnerabilities.
|
||||
|
||||
Processor MMIO Stale Data is a class of
|
||||
vulnerabilities that may expose data after an MMIO
|
||||
operation. Exposed data could originate or end in
|
||||
the same CPU buffers as affected by MDS and TAA.
|
||||
Therefore, similar to MDS and TAA, the mitigation
|
||||
is to clear the affected CPU buffers.
|
||||
|
||||
This parameter controls the mitigation. The
|
||||
options are:
|
||||
|
||||
full - Enable mitigation on vulnerable CPUs
|
||||
|
||||
full,nosmt - Enable mitigation and disable SMT on
|
||||
vulnerable CPUs.
|
||||
|
||||
off - Unconditionally disable mitigation
|
||||
|
||||
On MDS or TAA affected machines,
|
||||
mmio_stale_data=off can be prevented by an active
|
||||
MDS or TAA mitigation as these vulnerabilities are
|
||||
mitigated with the same mechanism so in order to
|
||||
disable this mitigation, you need to specify
|
||||
mds=off and tsx_async_abort=off too.
|
||||
|
||||
Not specifying this option is equivalent to
|
||||
mmio_stale_data=full.
|
||||
|
||||
For details see:
|
||||
Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
|
||||
|
||||
module.sig_enforce
|
||||
[KNL] When CONFIG_MODULE_SIG is set, this means that
|
||||
modules without (valid) signatures will fail to load.
|
||||
|
||||
@@ -40,9 +40,8 @@ properties:
|
||||
value to be used for converting remote channel measurements to
|
||||
temperature.
|
||||
$ref: /schemas/types.yaml#/definitions/int32
|
||||
items:
|
||||
minimum: -128
|
||||
maximum: 127
|
||||
minimum: -128
|
||||
maximum: 127
|
||||
|
||||
ti,beta-compensation:
|
||||
description:
|
||||
|
||||
@@ -0,0 +1,134 @@
|
||||
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||
%YAML 1.2
|
||||
---
|
||||
$id: http://devicetree.org/schemas/interrupt-controller/renesas,rzg2l-irqc.yaml#
|
||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
|
||||
title: Renesas RZ/G2L (and alike SoC's) Interrupt Controller (IA55)
|
||||
|
||||
maintainers:
|
||||
- Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
|
||||
- Geert Uytterhoeven <geert+renesas@glider.be>
|
||||
|
||||
description: |
|
||||
IA55 performs various interrupt controls including synchronization for the external
|
||||
interrupts of NMI, IRQ, and GPIOINT and the interrupts of the built-in peripheral
|
||||
interrupts output by each IP. And it notifies the interrupt to the GIC
|
||||
- IRQ sense select for 8 external interrupts, mapped to 8 GIC SPI interrupts
|
||||
- GPIO pins used as external interrupt input pins, mapped to 32 GIC SPI interrupts
|
||||
- NMI edge select (NMI is not treated as NMI exception and supports fall edge and
|
||||
stand-up edge detection interrupts)
|
||||
|
||||
allOf:
|
||||
- $ref: /schemas/interrupt-controller.yaml#
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
items:
|
||||
- enum:
|
||||
- renesas,r9a07g044-irqc # RZ/G2{L,LC}
|
||||
- renesas,r9a07g054-irqc # RZ/V2L
|
||||
- const: renesas,rzg2l-irqc
|
||||
|
||||
'#interrupt-cells':
|
||||
description: The first cell should contain external interrupt number (IRQ0-7) and the
|
||||
second cell is used to specify the flag.
|
||||
const: 2
|
||||
|
||||
'#address-cells':
|
||||
const: 0
|
||||
|
||||
interrupt-controller: true
|
||||
|
||||
reg:
|
||||
maxItems: 1
|
||||
|
||||
interrupts:
|
||||
maxItems: 41
|
||||
|
||||
clocks:
|
||||
maxItems: 2
|
||||
|
||||
clock-names:
|
||||
items:
|
||||
- const: clk
|
||||
- const: pclk
|
||||
|
||||
power-domains:
|
||||
maxItems: 1
|
||||
|
||||
resets:
|
||||
maxItems: 1
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- '#interrupt-cells'
|
||||
- '#address-cells'
|
||||
- interrupt-controller
|
||||
- reg
|
||||
- interrupts
|
||||
- clocks
|
||||
- clock-names
|
||||
- power-domains
|
||||
- resets
|
||||
|
||||
unevaluatedProperties: false
|
||||
|
||||
examples:
|
||||
- |
|
||||
#include <dt-bindings/interrupt-controller/arm-gic.h>
|
||||
#include <dt-bindings/clock/r9a07g044-cpg.h>
|
||||
|
||||
irqc: interrupt-controller@110a0000 {
|
||||
compatible = "renesas,r9a07g044-irqc", "renesas,rzg2l-irqc";
|
||||
reg = <0x110a0000 0x10000>;
|
||||
#interrupt-cells = <2>;
|
||||
#address-cells = <0>;
|
||||
interrupt-controller;
|
||||
interrupts = <GIC_SPI 0 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 1 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 2 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 3 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 4 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 5 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 6 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 7 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 444 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 445 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 446 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 447 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 448 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 449 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 450 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 451 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 452 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 453 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 454 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 455 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 456 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 457 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 458 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 459 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 460 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 461 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 462 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 463 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 464 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 465 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 466 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 467 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 468 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 469 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 470 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 471 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 472 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 473 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 474 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 475 IRQ_TYPE_LEVEL_HIGH>;
|
||||
clocks = <&cpg CPG_MOD R9A07G044_IA55_CLK>,
|
||||
<&cpg CPG_MOD R9A07G044_IA55_PCLK>;
|
||||
clock-names = "clk", "pclk";
|
||||
power-domains = <&cpg>;
|
||||
resets = <&cpg R9A07G044_IA55_RESETN>;
|
||||
};
|
||||
@@ -26,9 +26,14 @@ description:
|
||||
with priority below this threshold will not cause the PLIC to raise its
|
||||
interrupt line leading to the context.
|
||||
|
||||
While the PLIC supports both edge-triggered and level-triggered interrupts,
|
||||
interrupt handlers are oblivious to this distinction and therefore it is not
|
||||
specified in the PLIC device-tree binding.
|
||||
The PLIC supports both edge-triggered and level-triggered interrupts. For
|
||||
edge-triggered interrupts, the RISC-V PLIC spec allows two responses to edges
|
||||
seen while an interrupt handler is active; the PLIC may either queue them or
|
||||
ignore them. In the first case, handlers are oblivious to the trigger type, so
|
||||
it is not included in the interrupt specifier. In the second case, software
|
||||
needs to know the trigger type, so it can reorder the interrupt flow to avoid
|
||||
missing interrupts. This special handling is needed by at least the Renesas
|
||||
RZ/Five SoC (AX45MP AndesCore with a NCEPLIC100) and the T-HEAD C900 PLIC.
|
||||
|
||||
While the RISC-V ISA doesn't specify a memory layout for the PLIC, the
|
||||
"sifive,plic-1.0.0" device is a concrete implementation of the PLIC that
|
||||
@@ -47,6 +52,10 @@ maintainers:
|
||||
properties:
|
||||
compatible:
|
||||
oneOf:
|
||||
- items:
|
||||
- enum:
|
||||
- renesas,r9a07g043-plic
|
||||
- const: andestech,nceplic100
|
||||
- items:
|
||||
- enum:
|
||||
- sifive,fu540-c000-plic
|
||||
@@ -64,8 +73,7 @@ properties:
|
||||
'#address-cells':
|
||||
const: 0
|
||||
|
||||
'#interrupt-cells':
|
||||
const: 1
|
||||
'#interrupt-cells': true
|
||||
|
||||
interrupt-controller: true
|
||||
|
||||
@@ -82,6 +90,12 @@ properties:
|
||||
description:
|
||||
Specifies how many external interrupts are supported by this controller.
|
||||
|
||||
clocks: true
|
||||
|
||||
power-domains: true
|
||||
|
||||
resets: true
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- '#address-cells'
|
||||
@@ -91,6 +105,47 @@ required:
|
||||
- interrupts-extended
|
||||
- riscv,ndev
|
||||
|
||||
allOf:
|
||||
- if:
|
||||
properties:
|
||||
compatible:
|
||||
contains:
|
||||
enum:
|
||||
- andestech,nceplic100
|
||||
- thead,c900-plic
|
||||
|
||||
then:
|
||||
properties:
|
||||
'#interrupt-cells':
|
||||
const: 2
|
||||
|
||||
else:
|
||||
properties:
|
||||
'#interrupt-cells':
|
||||
const: 1
|
||||
|
||||
- if:
|
||||
properties:
|
||||
compatible:
|
||||
contains:
|
||||
const: renesas,r9a07g043-plic
|
||||
|
||||
then:
|
||||
properties:
|
||||
clocks:
|
||||
maxItems: 1
|
||||
|
||||
power-domains:
|
||||
maxItems: 1
|
||||
|
||||
resets:
|
||||
maxItems: 1
|
||||
|
||||
required:
|
||||
- clocks
|
||||
- power-domains
|
||||
- resets
|
||||
|
||||
additionalProperties: false
|
||||
|
||||
examples:
|
||||
|
||||
@@ -30,6 +30,7 @@ properties:
|
||||
- socionext,uniphier-ld11-aidet
|
||||
- socionext,uniphier-ld20-aidet
|
||||
- socionext,uniphier-pxs3-aidet
|
||||
- socionext,uniphier-nx1-aidet
|
||||
|
||||
reg:
|
||||
maxItems: 1
|
||||
|
||||
@@ -47,6 +47,17 @@ properties:
|
||||
gpio-ranges:
|
||||
maxItems: 1
|
||||
|
||||
interrupt-controller: true
|
||||
|
||||
'#interrupt-cells':
|
||||
const: 2
|
||||
description:
|
||||
The first cell contains the global GPIO port index, constructed using the
|
||||
RZG2L_GPIO() helper macro in <dt-bindings/pinctrl/rzg2l-pinctrl.h> and the
|
||||
second cell is used to specify the flag.
|
||||
E.g. "interrupts = <RZG2L_GPIO(43, 0) IRQ_TYPE_EDGE_FALLING>;" if P43_0 is
|
||||
being used as an interrupt.
|
||||
|
||||
clocks:
|
||||
maxItems: 1
|
||||
|
||||
@@ -110,6 +121,8 @@ required:
|
||||
- gpio-controller
|
||||
- '#gpio-cells'
|
||||
- gpio-ranges
|
||||
- interrupt-controller
|
||||
- '#interrupt-cells'
|
||||
- clocks
|
||||
- power-domains
|
||||
- resets
|
||||
@@ -126,6 +139,8 @@ examples:
|
||||
gpio-controller;
|
||||
#gpio-cells = <2>;
|
||||
gpio-ranges = <&pinctrl 0 0 392>;
|
||||
interrupt-controller;
|
||||
#interrupt-cells = <2>;
|
||||
clocks = <&cpg CPG_MOD R9A07G044_GPIO_HCLK>;
|
||||
resets = <&cpg R9A07G044_GPIO_RSTN>,
|
||||
<&cpg R9A07G044_GPIO_PORT_RESETN>,
|
||||
|
||||
@@ -13,8 +13,8 @@ disappeared as of Linux 3.0.
|
||||
|
||||
There are two places where extended attributes can be found. The first
|
||||
place is between the end of each inode entry and the beginning of the
|
||||
next inode entry. For example, if inode.i\_extra\_isize = 28 and
|
||||
sb.inode\_size = 256, then there are 256 - (128 + 28) = 100 bytes
|
||||
next inode entry. For example, if inode.i_extra_isize = 28 and
|
||||
sb.inode_size = 256, then there are 256 - (128 + 28) = 100 bytes
|
||||
available for in-inode extended attribute storage. The second place
|
||||
where extended attributes can be found is in the block pointed to by
|
||||
``inode.i_file_acl``. As of Linux 3.11, it is not possible for this
|
||||
@@ -38,8 +38,8 @@ Extended attributes, when stored after the inode, have a header
|
||||
- Name
|
||||
- Description
|
||||
* - 0x0
|
||||
- \_\_le32
|
||||
- h\_magic
|
||||
- __le32
|
||||
- h_magic
|
||||
- Magic number for identification, 0xEA020000. This value is set by the
|
||||
Linux driver, though e2fsprogs doesn't seem to check it(?)
|
||||
|
||||
@@ -55,28 +55,28 @@ The beginning of an extended attribute block is in
|
||||
- Name
|
||||
- Description
|
||||
* - 0x0
|
||||
- \_\_le32
|
||||
- h\_magic
|
||||
- __le32
|
||||
- h_magic
|
||||
- Magic number for identification, 0xEA020000.
|
||||
* - 0x4
|
||||
- \_\_le32
|
||||
- h\_refcount
|
||||
- __le32
|
||||
- h_refcount
|
||||
- Reference count.
|
||||
* - 0x8
|
||||
- \_\_le32
|
||||
- h\_blocks
|
||||
- __le32
|
||||
- h_blocks
|
||||
- Number of disk blocks used.
|
||||
* - 0xC
|
||||
- \_\_le32
|
||||
- h\_hash
|
||||
- __le32
|
||||
- h_hash
|
||||
- Hash value of all attributes.
|
||||
* - 0x10
|
||||
- \_\_le32
|
||||
- h\_checksum
|
||||
- __le32
|
||||
- h_checksum
|
||||
- Checksum of the extended attribute block.
|
||||
* - 0x14
|
||||
- \_\_u32
|
||||
- h\_reserved[3]
|
||||
- __u32
|
||||
- h_reserved[3]
|
||||
- Zero.
|
||||
|
||||
The checksum is calculated against the FS UUID, the 64-bit block number
|
||||
@@ -100,46 +100,46 @@ Attributes stored inside an inode do not need be stored in sorted order.
|
||||
- Name
|
||||
- Description
|
||||
* - 0x0
|
||||
- \_\_u8
|
||||
- e\_name\_len
|
||||
- __u8
|
||||
- e_name_len
|
||||
- Length of name.
|
||||
* - 0x1
|
||||
- \_\_u8
|
||||
- e\_name\_index
|
||||
- __u8
|
||||
- e_name_index
|
||||
- Attribute name index. There is a discussion of this below.
|
||||
* - 0x2
|
||||
- \_\_le16
|
||||
- e\_value\_offs
|
||||
- __le16
|
||||
- e_value_offs
|
||||
- Location of this attribute's value on the disk block where it is stored.
|
||||
Multiple attributes can share the same value. For an inode attribute
|
||||
this value is relative to the start of the first entry; for a block this
|
||||
value is relative to the start of the block (i.e. the header).
|
||||
* - 0x4
|
||||
- \_\_le32
|
||||
- e\_value\_inum
|
||||
- __le32
|
||||
- e_value_inum
|
||||
- The inode where the value is stored. Zero indicates the value is in the
|
||||
same block as this entry. This field is only used if the
|
||||
INCOMPAT\_EA\_INODE feature is enabled.
|
||||
INCOMPAT_EA_INODE feature is enabled.
|
||||
* - 0x8
|
||||
- \_\_le32
|
||||
- e\_value\_size
|
||||
- __le32
|
||||
- e_value_size
|
||||
- Length of attribute value.
|
||||
* - 0xC
|
||||
- \_\_le32
|
||||
- e\_hash
|
||||
- __le32
|
||||
- e_hash
|
||||
- Hash value of attribute name and attribute value. The kernel doesn't
|
||||
update the hash for in-inode attributes, so for that case this value
|
||||
must be zero, because e2fsck validates any non-zero hash regardless of
|
||||
where the xattr lives.
|
||||
* - 0x10
|
||||
- char
|
||||
- e\_name[e\_name\_len]
|
||||
- e_name[e_name_len]
|
||||
- Attribute name. Does not include trailing NULL.
|
||||
|
||||
Attribute values can follow the end of the entry table. There appears to
|
||||
be a requirement that they be aligned to 4-byte boundaries. The values
|
||||
are stored starting at the end of the block and grow towards the
|
||||
xattr\_header/xattr\_entry table. When the two collide, the overflow is
|
||||
xattr_header/xattr_entry table. When the two collide, the overflow is
|
||||
put into a separate disk block. If the disk block fills up, the
|
||||
filesystem returns -ENOSPC.
|
||||
|
||||
@@ -167,15 +167,15 @@ the key name. Here is a map of name index values to key prefixes:
|
||||
* - 1
|
||||
- “user.”
|
||||
* - 2
|
||||
- “system.posix\_acl\_access”
|
||||
- “system.posix_acl_access”
|
||||
* - 3
|
||||
- “system.posix\_acl\_default”
|
||||
- “system.posix_acl_default”
|
||||
* - 4
|
||||
- “trusted.”
|
||||
* - 6
|
||||
- “security.”
|
||||
* - 7
|
||||
- “system.” (inline\_data only?)
|
||||
- “system.” (inline_data only?)
|
||||
* - 8
|
||||
- “system.richacl” (SuSE kernels only?)
|
||||
|
||||
|
||||
@@ -23,7 +23,7 @@ means that a block group addresses 32 gigabytes instead of 128 megabytes,
|
||||
also shrinking the amount of file system overhead for metadata.
|
||||
|
||||
The administrator can set a block cluster size at mkfs time (which is
|
||||
stored in the s\_log\_cluster\_size field in the superblock); from then
|
||||
stored in the s_log_cluster_size field in the superblock); from then
|
||||
on, the block bitmaps track clusters, not individual blocks. This means
|
||||
that block groups can be several gigabytes in size (instead of just
|
||||
128MiB); however, the minimum allocation unit becomes a cluster, not a
|
||||
|
||||
@@ -9,15 +9,15 @@ group.
|
||||
The inode bitmap records which entries in the inode table are in use.
|
||||
|
||||
As with most bitmaps, one bit represents the usage status of one data
|
||||
block or inode table entry. This implies a block group size of 8 \*
|
||||
number\_of\_bytes\_in\_a\_logical\_block.
|
||||
block or inode table entry. This implies a block group size of 8 *
|
||||
number_of_bytes_in_a_logical_block.
|
||||
|
||||
NOTE: If ``BLOCK_UNINIT`` is set for a given block group, various parts
|
||||
of the kernel and e2fsprogs code pretends that the block bitmap contains
|
||||
zeros (i.e. all blocks in the group are free). However, it is not
|
||||
necessarily the case that no blocks are in use -- if ``meta_bg`` is set,
|
||||
the bitmaps and group descriptor live inside the group. Unfortunately,
|
||||
ext2fs\_test\_block\_bitmap2() will return '0' for those locations,
|
||||
ext2fs_test_block_bitmap2() will return '0' for those locations,
|
||||
which produces confusing debugfs output.
|
||||
|
||||
Inode Table
|
||||
|
||||
@@ -56,39 +56,39 @@ established that the super block and the group descriptor table, if
|
||||
present, will be at the beginning of the block group. The bitmaps and
|
||||
the inode table can be anywhere, and it is quite possible for the
|
||||
bitmaps to come after the inode table, or for both to be in different
|
||||
groups (flex\_bg). Leftover space is used for file data blocks, indirect
|
||||
groups (flex_bg). Leftover space is used for file data blocks, indirect
|
||||
block maps, extent tree blocks, and extended attributes.
|
||||
|
||||
Flexible Block Groups
|
||||
---------------------
|
||||
|
||||
Starting in ext4, there is a new feature called flexible block groups
|
||||
(flex\_bg). In a flex\_bg, several block groups are tied together as one
|
||||
(flex_bg). In a flex_bg, several block groups are tied together as one
|
||||
logical block group; the bitmap spaces and the inode table space in the
|
||||
first block group of the flex\_bg are expanded to include the bitmaps
|
||||
and inode tables of all other block groups in the flex\_bg. For example,
|
||||
if the flex\_bg size is 4, then group 0 will contain (in order) the
|
||||
first block group of the flex_bg are expanded to include the bitmaps
|
||||
and inode tables of all other block groups in the flex_bg. For example,
|
||||
if the flex_bg size is 4, then group 0 will contain (in order) the
|
||||
superblock, group descriptors, data block bitmaps for groups 0-3, inode
|
||||
bitmaps for groups 0-3, inode tables for groups 0-3, and the remaining
|
||||
space in group 0 is for file data. The effect of this is to group the
|
||||
block group metadata close together for faster loading, and to enable
|
||||
large files to be continuous on disk. Backup copies of the superblock
|
||||
and group descriptors are always at the beginning of block groups, even
|
||||
if flex\_bg is enabled. The number of block groups that make up a
|
||||
flex\_bg is given by 2 ^ ``sb.s_log_groups_per_flex``.
|
||||
if flex_bg is enabled. The number of block groups that make up a
|
||||
flex_bg is given by 2 ^ ``sb.s_log_groups_per_flex``.
|
||||
|
||||
Meta Block Groups
|
||||
-----------------
|
||||
|
||||
Without the option META\_BG, for safety concerns, all block group
|
||||
Without the option META_BG, for safety concerns, all block group
|
||||
descriptors copies are kept in the first block group. Given the default
|
||||
128MiB(2^27 bytes) block group size and 64-byte group descriptors, ext4
|
||||
can have at most 2^27/64 = 2^21 block groups. This limits the entire
|
||||
filesystem size to 2^21 * 2^27 = 2^48bytes or 256TiB.
|
||||
|
||||
The solution to this problem is to use the metablock group feature
|
||||
(META\_BG), which is already in ext3 for all 2.6 releases. With the
|
||||
META\_BG feature, ext4 filesystems are partitioned into many metablock
|
||||
(META_BG), which is already in ext3 for all 2.6 releases. With the
|
||||
META_BG feature, ext4 filesystems are partitioned into many metablock
|
||||
groups. Each metablock group is a cluster of block groups whose group
|
||||
descriptor structures can be stored in a single disk block. For ext4
|
||||
filesystems with 4 KB block size, a single metablock group partition
|
||||
@@ -110,7 +110,7 @@ bytes, a meta-block group contains 32 block groups for filesystems with
|
||||
a 1KB block size, and 128 block groups for filesystems with a 4KB
|
||||
blocksize. Filesystems can either be created using this new block group
|
||||
descriptor layout, or existing filesystems can be resized on-line, and
|
||||
the field s\_first\_meta\_bg in the superblock will indicate the first
|
||||
the field s_first_meta_bg in the superblock will indicate the first
|
||||
block group using this new layout.
|
||||
|
||||
Please see an important note about ``BLOCK_UNINIT`` in the section about
|
||||
@@ -121,15 +121,15 @@ Lazy Block Group Initialization
|
||||
|
||||
A new feature for ext4 are three block group descriptor flags that
|
||||
enable mkfs to skip initializing other parts of the block group
|
||||
metadata. Specifically, the INODE\_UNINIT and BLOCK\_UNINIT flags mean
|
||||
metadata. Specifically, the INODE_UNINIT and BLOCK_UNINIT flags mean
|
||||
that the inode and block bitmaps for that group can be calculated and
|
||||
therefore the on-disk bitmap blocks are not initialized. This is
|
||||
generally the case for an empty block group or a block group containing
|
||||
only fixed-location block group metadata. The INODE\_ZEROED flag means
|
||||
only fixed-location block group metadata. The INODE_ZEROED flag means
|
||||
that the inode table has been initialized; mkfs will unset this flag and
|
||||
rely on the kernel to initialize the inode tables in the background.
|
||||
|
||||
By not writing zeroes to the bitmaps and inode table, mkfs time is
|
||||
reduced considerably. Note the feature flag is RO\_COMPAT\_GDT\_CSUM,
|
||||
but the dumpe2fs output prints this as “uninit\_bg”. They are the same
|
||||
reduced considerably. Note the feature flag is RO_COMPAT_GDT_CSUM,
|
||||
but the dumpe2fs output prints this as “uninit_bg”. They are the same
|
||||
thing.
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| i.i\_block Offset | Where It Points |
|
||||
| i.i_block Offset | Where It Points |
|
||||
+=====================+==============================================================================================================================================================================================================================+
|
||||
| 0 to 11 | Direct map to file blocks 0 to 11. |
|
||||
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||
|
||||
@@ -4,7 +4,7 @@ Checksums
|
||||
---------
|
||||
|
||||
Starting in early 2012, metadata checksums were added to all major ext4
|
||||
and jbd2 data structures. The associated feature flag is metadata\_csum.
|
||||
and jbd2 data structures. The associated feature flag is metadata_csum.
|
||||
The desired checksum algorithm is indicated in the superblock, though as
|
||||
of October 2012 the only supported algorithm is crc32c. Some data
|
||||
structures did not have space to fit a full 32-bit checksum, so only the
|
||||
@@ -20,7 +20,7 @@ encounters directory blocks that lack sufficient empty space to add a
|
||||
checksum, it will request that you run ``e2fsck -D`` to have the
|
||||
directories rebuilt with checksums. This has the added benefit of
|
||||
removing slack space from the directory files and rebalancing the htree
|
||||
indexes. If you \_ignore\_ this step, your directories will not be
|
||||
indexes. If you _ignore_ this step, your directories will not be
|
||||
protected by a checksum!
|
||||
|
||||
The following table describes the data elements that go into each type
|
||||
@@ -35,39 +35,39 @@ of checksum. The checksum function is whatever the superblock describes
|
||||
- Length
|
||||
- Ingredients
|
||||
* - Superblock
|
||||
- \_\_le32
|
||||
- __le32
|
||||
- The entire superblock up to the checksum field. The UUID lives inside
|
||||
the superblock.
|
||||
* - MMP
|
||||
- \_\_le32
|
||||
- __le32
|
||||
- UUID + the entire MMP block up to the checksum field.
|
||||
* - Extended Attributes
|
||||
- \_\_le32
|
||||
- __le32
|
||||
- UUID + the entire extended attribute block. The checksum field is set to
|
||||
zero.
|
||||
* - Directory Entries
|
||||
- \_\_le32
|
||||
- __le32
|
||||
- UUID + inode number + inode generation + the directory block up to the
|
||||
fake entry enclosing the checksum field.
|
||||
* - HTREE Nodes
|
||||
- \_\_le32
|
||||
- __le32
|
||||
- UUID + inode number + inode generation + all valid extents + HTREE tail.
|
||||
The checksum field is set to zero.
|
||||
* - Extents
|
||||
- \_\_le32
|
||||
- __le32
|
||||
- UUID + inode number + inode generation + the entire extent block up to
|
||||
the checksum field.
|
||||
* - Bitmaps
|
||||
- \_\_le32 or \_\_le16
|
||||
- __le32 or __le16
|
||||
- UUID + the entire bitmap. Checksums are stored in the group descriptor,
|
||||
and truncated if the group descriptor size is 32 bytes (i.e. ^64bit)
|
||||
* - Inodes
|
||||
- \_\_le32
|
||||
- __le32
|
||||
- UUID + inode number + inode generation + the entire inode. The checksum
|
||||
field is set to zero. Each inode has its own checksum.
|
||||
* - Group Descriptors
|
||||
- \_\_le16
|
||||
- If metadata\_csum, then UUID + group number + the entire descriptor;
|
||||
else if gdt\_csum, then crc16(UUID + group number + the entire
|
||||
- __le16
|
||||
- If metadata_csum, then UUID + group number + the entire descriptor;
|
||||
else if gdt_csum, then crc16(UUID + group number + the entire
|
||||
descriptor). In all cases, only the lower 16 bits are stored.
|
||||
|
||||
|
||||
@@ -42,24 +42,24 @@ is at most 263 bytes long, though on disk you'll need to reference
|
||||
- Name
|
||||
- Description
|
||||
* - 0x0
|
||||
- \_\_le32
|
||||
- __le32
|
||||
- inode
|
||||
- Number of the inode that this directory entry points to.
|
||||
* - 0x4
|
||||
- \_\_le16
|
||||
- rec\_len
|
||||
- __le16
|
||||
- rec_len
|
||||
- Length of this directory entry. Must be a multiple of 4.
|
||||
* - 0x6
|
||||
- \_\_le16
|
||||
- name\_len
|
||||
- __le16
|
||||
- name_len
|
||||
- Length of the file name.
|
||||
* - 0x8
|
||||
- char
|
||||
- name[EXT4\_NAME\_LEN]
|
||||
- name[EXT4_NAME_LEN]
|
||||
- File name.
|
||||
|
||||
Since file names cannot be longer than 255 bytes, the new directory
|
||||
entry format shortens the name\_len field and uses the space for a file
|
||||
entry format shortens the name_len field and uses the space for a file
|
||||
type flag, probably to avoid having to load every inode during directory
|
||||
tree traversal. This format is ``ext4_dir_entry_2``, which is at most
|
||||
263 bytes long, though on disk you'll need to reference
|
||||
@@ -74,24 +74,24 @@ tree traversal. This format is ``ext4_dir_entry_2``, which is at most
|
||||
- Name
|
||||
- Description
|
||||
* - 0x0
|
||||
- \_\_le32
|
||||
- __le32
|
||||
- inode
|
||||
- Number of the inode that this directory entry points to.
|
||||
* - 0x4
|
||||
- \_\_le16
|
||||
- rec\_len
|
||||
- __le16
|
||||
- rec_len
|
||||
- Length of this directory entry.
|
||||
* - 0x6
|
||||
- \_\_u8
|
||||
- name\_len
|
||||
- __u8
|
||||
- name_len
|
||||
- Length of the file name.
|
||||
* - 0x7
|
||||
- \_\_u8
|
||||
- file\_type
|
||||
- __u8
|
||||
- file_type
|
||||
- File type code, see ftype_ table below.
|
||||
* - 0x8
|
||||
- char
|
||||
- name[EXT4\_NAME\_LEN]
|
||||
- name[EXT4_NAME_LEN]
|
||||
- File name.
|
||||
|
||||
.. _ftype:
|
||||
@@ -137,19 +137,19 @@ entry uses this extension, it may be up to 271 bytes.
|
||||
- Name
|
||||
- Description
|
||||
* - 0x0
|
||||
- \_\_le32
|
||||
- __le32
|
||||
- hash
|
||||
- The hash of the directory name
|
||||
* - 0x4
|
||||
- \_\_le32
|
||||
- minor\_hash
|
||||
- __le32
|
||||
- minor_hash
|
||||
- The minor hash of the directory name
|
||||
|
||||
|
||||
In order to add checksums to these classic directory blocks, a phony
|
||||
``struct ext4_dir_entry`` is placed at the end of each leaf block to
|
||||
hold the checksum. The directory entry is 12 bytes long. The inode
|
||||
number and name\_len fields are set to zero to fool old software into
|
||||
number and name_len fields are set to zero to fool old software into
|
||||
ignoring an apparently empty directory entry, and the checksum is stored
|
||||
in the place where the name normally goes. The structure is
|
||||
``struct ext4_dir_entry_tail``:
|
||||
@@ -163,24 +163,24 @@ in the place where the name normally goes. The structure is
|
||||
- Name
|
||||
- Description
|
||||
* - 0x0
|
||||
- \_\_le32
|
||||
- det\_reserved\_zero1
|
||||
- __le32
|
||||
- det_reserved_zero1
|
||||
- Inode number, which must be zero.
|
||||
* - 0x4
|
||||
- \_\_le16
|
||||
- det\_rec\_len
|
||||
- __le16
|
||||
- det_rec_len
|
||||
- Length of this directory entry, which must be 12.
|
||||
* - 0x6
|
||||
- \_\_u8
|
||||
- det\_reserved\_zero2
|
||||
- __u8
|
||||
- det_reserved_zero2
|
||||
- Length of the file name, which must be zero.
|
||||
* - 0x7
|
||||
- \_\_u8
|
||||
- det\_reserved\_ft
|
||||
- __u8
|
||||
- det_reserved_ft
|
||||
- File type, which must be 0xDE.
|
||||
* - 0x8
|
||||
- \_\_le32
|
||||
- det\_checksum
|
||||
- __le32
|
||||
- det_checksum
|
||||
- Directory leaf block checksum.
|
||||
|
||||
The leaf directory block checksum is calculated against the FS UUID, the
|
||||
@@ -194,7 +194,7 @@ Hash Tree Directories
|
||||
A linear array of directory entries isn't great for performance, so a
|
||||
new feature was added to ext3 to provide a faster (but peculiar)
|
||||
balanced tree keyed off a hash of the directory entry name. If the
|
||||
EXT4\_INDEX\_FL (0x1000) flag is set in the inode, this directory uses a
|
||||
EXT4_INDEX_FL (0x1000) flag is set in the inode, this directory uses a
|
||||
hashed btree (htree) to organize and find directory entries. For
|
||||
backwards read-only compatibility with ext2, this tree is actually
|
||||
hidden inside the directory file, masquerading as “empty” directory data
|
||||
@@ -206,14 +206,14 @@ rest of the directory block is empty so that it moves on.
|
||||
The root of the tree always lives in the first data block of the
|
||||
directory. By ext2 custom, the '.' and '..' entries must appear at the
|
||||
beginning of this first block, so they are put here as two
|
||||
``struct ext4_dir_entry_2``\ s and not stored in the tree. The rest of
|
||||
``struct ext4_dir_entry_2`` s and not stored in the tree. The rest of
|
||||
the root node contains metadata about the tree and finally a hash->block
|
||||
map to find nodes that are lower in the htree. If
|
||||
``dx_root.info.indirect_levels`` is non-zero then the htree has two
|
||||
levels; the data block pointed to by the root node's map is an interior
|
||||
node, which is indexed by a minor hash. Interior nodes in this tree
|
||||
contains a zeroed out ``struct ext4_dir_entry_2`` followed by a
|
||||
minor\_hash->block map to find leafe nodes. Leaf nodes contain a linear
|
||||
minor_hash->block map to find leafe nodes. Leaf nodes contain a linear
|
||||
array of all ``struct ext4_dir_entry_2``; all of these entries
|
||||
(presumably) hash to the same value. If there is an overflow, the
|
||||
entries simply overflow into the next leaf node, and the
|
||||
@@ -245,83 +245,83 @@ of a data block:
|
||||
- Name
|
||||
- Description
|
||||
* - 0x0
|
||||
- \_\_le32
|
||||
- __le32
|
||||
- dot.inode
|
||||
- inode number of this directory.
|
||||
* - 0x4
|
||||
- \_\_le16
|
||||
- dot.rec\_len
|
||||
- __le16
|
||||
- dot.rec_len
|
||||
- Length of this record, 12.
|
||||
* - 0x6
|
||||
- u8
|
||||
- dot.name\_len
|
||||
- dot.name_len
|
||||
- Length of the name, 1.
|
||||
* - 0x7
|
||||
- u8
|
||||
- dot.file\_type
|
||||
- dot.file_type
|
||||
- File type of this entry, 0x2 (directory) (if the feature flag is set).
|
||||
* - 0x8
|
||||
- char
|
||||
- dot.name[4]
|
||||
- “.\\0\\0\\0”
|
||||
- “.\0\0\0”
|
||||
* - 0xC
|
||||
- \_\_le32
|
||||
- __le32
|
||||
- dotdot.inode
|
||||
- inode number of parent directory.
|
||||
* - 0x10
|
||||
- \_\_le16
|
||||
- dotdot.rec\_len
|
||||
- block\_size - 12. The record length is long enough to cover all htree
|
||||
- __le16
|
||||
- dotdot.rec_len
|
||||
- block_size - 12. The record length is long enough to cover all htree
|
||||
data.
|
||||
* - 0x12
|
||||
- u8
|
||||
- dotdot.name\_len
|
||||
- dotdot.name_len
|
||||
- Length of the name, 2.
|
||||
* - 0x13
|
||||
- u8
|
||||
- dotdot.file\_type
|
||||
- dotdot.file_type
|
||||
- File type of this entry, 0x2 (directory) (if the feature flag is set).
|
||||
* - 0x14
|
||||
- char
|
||||
- dotdot\_name[4]
|
||||
- “..\\0\\0”
|
||||
- dotdot_name[4]
|
||||
- “..\0\0”
|
||||
* - 0x18
|
||||
- \_\_le32
|
||||
- struct dx\_root\_info.reserved\_zero
|
||||
- __le32
|
||||
- struct dx_root_info.reserved_zero
|
||||
- Zero.
|
||||
* - 0x1C
|
||||
- u8
|
||||
- struct dx\_root\_info.hash\_version
|
||||
- struct dx_root_info.hash_version
|
||||
- Hash type, see dirhash_ table below.
|
||||
* - 0x1D
|
||||
- u8
|
||||
- struct dx\_root\_info.info\_length
|
||||
- struct dx_root_info.info_length
|
||||
- Length of the tree information, 0x8.
|
||||
* - 0x1E
|
||||
- u8
|
||||
- struct dx\_root\_info.indirect\_levels
|
||||
- Depth of the htree. Cannot be larger than 3 if the INCOMPAT\_LARGEDIR
|
||||
- struct dx_root_info.indirect_levels
|
||||
- Depth of the htree. Cannot be larger than 3 if the INCOMPAT_LARGEDIR
|
||||
feature is set; cannot be larger than 2 otherwise.
|
||||
* - 0x1F
|
||||
- u8
|
||||
- struct dx\_root\_info.unused\_flags
|
||||
- struct dx_root_info.unused_flags
|
||||
-
|
||||
* - 0x20
|
||||
- \_\_le16
|
||||
- __le16
|
||||
- limit
|
||||
- Maximum number of dx\_entries that can follow this header, plus 1 for
|
||||
- Maximum number of dx_entries that can follow this header, plus 1 for
|
||||
the header itself.
|
||||
* - 0x22
|
||||
- \_\_le16
|
||||
- __le16
|
||||
- count
|
||||
- Actual number of dx\_entries that follow this header, plus 1 for the
|
||||
- Actual number of dx_entries that follow this header, plus 1 for the
|
||||
header itself.
|
||||
* - 0x24
|
||||
- \_\_le32
|
||||
- __le32
|
||||
- block
|
||||
- The block number (within the directory file) that goes with hash=0.
|
||||
* - 0x28
|
||||
- struct dx\_entry
|
||||
- struct dx_entry
|
||||
- entries[0]
|
||||
- As many 8-byte ``struct dx_entry`` as fits in the rest of the data block.
|
||||
|
||||
@@ -362,38 +362,38 @@ also the full length of a data block:
|
||||
- Name
|
||||
- Description
|
||||
* - 0x0
|
||||
- \_\_le32
|
||||
- __le32
|
||||
- fake.inode
|
||||
- Zero, to make it look like this entry is not in use.
|
||||
* - 0x4
|
||||
- \_\_le16
|
||||
- fake.rec\_len
|
||||
- The size of the block, in order to hide all of the dx\_node data.
|
||||
- __le16
|
||||
- fake.rec_len
|
||||
- The size of the block, in order to hide all of the dx_node data.
|
||||
* - 0x6
|
||||
- u8
|
||||
- name\_len
|
||||
- name_len
|
||||
- Zero. There is no name for this “unused” directory entry.
|
||||
* - 0x7
|
||||
- u8
|
||||
- file\_type
|
||||
- file_type
|
||||
- Zero. There is no file type for this “unused” directory entry.
|
||||
* - 0x8
|
||||
- \_\_le16
|
||||
- __le16
|
||||
- limit
|
||||
- Maximum number of dx\_entries that can follow this header, plus 1 for
|
||||
- Maximum number of dx_entries that can follow this header, plus 1 for
|
||||
the header itself.
|
||||
* - 0xA
|
||||
- \_\_le16
|
||||
- __le16
|
||||
- count
|
||||
- Actual number of dx\_entries that follow this header, plus 1 for the
|
||||
- Actual number of dx_entries that follow this header, plus 1 for the
|
||||
header itself.
|
||||
* - 0xE
|
||||
- \_\_le32
|
||||
- __le32
|
||||
- block
|
||||
- The block number (within the directory file) that goes with the lowest
|
||||
hash value of this block. This value is stored in the parent block.
|
||||
* - 0x12
|
||||
- struct dx\_entry
|
||||
- struct dx_entry
|
||||
- entries[0]
|
||||
- As many 8-byte ``struct dx_entry`` as fits in the rest of the data block.
|
||||
|
||||
@@ -410,11 +410,11 @@ long:
|
||||
- Name
|
||||
- Description
|
||||
* - 0x0
|
||||
- \_\_le32
|
||||
- __le32
|
||||
- hash
|
||||
- Hash code.
|
||||
* - 0x4
|
||||
- \_\_le32
|
||||
- __le32
|
||||
- block
|
||||
- Block number (within the directory file, not filesystem blocks) of the
|
||||
next node in the htree.
|
||||
@@ -423,13 +423,13 @@ long:
|
||||
author.)
|
||||
|
||||
If metadata checksums are enabled, the last 8 bytes of the directory
|
||||
block (precisely the length of one dx\_entry) are used to store a
|
||||
block (precisely the length of one dx_entry) are used to store a
|
||||
``struct dx_tail``, which contains the checksum. The ``limit`` and
|
||||
``count`` entries in the dx\_root/dx\_node structures are adjusted as
|
||||
necessary to fit the dx\_tail into the block. If there is no space for
|
||||
the dx\_tail, the user is notified to run e2fsck -D to rebuild the
|
||||
``count`` entries in the dx_root/dx_node structures are adjusted as
|
||||
necessary to fit the dx_tail into the block. If there is no space for
|
||||
the dx_tail, the user is notified to run e2fsck -D to rebuild the
|
||||
directory index (which will ensure that there's space for the checksum.
|
||||
The dx\_tail structure is 8 bytes long and looks like this:
|
||||
The dx_tail structure is 8 bytes long and looks like this:
|
||||
|
||||
.. list-table::
|
||||
:widths: 8 8 24 40
|
||||
@@ -441,13 +441,13 @@ The dx\_tail structure is 8 bytes long and looks like this:
|
||||
- Description
|
||||
* - 0x0
|
||||
- u32
|
||||
- dt\_reserved
|
||||
- dt_reserved
|
||||
- Zero.
|
||||
* - 0x4
|
||||
- \_\_le32
|
||||
- dt\_checksum
|
||||
- __le32
|
||||
- dt_checksum
|
||||
- Checksum of the htree directory block.
|
||||
|
||||
The checksum is calculated against the FS UUID, the htree index header
|
||||
(dx\_root or dx\_node), all of the htree indices (dx\_entry) that are in
|
||||
use, and the tail block (dx\_tail).
|
||||
(dx_root or dx_node), all of the htree indices (dx_entry) that are in
|
||||
use, and the tail block (dx_tail).
|
||||
|
||||
@@ -5,14 +5,14 @@ Large Extended Attribute Values
|
||||
|
||||
To enable ext4 to store extended attribute values that do not fit in the
|
||||
inode or in the single extended attribute block attached to an inode,
|
||||
the EA\_INODE feature allows us to store the value in the data blocks of
|
||||
the EA_INODE feature allows us to store the value in the data blocks of
|
||||
a regular file inode. This “EA inode” is linked only from the extended
|
||||
attribute name index and must not appear in a directory entry. The
|
||||
inode's i\_atime field is used to store a checksum of the xattr value;
|
||||
and i\_ctime/i\_version store a 64-bit reference count, which enables
|
||||
inode's i_atime field is used to store a checksum of the xattr value;
|
||||
and i_ctime/i_version store a 64-bit reference count, which enables
|
||||
sharing of large xattr values between multiple owning inodes. For
|
||||
backward compatibility with older versions of this feature, the
|
||||
i\_mtime/i\_generation *may* store a back-reference to the inode number
|
||||
and i\_generation of the **one** owning inode (in cases where the EA
|
||||
i_mtime/i_generation *may* store a back-reference to the inode number
|
||||
and i_generation of the **one** owning inode (in cases where the EA
|
||||
inode is not referenced by multiple inodes) to verify that the EA inode
|
||||
is the correct one being accessed.
|
||||
|
||||
@@ -7,34 +7,34 @@ Each block group on the filesystem has one of these descriptors
|
||||
associated with it. As noted in the Layout section above, the group
|
||||
descriptors (if present) are the second item in the block group. The
|
||||
standard configuration is for each block group to contain a full copy of
|
||||
the block group descriptor table unless the sparse\_super feature flag
|
||||
the block group descriptor table unless the sparse_super feature flag
|
||||
is set.
|
||||
|
||||
Notice how the group descriptor records the location of both bitmaps and
|
||||
the inode table (i.e. they can float). This means that within a block
|
||||
group, the only data structures with fixed locations are the superblock
|
||||
and the group descriptor table. The flex\_bg mechanism uses this
|
||||
and the group descriptor table. The flex_bg mechanism uses this
|
||||
property to group several block groups into a flex group and lay out all
|
||||
of the groups' bitmaps and inode tables into one long run in the first
|
||||
group of the flex group.
|
||||
|
||||
If the meta\_bg feature flag is set, then several block groups are
|
||||
grouped together into a meta group. Note that in the meta\_bg case,
|
||||
If the meta_bg feature flag is set, then several block groups are
|
||||
grouped together into a meta group. Note that in the meta_bg case,
|
||||
however, the first and last two block groups within the larger meta
|
||||
group contain only group descriptors for the groups inside the meta
|
||||
group.
|
||||
|
||||
flex\_bg and meta\_bg do not appear to be mutually exclusive features.
|
||||
flex_bg and meta_bg do not appear to be mutually exclusive features.
|
||||
|
||||
In ext2, ext3, and ext4 (when the 64bit feature is not enabled), the
|
||||
block group descriptor was only 32 bytes long and therefore ends at
|
||||
bg\_checksum. On an ext4 filesystem with the 64bit feature enabled, the
|
||||
bg_checksum. On an ext4 filesystem with the 64bit feature enabled, the
|
||||
block group descriptor expands to at least the 64 bytes described below;
|
||||
the size is stored in the superblock.
|
||||
|
||||
If gdt\_csum is set and metadata\_csum is not set, the block group
|
||||
If gdt_csum is set and metadata_csum is not set, the block group
|
||||
checksum is the crc16 of the FS UUID, the group number, and the group
|
||||
descriptor structure. If metadata\_csum is set, then the block group
|
||||
descriptor structure. If metadata_csum is set, then the block group
|
||||
checksum is the lower 16 bits of the checksum of the FS UUID, the group
|
||||
number, and the group descriptor structure. Both block and inode bitmap
|
||||
checksums are calculated against the FS UUID, the group number, and the
|
||||
@@ -51,59 +51,59 @@ The block group descriptor is laid out in ``struct ext4_group_desc``.
|
||||
- Name
|
||||
- Description
|
||||
* - 0x0
|
||||
- \_\_le32
|
||||
- bg\_block\_bitmap\_lo
|
||||
- __le32
|
||||
- bg_block_bitmap_lo
|
||||
- Lower 32-bits of location of block bitmap.
|
||||
* - 0x4
|
||||
- \_\_le32
|
||||
- bg\_inode\_bitmap\_lo
|
||||
- __le32
|
||||
- bg_inode_bitmap_lo
|
||||
- Lower 32-bits of location of inode bitmap.
|
||||
* - 0x8
|
||||
- \_\_le32
|
||||
- bg\_inode\_table\_lo
|
||||
- __le32
|
||||
- bg_inode_table_lo
|
||||
- Lower 32-bits of location of inode table.
|
||||
* - 0xC
|
||||
- \_\_le16
|
||||
- bg\_free\_blocks\_count\_lo
|
||||
- __le16
|
||||
- bg_free_blocks_count_lo
|
||||
- Lower 16-bits of free block count.
|
||||
* - 0xE
|
||||
- \_\_le16
|
||||
- bg\_free\_inodes\_count\_lo
|
||||
- __le16
|
||||
- bg_free_inodes_count_lo
|
||||
- Lower 16-bits of free inode count.
|
||||
* - 0x10
|
||||
- \_\_le16
|
||||
- bg\_used\_dirs\_count\_lo
|
||||
- __le16
|
||||
- bg_used_dirs_count_lo
|
||||
- Lower 16-bits of directory count.
|
||||
* - 0x12
|
||||
- \_\_le16
|
||||
- bg\_flags
|
||||
- __le16
|
||||
- bg_flags
|
||||
- Block group flags. See the bgflags_ table below.
|
||||
* - 0x14
|
||||
- \_\_le32
|
||||
- bg\_exclude\_bitmap\_lo
|
||||
- __le32
|
||||
- bg_exclude_bitmap_lo
|
||||
- Lower 32-bits of location of snapshot exclusion bitmap.
|
||||
* - 0x18
|
||||
- \_\_le16
|
||||
- bg\_block\_bitmap\_csum\_lo
|
||||
- __le16
|
||||
- bg_block_bitmap_csum_lo
|
||||
- Lower 16-bits of the block bitmap checksum.
|
||||
* - 0x1A
|
||||
- \_\_le16
|
||||
- bg\_inode\_bitmap\_csum\_lo
|
||||
- __le16
|
||||
- bg_inode_bitmap_csum_lo
|
||||
- Lower 16-bits of the inode bitmap checksum.
|
||||
* - 0x1C
|
||||
- \_\_le16
|
||||
- bg\_itable\_unused\_lo
|
||||
- __le16
|
||||
- bg_itable_unused_lo
|
||||
- Lower 16-bits of unused inode count. If set, we needn't scan past the
|
||||
``(sb.s_inodes_per_group - gdt.bg_itable_unused)``\ th entry in the
|
||||
``(sb.s_inodes_per_group - gdt.bg_itable_unused)`` th entry in the
|
||||
inode table for this group.
|
||||
* - 0x1E
|
||||
- \_\_le16
|
||||
- bg\_checksum
|
||||
- Group descriptor checksum; crc16(sb\_uuid+group\_num+bg\_desc) if the
|
||||
RO\_COMPAT\_GDT\_CSUM feature is set, or
|
||||
crc32c(sb\_uuid+group\_num+bg\_desc) & 0xFFFF if the
|
||||
RO\_COMPAT\_METADATA\_CSUM feature is set. The bg\_checksum
|
||||
field in bg\_desc is skipped when calculating crc16 checksum,
|
||||
- __le16
|
||||
- bg_checksum
|
||||
- Group descriptor checksum; crc16(sb_uuid+group_num+bg_desc) if the
|
||||
RO_COMPAT_GDT_CSUM feature is set, or
|
||||
crc32c(sb_uuid+group_num+bg_desc) & 0xFFFF if the
|
||||
RO_COMPAT_METADATA_CSUM feature is set. The bg_checksum
|
||||
field in bg_desc is skipped when calculating crc16 checksum,
|
||||
and set to zero if crc32c checksum is used.
|
||||
* -
|
||||
-
|
||||
@@ -111,48 +111,48 @@ The block group descriptor is laid out in ``struct ext4_group_desc``.
|
||||
- These fields only exist if the 64bit feature is enabled and s_desc_size
|
||||
> 32.
|
||||
* - 0x20
|
||||
- \_\_le32
|
||||
- bg\_block\_bitmap\_hi
|
||||
- __le32
|
||||
- bg_block_bitmap_hi
|
||||
- Upper 32-bits of location of block bitmap.
|
||||
* - 0x24
|
||||
- \_\_le32
|
||||
- bg\_inode\_bitmap\_hi
|
||||
- __le32
|
||||
- bg_inode_bitmap_hi
|
||||
- Upper 32-bits of location of inodes bitmap.
|
||||
* - 0x28
|
||||
- \_\_le32
|
||||
- bg\_inode\_table\_hi
|
||||
- __le32
|
||||
- bg_inode_table_hi
|
||||
- Upper 32-bits of location of inodes table.
|
||||
* - 0x2C
|
||||
- \_\_le16
|
||||
- bg\_free\_blocks\_count\_hi
|
||||
- __le16
|
||||
- bg_free_blocks_count_hi
|
||||
- Upper 16-bits of free block count.
|
||||
* - 0x2E
|
||||
- \_\_le16
|
||||
- bg\_free\_inodes\_count\_hi
|
||||
- __le16
|
||||
- bg_free_inodes_count_hi
|
||||
- Upper 16-bits of free inode count.
|
||||
* - 0x30
|
||||
- \_\_le16
|
||||
- bg\_used\_dirs\_count\_hi
|
||||
- __le16
|
||||
- bg_used_dirs_count_hi
|
||||
- Upper 16-bits of directory count.
|
||||
* - 0x32
|
||||
- \_\_le16
|
||||
- bg\_itable\_unused\_hi
|
||||
- __le16
|
||||
- bg_itable_unused_hi
|
||||
- Upper 16-bits of unused inode count.
|
||||
* - 0x34
|
||||
- \_\_le32
|
||||
- bg\_exclude\_bitmap\_hi
|
||||
- __le32
|
||||
- bg_exclude_bitmap_hi
|
||||
- Upper 32-bits of location of snapshot exclusion bitmap.
|
||||
* - 0x38
|
||||
- \_\_le16
|
||||
- bg\_block\_bitmap\_csum\_hi
|
||||
- __le16
|
||||
- bg_block_bitmap_csum_hi
|
||||
- Upper 16-bits of the block bitmap checksum.
|
||||
* - 0x3A
|
||||
- \_\_le16
|
||||
- bg\_inode\_bitmap\_csum\_hi
|
||||
- __le16
|
||||
- bg_inode_bitmap_csum_hi
|
||||
- Upper 16-bits of the inode bitmap checksum.
|
||||
* - 0x3C
|
||||
- \_\_u32
|
||||
- bg\_reserved
|
||||
- __u32
|
||||
- bg_reserved
|
||||
- Padding to 64 bytes.
|
||||
|
||||
.. _bgflags:
|
||||
@@ -166,8 +166,8 @@ Block group flags can be any combination of the following:
|
||||
* - Value
|
||||
- Description
|
||||
* - 0x1
|
||||
- inode table and bitmap are not initialized (EXT4\_BG\_INODE\_UNINIT).
|
||||
- inode table and bitmap are not initialized (EXT4_BG_INODE_UNINIT).
|
||||
* - 0x2
|
||||
- block bitmap is not initialized (EXT4\_BG\_BLOCK\_UNINIT).
|
||||
- block bitmap is not initialized (EXT4_BG_BLOCK_UNINIT).
|
||||
* - 0x4
|
||||
- inode table is zeroed (EXT4\_BG\_INODE\_ZEROED).
|
||||
- inode table is zeroed (EXT4_BG_INODE_ZEROED).
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
The Contents of inode.i\_block
|
||||
The Contents of inode.i_block
|
||||
------------------------------
|
||||
|
||||
Depending on the type of file an inode describes, the 60 bytes of
|
||||
@@ -47,7 +47,7 @@ In ext4, the file to logical block map has been replaced with an extent
|
||||
tree. Under the old scheme, allocating a contiguous run of 1,000 blocks
|
||||
requires an indirect block to map all 1,000 entries; with extents, the
|
||||
mapping is reduced to a single ``struct ext4_extent`` with
|
||||
``ee_len = 1000``. If flex\_bg is enabled, it is possible to allocate
|
||||
``ee_len = 1000``. If flex_bg is enabled, it is possible to allocate
|
||||
very large files with a single extent, at a considerable reduction in
|
||||
metadata block use, and some improvement in disk efficiency. The inode
|
||||
must have the extents flag (0x80000) flag set for this feature to be in
|
||||
@@ -76,28 +76,28 @@ which is 12 bytes long:
|
||||
- Name
|
||||
- Description
|
||||
* - 0x0
|
||||
- \_\_le16
|
||||
- eh\_magic
|
||||
- __le16
|
||||
- eh_magic
|
||||
- Magic number, 0xF30A.
|
||||
* - 0x2
|
||||
- \_\_le16
|
||||
- eh\_entries
|
||||
- __le16
|
||||
- eh_entries
|
||||
- Number of valid entries following the header.
|
||||
* - 0x4
|
||||
- \_\_le16
|
||||
- eh\_max
|
||||
- __le16
|
||||
- eh_max
|
||||
- Maximum number of entries that could follow the header.
|
||||
* - 0x6
|
||||
- \_\_le16
|
||||
- eh\_depth
|
||||
- __le16
|
||||
- eh_depth
|
||||
- Depth of this extent node in the extent tree. 0 = this extent node
|
||||
points to data blocks; otherwise, this extent node points to other
|
||||
extent nodes. The extent tree can be at most 5 levels deep: a logical
|
||||
block number can be at most ``2^32``, and the smallest ``n`` that
|
||||
satisfies ``4*(((blocksize - 12)/12)^n) >= 2^32`` is 5.
|
||||
* - 0x8
|
||||
- \_\_le32
|
||||
- eh\_generation
|
||||
- __le32
|
||||
- eh_generation
|
||||
- Generation of the tree. (Used by Lustre, but not standard ext4).
|
||||
|
||||
Internal nodes of the extent tree, also known as index nodes, are
|
||||
@@ -112,22 +112,22 @@ recorded as ``struct ext4_extent_idx``, and are 12 bytes long:
|
||||
- Name
|
||||
- Description
|
||||
* - 0x0
|
||||
- \_\_le32
|
||||
- ei\_block
|
||||
- __le32
|
||||
- ei_block
|
||||
- This index node covers file blocks from 'block' onward.
|
||||
* - 0x4
|
||||
- \_\_le32
|
||||
- ei\_leaf\_lo
|
||||
- __le32
|
||||
- ei_leaf_lo
|
||||
- Lower 32-bits of the block number of the extent node that is the next
|
||||
level lower in the tree. The tree node pointed to can be either another
|
||||
internal node or a leaf node, described below.
|
||||
* - 0x8
|
||||
- \_\_le16
|
||||
- ei\_leaf\_hi
|
||||
- __le16
|
||||
- ei_leaf_hi
|
||||
- Upper 16-bits of the previous field.
|
||||
* - 0xA
|
||||
- \_\_u16
|
||||
- ei\_unused
|
||||
- __u16
|
||||
- ei_unused
|
||||
-
|
||||
|
||||
Leaf nodes of the extent tree are recorded as ``struct ext4_extent``,
|
||||
@@ -142,24 +142,24 @@ and are also 12 bytes long:
|
||||
- Name
|
||||
- Description
|
||||
* - 0x0
|
||||
- \_\_le32
|
||||
- ee\_block
|
||||
- __le32
|
||||
- ee_block
|
||||
- First file block number that this extent covers.
|
||||
* - 0x4
|
||||
- \_\_le16
|
||||
- ee\_len
|
||||
- __le16
|
||||
- ee_len
|
||||
- Number of blocks covered by extent. If the value of this field is <=
|
||||
32768, the extent is initialized. If the value of the field is > 32768,
|
||||
the extent is uninitialized and the actual extent length is ``ee_len`` -
|
||||
32768. Therefore, the maximum length of a initialized extent is 32768
|
||||
blocks, and the maximum length of an uninitialized extent is 32767.
|
||||
* - 0x6
|
||||
- \_\_le16
|
||||
- ee\_start\_hi
|
||||
- __le16
|
||||
- ee_start_hi
|
||||
- Upper 16-bits of the block number to which this extent points.
|
||||
* - 0x8
|
||||
- \_\_le32
|
||||
- ee\_start\_lo
|
||||
- __le32
|
||||
- ee_start_lo
|
||||
- Lower 32-bits of the block number to which this extent points.
|
||||
|
||||
Prior to the introduction of metadata checksums, the extent header +
|
||||
@@ -182,8 +182,8 @@ including) the checksum itself.
|
||||
- Name
|
||||
- Description
|
||||
* - 0x0
|
||||
- \_\_le32
|
||||
- eb\_checksum
|
||||
- __le32
|
||||
- eb_checksum
|
||||
- Checksum of the extent block, crc32c(uuid+inum+igeneration+extentblock)
|
||||
|
||||
Inline Data
|
||||
|
||||
@@ -11,12 +11,12 @@ file is smaller than 60 bytes, then the data are stored inline in
|
||||
attribute space, then it might be found as an extended attribute
|
||||
“system.data” within the inode body (“ibody EA”). This of course
|
||||
constrains the amount of extended attributes one can attach to an inode.
|
||||
If the data size increases beyond i\_block + ibody EA, a regular block
|
||||
If the data size increases beyond i_block + ibody EA, a regular block
|
||||
is allocated and the contents moved to that block.
|
||||
|
||||
Pending a change to compact the extended attribute key used to store
|
||||
inline data, one ought to be able to store 160 bytes of data in a
|
||||
256-byte inode (as of June 2015, when i\_extra\_isize is 28). Prior to
|
||||
256-byte inode (as of June 2015, when i_extra_isize is 28). Prior to
|
||||
that, the limit was 156 bytes due to inefficient use of inode space.
|
||||
|
||||
The inline data feature requires the presence of an extended attribute
|
||||
@@ -25,12 +25,12 @@ for “system.data”, even if the attribute value is zero length.
|
||||
Inline Directories
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The first four bytes of i\_block are the inode number of the parent
|
||||
The first four bytes of i_block are the inode number of the parent
|
||||
directory. Following that is a 56-byte space for an array of directory
|
||||
entries; see ``struct ext4_dir_entry``. If there is a “system.data”
|
||||
attribute in the inode body, the EA value is an array of
|
||||
``struct ext4_dir_entry`` as well. Note that for inline directories, the
|
||||
i\_block and EA space are treated as separate dirent blocks; directory
|
||||
i_block and EA space are treated as separate dirent blocks; directory
|
||||
entries cannot span the two.
|
||||
|
||||
Inline directory entries are not checksummed, as the inode checksum
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user