docs: convert docs to ReST and rename to *.rst

The conversion is actually:
  - add blank lines and indentation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This commit is contained in:
Mauro Carvalho Chehab
2019-06-12 14:52:43 -03:00
committed by Jonathan Corbet
parent 8ea618899b
commit f0ba43774c
36 changed files with 1124 additions and 796 deletions
@@ -1,3 +1,4 @@
=============================
Guidance for writing policies
=============================
@@ -30,7 +31,7 @@ multiqueue (mq)
This policy is now an alias for smq (see below).
The following tunables are accepted, but have no effect:
The following tunables are accepted, but have no effect::
'sequential_threshold <#nr_sequential_ios>'
'random_threshold <#nr_random_ios>'
@@ -56,7 +57,9 @@ mq policy's hints to be dropped. Also, performance of the cache may
degrade slightly until smq recalculates the origin device's hotspots
that should be cached.
Memory usage:
Memory usage
^^^^^^^^^^^^
The mq policy used a lot of memory; 88 bytes per cache block on a 64
bit machine.
@@ -69,7 +72,9 @@ cache block).
All this means smq uses ~25bytes per cache block. Still a lot of
memory, but a substantial improvement nontheless.
Level balancing:
Level balancing
^^^^^^^^^^^^^^^
mq placed entries in different levels of the multiqueue structures
based on their hit count (~ln(hit count)). This meant the bottom
levels generally had the most entries, and the top ones had very
@@ -94,7 +99,9 @@ is used to decide which blocks to promote. If the hotspot queue is
performing badly then it starts moving entries more quickly between
levels. This lets it adapt to new IO patterns very quickly.
Performance:
Performance
^^^^^^^^^^^
Testing smq shows substantially better performance than mq.
cleaner
@@ -105,16 +112,19 @@ The cleaner writes back all dirty blocks in a cache to decommission it.
Examples
========
The syntax for a table is:
The syntax for a table is::
cache <metadata dev> <cache dev> <origin dev> <block size>
<#feature_args> [<feature arg>]*
<policy> <#policy_args> [<policy arg>]*
The syntax to send a message using the dmsetup command is:
The syntax to send a message using the dmsetup command is::
dmsetup message <mapped device> 0 sequential_threshold 1024
dmsetup message <mapped device> 0 random_threshold 8
Using dmsetup:
Using dmsetup::
dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \
/dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8"
creates a 128GB large mapped device named 'blah' with the
@@ -1,3 +1,7 @@
=====
Cache
=====
Introduction
============
@@ -24,10 +28,13 @@ scenarios (eg. a vm image server).
Glossary
========
Migration - Movement of the primary copy of a logical block from one
Migration
Movement of the primary copy of a logical block from one
device to the other.
Promotion - Migration from slow device to fast device.
Demotion - Migration from fast device to slow device.
Promotion
Migration from slow device to fast device.
Demotion
Migration from fast device to slow device.
The origin device always contains a copy of the logical block, which
may be out of date or kept in sync with the copy on the cache device
@@ -169,45 +176,53 @@ Target interface
Constructor
-----------
cache <metadata dev> <cache dev> <origin dev> <block size>
<#feature args> [<feature arg>]*
<policy> <#policy args> [policy args]*
::
metadata dev : fast device holding the persistent metadata
cache dev : fast device holding cached data blocks
origin dev : slow device holding original data blocks
block size : cache unit size in sectors
cache <metadata dev> <cache dev> <origin dev> <block size>
<#feature args> [<feature arg>]*
<policy> <#policy args> [policy args]*
#feature args : number of feature arguments passed
feature args : writethrough or passthrough (The default is writeback.)
================ =======================================================
metadata dev fast device holding the persistent metadata
cache dev fast device holding cached data blocks
origin dev slow device holding original data blocks
block size cache unit size in sectors
policy : the replacement policy to use
#policy args : an even number of arguments corresponding to
key/value pairs passed to the policy
policy args : key/value pairs passed to the policy
E.g. 'sequential_threshold 1024'
See cache-policies.txt for details.
#feature args number of feature arguments passed
feature args writethrough or passthrough (The default is writeback.)
policy the replacement policy to use
#policy args an even number of arguments corresponding to
key/value pairs passed to the policy
policy args key/value pairs passed to the policy
E.g. 'sequential_threshold 1024'
See cache-policies.txt for details.
================ =======================================================
Optional feature arguments are:
writethrough : write through caching that prohibits cache block
content from being different from origin block content.
Without this argument, the default behaviour is to write
back cache block contents later for performance reasons,
so they may differ from the corresponding origin blocks.
passthrough : a degraded mode useful for various cache coherency
situations (e.g., rolling back snapshots of
underlying storage). Reads and writes always go to
the origin. If a write goes to a cached origin
block, then the cache block is invalidated.
To enable passthrough mode the cache must be clean.
metadata2 : use version 2 of the metadata. This stores the dirty bits
in a separate btree, which improves speed of shutting
down the cache.
==================== ========================================================
writethrough write through caching that prohibits cache block
content from being different from origin block content.
Without this argument, the default behaviour is to write
back cache block contents later for performance reasons,
so they may differ from the corresponding origin blocks.
no_discard_passdown : disable passing down discards from the cache
to the origin's data device.
passthrough a degraded mode useful for various cache coherency
situations (e.g., rolling back snapshots of
underlying storage). Reads and writes always go to
the origin. If a write goes to a cached origin
block, then the cache block is invalidated.
To enable passthrough mode the cache must be clean.
metadata2 use version 2 of the metadata. This stores the dirty
bits in a separate btree, which improves speed of
shutting down the cache.
no_discard_passdown disable passing down discards from the cache
to the origin's data device.
==================== ========================================================
A policy called 'default' is always registered. This is an alias for
the policy we currently think is giving best all round performance.
@@ -218,54 +233,61 @@ the characteristics of a specific policy, always request it by name.
Status
------
<metadata block size> <#used metadata blocks>/<#total metadata blocks>
<cache block size> <#used cache blocks>/<#total cache blocks>
<#read hits> <#read misses> <#write hits> <#write misses>
<#demotions> <#promotions> <#dirty> <#features> <features>*
<#core args> <core args>* <policy name> <#policy args> <policy args>*
<cache metadata mode>
::
metadata block size : Fixed block size for each metadata block in
sectors
#used metadata blocks : Number of metadata blocks used
#total metadata blocks : Total number of metadata blocks
cache block size : Configurable block size for the cache device
in sectors
#used cache blocks : Number of blocks resident in the cache
#total cache blocks : Total number of cache blocks
#read hits : Number of times a READ bio has been mapped
to the cache
#read misses : Number of times a READ bio has been mapped
to the origin
#write hits : Number of times a WRITE bio has been mapped
to the cache
#write misses : Number of times a WRITE bio has been
mapped to the origin
#demotions : Number of times a block has been removed
from the cache
#promotions : Number of times a block has been moved to
the cache
#dirty : Number of blocks in the cache that differ
from the origin
#feature args : Number of feature args to follow
feature args : 'writethrough' (optional)
#core args : Number of core arguments (must be even)
core args : Key/value pairs for tuning the core
e.g. migration_threshold
policy name : Name of the policy
#policy args : Number of policy arguments to follow (must be even)
policy args : Key/value pairs e.g. sequential_threshold
cache metadata mode : ro if read-only, rw if read-write
In serious cases where even a read-only mode is deemed unsafe
no further I/O will be permitted and the status will just
contain the string 'Fail'. The userspace recovery tools
should then be used.
needs_check : 'needs_check' if set, '-' if not set
A metadata operation has failed, resulting in the needs_check
flag being set in the metadata's superblock. The metadata
device must be deactivated and checked/repaired before the
cache can be made fully operational again. '-' indicates
needs_check is not set.
<metadata block size> <#used metadata blocks>/<#total metadata blocks>
<cache block size> <#used cache blocks>/<#total cache blocks>
<#read hits> <#read misses> <#write hits> <#write misses>
<#demotions> <#promotions> <#dirty> <#features> <features>*
<#core args> <core args>* <policy name> <#policy args> <policy args>*
<cache metadata mode>
========================= =====================================================
metadata block size Fixed block size for each metadata block in
sectors
#used metadata blocks Number of metadata blocks used
#total metadata blocks Total number of metadata blocks
cache block size Configurable block size for the cache device
in sectors
#used cache blocks Number of blocks resident in the cache
#total cache blocks Total number of cache blocks
#read hits Number of times a READ bio has been mapped
to the cache
#read misses Number of times a READ bio has been mapped
to the origin
#write hits Number of times a WRITE bio has been mapped
to the cache
#write misses Number of times a WRITE bio has been
mapped to the origin
#demotions Number of times a block has been removed
from the cache
#promotions Number of times a block has been moved to
the cache
#dirty Number of blocks in the cache that differ
from the origin
#feature args Number of feature args to follow
feature args 'writethrough' (optional)
#core args Number of core arguments (must be even)
core args Key/value pairs for tuning the core
e.g. migration_threshold
policy name Name of the policy
#policy args Number of policy arguments to follow (must be even)
policy args Key/value pairs e.g. sequential_threshold
cache metadata mode ro if read-only, rw if read-write
In serious cases where even a read-only mode is
deemed unsafe no further I/O will be permitted and
the status will just contain the string 'Fail'.
The userspace recovery tools should then be used.
needs_check 'needs_check' if set, '-' if not set
A metadata operation has failed, resulting in the
needs_check flag being set in the metadata's
superblock. The metadata device must be
deactivated and checked/repaired before the
cache can be made fully operational again.
'-' indicates needs_check is not set.
========================= =====================================================
Messages
--------
@@ -274,11 +296,12 @@ Policies will have different tunables, specific to each one, so we
need a generic way of getting and setting these. Device-mapper
messages are used. (A sysfs interface would also be possible.)
The message format is:
The message format is::
<key> <value>
E.g.
E.g.::
dmsetup message my_cache 0 sequential_threshold 1024
@@ -290,11 +313,12 @@ of values from 5 to 9. Each cblock must be expressed as a decimal
value, in the future a variant message that takes cblock ranges
expressed in hexadecimal may be needed to better support efficient
invalidation of larger caches. The cache must be in passthrough mode
when invalidate_cblocks is used.
when invalidate_cblocks is used::
invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]*
E.g.
E.g.::
dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789
Examples
@@ -304,8 +328,10 @@ The test suite can be found here:
https://github.com/jthornber/device-mapper-test-suite
dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
/dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0'
dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
/dev/mapper/ssd /dev/mapper/origin 1024 1 writeback \
mq 4 sequential_threshold 1024 random_threshold 8'
::
dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
/dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0'
dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
/dev/mapper/ssd /dev/mapper/origin 1024 1 writeback \
mq 4 sequential_threshold 1024 random_threshold 8'
@@ -1,10 +1,12 @@
========
dm-delay
========
Device-Mapper's "delay" target delays reads and/or writes
and maps them to different devices.
Parameters:
Parameters::
<device> <offset> <delay> [<write_device> <write_offset> <write_delay>
[<flush_device> <flush_offset> <flush_delay>]]
@@ -14,15 +16,16 @@ Delays are specified in milliseconds.
Example scripts
===============
[[
#!/bin/sh
# Create device delaying rw operation for 500ms
echo "0 `blockdev --getsz $1` delay $1 0 500" | dmsetup create delayed
]]
[[
#!/bin/sh
# Create device delaying only write operation for 500ms and
# splitting reads and writes to different devices $1 $2
echo "0 `blockdev --getsz $1` delay $1 0 0 $2 0 500" | dmsetup create delayed
]]
::
#!/bin/sh
# Create device delaying rw operation for 500ms
echo "0 `blockdev --getsz $1` delay $1 0 500" | dmsetup create delayed
::
#!/bin/sh
# Create device delaying only write operation for 500ms and
# splitting reads and writes to different devices $1 $2
echo "0 `blockdev --getsz $1` delay $1 0 0 $2 0 500" | dmsetup create delayed
@@ -1,5 +1,6 @@
========
dm-crypt
=========
========
Device-Mapper's "crypt" target provides transparent encryption of block devices
using the kernel crypto API.
@@ -7,15 +8,20 @@ using the kernel crypto API.
For a more detailed description of supported parameters see:
https://gitlab.com/cryptsetup/cryptsetup/wikis/DMCrypt
Parameters: <cipher> <key> <iv_offset> <device path> \
Parameters::
<cipher> <key> <iv_offset> <device path> \
<offset> [<#opt_params> <opt_params>]
<cipher>
Encryption cipher, encryption mode and Initial Vector (IV) generator.
The cipher specifications format is:
The cipher specifications format is::
cipher[:keycount]-chainmode-ivmode[:ivopts]
Examples:
Examples::
aes-cbc-essiv:sha256
aes-xts-plain64
serpent-xts-plain64
@@ -25,12 +31,17 @@ Parameters: <cipher> <key> <iv_offset> <device path> \
as for the first format type.
This format is mainly used for specification of authenticated modes.
The crypto API cipher specifications format is:
The crypto API cipher specifications format is::
capi:cipher_api_spec-ivmode[:ivopts]
Examples:
Examples::
capi:cbc(aes)-essiv:sha256
capi:xts(aes)-plain64
Examples of authenticated modes:
Examples of authenticated modes::
capi:gcm(aes)-random
capi:authenc(hmac(sha256),xts(aes))-random
capi:rfc7539(chacha20,poly1305)-random
@@ -142,21 +153,21 @@ LUKS (Linux Unified Key Setup) is now the preferred way to set up disk
encryption with dm-crypt using the 'cryptsetup' utility, see
https://gitlab.com/cryptsetup/cryptsetup
[[
#!/bin/sh
# Create a crypt device using dmsetup
dmsetup create crypt1 --table "0 `blockdev --getsz $1` crypt aes-cbc-essiv:sha256 babebabebabebabebabebabebabebabe 0 $1 0"
]]
::
[[
#!/bin/sh
# Create a crypt device using dmsetup when encryption key is stored in keyring service
dmsetup create crypt2 --table "0 `blockdev --getsize $1` crypt aes-cbc-essiv:sha256 :32:logon:my_prefix:my_key 0 $1 0"
]]
#!/bin/sh
# Create a crypt device using dmsetup
dmsetup create crypt1 --table "0 `blockdev --getsz $1` crypt aes-cbc-essiv:sha256 babebabebabebabebabebabebabebabe 0 $1 0"
[[
#!/bin/sh
# Create a crypt device using cryptsetup and LUKS header with default cipher
cryptsetup luksFormat $1
cryptsetup luksOpen $1 crypt1
]]
::
#!/bin/sh
# Create a crypt device using dmsetup when encryption key is stored in keyring service
dmsetup create crypt2 --table "0 `blockdev --getsize $1` crypt aes-cbc-essiv:sha256 :32:logon:my_prefix:my_key 0 $1 0"
::
#!/bin/sh
# Create a crypt device using cryptsetup and LUKS header with default cipher
cryptsetup luksFormat $1
cryptsetup luksOpen $1 crypt1
@@ -1,3 +1,4 @@
=========
dm-flakey
=========
@@ -15,17 +16,26 @@ underlying devices.
Table parameters
----------------
::
<dev path> <offset> <up interval> <down interval> \
[<num_features> [<feature arguments>]]
Mandatory parameters:
<dev path>: Full pathname to the underlying block-device, or a
"major:minor" device-number.
<offset>: Starting sector within the device.
<up interval>: Number of seconds device is available.
<down interval>: Number of seconds device returns errors.
<dev path>:
Full pathname to the underlying block-device, or a
"major:minor" device-number.
<offset>:
Starting sector within the device.
<up interval>:
Number of seconds device is available.
<down interval>:
Number of seconds device returns errors.
Optional feature parameters:
If no feature parameters are present, during the periods of
unreliability, all I/O returns errors.
@@ -41,17 +51,24 @@ Optional feature parameters:
During <down interval>, replace <Nth_byte> of the data of
each matching bio with <value>.
<Nth_byte>: The offset of the byte to replace.
Counting starts at 1, to replace the first byte.
<direction>: Either 'r' to corrupt reads or 'w' to corrupt writes.
'w' is incompatible with drop_writes.
<value>: The value (from 0-255) to write.
<flags>: Perform the replacement only if bio->bi_opf has all the
selected flags set.
<Nth_byte>:
The offset of the byte to replace.
Counting starts at 1, to replace the first byte.
<direction>:
Either 'r' to corrupt reads or 'w' to corrupt writes.
'w' is incompatible with drop_writes.
<value>:
The value (from 0-255) to write.
<flags>:
Perform the replacement only if bio->bi_opf has all the
selected flags set.
Examples:
Replaces the 32nd byte of READ bios with the value 1::
corrupt_bio_byte 32 r 1 0
- replaces the 32nd byte of READ bios with the value 1
Replaces the 224th byte of REQ_META (=32) bios with the value 0::
corrupt_bio_byte 224 w 0 32
- replaces the 224th byte of REQ_META (=32) bios with the value 0
@@ -1,5 +1,6 @@
================================
Early creation of mapped devices
====================================
================================
It is possible to configure a device-mapper device to act as the root device for
your system in two ways.
@@ -12,15 +13,17 @@ The second is to create one or more device-mappers using the module parameter
The format is specified as a string of data separated by commas and optionally
semi-colons, where:
- a comma is used to separate fields like name, uuid, flags and table
(specifies one device)
- a semi-colon is used to separate devices.
So the format will look like this:
So the format will look like this::
dm-mod.create=<name>,<uuid>,<minor>,<flags>,<table>[,<table>+][;<name>,<uuid>,<minor>,<flags>,<table>[,<table>+]+]
Where,
Where::
<name> ::= The device name.
<uuid> ::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ""
<minor> ::= The device minor number | ""
@@ -29,7 +32,7 @@ Where,
<target_type> ::= "verity" | "linear" | ... (see list below)
The dm line should be equivalent to the one used by the dmsetup tool with the
--concise argument.
`--concise` argument.
Target types
============
@@ -38,32 +41,34 @@ Not all target types are available as there are serious risks in allowing
activation of certain DM targets without first using userspace tools to check
the validity of associated metadata.
"cache": constrained, userspace should verify cache device
"crypt": allowed
"delay": allowed
"era": constrained, userspace should verify metadata device
"flakey": constrained, meant for test
"linear": allowed
"log-writes": constrained, userspace should verify metadata device
"mirror": constrained, userspace should verify main/mirror device
"raid": constrained, userspace should verify metadata device
"snapshot": constrained, userspace should verify src/dst device
"snapshot-origin": allowed
"snapshot-merge": constrained, userspace should verify src/dst device
"striped": allowed
"switch": constrained, userspace should verify dev path
"thin": constrained, requires dm target message from userspace
"thin-pool": constrained, requires dm target message from userspace
"verity": allowed
"writecache": constrained, userspace should verify cache device
"zero": constrained, not meant for rootfs
======================= =======================================================
`cache` constrained, userspace should verify cache device
`crypt` allowed
`delay` allowed
`era` constrained, userspace should verify metadata device
`flakey` constrained, meant for test
`linear` allowed
`log-writes` constrained, userspace should verify metadata device
`mirror` constrained, userspace should verify main/mirror device
`raid` constrained, userspace should verify metadata device
`snapshot` constrained, userspace should verify src/dst device
`snapshot-origin` allowed
`snapshot-merge` constrained, userspace should verify src/dst device
`striped` allowed
`switch` constrained, userspace should verify dev path
`thin` constrained, requires dm target message from userspace
`thin-pool` constrained, requires dm target message from userspace
`verity` allowed
`writecache` constrained, userspace should verify cache device
`zero` constrained, not meant for rootfs
======================= =======================================================
If the target is not listed above, it is constrained by default (not tested).
Examples
========
An example of booting to a linear array made up of user-mode linux block
devices:
devices::
dm-mod.create="lroot,,,rw, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" root=/dev/dm-0
@@ -71,8 +76,8 @@ This will boot to a rw dm-linear target of 8192 sectors split across two block
devices identified by their major:minor numbers. After boot, udev will rename
this target to /dev/mapper/lroot (depending on the rules). No uuid was assigned.
An example of multiple device-mappers, with the dm-mod.create="..." contents is shown here
split on multiple lines for readability:
An example of multiple device-mappers, with the dm-mod.create="..." contents
is shown here split on multiple lines for readability::
dm-linear,,1,rw,
0 32768 linear 8:1 0,
@@ -84,30 +89,36 @@ split on multiple lines for readability:
Other examples (per target):
"crypt":
"crypt"::
dm-crypt,,8,ro,
0 1048576 crypt aes-xts-plain64
babebabebabebabebabebabebabebabebabebabebabebabebabebabebabebabe 0
/dev/sda 0 1 allow_discards
"delay":
"delay"::
dm-delay,,4,ro,0 409600 delay /dev/sda1 0 500
"linear":
"linear"::
dm-linear,,,rw,
0 32768 linear /dev/sda1 0,
32768 1024000 linear /dev/sda2 0,
1056768 204800 linear /dev/sda3 0,
1261568 512000 linear /dev/sda4 0
"snapshot-origin":
"snapshot-origin"::
dm-snap-orig,,4,ro,0 409600 snapshot-origin 8:2
"striped":
"striped"::
dm-striped,,4,ro,0 1638400 striped 4 4096
/dev/sda1 0 /dev/sda2 0 /dev/sda3 0 /dev/sda4 0
"verity":
"verity"::
dm-verity,,4,ro,
0 1638400 verity 1 8:1 8:2 4096 4096 204800 1 sha256
fb1a5a0f00deb908d8b53cb270858975e76cf64105d412ce764225d53b8f3cfd
@@ -1,3 +1,7 @@
============
dm-integrity
============
The dm-integrity target emulates a block device that has additional
per-sector tags that can be used for storing integrity information.
@@ -35,15 +39,16 @@ zeroes. If the superblock is neither valid nor zeroed, the dm-integrity
target can't be loaded.
To use the target for the first time:
1. overwrite the superblock with zeroes
2. load the dm-integrity target with one-sector size, the kernel driver
will format the device
will format the device
3. unload the dm-integrity target
4. read the "provided_data_sectors" value from the superblock
5. load the dm-integrity target with the the target size
"provided_data_sectors"
"provided_data_sectors"
6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target
with the size "provided_data_sectors"
with the size "provided_data_sectors"
Target arguments:
@@ -51,17 +56,20 @@ Target arguments:
1. the underlying block device
2. the number of reserved sector at the beginning of the device - the
dm-integrity won't read of write these sectors
dm-integrity won't read of write these sectors
3. the size of the integrity tag (if "-" is used, the size is taken from
the internal-hash algorithm)
the internal-hash algorithm)
4. mode:
D - direct writes (without journal) - in this mode, journaling is
D - direct writes (without journal)
in this mode, journaling is
not used and data sectors and integrity tags are written
separately. In case of crash, it is possible that the data
and integrity tag doesn't match.
J - journaled writes - data and integrity tags are written to the
J - journaled writes
data and integrity tags are written to the
journal and atomicity is guaranteed. In case of crash,
either both data and tag or none of them are written. The
journaled mode degrades write throughput twice because the
@@ -178,9 +186,12 @@ and the reloaded target would be non-functional.
The layout of the formatted block device:
* reserved sectors (they are not used by this target, they can be used for
storing LUKS metadata or for other purpose), the size of the reserved
area is specified in the target arguments
* reserved sectors
(they are not used by this target, they can be used for
storing LUKS metadata or for other purpose), the size of the reserved
area is specified in the target arguments
* superblock (4kiB)
* magic string - identifies that the device was formatted
* version
@@ -192,40 +203,55 @@ The layout of the formatted block device:
metadata and padding). The user of this target should not send
bios that access data beyond the "provided data sectors" limit.
* flags
SB_FLAG_HAVE_JOURNAL_MAC - a flag is set if journal_mac is used
SB_FLAG_RECALCULATING - recalculating is in progress
SB_FLAG_DIRTY_BITMAP - journal area contains the bitmap of dirty
blocks
SB_FLAG_HAVE_JOURNAL_MAC
- a flag is set if journal_mac is used
SB_FLAG_RECALCULATING
- recalculating is in progress
SB_FLAG_DIRTY_BITMAP
- journal area contains the bitmap of dirty
blocks
* log2(sectors per block)
* a position where recalculating finished
* journal
The journal is divided into sections, each section contains:
* metadata area (4kiB), it contains journal entries
every journal entry contains:
- every journal entry contains:
* logical sector (specifies where the data and tag should
be written)
* last 8 bytes of data
* integrity tag (the size is specified in the superblock)
every metadata sector ends with
- every metadata sector ends with
* mac (8-bytes), all the macs in 8 metadata sectors form a
64-byte value. It is used to store hmac of sector
numbers in the journal section, to protect against a
possibility that the attacker tampers with sector
numbers in the journal.
* commit id
* data area (the size is variable; it depends on how many journal
entries fit into the metadata area)
every sector in the data area contains:
- every sector in the data area contains:
* data (504 bytes of data, the last 8 bytes are stored in
the journal entry)
* commit id
To test if the whole journal section was written correctly, every
512-byte sector of the journal ends with 8-byte commit id. If the
commit id matches on all sectors in a journal section, then it is
assumed that the section was written correctly. If the commit id
doesn't match, the section was written partially and it should not
be replayed.
* one or more runs of interleaved tags and data. Each run contains:
* one or more runs of interleaved tags and data.
Each run contains:
* tag area - it contains integrity tags. There is one tag for each
sector in the data area
* data area - it contains data sectors. The number of data sectors
@@ -1,3 +1,4 @@
=====
dm-io
=====
@@ -7,7 +8,7 @@ version.
The user must set up an io_region structure to describe the desired location
of the I/O. Each io_region indicates a block-device along with the starting
sector and size of the region.
sector and size of the region::
struct io_region {
struct block_device *bdev;
@@ -19,7 +20,7 @@ Dm-io can read from one io_region or write to one or more io_regions. Writes
to multiple regions are specified by an array of io_region structures.
The first I/O service type takes a list of memory pages as the data buffer for
the I/O, along with an offset into the first page.
the I/O, along with an offset into the first page::
struct page_list {
struct page_list *next;
@@ -35,7 +36,7 @@ the I/O, along with an offset into the first page.
The second I/O service type takes an array of bio vectors as the data buffer
for the I/O. This service can be handy if the caller has a pre-assembled bio,
but wants to direct different portions of the bio to different devices.
but wants to direct different portions of the bio to different devices::
int dm_io_sync_bvec(unsigned int num_regions, struct io_region *where,
int rw, struct bio_vec *bvec,
@@ -47,7 +48,7 @@ but wants to direct different portions of the bio to different devices.
The third I/O service type takes a pointer to a vmalloc'd memory buffer as the
data buffer for the I/O. This service can be handy if the caller needs to do
I/O to a large region but doesn't want to allocate a large number of individual
memory pages.
memory pages::
int dm_io_sync_vm(unsigned int num_regions, struct io_region *where, int rw,
void *data, unsigned long *error_bits);
@@ -55,11 +56,11 @@ memory pages.
void *data, io_notify_fn fn, void *context);
Callers of the asynchronous I/O services must include the name of a completion
callback routine and a pointer to some context data for the I/O.
callback routine and a pointer to some context data for the I/O::
typedef void (*io_notify_fn)(unsigned long error, void *context);
The "error" parameter in this callback, as well as the "*error" parameter in
The "error" parameter in this callback, as well as the `*error` parameter in
all of the synchronous versions, is a bitset (instead of a simple error value).
In the case of an write-I/O to multiple regions, this bitset allows dm-io to
indicate success or failure on each individual region.
@@ -72,4 +73,3 @@ always available in order to avoid unnecessary waiting while performing I/O.
When the user is finished using the dm-io services, they should call
dm_io_put() and specify the same number of pages that were given on the
dm_io_get() call.
@@ -1,3 +1,4 @@
=====================
Device-Mapper Logging
=====================
The device-mapper logging code is used by some of the device-mapper
@@ -16,11 +17,13 @@ dm_dirty_log_type in include/linux/dm-dirty-log.h). Various different
logging implementations are available and provide different
capabilities. The list includes:
============== ==============================================================
Type Files
==== =====
============== ==============================================================
disk drivers/md/dm-log.c
core drivers/md/dm-log.c
userspace drivers/md/dm-log-userspace* include/linux/dm-log-userspace.h
============== ==============================================================
The "disk" log type
-------------------
@@ -1,3 +1,4 @@
===============
dm-queue-length
===============
@@ -6,12 +7,18 @@ which selects a path with the least number of in-flight I/Os.
The path selector name is 'queue-length'.
Table parameters for each path: [<repeat_count>]
::
<repeat_count>: The number of I/Os to dispatch using the selected
path before switching to the next path.
If not given, internal default is used. To check
the default value, see the activated table.
Status for each path: <status> <fail-count> <in-flight>
::
<status>: 'A' if the path is active, 'F' if the path is failed.
<fail-count>: The number of path failures.
<in-flight>: The number of in-flight I/Os on the path.
@@ -29,11 +36,13 @@ Examples
========
In case that 2 paths (sda and sdb) are used with repeat_count == 128.
# echo "0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128" \
dmsetup create test
#
# dmsetup table
test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128
#
# dmsetup status
test: 0 10 multipath 2 0 0 0 1 1 E 0 2 1 8:0 A 0 0 8:16 A 0 0
::
# echo "0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128" \
dmsetup create test
#
# dmsetup table
test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128
#
# dmsetup status
test: 0 10 multipath 2 0 0 0 1 1 E 0 2 1 8:0 A 0 0 8:16 A 0 0
@@ -1,3 +1,4 @@
=======
dm-raid
=======
@@ -8,49 +9,66 @@ interface.
Mapping Table Interface
-----------------------
The target is named "raid" and it accepts the following parameters:
The target is named "raid" and it accepts the following parameters::
<raid_type> <#raid_params> <raid_params> \
<#raid_devs> <metadata_dev0> <dev0> [.. <metadata_devN> <devN>]
<raid_type>:
============= ===============================================================
raid0 RAID0 striping (no resilience)
raid1 RAID1 mirroring
raid4 RAID4 with dedicated last parity disk
raid5_n RAID5 with dedicated last parity disk supporting takeover
Same as raid4
-Transitory layout
- Transitory layout
raid5_la RAID5 left asymmetric
- rotating parity 0 with data continuation
raid5_ra RAID5 right asymmetric
- rotating parity N with data continuation
raid5_ls RAID5 left symmetric
- rotating parity 0 with data restart
raid5_rs RAID5 right symmetric
- rotating parity N with data restart
raid6_zr RAID6 zero restart
- rotating parity zero (left-to-right) with data restart
raid6_nr RAID6 N restart
- rotating parity N (right-to-left) with data restart
raid6_nc RAID6 N continue
- rotating parity N (right-to-left) with data continuation
raid6_n_6 RAID6 with dedicate parity disks
- parity and Q-syndrome on the last 2 disks;
layout for takeover from/to raid4/raid5_n
raid6_la_6 Same as "raid_la" plus dedicated last Q-syndrome disk
- layout for takeover from raid5_la from/to raid6
raid6_ra_6 Same as "raid5_ra" dedicated last Q-syndrome disk
- layout for takeover from raid5_ra from/to raid6
raid6_ls_6 Same as "raid5_ls" dedicated last Q-syndrome disk
- layout for takeover from raid5_ls from/to raid6
raid6_rs_6 Same as "raid5_rs" dedicated last Q-syndrome disk
- layout for takeover from raid5_rs from/to raid6
raid10 Various RAID10 inspired algorithms chosen by additional params
(see raid10_format and raid10_copies below)
- RAID10: Striped Mirrors (aka 'Striping on top of mirrors')
- RAID1E: Integrated Adjacent Stripe Mirroring
- RAID1E: Integrated Offset Stripe Mirroring
- and other similar RAID10 variants
- and other similar RAID10 variants
============= ===============================================================
Reference: Chapter 4 of
http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
@@ -58,33 +76,41 @@ The target is named "raid" and it accepts the following parameters:
<#raid_params>: The number of parameters that follow.
<raid_params> consists of
Mandatory parameters:
<chunk_size>: Chunk size in sectors. This parameter is often known as
<chunk_size>:
Chunk size in sectors. This parameter is often known as
"stripe size". It is the only mandatory parameter and
is placed first.
followed by optional parameters (in any order):
[sync|nosync] Force or prevent RAID initialization.
[sync|nosync]
Force or prevent RAID initialization.
[rebuild <idx>] Rebuild drive number 'idx' (first drive is 0).
[rebuild <idx>]
Rebuild drive number 'idx' (first drive is 0).
[daemon_sleep <ms>]
Interval between runs of the bitmap daemon that
clear bits. A longer interval means less bitmap I/O but
resyncing after a failure is likely to take longer.
[min_recovery_rate <kB/sec/disk>] Throttle RAID initialization
[max_recovery_rate <kB/sec/disk>] Throttle RAID initialization
[write_mostly <idx>] Mark drive index 'idx' write-mostly.
[max_write_behind <sectors>] See '--write-behind=' (man mdadm)
[stripe_cache <sectors>] Stripe cache size (RAID 4/5/6 only)
[min_recovery_rate <kB/sec/disk>]
Throttle RAID initialization
[max_recovery_rate <kB/sec/disk>]
Throttle RAID initialization
[write_mostly <idx>]
Mark drive index 'idx' write-mostly.
[max_write_behind <sectors>]
See '--write-behind=' (man mdadm)
[stripe_cache <sectors>]
Stripe cache size (RAID 4/5/6 only)
[region_size <sectors>]
The region_size multiplied by the number of regions is the
logical size of the array. The bitmap records the device
synchronisation state for each region.
[raid10_copies <# copies>]
[raid10_format <near|far|offset>]
[raid10_copies <# copies>], [raid10_format <near|far|offset>]
These two options are used to alter the default layout of
a RAID10 configuration. The number of copies is can be
specified, but the default is 2. There are also three
@@ -93,13 +119,17 @@ The target is named "raid" and it accepts the following parameters:
respect to mirroring. If these options are left unspecified,
or 'raid10_copies 2' and/or 'raid10_format near' are given,
then the layouts for 2, 3 and 4 devices are:
======== ========== ==============
2 drives 3 drives 4 drives
-------- ---------- --------------
======== ========== ==============
A1 A1 A1 A1 A2 A1 A1 A2 A2
A2 A2 A2 A3 A3 A3 A3 A4 A4
A3 A3 A4 A4 A5 A5 A5 A6 A6
A4 A4 A5 A6 A6 A7 A7 A8 A8
.. .. .. .. .. .. .. .. ..
======== ========== ==============
The 2-device layout is equivalent 2-way RAID1. The 4-device
layout is what a traditional RAID10 would look like. The
3-device layout is what might be called a 'RAID1E - Integrated
@@ -107,8 +137,10 @@ The target is named "raid" and it accepts the following parameters:
If 'raid10_copies 2' and 'raid10_format far', then the layouts
for 2, 3 and 4 devices are:
======== ============ ===================
2 drives 3 drives 4 drives
-------- -------------- --------------------
======== ============ ===================
A1 A2 A1 A2 A3 A1 A2 A3 A4
A3 A4 A4 A5 A6 A5 A6 A7 A8
A5 A6 A7 A8 A9 A9 A10 A11 A12
@@ -117,11 +149,14 @@ The target is named "raid" and it accepts the following parameters:
A4 A3 A6 A4 A5 A6 A5 A8 A7
A6 A5 A9 A7 A8 A10 A9 A12 A11
.. .. .. .. .. .. .. .. ..
======== ============ ===================
If 'raid10_copies 2' and 'raid10_format offset', then the
layouts for 2, 3 and 4 devices are:
======== ========== ================
2 drives 3 drives 4 drives
-------- ------------ -----------------
======== ========== ================
A1 A2 A1 A2 A3 A1 A2 A3 A4
A2 A1 A3 A1 A2 A2 A1 A4 A3
A3 A4 A4 A5 A6 A5 A6 A7 A8
@@ -129,6 +164,8 @@ The target is named "raid" and it accepts the following parameters:
A5 A6 A7 A8 A9 A9 A10 A11 A12
A6 A5 A9 A7 A8 A10 A9 A12 A11
.. .. .. .. .. .. .. .. ..
======== ========== ================
Here we see layouts closely akin to 'RAID1E - Integrated
Offset Stripe Mirroring'.
@@ -190,22 +227,25 @@ The target is named "raid" and it accepts the following parameters:
Example Tables
--------------
# RAID4 - 4 data drives, 1 parity (no metadata devices)
# No metadata devices specified to hold superblock/bitmap info
# Chunk size of 1MiB
# (Lines separated for easy reading)
0 1960893648 raid \
raid4 1 2048 \
5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
::
# RAID4 - 4 data drives, 1 parity (with metadata devices)
# Chunk size of 1MiB, force RAID initialization,
# min recovery rate at 20 kiB/sec/disk
# RAID4 - 4 data drives, 1 parity (no metadata devices)
# No metadata devices specified to hold superblock/bitmap info
# Chunk size of 1MiB
# (Lines separated for easy reading)
0 1960893648 raid \
raid4 4 2048 sync min_recovery_rate 20 \
5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82
0 1960893648 raid \
raid4 1 2048 \
5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
# RAID4 - 4 data drives, 1 parity (with metadata devices)
# Chunk size of 1MiB, force RAID initialization,
# min recovery rate at 20 kiB/sec/disk
0 1960893648 raid \
raid4 4 2048 sync min_recovery_rate 20 \
5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82
Status Output
@@ -219,41 +259,58 @@ Arguments that can be repeated are ordered by value.
'dmsetup status' yields information on the state and health of the array.
The output is as follows (normally a single line, but expanded here for
clarity):
1: <s> <l> raid \
2: <raid_type> <#devices> <health_chars> \
3: <sync_ratio> <sync_action> <mismatch_cnt>
clarity)::
1: <s> <l> raid \
2: <raid_type> <#devices> <health_chars> \
3: <sync_ratio> <sync_action> <mismatch_cnt>
Line 1 is the standard output produced by device-mapper.
Line 2 & 3 are produced by the raid target and are best explained by example:
Line 2 & 3 are produced by the raid target and are best explained by example::
0 1960893648 raid raid4 5 AAAAA 2/490221568 init 0
Here we can see the RAID type is raid4, there are 5 devices - all of
which are 'A'live, and the array is 2/490221568 complete with its initial
recovery. Here is a fuller description of the individual fields:
=============== =========================================================
<raid_type> Same as the <raid_type> used to create the array.
<health_chars> One char for each device, indicating: 'A' = alive and
in-sync, 'a' = alive but not in-sync, 'D' = dead/failed.
<health_chars> One char for each device, indicating:
- 'A' = alive and in-sync
- 'a' = alive but not in-sync
- 'D' = dead/failed.
<sync_ratio> The ratio indicating how much of the array has undergone
the process described by 'sync_action'. If the
'sync_action' is "check" or "repair", then the process
of "resync" or "recover" can be considered complete.
<sync_action> One of the following possible states:
idle - No synchronization action is being performed.
frozen - The current action has been halted.
resync - Array is undergoing its initial synchronization
idle
- No synchronization action is being performed.
frozen
- The current action has been halted.
resync
- Array is undergoing its initial synchronization
or is resynchronizing after an unclean shutdown
(possibly aided by a bitmap).
recover - A device in the array is being rebuilt or
recover
- A device in the array is being rebuilt or
replaced.
check - A user-initiated full check of the array is
check
- A user-initiated full check of the array is
being performed. All blocks are read and
checked for consistency. The number of
discrepancies found are recorded in
<mismatch_cnt>. No changes are made to the
array by this action.
repair - The same as "check", but discrepancies are
repair
- The same as "check", but discrepancies are
corrected.
reshape - The array is undergoing a reshape.
reshape
- The array is undergoing a reshape.
<mismatch_cnt> The number of discrepancies found between mirror copies
in RAID1/10 or wrong parity values found in RAID4/5/6.
This value is valid only after a "check" of the array
@@ -261,10 +318,11 @@ recovery. Here is a fuller description of the individual fields:
<data_offset> The current data offset to the start of the user data on
each component device of a raid set (see the respective
raid parameter to support out-of-place reshaping).
<journal_char> 'A' - active write-through journal device.
'a' - active write-back journal device.
'D' - dead journal device.
'-' - no journal device.
<journal_char> - 'A' - active write-through journal device.
- 'a' - active write-back journal device.
- 'D' - dead journal device.
- '-' - no journal device.
=============== =========================================================
Message Interface
@@ -272,12 +330,15 @@ Message Interface
The dm-raid target will accept certain actions through the 'message' interface.
('man dmsetup' for more information on the message interface.) These actions
include:
"idle" - Halt the current sync action.
"frozen" - Freeze the current sync action.
"resync" - Initiate/continue a resync.
"recover"- Initiate/continue a recover process.
"check" - Initiate a check (i.e. a "scrub") of the array.
"repair" - Initiate a repair of the array.
========= ================================================
"idle" Halt the current sync action.
"frozen" Freeze the current sync action.
"resync" Initiate/continue a resync.
"recover" Initiate/continue a recover process.
"check" Initiate a check (i.e. a "scrub") of the array.
"repair" Initiate a repair of the array.
========= ================================================
Discard Support
@@ -307,48 +368,52 @@ increasingly whitelisted in the kernel and can thus be trusted.
For trusted devices, the following dm-raid module parameter can be set
to safely enable discard support for RAID 4/5/6:
'devices_handle_discards_safely'
Version History
---------------
1.0.0 Initial version. Support for RAID 4/5/6
1.1.0 Added support for RAID 1
1.2.0 Handle creation of arrays that contain failed devices.
1.3.0 Added support for RAID 10
1.3.1 Allow device replacement/rebuild for RAID 10
1.3.2 Fix/improve redundancy checking for RAID10
1.4.0 Non-functional change. Removes arg from mapping function.
1.4.1 RAID10 fix redundancy validation checks (commit 55ebbb5).
1.4.2 Add RAID10 "far" and "offset" algorithm support.
1.5.0 Add message interface to allow manipulation of the sync_action.
::
1.0.0 Initial version. Support for RAID 4/5/6
1.1.0 Added support for RAID 1
1.2.0 Handle creation of arrays that contain failed devices.
1.3.0 Added support for RAID 10
1.3.1 Allow device replacement/rebuild for RAID 10
1.3.2 Fix/improve redundancy checking for RAID10
1.4.0 Non-functional change. Removes arg from mapping function.
1.4.1 RAID10 fix redundancy validation checks (commit 55ebbb5).
1.4.2 Add RAID10 "far" and "offset" algorithm support.
1.5.0 Add message interface to allow manipulation of the sync_action.
New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt.
1.5.1 Add ability to restore transiently failed devices on resume.
1.5.2 'mismatch_cnt' is zero unless [last_]sync_action is "check".
1.6.0 Add discard support (and devices_handle_discard_safely module param).
1.7.0 Add support for MD RAID0 mappings.
1.8.0 Explicitly check for compatible flags in the superblock metadata
1.5.1 Add ability to restore transiently failed devices on resume.
1.5.2 'mismatch_cnt' is zero unless [last_]sync_action is "check".
1.6.0 Add discard support (and devices_handle_discard_safely module param).
1.7.0 Add support for MD RAID0 mappings.
1.8.0 Explicitly check for compatible flags in the superblock metadata
and reject to start the raid set if any are set by a newer
target version, thus avoiding data corruption on a raid set
with a reshape in progress.
1.9.0 Add support for RAID level takeover/reshape/region size
1.9.0 Add support for RAID level takeover/reshape/region size
and set size reduction.
1.9.1 Fix activation of existing RAID 4/10 mapped devices
1.9.2 Don't emit '- -' on the status table line in case the constructor
1.9.1 Fix activation of existing RAID 4/10 mapped devices
1.9.2 Don't emit '- -' on the status table line in case the constructor
fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and
'D' on the status line. If '- -' is passed into the constructor, emit
'- -' on the table line and '-' as the status line health character.
1.10.0 Add support for raid4/5/6 journal device
1.10.1 Fix data corruption on reshape request
1.11.0 Fix table line argument order
1.10.0 Add support for raid4/5/6 journal device
1.10.1 Fix data corruption on reshape request
1.11.0 Fix table line argument order
(wrong raid10_copies/raid10_format sequence)
1.11.1 Add raid4/5/6 journal write-back support via journal_mode option
1.12.1 Fix for MD deadlock between mddev_suspend() and md_write_start() available
1.13.0 Fix dev_health status at end of "recover" (was 'a', now 'A')
1.13.1 Fix deadlock caused by early md_stop_writes(). Also fix size an
1.11.1 Add raid4/5/6 journal write-back support via journal_mode option
1.12.1 Fix for MD deadlock between mddev_suspend() and md_write_start() available
1.13.0 Fix dev_health status at end of "recover" (was 'a', now 'A')
1.13.1 Fix deadlock caused by early md_stop_writes(). Also fix size an
state races.
1.13.2 Fix raid redundancy validation and avoid keeping raid set frozen
1.14.0 Fix reshape race on small devices. Fix stripe adding reshape
1.13.2 Fix raid redundancy validation and avoid keeping raid set frozen
1.14.0 Fix reshape race on small devices. Fix stripe adding reshape
deadlock/potential data corruption. Update superblock when
specific devices are requested via rebuild. Fix RAID leg
rebuild errors.
@@ -1,3 +1,4 @@
===============
dm-service-time
===============
@@ -12,25 +13,34 @@ in a path-group, and it can be specified as a table argument.
The path selector name is 'service-time'.
Table parameters for each path: [<repeat_count> [<relative_throughput>]]
<repeat_count>: The number of I/Os to dispatch using the selected
Table parameters for each path:
[<repeat_count> [<relative_throughput>]]
<repeat_count>:
The number of I/Os to dispatch using the selected
path before switching to the next path.
If not given, internal default is used. To check
the default value, see the activated table.
<relative_throughput>: The relative throughput value of the path
<relative_throughput>:
The relative throughput value of the path
among all paths in the path-group.
The valid range is 0-100.
If not given, minimum value '1' is used.
If '0' is given, the path isn't selected while
other paths having a positive value are available.
Status for each path: <status> <fail-count> <in-flight-size> \
<relative_throughput>
<status>: 'A' if the path is active, 'F' if the path is failed.
<fail-count>: The number of path failures.
<in-flight-size>: The size of in-flight I/Os on the path.
<relative_throughput>: The relative throughput value of the path
among all paths in the path-group.
Status for each path:
<status> <fail-count> <in-flight-size> <relative_throughput>
<status>:
'A' if the path is active, 'F' if the path is failed.
<fail-count>:
The number of path failures.
<in-flight-size>:
The size of in-flight I/Os on the path.
<relative_throughput>:
The relative throughput value of the path
among all paths in the path-group.
Algorithm
@@ -39,7 +49,7 @@ Algorithm
dm-service-time adds the I/O size to 'in-flight-size' when the I/O is
dispatched and subtracts when completed.
Basically, dm-service-time selects a path having minimum service time
which is calculated by:
which is calculated by::
('in-flight-size' + 'size-of-incoming-io') / 'relative_throughput'
@@ -67,25 +77,25 @@ Examples
========
In case that 2 paths (sda and sdb) are used with repeat_count == 128
and sda has an average throughput 1GB/s and sdb has 4GB/s,
'relative_throughput' value may be '1' for sda and '4' for sdb.
'relative_throughput' value may be '1' for sda and '4' for sdb::
# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4" \
dmsetup create test
#
# dmsetup table
test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4
#
# dmsetup status
test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 1 8:16 A 0 0 4
# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4" \
dmsetup create test
#
# dmsetup table
test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4
#
# dmsetup status
test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 1 8:16 A 0 0 4
Or '2' for sda and '8' for sdb would be also true.
Or '2' for sda and '8' for sdb would be also true::
# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8" \
dmsetup create test
#
# dmsetup table
test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8
#
# dmsetup status
test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 2 8:16 A 0 0 8
# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8" \
dmsetup create test
#
# dmsetup table
test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8
#
# dmsetup status
test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 2 8:16 A 0 0 8
+110
View File
@@ -0,0 +1,110 @@
====================
device-mapper uevent
====================
The device-mapper uevent code adds the capability to device-mapper to create
and send kobject uevents (uevents). Previously device-mapper events were only
available through the ioctl interface. The advantage of the uevents interface
is the event contains environment attributes providing increased context for
the event avoiding the need to query the state of the device-mapper device after
the event is received.
There are two functions currently for device-mapper events. The first function
listed creates the event and the second function sends the event(s)::
void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti,
const char *path, unsigned nr_valid_paths)
void dm_send_uevents(struct list_head *events, struct kobject *kobj)
The variables added to the uevent environment are:
Variable Name: DM_TARGET
------------------------
:Uevent Action(s): KOBJ_CHANGE
:Type: string
:Description:
:Value: Name of device-mapper target that generated the event.
Variable Name: DM_ACTION
------------------------
:Uevent Action(s): KOBJ_CHANGE
:Type: string
:Description:
:Value: Device-mapper specific action that caused the uevent action.
PATH_FAILED - A path has failed;
PATH_REINSTATED - A path has been reinstated.
Variable Name: DM_SEQNUM
------------------------
:Uevent Action(s): KOBJ_CHANGE
:Type: unsigned integer
:Description: A sequence number for this specific device-mapper device.
:Value: Valid unsigned integer range.
Variable Name: DM_PATH
----------------------
:Uevent Action(s): KOBJ_CHANGE
:Type: string
:Description: Major and minor number of the path device pertaining to this
event.
:Value: Path name in the form of "Major:Minor"
Variable Name: DM_NR_VALID_PATHS
--------------------------------
:Uevent Action(s): KOBJ_CHANGE
:Type: unsigned integer
:Description:
:Value: Valid unsigned integer range.
Variable Name: DM_NAME
----------------------
:Uevent Action(s): KOBJ_CHANGE
:Type: string
:Description: Name of the device-mapper device.
:Value: Name
Variable Name: DM_UUID
----------------------
:Uevent Action(s): KOBJ_CHANGE
:Type: string
:Description: UUID of the device-mapper device.
:Value: UUID. (Empty string if there isn't one.)
An example of the uevents generated as captured by udevmonitor is shown
below
1.) Path failure::
UEVENT[1192521009.711215] change@/block/dm-3
ACTION=change
DEVPATH=/block/dm-3
SUBSYSTEM=block
DM_TARGET=multipath
DM_ACTION=PATH_FAILED
DM_SEQNUM=1
DM_PATH=8:32
DM_NR_VALID_PATHS=0
DM_NAME=mpath2
DM_UUID=mpath-35333333000002328
MINOR=3
MAJOR=253
SEQNUM=1130
2.) Path reinstate::
UEVENT[1192521132.989927] change@/block/dm-3
ACTION=change
DEVPATH=/block/dm-3
SUBSYSTEM=block
DM_TARGET=multipath
DM_ACTION=PATH_REINSTATED
DM_SEQNUM=2
DM_PATH=8:32
DM_NR_VALID_PATHS=1
DM_NAME=mpath2
DM_UUID=mpath-35333333000002328
MINOR=3
MAJOR=253
SEQNUM=1131
-97
View File
@@ -1,97 +0,0 @@
The device-mapper uevent code adds the capability to device-mapper to create
and send kobject uevents (uevents). Previously device-mapper events were only
available through the ioctl interface. The advantage of the uevents interface
is the event contains environment attributes providing increased context for
the event avoiding the need to query the state of the device-mapper device after
the event is received.
There are two functions currently for device-mapper events. The first function
listed creates the event and the second function sends the event(s).
void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti,
const char *path, unsigned nr_valid_paths)
void dm_send_uevents(struct list_head *events, struct kobject *kobj)
The variables added to the uevent environment are:
Variable Name: DM_TARGET
Uevent Action(s): KOBJ_CHANGE
Type: string
Description:
Value: Name of device-mapper target that generated the event.
Variable Name: DM_ACTION
Uevent Action(s): KOBJ_CHANGE
Type: string
Description:
Value: Device-mapper specific action that caused the uevent action.
PATH_FAILED - A path has failed.
PATH_REINSTATED - A path has been reinstated.
Variable Name: DM_SEQNUM
Uevent Action(s): KOBJ_CHANGE
Type: unsigned integer
Description: A sequence number for this specific device-mapper device.
Value: Valid unsigned integer range.
Variable Name: DM_PATH
Uevent Action(s): KOBJ_CHANGE
Type: string
Description: Major and minor number of the path device pertaining to this
event.
Value: Path name in the form of "Major:Minor"
Variable Name: DM_NR_VALID_PATHS
Uevent Action(s): KOBJ_CHANGE
Type: unsigned integer
Description:
Value: Valid unsigned integer range.
Variable Name: DM_NAME
Uevent Action(s): KOBJ_CHANGE
Type: string
Description: Name of the device-mapper device.
Value: Name
Variable Name: DM_UUID
Uevent Action(s): KOBJ_CHANGE
Type: string
Description: UUID of the device-mapper device.
Value: UUID. (Empty string if there isn't one.)
An example of the uevents generated as captured by udevmonitor is shown
below.
1.) Path failure.
UEVENT[1192521009.711215] change@/block/dm-3
ACTION=change
DEVPATH=/block/dm-3
SUBSYSTEM=block
DM_TARGET=multipath
DM_ACTION=PATH_FAILED
DM_SEQNUM=1
DM_PATH=8:32
DM_NR_VALID_PATHS=0
DM_NAME=mpath2
DM_UUID=mpath-35333333000002328
MINOR=3
MAJOR=253
SEQNUM=1130
2.) Path reinstate.
UEVENT[1192521132.989927] change@/block/dm-3
ACTION=change
DEVPATH=/block/dm-3
SUBSYSTEM=block
DM_TARGET=multipath
DM_ACTION=PATH_REINSTATED
DM_SEQNUM=2
DM_PATH=8:32
DM_NR_VALID_PATHS=1
DM_NAME=mpath2
DM_UUID=mpath-35333333000002328
MINOR=3
MAJOR=253
SEQNUM=1131
@@ -1,3 +1,4 @@
========
dm-zoned
========
@@ -133,12 +134,13 @@ A zoned block device must first be formatted using the dmzadm tool. This
will analyze the device zone configuration, determine where to place the
metadata sets on the device and initialize the metadata sets.
Ex:
Ex::
dmzadm --format /dev/sdxx
dmzadm --format /dev/sdxx
For a formatted device, the target can be created normally with the
dmsetup utility. The only parameter that dm-zoned requires is the
underlying zoned block device name. Ex:
underlying zoned block device name. Ex::
echo "0 `blockdev --getsize ${dev}` zoned ${dev}" | dmsetup create dmz-`basename ${dev}`
echo "0 `blockdev --getsize ${dev}` zoned ${dev}" | \
dmsetup create dmz-`basename ${dev}`
@@ -1,3 +1,7 @@
======
dm-era
======
Introduction
============
@@ -14,12 +18,14 @@ coherency after rolling back a vendor snapshot.
Constructor
===========
era <metadata dev> <origin dev> <block size>
era <metadata dev> <origin dev> <block size>
metadata dev : fast device holding the persistent metadata
origin dev : device holding data blocks that may change
block size : block size of origin data device, granularity that is
tracked by the target
================ ======================================================
metadata dev fast device holding the persistent metadata
origin dev device holding data blocks that may change
block size block size of origin data device, granularity that is
tracked by the target
================ ======================================================
Messages
========
@@ -49,14 +55,16 @@ Status
<metadata block size> <#used metadata blocks>/<#total metadata blocks>
<current era> <held metadata root | '-'>
metadata block size : Fixed block size for each metadata block in
sectors
#used metadata blocks : Number of metadata blocks used
#total metadata blocks : Total number of metadata blocks
current era : The current era
held metadata root : The location, in blocks, of the metadata root
that has been 'held' for userspace read
access. '-' indicates there is no held root
========================= ==============================================
metadata block size Fixed block size for each metadata block in
sectors
#used metadata blocks Number of metadata blocks used
#total metadata blocks Total number of metadata blocks
current era The current era
held metadata root The location, in blocks, of the metadata root
that has been 'held' for userspace read
access. '-' indicates there is no held root
========================= ==============================================
Detailed use case
=================
@@ -88,7 +96,7 @@ Memory usage
The target uses a bitset to record writes in the current era. It also
has a spare bitset ready for switching over to a new era. Other than
that it uses a few 4k blocks for updating metadata.
that it uses a few 4k blocks for updating metadata::
(4 * nr_blocks) bytes + buffers
+44
View File
@@ -0,0 +1,44 @@
:orphan:
=============
Device Mapper
=============
.. toctree::
:maxdepth: 1
cache-policies
cache
delay
dm-crypt
dm-flakey
dm-init
dm-integrity
dm-io
dm-log
dm-queue-length
dm-raid
dm-service-time
dm-uevent
dm-zoned
era
kcopyd
linear
log-writes
persistent-data
snapshot
statistics
striped
switch
thin-provisioning
unstriped
verity
writecache
zero
.. only:: subproject and html
Indices
=======
* :ref:`genindex`
@@ -1,3 +1,4 @@
======
kcopyd
======
@@ -7,7 +8,7 @@ notification. It is used by dm-snapshot and dm-mirror.
Users of kcopyd must first create a client and indicate how many memory pages
to set aside for their copy jobs. This is done with a call to
kcopyd_client_create().
kcopyd_client_create()::
int kcopyd_client_create(unsigned int num_pages,
struct kcopyd_client **result);
@@ -16,7 +17,7 @@ To start a copy job, the user must set up io_region structures to describe
the source and destinations of the copy. Each io_region indicates a
block-device along with the starting sector and size of the region. The source
of the copy is given as one io_region structure, and the destinations of the
copy are given as an array of io_region structures.
copy are given as an array of io_region structures::
struct io_region {
struct block_device *bdev;
@@ -26,7 +27,7 @@ copy are given as an array of io_region structures.
To start the copy, the user calls kcopyd_copy(), passing in the client
pointer, pointers to the source and destination io_regions, the name of a
completion callback routine, and a pointer to some context data for the copy.
completion callback routine, and a pointer to some context data for the copy::
int kcopyd_copy(struct kcopyd_client *kc, struct io_region *from,
unsigned int num_dests, struct io_region *dests,
@@ -41,7 +42,6 @@ write error occurred during the copy.
When a user is done with all their copy jobs, they should call
kcopyd_client_destroy() to delete the kcopyd client, which will release the
associated memory pages.
associated memory pages::
void kcopyd_client_destroy(struct kcopyd_client *kc);
+63
View File
@@ -0,0 +1,63 @@
=========
dm-linear
=========
Device-Mapper's "linear" target maps a linear range of the Device-Mapper
device onto a linear range of another device. This is the basic building
block of logical volume managers.
Parameters: <dev path> <offset>
<dev path>:
Full pathname to the underlying block-device, or a
"major:minor" device-number.
<offset>:
Starting sector within the device.
Example scripts
===============
::
#!/bin/sh
# Create an identity mapping for a device
echo "0 `blockdev --getsz $1` linear $1 0" | dmsetup create identity
::
#!/bin/sh
# Join 2 devices together
size1=`blockdev --getsz $1`
size2=`blockdev --getsz $2`
echo "0 $size1 linear $1 0
$size1 $size2 linear $2 0" | dmsetup create joined
::
#!/usr/bin/perl -w
# Split a device into 4M chunks and then join them together in reverse order.
my $name = "reverse";
my $extent_size = 4 * 1024 * 2;
my $dev = $ARGV[0];
my $table = "";
my $count = 0;
if (!defined($dev)) {
die("Please specify a device.\n");
}
my $dev_size = `blockdev --getsz $dev`;
my $extents = int($dev_size / $extent_size) -
(($dev_size % $extent_size) ? 1 : 0);
while ($extents > 0) {
my $this_start = $count * $extent_size;
$extents--;
$count++;
my $this_offset = $extents * $extent_size;
$table .= "$this_start $extent_size linear $dev $this_offset\n";
}
`echo \"$table\" | dmsetup create $name`;
-61
View File
@@ -1,61 +0,0 @@
dm-linear
=========
Device-Mapper's "linear" target maps a linear range of the Device-Mapper
device onto a linear range of another device. This is the basic building
block of logical volume managers.
Parameters: <dev path> <offset>
<dev path>: Full pathname to the underlying block-device, or a
"major:minor" device-number.
<offset>: Starting sector within the device.
Example scripts
===============
[[
#!/bin/sh
# Create an identity mapping for a device
echo "0 `blockdev --getsz $1` linear $1 0" | dmsetup create identity
]]
[[
#!/bin/sh
# Join 2 devices together
size1=`blockdev --getsz $1`
size2=`blockdev --getsz $2`
echo "0 $size1 linear $1 0
$size1 $size2 linear $2 0" | dmsetup create joined
]]
[[
#!/usr/bin/perl -w
# Split a device into 4M chunks and then join them together in reverse order.
my $name = "reverse";
my $extent_size = 4 * 1024 * 2;
my $dev = $ARGV[0];
my $table = "";
my $count = 0;
if (!defined($dev)) {
die("Please specify a device.\n");
}
my $dev_size = `blockdev --getsz $dev`;
my $extents = int($dev_size / $extent_size) -
(($dev_size % $extent_size) ? 1 : 0);
while ($extents > 0) {
my $this_start = $count * $extent_size;
$extents--;
$count++;
my $this_offset = $extents * $extent_size;
$table .= "$this_start $extent_size linear $dev $this_offset\n";
}
`echo \"$table\" | dmsetup create $name`;
]]

Some files were not shown because too many files have changed in this diff Show More