'GPL-2.0-only' is used instead of 'GPL-2.0' because SPDX has
deprecated its use.
Suggested-by: John Wiele <jwiele@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Add a flags argument to dm_bufio_client_create and update all the
callers. This is in preparation to add the DM_BUFIO_NO_SLEEP flag.
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Corrupted metadata could warrant returning error from sm_ll_lookup_bitmap().
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Abuse of BUG_ON() is never appropriate, best to propagate errors to
fail gracefully (rather than take the entire system down).
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Should have been removed as part of commit f73e2e70ec ("dm btree
spine: remove paranoid node_check call in node_prep_for_write()")
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
remove_raw() in dm_btree_remove() may fail due to IO read error
(e.g. read the content of origin block fails during shadowing),
and the value of shadow_spine::root is uninitialized, but
the uninitialized value is still assign to new_root in the
end of dm_btree_remove().
For dm-thin, the value of pmd->details_root or pmd->root will become
an uninitialized value, so if trying to read details_info tree again
out-of-bound memory may occur as showed below:
general protection fault, probably for non-canonical address 0x3fdcb14c8d7520
CPU: 4 PID: 515 Comm: dmsetup Not tainted 5.13.0-rc6
Hardware name: QEMU Standard PC
RIP: 0010:metadata_ll_load_ie+0x14/0x30
Call Trace:
sm_metadata_count_is_more_than_one+0xb9/0xe0
dm_tm_shadow_block+0x52/0x1c0
shadow_step+0x59/0xf0
remove_raw+0xb2/0x170
dm_btree_remove+0xf4/0x1c0
dm_pool_delete_thin_device+0xc3/0x140
pool_message+0x218/0x2b0
target_message+0x251/0x290
ctl_ioctl+0x1c4/0x4d0
dm_ctl_ioctl+0xe/0x20
__x64_sys_ioctl+0x7b/0xb0
do_syscall_64+0x40/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixing it by only assign new_root when removal succeeds
Signed-off-by: Hou Tao <houtao1@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The disk space map stores it's index entries in a btree, these are
accessed very frequently, so having a few cached makes a big difference
to performance.
With this change provisioning a new block takes roughly 20% less cpu.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
When we break sharing on btree nodes we typically need to increment
the reference counts to every value held in the node. This can
cause a lot of repeated calls to the space maps. Fix this by changing
the interface to the space map inc/dec methods to take ranges of
adjacent blocks to be operated on.
For installations that are using a lot of snapshots this will reduce
cpu overhead of fundamental operations such as provisioning a new block,
or deleting a snapshot, by as much as 10 times.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Current commit code resets the place where the search for free blocks
will begin back to the start of the metadata device. There are a couple
of repercussions to this:
- The first allocation after the commit is likely to take longer than
normal as it searches for a free block in an area that is likely to
have very few free blocks (if any).
- Any free blocks it finds will have been recently freed. Reusing them
means we have fewer old copies of the metadata to aid recovery from
hardware error.
Fix these issues by leaving the cursor alone, only resetting when the
search hits the end of the metadata device.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This commit improves the residency of btrees built in the metadata for
dm-thin and dm-cache.
When inserting a new entry into a full btree node the current code
splits the node into two. This can result in very many half full nodes,
particularly if the insertions are occurring in an ascending order (as
happens in dm-thin with large writes).
With this commit, when we insert into a full node we first try and move
some entries to a neighbouring node that has space, failing that it
tries to split two neighbouring nodes into three.
Results are given below. 'Residency' is how full nodes are on average
as a percentage. Average instruction counts for the operations
are given to show the extra processing has little overhead.
+--------------------------+--------------------------+
| Before | After |
+------------+-----------+-----------+--------------+-----------+--------------+
| Test | Phase | Residency | Instructions | Residency | Instructions |
+------------+-----------+-----------+--------------+-----------+--------------+
| Ascending | insert | 50 | 1876 | 96 | 1930 |
| | overwrite | 50 | 1789 | 96 | 1746 |
| | lookup | 50 | 778 | 96 | 778 |
| Descending | insert | 50 | 3024 | 96 | 3181 |
| | overwrite | 50 | 1789 | 96 | 1746 |
| | lookup | 50 | 778 | 96 | 778 |
| Random | insert | 68 | 3800 | 84 | 3736 |
| | overwrite | 68 | 4254 | 84 | 3911 |
| | lookup | 68 | 779 | 84 | 779 |
| Runs | insert | 63 | 2546 | 82 | 2815 |
| | overwrite | 63 | 2013 | 82 | 1986 |
| | lookup | 63 | 778 | 82 | 779 |
+------------+-----------+-----------+--------------+-----------+--------------+
Ascending - keys are inserted in ascending order.
Descending - keys are inserted in descending order.
Random - keys are inserted in random order.
Runs - keys are split into ascending runs of ~20 length. Then
the runs are shuffled.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com> # contains_key() fix
Signed-off-by: Mike Snitzer <snitzer@redhat.com>