The 8.3.12 commit drbd: Bugfix for the connection behavior fixes a
"wasted established connection", if a former connection attempt failed
during its early stages.
However it opened a window for a regression, if a connection attempt
fails during its last stages. The result was a terminated receiver
thread, that left behind the supposedly transient "C_UNCONNECTED" state.
Any later requests to change the connection state fail, as they wait for
the connection state to "stabilize".
Fix: short circuit and keep retrying to restablish a new connection,
if we don't reach C_WF_REPORT_PARAMS.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If the disk has failed already, there is no point trying to change the
bitmap. drbd_set_out_of_sync() already had this safeguard,
time to add it to drbd_set_in_sync() as well.
This also prevents some warning messages, like
FIXME asender in bm_change_bits_to, bitmap locked for 'detach' by worker
if our disk fails during resync, while there are some resync acks queued up.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
recent commit
drbd: always write bitmap on detach
introduced a bitmap writeout during detach,
which obviously needs some meta data device to write to.
Unfortunately, that same error path may be taken if we fail to attach,
e.g. due to UUID mismatch, after we changed state to D_ATTACHING,
but before the lower level device pointer is even assigned.
We need to test for presence of mdev->ldev.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If we detach due to local read-error (which sets a bit in the bitmap),
stay Primary, and then re-attach (which re-reads the bitmap from disk),
we potentially lost the "out-of-sync" (or, "bad block") information in
the bitmap.
Always (try to) write out the changed bitmap pages before going diskless.
That way, we don't lose the bit for the bad block,
the next resync will fetch it from the peer, and rewrite
it locally, which may result in block reallocation in some
lower layer (or the hardware), and thereby "heal" the bad blocks.
If the bitmap writeout errors out as well, we will (again: try to)
mark the "we need a full sync" bit in our super block,
if it was a READ error; writes are covered by the activity log already.
If that superblock does not make it to disk either, we are sorry.
Maybe we just lost an entire disk or controller (or iSCSI connection),
and there actually are no bad blocks at all, so we don't need to
re-fetch from the peer, there is no "auto-healing" necessary.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The intention of force-detach is to be able to deal with a completely
unresponsive lower level IO stack, which does not even deliver error
completions anymore, but no completion at all.
In all other cases, we must still wait for the meta data IO completion.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This has not yet been observed, but conceivably, when using GFP_KERNEL
allocations from drbd_md_sync(), drbd_flush_after_epoch() or
receive_SyncParam(), we could trigger additional IO to our own device,
or an other device in a criss-cross setup, and end up in a local
deadlock, or potentially a distributed deadlock in a criss-cross setup
involving the peer blocked in a similar way waiting for us to make
progress.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The former comment arguing that GFP_KERNEL was good enough was wrong: it
did not take resize into account at all, and assumed the only path
leading here was the normal attach on a still secondary device, so no
deadlock would be possible.
Both resize on a Primary, or attach on a diskless Primary,
could potentially deadlock.
drbd_bm_resize() is called while IO to the respective device is
suspended, so we must use GFP_NOIO to avoid potential deadlock.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
"aborting" requests, or force-detaching the disk, is intended for
completely blocked/hung local backing devices which do no longer
complete requests at all, not even do error completions. In this
situation, usually a hard-reset and failover is the only way out.
By "aborting", basically faking a local error-completion,
we allow for a more graceful swichover by cleanly migrating services.
Still the affected node has to be rebooted "soon".
By completing these requests, we allow the upper layers to re-use
the associated data pages.
If later the local backing device "recovers", and now DMAs some data
from disk into the original request pages, in the best case it will
just put random data into unused pages; but typically it will corrupt
meanwhile completely unrelated data, causing all sorts of damage.
Which means delayed successful completion,
especially for READ requests,
is a reason to panic().
We assume that a delayed *error* completion is OK,
though we still will complain noisily about it.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
is_valid_transition() might return SS_NOTHING_TO_DO.
The condition function _req_st_cond() returned SS_NOTHING_TO_DO, which
caused the wait_event to abort too early. Therefore drbd_req_state()
did not consume the next CL_ST_CHG_SUCCESS or SS_CW_FAILED_BY_PEER
causing serve disruption of the state machine logic...
Detaching from a single volue was one way to trigger this bug.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We use the RQ_POSTPONED flag to mark a request for several reasons.
It may be a conflicting request in a dual-primary setup,
where conflict detection and resolution on the peer decided that
this request needs to be re-submitted, it needs to re-enter
drbd_make_request() to fix the data divergence caused by these
conflicting, partially overlapping, quasi-simultaneous requests.
In this case we need to mark the corresponding area as out-of-sync,
before we call drbd_al_complete_io().
We also use the RQ_POSTPONED flag to just "push back" a request,
before even processing it, if IO is suspended for some reason.
In this case, as this request was neither submitted nor sent yet,
we must not touch the bitmap.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
A postponed request might has RQ_IN_ACT_LOG already set, but
is POSTPONED before it gets something in the RQ_LOCAL_MASK
set. Up to now this caused a left-over active extent.
Fix that by only testing for the RQ_IN_ACT_LOG bit in drbd_req_destroy()
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Without this, the meta-data gets updates after 5 seconds by the
md_sync_timer. Better to do it immeditaly after a state change.
If the asender detects a network failure, it may take a bit until
the worker processes the according after-conn-state-change work item.
The worker might be blocked in sending something, i.e. it
takes until it gets into its timeout. That is 6 seconds by
default which is longer than the 5 seconds of the md_sync_timer.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
* Postponed requests should not set or clear out-of-sync marks
* When a request gets postponed we need to drop its reference
mdev->local_cnt (put_ldev()).
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
With merging the commit
'drbd: Delay/reject other state changes while establishing a connection'
the condition check for clearing the flag was wrong.
Move the bit clearing to the __drbd_set_state() function
in order to have it already cleared for the other parts of
the function. I.e. clearing the susp_fen in the after_state_ch() function.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>