lsblk should be able to pick up stacking device driver relations
involving DRBD conveniently.
Even though upstream kernel since 2011 says
"DON'T USE THIS UNLESS YOU'RE ALREADY USING IT."
a new user has been added since (bcache),
which sets the precedences for us to use it as well.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
We cannot possibly support SECDISCARD, even if all backend devices would
support it: if our peer is currently unreachable, some instance of the
data may obviously still be recoverable.
We did not set discard_granularity at all. We don't really care (yet),
we only pass them on, so for now, set our granularity to one sector.
blkdev_stack_limits() takes care of the rest.
If we decide we cannot support discards,
not only clear the (not user visible) QUEUE_FLAG_DISCARD,
but set both (user visible) discard_granularity and max_discard_sectors
to zero, to avoid confusion with e.g. lsblk -D.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
In a multiple error scenario, we may end up with a "frozen" Primary,
that has no access to any data (no local disk, no replication link).
If we then resume-io, we try to generate a new data generation id,
which will fail if there is no longer a local disk.
Double check for available local data,
which prevents the NULL pointer deref.
If we are diskless, turn the resume-io in this situation
into the first stage of a "force down", by bumping the "effective" data
gen id, which will prevent later attach or connect to the former data
set without first being demoted (deconfigured).
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The intention is to reduce CPU utilization. Recent measurements
unveiled that the current performance bottleneck is CPU utilization
on the receiving node. The asender thread became CPU limited.
One of the main points is to eliminate the idr_for_each_entry() loop
from the sending acks code path.
One exception in that is sending back ping_acks. These stay
in the ack-receiver thread. Otherwise the logic becomes too
complicated for no added value.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
A D_FAILED disk transitions as quickly as possible to
D_DISKLESS. But in the "unresponsive local disk" case,
there remains a time window where a administrative detach command could
find the disk already failed, but some internal meta data IO against the
unresponsive local disk still pending.
In that case, drbd_md_get_buffer() will return NULL.
Don't unconditionally call drbd_md_put_buffer(), or it will cause
refcount imbalance, and prevent any further re-attach on this volume
(until it is deleted and re-created).
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
When detaching, we make sure no application IO is in-flight
by internally suspending IO, then trigger the state change,
wait for the result, and finally internally resume IO again.
Once we triggered the stat change to "Failed",
we expect it to change from Failed to Diskless.
(To avoid races, we actually wait for it to leave "Failed").
On an unresponsive local IO backend, this may not happen, ever.
Don't have a "hung" detach block IO "forever", but resume IO
before waiting for the state change to Diskless.
We may well be able to continue IO to and from a healthy peer.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The lc_destroy() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The status command originates the drbd9 code base. While for now we
keep the status information in /proc/drbd available, this commit
allows the user base to gracefully migrate their monitoring
infrastructure to the new status reporting interface.
In drbd9 no status information is exposed through /proc/drbd.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The events2 command originates from drbd-9 development. It features
more information but requires a incompatible change in output
format.
Therefore the previous events command continues to exist, the new
improved events2 command becomes available now.
This prepares the user-base for a later switch to the complete
drbd9 code base.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Instead of using a rwlock for synchronizing state changes across
resources, take the request locks of all resources for global state
changes. Use resources_mutex to serialize global state changes.
This means that taking the request lock of a resource is now enough to
prevent changes of that resource. (Previously, a read lock on the
global state lock was needed as well.)
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
In drbd-8.4 there is always a single connection per resource,
and there is always exactly one peer_device for a device.
peer_device can not be NULL here.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Some drivers use it now, others just set the limits field manually.
But in preparation for splitting this into a hard and soft limit,
ensure that they all call the proper function for setting the hw
limit for discards.
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Symptoms:
If DRBD was "cleanly shut down" (all in sync, both Secondary before
disconnect, identical data generation uuids), and then one side was
promoted *during* the next connection handshake, the role change
could confuse the handshake.
The Primary would get stuck in WFBitmapS, the Secondary would log
unexpected cstate (Connected) in receive_bitmap
and get stuck in WFBitmapT.
Fix:
The test in is_valid_soft_transition wrong. It works because
the not allowed actions (promote/attach) do not touch the
cstate. The previous condition failed to demand a cstate change
in one clause.
In order to avoid deadlocks give up the state_mutex while waiting
for the transient state to go away.
Conflicts:
drbd/drbd_state.c
drbd/drbd_state.h
drbd/drbd_wrappers.h
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Pull block driver changes from Jens Axboe:
"Nothing out of the ordinary here, this pull request contains:
- A big round of fixes for bcache from Kent Overstreet, Slava Pestov,
and Surbhi Palande. No new features, just a lot of fixes.
- The usual round of drbd updates from Andreas Gruenbacher, Lars
Ellenberg, and Philipp Reisner.
- virtio_blk was converted to blk-mq back in 3.13, but now Ming Lei
has taken it one step further and added support for actually using
more than one queue.
- Addition of an explicit SG_FLAG_Q_AT_HEAD for block/bsg, to
compliment the the default behavior of adding to the tail of the
queue. From Douglas Gilbert"
* 'for-3.17/drivers' of git://git.kernel.dk/linux-block: (86 commits)
bcache: Drop unneeded blk_sync_queue() calls
bcache: add mutex lock for bch_is_open
bcache: Correct printing of btree_gc_max_duration_ms
bcache: try to set b->parent properly
bcache: fix memory corruption in init error path
bcache: fix crash with incomplete cache set
bcache: Fix more early shutdown bugs
bcache: fix use-after-free in btree_gc_coalesce()
bcache: Fix an infinite loop in journal replay
bcache: fix crash in bcache_btree_node_alloc_fail tracepoint
bcache: bcache_write tracepoint was crashing
bcache: fix typo in bch_bkey_equal_header
bcache: Allocate bounce buffers with GFP_NOWAIT
bcache: Make sure to pass GFP_WAIT to mempool_alloc()
bcache: fix uninterruptible sleep in writeback thread
bcache: wait for buckets when allocating new btree root
bcache: fix crash on shutdown in passthrough mode
bcache: fix lockdep warnings on shutdown
bcache allocator: send discards with correct size
bcache: Fix to remove the rcu_sched stalls.
...
For diagnostic purposes, track intent, start time
and latest submit time of meta data IO.
Move separate members from struct drbd_device
into the embeded struct drbd_md_io.
s/md_io_(page|in_use)/md_io.\1/
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The recent fix to put_ldev() (correct ordering of access to local_cnt
and state.disk; memory barrier in __drbd_set_state) guarantees
that the cleanup happens exactly once.
However it does not yet guarantee that the cleanup happens from worker
context, the last put_ldev() may still happen from atomic context,
which must not happen: blkdev_put() may sleep.
Fix this by scheduling the cleanup to the worker instead,
using a couple more bits in device->flags and a new helper,
drbd_device_post_work().
Generalized the "resync progress" work to cover these new work bits.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This probably does not have any real life impact,
but we should first persist any potentially new UUID
and other meta data flags, as well as our new role,
before we allow/disallow write access.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The conf_update mutex used to be held while clearing the
net_conf->discard_my_data flag inside drbd_set_role.
It was moved into drbd_adm_set_role with
drbd: allow parallel promote/demote actions
but then replaced at that location by the newly introduced adm_mutex with
drbd: Fix a potential deadlock in drbdsetup, introduce resource->adm_mutex
And I simply forgot to put it back in at the original location.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>