linux-apfs

mirror of https://github.com/linux-apfs/linux-apfs.git synced 2026-05-01 15:00:59 -07:00

Author	SHA1	Message	Date
Lars Ellenberg	1b228c98ce	drbd: fix regression: protocol A sometimes synchronous, C sometimes double-latency Regression introduced with 8.4.5 drbd: application writes may set-in-sync in protocol != C Overwriting the same block (LBA) while a former version is still "in-flight" to the peer (to be exact: we did not receive the P_BARRIER_ACK for its epoch yet) would wait for the full epoch of that former version to be acknowledged by the peer. In synchronous and quasi-synchronous protocols C and B, this may double the latency on overwrites. With protocol A, which is supposed to be asynchronous and only wait for local completion, it is even worse: it would make overwrites quasi-synchronous, they would be hit by the full RTT, which protocol A was specifically meant to avoid, and possibly the additional time it takes to drain the buffers first. Particularly bad for databases, or anything else that does frequent updates to the same blocks (various file system meta data). No impact if >= rtt passes between updates to the same block. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-13 21:43:04 -06:00
Mike Christie	28a8f0d317	block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSH To avoid confusion between REQ_OP_FLUSH, which is handled by request_fn drivers, and upper layers requesting the block layer perform a flush sequence along with possibly a WRITE, this patch renames REQ_FLUSH to REQ_PREFLUSH. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-07 13:41:38 -06:00
Lars Ellenberg	f85d9f2d02	drbd: fix "endless" transfer log walk in protocol A Don't remember a DRBD request as ack_pending, if it is not. In protocol A, we usually clear RQ_NET_PENDING at the same time we set RQ_NET_SENT, so when deciding to remember it as ack_pending, mod_rq_state needs to look at the current request state, not at the previous state before the current modification was applied. This should prevent advance_conn_req_ack_pending() from walking the full transfer log just to find NULL in protocol A, which would cause serious performance degradation with many "in-flight" requests, e.g. when working via DRBD-proxy, or with a huge bandwidth-delay product. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:03 -07:00
Philipp Reisner	668700b40a	drbd: Create a dedicated workqueue for sending acks on the control connection The intention is to reduce CPU utilization. Recent measurements unveiled that the current performance bottleneck is CPU utilization on the receiving node. The asender thread became CPU limited. One of the main points is to eliminate the idr_for_each_entry() loop from the sending acks code path. One exception in that is sending back ping_acks. These stay in the ack-receiver thread. Otherwise the logic becomes too complicated for no added value. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:01 -07:00
Lars Ellenberg	84d34f2f07	drbd: improve network timeout detection Don't blame the peer for being unresponsive, if we did not even ask the question yet. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:01 -07:00
Lars Ellenberg	05cbbb395f	drbd: Fix spurious disk-timeout (You should not use disk-timeout anyways, see the man page for why...) We add incoming requests to the tail of some ring list. On local completion, requests are removed from that list. The timer looks only at the head of that ring list, so is supposed to only see the oldest request. All protected by a spinlock. The request object is created with timestamps zeroed out. The timestamp was only filled in just before the actual submit. But to actually submit the request, we need to give up the spinlock. If you are unlucky, there is no older still pending request, the timer looks at a new request with timestamp still zero (before it even was submitted), and 0 + timeout is most likely older than "now". Better assign the timestamp right when we put the request object on said ring list. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:01 -07:00
Andreas Gruenbacher	2e9ffde6f0	drbd: De-inline drbd_should_do_remote() and drbd_should_send_out_of_sync() There is no need to have these two as inline functions. In addition, drbd_should_send_out_of_sync() is only used in a single place, anyway. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:00 -07:00
Jens Axboe	dece16353e	block: change ->make_request_fn() and users to return a queue cookie No functional changes in this patch, but it prepares us for returning a more useful cookie related to the IO that was queued up. Signed-off-by: Jens Axboe <axboe@fb.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com>	2015-11-07 10:40:46 -07:00
Kent Overstreet	8ae126660f	block: kill merge_bvec_fn() completely As generic_make_request() is now able to handle arbitrarily sized bios, it's no longer necessary for each individual block driver to define its own ->merge_bvec_fn() callback. Remove every invocation completely. Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@kernel.org> Cc: ceph-devel@vger.kernel.org Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits) Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: also remove ->merge_bvec_fn() in dm-thin as well as dm-era-target, and resolve merge conflicts] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-13 12:31:57 -06:00
Kent Overstreet	54efd50bfd	block: make generic_make_request handle arbitrarily sized bios The way the block layer is currently written, it goes to great lengths to avoid having to split bios; upper layer code (such as bio_add_page()) checks what the underlying device can handle and tries to always create bios that don't need to be split. But this approach becomes unwieldy and eventually breaks down with stacked devices and devices with dynamic limits, and it adds a lot of complexity. If the block layer could split bios as needed, we could eliminate a lot of complexity elsewhere - particularly in stacked drivers. Code that creates bios can then create whatever size bios are convenient, and more importantly stacked drivers don't have to deal with both their own bio size limitations and the limitations of the (potentially multiple) devices underneath them. In the future this will let us delete merge_bvec_fn and a bunch of other code. We do this by adding calls to blk_queue_split() to the various make_request functions that need it - a few can already handle arbitrary size bios. Note that we add the call _after_ any call to blk_queue_bounce(); this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be concerned with bouncing affecting segment merging. Some make_request_fn() callbacks were simple enough to audit and verify they don't need blk_queue_split() calls. The skipped ones are: * nfhd_make_request (arch/m68k/emu/nfblock.c) * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) * brd_make_request (ramdisk - drivers/block/brd.c) * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) * loop_make_request * null_queue_bio * bcache's make_request fns Some others are almost certainly safe to remove now, but will be left for future patches. Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ming Lei <ming.lei@canonical.com> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Geoff Levand <geoff@infradead.org> Cc: Jim Paris <jim@jtan.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Andreas Dilger <andreas.dilger@intel.com> Acked-by: NeilBrown <neilb@suse.de> (for the 'md/md.c' bits) Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-13 12:31:33 -06:00
Christoph Hellwig	4246a0b63b	block: add a bi_error field to struct bio Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-07-29 08:55:15 -06:00
David Rientjes	23fe8f8b11	block, drbd: fix drbd_req_new() initialization mempool_alloc() does not support __GFP_ZERO since elements may come from memory that has already been released by mempool_free(). Remove __GFP_ZERO from mempool_alloc() in drbd_req_new() and properly initialize it to 0. Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-03-24 20:00:16 -06:00
Gu Zheng	244808543e	drbd: use generic io stats accounting functions to simplify io stat accounting Use generic io stats accounting help functions (generic_{start,end}_io_acct) to simplify io stat accounting. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-24 08:05:14 -07:00
Philipp Reisner	9581f97a68	drbd: Fix state change in case of connection timeout A connection timeout affects all volumes of a resource! Under the following conditions: A resource with multiple volumes AND ko-count >=1 AND a write request triggers the timeout (ko-count * timeout) DRBD's internal state gets confused. That in turn may lead to very miss leading follow up failures. E.g. "BUG: scheduling while atomic" CC: stable@kernel.org # v3.17 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-10 09:27:41 -07:00
Lars Ellenberg	3b9d35d744	drbd: merge_bvec_fn: properly remap bvm->bi_bdev This was not noticed for many years. Affects operation if md raid is used a backing device for DRBD. CC: stable@kernel.org # v3.2+ Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-10 09:27:39 -07:00
Andreas Gruenbacher	8d4ba3f0fa	drbd: Avoid inconsistent locking warning request_timer_fn() takes resource->req_lock via the device and releases it via the connection. Avoid this as it is confusing static code checkers. Reported-by: "Dan Carpenter" <dan.carpenter@oracle.com> Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-09-11 08:41:29 -06:00
Lars Ellenberg	f5b90b6bf0	drbd: resync should only lock out specific ranges During resync, if we need to block some specific incoming write because of active resync requests to that same range, we potentially caused all new application writes (to "cold" activity log extents) to block until this one request has been processed. Improve the do_submit() logic to * grab all incoming requests to some "incoming" list * process this list - move aside requests that are blocked by resync - prepare activity log transactions, - commit transactions and submit corresponding requests - if there are remaining requests that only wait for activity log extents to become free, stop the fast path (mark activity log as "starving") - iterate until no more requests are waiting for the activity log, but all potentially remaining requests are only blocked by resync * only then grab new incoming requests That way, very busy IO on currently "hot" activity log extents cannot starve scattered IO to "cold" extents. And blocked-by-resync requests are processed once resync traffic on the affected region has ceased, without blocking anything else. The only blocking mode left is when we cannot start requests to "cold" extents because all currently "hot" extents are actually used. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2014-07-10 18:35:21 +02:00
Lars Ellenberg	ad3fee7900	drbd: improve throttling decisions of background resynchronisation Background resynchronisation does some "side-stepping", or throttles itself, if it detects application IO activity, and the current resync rate estimate is above the configured "cmin-rate". What was not detected: if there is no application IO, because it blocks on activity log transactions. Introduce a new atomic_t ap_actlog_cnt, tracking such blocked requests, and count non-zero as application IO activity. This counter is exposed at proc_details level 2 and above. Also make sure to release the currently locked resync extent if we side-step due to such voluntary throttling. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2014-07-10 18:35:13 +02:00
Lars Ellenberg	7753a4c17f	drbd: add caching oldest request pointers for replication stages A request that is to be shipped to the peer goes through a few stages: - queued - sent, waiting for ack - ack received, waiting for "barrier ack", which is re-order epoch being closed on the peer by acknowledging a "cache flush" equivalent on the lower level device. In the later two stages, depending on protocol, we may have already completed this request to the upper layers, so it won't be found anymore on device->pending_master_completion[] lists. Track the oldest request yet to be sent (req_next), the oldest not yet acknowledged (req_ack_pending) and the oldest "still waiting for something from the peer" (req_not_net_done), doing short list walks on the transfer log to find the next pending one whenever such a request makes progress. Now we have a fast way to look up the oldest requests, don't do a transfer log walk every time. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2014-07-10 18:35:12 +02:00
Lars Ellenberg	844a6ae735	drbd: add lists to find oldest pending requests Adding requests to per-device fifo lists as soon as possible after allocating them leaves a simple list_first_entry_or_null() to find the oldest request, regardless what it is still waiting for. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2014-07-10 18:35:12 +02:00
Lars Ellenberg	e5f891b223	drbd: gather detailed timing statistics for drbd_requests Record (in jiffies) how much time a request spends in which stages. Followup commits will use and present this additional timing information so we can better locate and tackle the root causes of latency spikes, or present the backlog for asynchronous replication. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2014-07-10 18:35:11 +02:00
Lars Ellenberg	0c066bc39e	drbd: short-circuit in maybe_pull_ahead If we already "pulled ahead", we can short-circuit, and avoid logging the same messages over and over again. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2014-07-10 18:35:02 +02:00
Lars Ellenberg	08d0dabf48	drbd: application writes may set-in-sync in protocol != C If "dirty" blocks are written to during resync, that brings them in-sync. By explicitly requesting write-acks during resync even in protocol != C, we now can actually respect this. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2014-07-10 18:35:02 +02:00
Lars Ellenberg	4dd726f029	drbd: get rid of drbd_queue_work_front The last user was al_write_transaction, if called with "delegate", and the last user to call it with "delegate = true" was the receiver thread, which has no need to delegate, but can call it himself. Finally drop the delegate parameter, drop the extra w_al_write_transaction callback, and drop drbd_queue_work_front. Do not (yet) change dequeue_work_item to dequeue_work_batch, though. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2014-07-10 18:34:56 +02:00
Lars Ellenberg	44a4d55184	drbd: refactor use of first_peer_device() Reduce the number of calls to first_peer_device(). Instead, call first_peer_device() just once to assign a local variable peer_device. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2014-07-10 15:22:22 +02:00

1 2 3 4 5 ...

209 Commits