linux

mirror of https://github.com/Dasharo/linux.git synced 2026-03-06 15:25:10 -08:00

Author	SHA1	Message	Date
Alexander Aring	a470cb2a06	dlm: slow down filling up processing queue If there is a burst of message the receive worker will filling up the processing queue but where are too slow to process dlm messages. This patch will slow down the receiver worker to keep the buffer on the socket layer to tell the sender to backoff. This is done by a threshold to get the next buffers from the socket after all messages were processed done by a flush_workqueue(). This however only occurs when we have a message burst when we e.g. create 1 million locks. If we put more and more new messages to process in the processqueue we will soon run out of memory. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-10-12 15:21:00 -05:00
Alexander Aring	6212e4528b	dlm: fix no ack after final message In case of an final DLM message we can't should not send an ack out after the final message. This patch moves the ack message before the messages will be transmitted. If it's the final message and the receiving node turns into DLM_CLOSED state another ack messages will being received and turning the receiving node into DLM_ESTABLISHED again. Fixes: `1696c75f18` ("fs: dlm: add send ack threshold and append acks to msgs") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-10-12 15:20:55 -05:00
Alexander Aring	e759eb3e27	dlm: be sure we reset all nodes at forced shutdown In case we running in a force shutdown in either midcomms or lowcomms implementation we will make sure we reset all per midcomms node information. Fixes: `63e711b081` ("fs: dlm: create midcomms nodes when configure") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-10-12 15:20:48 -05:00
Alexander Aring	2776635edc	dlm: fix remove member after close call The idea of commit `63e711b081` ("fs: dlm: create midcomms nodes when configure") is to set the midcomms node lifetime when a node joins or leaves the cluster. Currently we can hit the following warning: [10844.611495] ------------[ cut here ]------------ [10844.615913] WARNING: CPU: 4 PID: 84304 at fs/dlm/midcomms.c:1263 dlm_midcomms_remove_member+0x13f/0x180 [dlm] or running in a state where we hit a midcomms node usage count in a negative value: [ 260.830782] node 2 users dec count -1 The first warning happens when the a specific node does not exists and it was probably removed but dlm_midcomms_close() which is called when a node leaves the cluster. The second kernel log message is probably in a case when dlm_midcomms_addr() is called when a joined the cluster but due fencing a node leaved the cluster without getting removed from the lockspace. If the node joins the cluster and it was removed from the cluster due fencing the first call is to remove the node from lockspaces triggered by the user space. In both cases if the node wasn't found or the user count is zero, we should ignore any additional midcomms handling of dlm_midcomms_remove_member(). Fixes: `63e711b081` ("fs: dlm: create midcomms nodes when configure") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-10-12 15:20:37 -05:00
Alexander Aring	fe9b619e6e	dlm: fix creating multiple node structures This patch will lookup existing nodes instead of always creating them when dlm_midcomms_addr() is called. The idea is here to create midcomms nodes when user space getting informed that nodes joins the cluster. This is the case when dlm_midcomms_addr() is called, however it can be called multiple times by user space to add several address configurations to one node e.g. when using SCTP. Those multiple times need to be filtered out and we doing that by looking up if the node exists before. Due configfs entry it is safe that this function gets only called once at a time. Fixes: `63e711b081` ("fs: dlm: create midcomms nodes when configure") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-10-12 15:20:28 -05:00
Christophe JAILLET	bc15bec1f8	fs: dlm: Remove some useless memset() There is no need to clear the buffer used to build the file name. snprintf() already guarantees that it is NULL terminated and such a (useless) precaution was not done for the first string (i.e ls_debug_rsb_dentry) So, save a few LoC. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-10-12 15:20:17 -05:00
Christophe JAILLET	b859e01054	fs: dlm: Fix the size of a buffer in dlm_create_debug_file() 8 is not the maximum size of the suffix used when creating debugfs files. Let the compiler compute the correct size, and only give a hint about the longest possible string that is used. When building with W=1, this fixes the following warnings: fs/dlm/debug_fs.c: In function ‘dlm_create_debug_file’: fs/dlm/debug_fs.c:1020:58: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=] 1020 \| snprintf(name, DLM_LOCKSPACE_LEN + 8, "%s_waiters", ls->ls_name); \| ^ fs/dlm/debug_fs.c:1020:9: note: ‘snprintf’ output between 9 and 73 bytes into a destination of size 72 1020 \| snprintf(name, DLM_LOCKSPACE_LEN + 8, "%s_waiters", ls->ls_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fs/dlm/debug_fs.c:1031:50: error: ‘_queued_asts’ directive output may be truncated writing 12 bytes into a region of size between 8 and 72 [-Werror=format-truncation=] 1031 \| snprintf(name, DLM_LOCKSPACE_LEN + 8, "%s_queued_asts", ls->ls_name); \| ^~~~~~~~~~~~ fs/dlm/debug_fs.c:1031:9: note: ‘snprintf’ output between 13 and 77 bytes into a destination of size 72 1031 \| snprintf(name, DLM_LOCKSPACE_LEN + 8, "%s_queued_asts", ls->ls_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: `541adb0d4d` ("fs: dlm: debugfs for queued callbacks") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-10-12 15:19:37 -05:00
Christophe JAILLET	19b3102c0b	fs: dlm: Simplify buffer size computation in dlm_create_debug_file() Use sizeof(name) instead of the equivalent, but hard coded, DLM_LOCKSPACE_LEN + 8. This is less verbose and more future proof. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-10-12 15:19:29 -05:00
Alexander Aring	7c53e847ff	dlm: fix plock lookup when using multiple lockspaces All posix lock ops, for all lockspaces (gfs2 file systems) are sent to userspace (dlm_controld) through a single misc device. The dlm_controld daemon reads the ops from the misc device and sends them to other cluster nodes using separate, per-lockspace cluster api communication channels. The ops for a single lockspace are ordered at this level, so that the results are received in the same sequence that the requests were sent. When the results are sent back to the kernel via the misc device, they are again funneled through the single misc device for all lockspaces. When the dlm code in the kernel processes the results from the misc device, these results will be returned in the same sequence that the requests were sent, on a per-lockspace basis. A recent change in this request/reply matching code missed the "per-lockspace" check (fsid comparison) when matching request and reply, so replies could be incorrectly matched to requests from other lockspaces. Cc: stable@vger.kernel.org Reported-by: Barry Marson <bmarson@redhat.com> Fixes: `57e2c2f2d9` ("fs: dlm: fix mismatch of plock results from userspace") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-08-25 10:31:39 -05:00
Alexander Aring	a3d85fcf26	fs: dlm: don't use RCOM_NAMES for version detection Currently RCOM_STATUS and RCOM_NAMES inclusive their replies are being used to determine the DLM version. The RCOM_NAMES messages are triggered in DLM recovery when calling dlm_recover_directory() only. At this time the DLM version need to be determined. I ran some tests and did not expirenced some issues. When the DLM version detection was developed probably I run once in a case of RCOM_NAMES and the version was not detected yet. However it seems to be not necessary. For backwards compatibility we still need to accept RCOM_NAMES messages which are not protected regarding the DLM message reliability layer aka stateless message. This patch changes that RCOM_NAMES we are sending out after this patch are not stateless anymore. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-08-10 10:33:03 -05:00
Alexander Aring	63e711b081	fs: dlm: create midcomms nodes when configure This patch puts the life of a midcomms node the same as a lowcomms connection. The lowcomms connection lifetime was changed by commit `6f0b0b5d7a` ("fs: dlm: remove dlm_node_addrs lookup list"). In the future the midcomms node instances can be merged with lowcomms connection structure as the lifetime is the same and states can be controlled over values or flags. Before midcomms nodes were generated during version detection. This is not necessary anymore when the nodes are created when the cluster manager configures DLM via configfs. When a midcomms node is created over configfs it well set DLM_VERSION_NOT_SET as version. This indicates that the version of the midcomms node is still unknown and need to be probed via certain rcom messages. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-08-10 10:33:03 -05:00
Alexander Aring	1151935182	fs: dlm: constify receive buffer The dlm receive buffer should be never manipulated as DLM is the last instance of parsing layer. This patch constify the whole receive buffer so we are sure it never gets manipulated when it's being parsed. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-08-10 10:33:03 -05:00
Alexander Aring	b9d2f6ada0	fs: dlm: drop rxbuf manipulation in dlm_recover_master_copy Currently dlm_recover_master_copy() manipulates the receive buffer of an rcom lock message and modifies it on the fly so a later memcpy() to a new rcom message with the same message has those new values. This patch avoids manipulating the received rcom message by store the values for the new rcom message in paremter assigned with call by reference. Later when dlm_send_rcom_lock() constructs a new message and memcpy() the receive buffer those values will be set on the new constructed message. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-08-10 10:33:03 -05:00
Alexander Aring	561c67d8a1	fs: dlm: drop rxbuf manipulation in dlm_copy_master_names This patch removes the manipulation of the receive buffer in case of an error and be sure the buffer is null terminated before an error messagea is printed out. Instead of manipulate the receive buffer we tell inside the format string the maximum length the string buffer is being read. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-08-10 10:33:03 -05:00
Alexander Aring	c4f4e135c2	fs: dlm: get recovery sequence number as parameter This patch removes a read of the ls->ls_recover_seq uint64_t number in _create_rcom(). If the ls->ls_recover_seq is readed the ls_recover_lock need to held. However this number was always readed before when any rcom message is received and it's not necessary to read it again from a per lockspace variable to use it for the replying message. This patch will pass the sequence number as parameter so another read of ls->ls_recover_seq and holding the ls->ls_recover_lock is not required. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-08-10 10:33:03 -05:00
Alexander Aring	643f5cfa61	fs: dlm: cleanup lock order This patch cleanups the lock order to hold at first the close_lock and then held the nodes_srcu read lock. Probably it will never be a problem as nodes_srcu is only a read lock preventing the node pointer getting freed. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-08-10 10:33:03 -05:00
Alexander Aring	c84c47333a	fs: dlm: remove clear_members_cb This patch is just a small cleanup to directly call remove_remote_member() instead of going over clear_members_cb() which just calls remove_remote_member(). Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-08-10 10:33:03 -05:00
Alexander Aring	8c95006d55	fs: dlm: add plock dev tracepoints I currently debug nfs plock handling and introduce those two tracepoints for getting more information about what is happening there if the user space reads plock operations from kernel and writing the result back. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-08-10 10:33:03 -05:00
Alexander Aring	67b5da9a40	fs: dlm: check on plock ops when exit dlm To be sure we don't have any issues that there are leftover plock ops in either send_list or recv_list we simple check if either one of the list are empty when we exit the dlm subsystem. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-08-10 10:33:03 -05:00
Alexander Aring	541adb0d4d	fs: dlm: debugfs for queued callbacks It was useful to debug an issue with the callback queue to check if any callbacks in any lkb are for some reason not processed by the callback workqueue. The mentioned issue was fixed by commit `a034c1370d` ("fs: dlm: fix DLM_IFL_CB_PENDING gets overwritten"). If there are similar issue that looks like a ast callback was not processed, we can confirm now that it is not sitting to be processed by the callback workqueue anymore. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-08-10 10:33:03 -05:00
Alexander Aring	4b056db81c	fs: dlm: remove unused processed_nodes The variable processed_nodes is not being used by commit `1696c75f18` ("fs: dlm: add send ack threshold and append acks to msgs"). This patch removes the leftover of this commit. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-08-10 10:33:03 -05:00
Alexander Aring	e717f2e8e4	fs: dlm: add missing spin_unlock This patch fixes commit `dc52cd2eff` ("fs: dlm: fix F_CANCELLK to cancel pending request") that we don't unlock the ops_lock in a rate case when a waiter cannot be found. This case can only happen when cancellation of plock operation was successful but no kernel waiter was being found. Fixes: `dc52cd2eff` ("fs: dlm: fix F_CANCELLK to cancel pending request") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-08-10 10:33:03 -05:00
Alexander Aring	dc52cd2eff	fs: dlm: fix F_CANCELLK to cancel pending request This patch fixes the current handling of F_CANCELLK by not just doing a unlock as we need to try to cancel a lock at first. A unlock makes sense on a non-blocking lock request but if it's a blocking lock request we need to cancel the request until it's not granted yet. This patch is fixing this behaviour by first try to cancel a lock request and if it's failed it's unlocking the lock which seems to be granted. Note: currently the nfs locking handling was disabled by commit `40595cdc93` ("nfs: block notification on fs with its own ->lock"). However DLM was never being updated regarding to this change. Future patches will try to fix lockd lock requests for DLM. This patch is currently assuming the upstream DLM lockd handling is correct. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-07-20 17:25:04 -05:00
Alexander Aring	568f915655	fs: dlm: allow to F_SETLKW getting interrupted This patch implements dlm plock F_SETLKW interruption feature. If a blocking posix lock request got interrupted in user space by a signal a cancellation request for a non granted lock request to the user space lock manager will be send. The user lock manager answers either with zero or a negative errno code. A errno of -ENOENT signals that there is currently no blocking lock request waiting to being granted. In case of -ENOENT it was probably to late to request a cancellation and the pending lock got granted. In any error case we will wait until the lock is being granted as cancellation failed, this causes also that in case of an older user lock manager returning -EINVAL we will wait as cancellation is not supported which should be fine. If a user requires this feature the user should update dlm user space to support lock request cancellation. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-07-20 17:24:57 -05:00
Alexander Aring	99c58d6480	fs: dlm: remove twice newline This patch removes a newline which log_print() already adds, also removes wrapped string that causes a checkpatch warning. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-07-20 17:24:36 -05:00

1 2 3 4 5 ...

854 Commits