David Howells says:
====================
rxrpc: Fix retransmission timeout and ACK discard
Here are a couple of fixes and an extra tracepoint for AF_RXRPC:
(1) Calculate the RTO pretty much as TCP does, rather than making
something up, including an initial 4s timeout (which causes return
probes from the fileserver to fail if a packet goes missing), and add
backoff.
(2) Fix the discarding of out-of-order received ACKs. We mustn't let the
hard-ACK point regress, nor do we want to do unnecessary
retransmission because the soft-ACK list regresses. This is not
trivial, however, due to some loose wording in various old protocol
specs, the ACK field that should be used for this sometimes has the
wrong information in it.
(3) Add a tracepoint to log a discarded ACK.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
rxrpc currently uses a fixed 4s retransmission timeout until the RTT is
sufficiently sampled. This can cause problems with some fileservers with
calls to the cache manager in the afs filesystem being dropped from the
fileserver because a packet goes missing and the retransmission timeout is
greater than the call expiry timeout.
Fix this by:
(1) Copying the RTT/RTO calculation code from Linux's TCP implementation
and altering it to fit rxrpc.
(2) Altering the various users of the RTT to make use of the new SRTT
value.
(3) Replacing the use of rxrpc_resend_timeout to use the calculated RTO
value instead (which is needed in jiffies), along with a backoff.
Notes:
(1) rxrpc provides RTT samples by matching the serial numbers on outgoing
DATA packets that have the RXRPC_REQUEST_ACK set and PING ACK packets
against the reference serial number in incoming REQUESTED ACK and
PING-RESPONSE ACK packets.
(2) Each packet that is transmitted on an rxrpc connection gets a new
per-connection serial number, even for retransmissions, so an ACK can
be cross-referenced to a specific trigger packet. This allows RTT
information to be drawn from retransmitted DATA packets also.
(3) rxrpc maintains the RTT/RTO state on the rxrpc_peer record rather than
on an rxrpc_call because many RPC calls won't live long enough to
generate more than one sample.
(4) The calculated SRTT value is in units of 8ths of a microsecond rather
than nanoseconds.
The (S)RTT and RTO values are displayed in /proc/net/rxrpc/peers.
Fixes: 17926a7932 ([AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both"")
Signed-off-by: David Howells <dhowells@redhat.com>
Pull block fixes from Jens Axboe:
- a small series fixing a use-after-free of bdi name (Christoph,Yufen)
- NVMe fix for a regression with the smaller CQ update (Alexey)
- NVMe fix for a hang at namespace scanning error recovery (Sagi)
- fix race with blk-iocost iocg->abs_vdebt updates (Tejun)
* tag 'block-5.7-2020-05-09' of git://git.kernel.dk/linux-block:
nvme: fix possible hang when ns scanning fails during error recovery
nvme-pci: fix "slimmer CQ head update"
bdi: add a ->dev_name field to struct backing_dev_info
bdi: use bdi_dev_name() to get device name
bdi: move bdi_dev_name out of line
vboxsf: don't use the source name in the bdi name
iocost: protect iocg->abs_vdebt with iocg->waitq.lock
Pull tracing fixes from Steven Rostedt:
- Fix bootconfig causing kernels to fail with CONFIG_BLK_DEV_RAM
enabled
- Fix allocation leaks in bootconfig tool
- Fix a double initialization of a variable
- Fix API bootconfig usage from kprobe boot time events
- Reject NULL location for kprobes
- Fix crash caused by preempt delay module not cleaning up kthread
correctly
- Add vmalloc_sync_mappings() to prevent x86_64 page faults from
recursively faulting from tracing page faults
- Fix comment in gpu/trace kerneldoc header
- Fix documentation of how to create a trace event class
- Make the local tracing_snapshot_instance_cond() function static
* tag 'trace-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tools/bootconfig: Fix resource leak in apply_xbc()
tracing: Make tracing_snapshot_instance_cond() static
tracing: Fix doc mistakes in trace sample
gpu/trace: Minor comment updates for gpu_mem_total tracepoint
tracing: Add a vmalloc_sync_mappings() for safe measure
tracing: Wait for preempt irq delay thread to finish
tracing/kprobes: Reject new event if loc is NULL
tracing/boottime: Fix kprobe event API usage
tracing/kprobes: Fix a double initialization typo
bootconfig: Fix to remove bootconfig data from initrd while boot
This change updates the improper comment for the 'size' attribute in the
tracepoint definition. Most gfx drivers pre-fault in physical pages
instead of making virtual allocations. So we drop the 'Virtual' keyword
here and leave this to the implementations.
Link: http://lkml.kernel.org/r/20200428220825.169606-1-zzyiwei@google.com
Signed-off-by: Yiwei Zhang <zzyiwei@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Stable fixes:
- fix handling of backchannel binding in BIND_CONN_TO_SESSION
Bugfixes:
- Fix a credential use-after-free issue in pnfs_roc()
- Fix potential posix_acl refcnt leak in nfs3_set_acl
- defer slow parts of rpc_free_client() to a workqueue
- Fix an Oopsable race in __nfs_list_for_each_server()
- Fix trace point use-after-free race
- Regression: the RDMA client no longer responds to server disconnect
requests
- Fix return values of xdr_stream_encode_item_{present, absent}
- _pnfs_return_layout() must always wait for layoutreturn completion
Cleanups:
- Remove unreachable error conditions"
* tag 'nfs-for-5.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix a race in __nfs_list_for_each_server()
NFSv4.1: fix handling of backchannel binding in BIND_CONN_TO_SESSION
SUNRPC: defer slow parts of rpc_free_client() to a workqueue.
NFSv4: Remove unreachable error condition due to rpc_run_task()
SUNRPC: Remove unreachable error condition
xprtrdma: Fix use of xdr_stream_encode_item_{present, absent}
xprtrdma: Fix trace point use-after-free race
xprtrdma: Restore wake-up-all to rpcrdma_cm_event_handler()
nfs: Fix potential posix_acl refcnt leak in nfs3_set_acl
NFS/pnfs: Fix a credential use-after-free issue in pnfs_roc()
NFS/pnfs: Ensure that _pnfs_return_layout() waits for layoutreturn completion
Pull block fixes from Jens Axboe:
"A few fixes/changes that should go into this release:
- null_blk zoned fixes (Damien)
- blkdev_close() sync improvement (Douglas)
- Fix regression in blk-iocost that impacted (at least) systemtap
(Waiman)
- Comment fix, header removal (Zhiqiang, Jianpeng)"
* tag 'block-5.7-2020-04-24' of git://git.kernel.dk/linux-block:
null_blk: Cleanup zoned device initialization
null_blk: Fix zoned command handling
block: remove unused header
blk-iocost: Fix error on iocost_ioc_vrate_adj
bdev: Reduce time holding bd_mutex in sync in blkdev_close()
buffer: remove useless comment and WB_REASON_FREE_MORE_MEM, reason.
Pull nfsd fixes from Chuck Lever:
"The first set of 5.7-rc fixes for NFS server issues.
These were all unresolved at the time the 5.7 window opened, and
needed some additional time to ensure they were correctly addressed.
They are ready now.
At the moment I know of one more urgent issue regarding the NFS
server. A fix has been tested and is under review. I expect to send
one more pull request, containing this fix (which now consists of 3
patches).
Fixes:
- Address several use-after-free and memory leak bugs
- Prevent a backchannel livelock"
* tag 'nfsd-5.7-rc-1' of git://git.linux-nfs.org/projects/cel/cel-2.6:
svcrdma: Fix leak of svc_rdma_recv_ctxt objects
svcrdma: Fix trace point use-after-free race
SUNRPC: Fix backchannel RPC soft lockups
SUNRPC/cache: Fix unsafe traverse caused double-free in cache_purge
nfsd: memory corruption in nfsd4_lock()
Systemtap 4.2 is unable to correctly interpret the "u32 (*missed_ppm)[2]"
argument of the iocost_ioc_vrate_adj trace entry defined in
include/trace/events/iocost.h leading to the following error:
/tmp/stapAcz0G0/stap_c89c58b83cea1724e26395efa9ed4939_6321_aux_6.c:78:8:
error: expected ‘;’, ‘,’ or ‘)’ before ‘*’ token
, u32[]* __tracepoint_arg_missed_ppm
That argument type is indeed rather complex and hard to read. Looking
at block/blk-iocost.c. It is just a 2-entry u32 array. By simplifying
the argument to a simple "u32 *missed_ppm" and adjusting the trace
entry accordingly, the compilation error was gone.
Fixes: 7caa47151a ("blkcg: implement blk-iocost")
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It's not safe to use resources pointed to by the @send_wr of
ib_post_send() _after_ that function returns. Those resources are
typically freed by the Send completion handler, which can run before
ib_post_send() returns.
Thus the trace points currently around ib_post_send() in the
client's RPC/RDMA transport are a hazard, even when they are
disabled. Rearrange them so that they touch the Work Request only
_before_ ib_post_send() is invoked.
Fixes: ab03eff58e ("xprtrdma: Add trace points in RPC Call transmit paths")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
free_more_memory func has been completely removed in commit bc48f001de
("buffer: eliminate the need to call free_more_memory() in __getblk_slow()")
So comment and `WB_REASON_FREE_MORE_MEM` reason about free_more_memory
are no longer needed.
Fixes: bc48f001de ("buffer: eliminate the need to call free_more_memory() in __getblk_slow()")
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Drop needless newlines from tracepoint format strings, they only add
empty lines to perf tracing output.
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge more updates from Andrew Morton:
- a lot more of MM, quite a bit more yet to come: (memcg, pagemap,
vmalloc, pagealloc, migration, thp, ksm, madvise, virtio,
userfaultfd, memory-hotplug, shmem, rmap, zswap, zsmalloc, cleanups)
- various other subsystems (procfs, misc, MAINTAINERS, bitops, lib,
checkpatch, epoll, binfmt, kallsyms, reiserfs, kmod, gcov, kconfig,
ubsan, fault-injection, ipc)
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (158 commits)
ipc/shm.c: make compat_ksys_shmctl() static
ipc/mqueue.c: fix a brace coding style issue
lib/Kconfig.debug: fix a typo "capabilitiy" -> "capability"
ubsan: include bug type in report header
kasan: unset panic_on_warn before calling panic()
ubsan: check panic_on_warn
drivers/misc/lkdtm/bugs.c: add arithmetic overflow and array bounds checks
ubsan: split "bounds" checker from other options
ubsan: add trap instrumentation option
init/Kconfig: clean up ANON_INODES and old IO schedulers options
kernel/gcov/fs.c: replace zero-length array with flexible-array member
gcov: gcc_3_4: replace zero-length array with flexible-array member
gcov: gcc_4_7: replace zero-length array with flexible-array member
kernel/kmod.c: fix a typo "assuems" -> "assumes"
reiserfs: clean up several indentation issues
kallsyms: unexport kallsyms_lookup_name() and kallsyms_on_each_symbol()
samples/hw_breakpoint: drop use of kallsyms_lookup_name()
samples/hw_breakpoint: drop HW_BREAKPOINT_R when reporting writes
fs/binfmt_elf.c: don't free interpreter's ELF pheaders on common path
fs/binfmt_elf.c: allocate less for static executable
...
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Stable fixes:
- Fix a page leak in nfs_destroy_unlinked_subrequests()
- Fix use-after-free issues in nfs_pageio_add_request()
- Fix new mount code constant_table array definitions
- finish_automount() requires us to hold 2 refs to the mount record
Features:
- Improve the accuracy of telldir/seekdir by using 64-bit cookies
when possible.
- Allow one RDMA active connection and several zombie connections to
prevent blocking if the remote server is unresponsive.
- Limit the size of the NFS access cache by default
- Reduce the number of references to credentials that are taken by
NFS
- pNFS files and flexfiles drivers now support per-layout segment
COMMIT lists.
- Enable partial-file layout segments in the pNFS/flexfiles driver.
- Add support for CB_RECALL_ANY to the pNFS flexfiles layout type
- pNFS/flexfiles Report NFS4ERR_DELAY and NFS4ERR_GRACE errors from
the DS using the layouterror mechanism.
Bugfixes and cleanups:
- SUNRPC: Fix krb5p regressions
- Don't specify NFS version in "UDP not supported" error
- nfsroot: set tcp as the default transport protocol
- pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()
- alloc_nfs_open_context() must use the file cred when available
- Fix locking when dereferencing the delegation cred
- Fix memory leaks in O_DIRECT when nfs_get_lock_context() fails
- Various clean ups of the NFS O_DIRECT commit code
- Clean up RDMA connect/disconnect
- Replace zero-length arrays with C99-style flexible arrays"
* tag 'nfs-for-5.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (86 commits)
NFS: Clean up process of marking inode stale.
SUNRPC: Don't start a timer on an already queued rpc task
NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()
NFS/pnfs: Fix dereference of layout cred in pnfs_layoutcommit_inode()
NFS: Beware when dereferencing the delegation cred
NFS: Add a module parameter to set nfs_mountpoint_expiry_timeout
NFS: finish_automount() requires us to hold 2 refs to the mount record
NFS: Fix a few constant_table array definitions
NFS: Try to join page groups before an O_DIRECT retransmission
NFS: Refactor nfs_lock_and_join_requests()
NFS: Reverse the submission order of requests in __nfs_pageio_add_request()
NFS: Clean up nfs_lock_and_join_requests()
NFS: Remove the redundant function nfs_pgio_has_mirroring()
NFS: Fix memory leaks in nfs_pageio_stop_mirroring()
NFS: Fix a request reference leak in nfs_direct_write_clear_reqs()
NFS: Fix use-after-free issues in nfs_pageio_add_request()
NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests()
NFS: Fix a page leak in nfs_destroy_unlinked_subrequests()
NFS: Remove unused FLUSH_SYNC support in nfs_initiate_pgio()
pNFS/flexfiles: Specify the layout segment range in LAYOUTGET
...
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've mainly focused on fixing bugs and addressing
issues in recently introduced compression support.
Enhancement:
- add zstd support, and set LZ4 by default
- add ioctl() to show # of compressed blocks
- show mount time in debugfs
- replace rwsem with spinlock
- avoid lock contention in DIO reads
Some major bug fixes wrt compression:
- compressed block count
- memory access and leak
- remove obsolete fields
- flag controls
Other bug fixes and clean ups:
- fix overflow when handling .flags in inode_info
- fix SPO issue during resize FS flow
- fix compression with fsverity enabled
- potential deadlock when writing compressed pages
- show missing mount options"
* tag 'f2fs-for-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (66 commits)
f2fs: keep inline_data when compression conversion
f2fs: fix to disable compression on directory
f2fs: add missing CONFIG_F2FS_FS_COMPRESSION
f2fs: switch discard_policy.timeout to bool type
f2fs: fix to verify tpage before releasing in f2fs_free_dic()
f2fs: show compression in statx
f2fs: clean up dic->tpages assignment
f2fs: compress: support zstd compress algorithm
f2fs: compress: add .{init,destroy}_decompress_ctx callback
f2fs: compress: fix to call missing destroy_compress_ctx()
f2fs: change default compression algorithm
f2fs: clean up {cic,dic}.ref handling
f2fs: fix to use f2fs_readpage_limit() in f2fs_read_multi_pages()
f2fs: xattr.h: Make stub helpers inline
f2fs: fix to avoid double unlock
f2fs: fix potential .flags overflow on 32bit architecture
f2fs: fix NULL pointer dereference in f2fs_verity_work()
f2fs: fix to clear PG_error if fsverity failed
f2fs: don't call fscrypt_get_encryption_info() explicitly in f2fs_tmpfile()
f2fs: don't trigger data flush in foreground operation
...
Pull tracing updates from Steven Rostedt:
"New tracing features:
- The ring buffer is no longer disabled when reading the trace file.
The trace_pipe file was made to be used for live tracing and
reading as it acted like the normal producer/consumer. As the trace
file would not consume the data, the easy way of handling it was to
just disable writes to the ring buffer.
This came to a surprise to the BPF folks who complained about lost
events due to reading. This is no longer an issue. If someone wants
to keep the old disabling there's a new option "pause-on-trace"
that can be set.
- New set_ftrace_notrace_pid file. PIDs in this file will not be
traced by the function tracer.
Similar to set_ftrace_pid, which makes the function tracer only
trace those tasks with PIDs in the file, the set_ftrace_notrace_pid
does the reverse.
- New set_event_notrace_pid file. PIDs in this file will cause events
not to be traced if triggered by a task with a matching PID.
Similar to the set_event_pid file but will not be traced. Note,
sched_waking and sched_switch events may still be traced if one of
the tasks referenced by those events contains a PID that is allowed
to be traced.
Tracing related features:
- New bootconfig option, that is attached to the initrd file.
If bootconfig is on the command line, then the initrd file is
searched looking for a bootconfig appended at the end.
- New GPU tracepoint infrastructure to help the gfx drivers to get
off debugfs (acked by Greg Kroah-Hartman)
And other minor updates and fixes"
* tag 'trace-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits)
tracing: Do not allocate buffer in trace_find_next_entry() in atomic
tracing: Add documentation on set_ftrace_notrace_pid and set_event_notrace_pid
selftests/ftrace: Add test to test new set_event_notrace_pid file
selftests/ftrace: Add test to test new set_ftrace_notrace_pid file
tracing: Create set_event_notrace_pid to not trace tasks
ftrace: Create set_ftrace_notrace_pid to not trace tasks
ftrace: Make function trace pid filtering a bit more exact
ftrace/kprobe: Show the maxactive number on kprobe_events
tracing: Have the document reflect that the trace file keeps tracing enabled
ring-buffer/tracing: Have iterator acknowledge dropped events
tracing: Do not disable tracing when reading the trace file
ring-buffer: Do not disable recording when there is an iterator
ring-buffer: Make resize disable per cpu buffer instead of total buffer
ring-buffer: Optimize rb_iter_head_event()
ring-buffer: Do not die if rb_iter_peek() fails more than thrice
ring-buffer: Have rb_iter_head_event() handle concurrent writer
ring-buffer: Add page_stamp to iterator for synchronization
ring-buffer: Rename ring_buffer_read() to read_buffer_iter_advance()
ring-buffer: Have ring_buffer_empty() not depend on tracing stopped
tracing: Save off entry when peeking at next entry
...
Pull nfsd updates from Chuck Lever:
- Fix EXCHANGE_ID response when NFSD runs in a container
- A battery of new static trace points
- Socket transports now use bio_vec to send Replies
- NFS/RDMA now supports filesystems with no .splice_read method
- Favor memcpy() over DMA mapping for small RPC/RDMA Replies
- Add pre-requisites for supporting multiple Write chunks
- Numerous minor fixes and clean-ups
[ Chuck is filling in for Bruce this time while he and his family settle
into a new house ]
* tag 'nfsd-5.7' of git://git.linux-nfs.org/projects/cel/cel-2.6: (39 commits)
svcrdma: Fix leak of transport addresses
SUNRPC: Fix a potential buffer overflow in 'svc_print_xprts()'
SUNRPC/cache: don't allow invalid entries to be flushed
nfsd: fsnotify on rmdir under nfsd/clients/
nfsd4: kill warnings on testing stateids with mismatched clientids
nfsd: remove read permission bit for ctl sysctl
NFSD: Fix NFS server build errors
sunrpc: Add tracing for cache events
SUNRPC/cache: Allow garbage collection of invalid cache entries
nfsd: export upcalls must not return ESTALE when mountd is down
nfsd: Add tracepoints for update of the expkey and export cache entries
nfsd: Add tracepoints for exp_find_key() and exp_get_by_name()
nfsd: Add tracing to nfsd_set_fh_dentry()
nfsd: Don't add locks to closed or closing open stateids
SUNRPC: Teach server to use xprt_sock_sendmsg for socket sends
SUNRPC: Refactor xs_sendpages()
svcrdma: Avoid DMA mapping small RPC Replies
svcrdma: Fix double sync of transport header buffer
svcrdma: Refactor chunk list encoders
SUNRPC: Add encoders for list item discriminators
...
Add zstd compress algorithm support, use "compress_algorithm=zstd"
mountoption to enable it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Pull SCSI updates from James Bottomley:
"This series has a huge amount of churn because it pulls in Mauro's doc
update changing all our txt files to rst ones.
Excluding that, we have the usual driver updates (qla2xxx, ufs, lpfc,
zfcp, ibmvfc, pm80xx, aacraid), a treewide update for scnprintf and
some other minor updates.
The major core change is Hannes moving functions out of the aacraid
driver and into the core"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (223 commits)
scsi: aic7xxx: aic97xx: Remove FreeBSD-specific code
scsi: ufs: Do not rely on prefetched data
scsi: dc395x: remove dc395x_bios_param
scsi: libiscsi: Fix error count for active session
scsi: hpsa: correct race condition in offload enabled
scsi: message: fusion: Replace zero-length array with flexible-array member
scsi: qedi: Add PCI shutdown handler support
scsi: qedi: Add MFW error recovery process
scsi: ufs: Enable block layer runtime PM for well-known logical units
scsi: ufs-qcom: Override devfreq parameters
scsi: ufshcd: Let vendor override devfreq parameters
scsi: ufshcd: Update the set frequency to devfreq
scsi: ufs: Resume ufs host before accessing ufs device
scsi: ufs-mediatek: customize the delay for enabling host
scsi: ufs: make HCE polling more compact to improve initialization latency
scsi: ufs: allow custom delay prior to host enabling
scsi: ufs-mediatek: use common delay function
scsi: ufs: introduce common and flexible delay function
scsi: ufs: use an enum for host capabilities
scsi: ufs: fix uninitialized tx_lanes in ufshcd_disable_tx_lcc()
...