Separate the head & commit indices from the tail index to avoid
cache-line contention (so called 'false-sharing') between concurrent
threads.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since neither echo_commit nor echo_tail can change for the duration
of __process_echoes loop, substitute index comparison for the
snapshot counter.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Byte-by-byte echo output is painfully slow, requiring a lock/unlock
cycle for every input byte.
Instead, perform the echo output in blocks of 256 characters, and
at least once per flip buffer receive. Enough space is reserved in
the echo buffer to guarantee a full block can be saved without
overrunning the echo output. Overrun is prevented by discarding
the oldest echoes until enough space exists in the echo buffer
to receive at least a full block of new echoes.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Adding data to echo_buf (via add_echo_byte()) is guaranteed to be
single-threaded, since all callers are from the n_tty_receive_buf()
path. Processing the echo_buf can be called from either the
n_tty_receive_buf() path or the n_tty_write() path; however, these
callers are already serialized by output_lock.
Publish cumulative echo_head changes to echo_commit; process echo_buf
from echo_tail to echo_commit; remove echo_lock.
On echo_buf overrun, claim output_lock to serialize changes to
echo_tail.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Prepare for lockless echo_buf handling; compute current byte count
of echo_buf from head and tail indices.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Instead of using a single index to track the current echo_buf position,
use a head index when adding to the buffer and a tail index when
consuming from the buffer. Allow these head and tail indices to wrap
at max representable value; perform modulo reduction via helper
functions when accessing the buffer.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
TTY_BUFFER_PAGE is only used within drivers/tty/tty_buffer.c;
relocate to that file scope.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Convert the tty_buffer_flush() exclusion mechanism to a
public interface - tty_buffer_lock/unlock_exclusive() - and use
the interface to safely write the paste selection to the line
discipline.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Atomic bit ops are no longer required to indicate a flip buffer
flush is pending, as the flush_mutex is sufficient barrier.
Remove the unnecessary port .iflags field and localize flip buffer
state to struct tty_bufhead.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Separate the head and tail ptrs to avoid cache-line contention
(so called 'false-sharing') between concurrent threads.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that dropping the buffer lock is not necessary (as result of
converting the spin lock to a mutex), the flip buffer flush no
longer needs to be handled by the buffer work.
Simply signal a flush is required; the buffer work will exit the
i/o loop, which allows tty_buffer_flush() to proceed.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The buffer work may race with parallel tty_buffer_flush. Use a
mutex to guarantee exclusive modify access to the head flip
buffer.
Remove the unneeded spin lock.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Driver-side flip buffer input is already single-threaded; 'publish'
the .next link as the last operation on the tail buffer so the
'consumer' sees the already-completed flip buffer.
The commit buffer index is already 'published' by driver-side functions.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lockless flip buffers require atomically updating the bytes-in-use
watermark.
The pty driver also peeks at the watermark value to limit
memory consumption to a much lower value than the default; query
the watermark with new fn, tty_buffer_space_avail().
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use a 0-sized sentinel to avoid assigning the head ptr from
the driver side thread. This also eliminates testing head/tail
for NULL.
When the sentinel is first 'consumed' by the buffer work
(or by tty_buffer_flush()), it is detached from the list but not
freed nor added to the free list. Both buffer work and
tty_buffer_flush() continue to preserve at least 1 flip buffer
to which head & tail is pointed.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In preparation for lockless flip buffers, make the flip buffer
free list lockless.
NB: using llist is not the optimal solution, as the driver and
buffer work may contend over the llist head unnecessarily. However,
test measurements indicate this contention is low.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tty_buffer_find() implements a simple free list lookaside cache.
Merge this functionality into tty_buffer_alloc() to reflect the
more traditional alloc/free symmetry.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since flip buffers are size-aligned to 256 bytes and all flip
buffers 512-bytes or larger are not added to the free list, the
free list only contains 256-byte flip buffers.
Remove the list search when allocating a new flip buffer.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The char_buf_ptr and flag_buf_ptr values are trivially derived from
the .data field offset; compute values as needed.
Fixes a long-standing type-mismatch with the char and flag ptrs.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>