commit 38ff3e6bb9 upstream.
This was just recently reported to me. When built as modules, the
dccp_probe module has a silent dependency on the dccp module. This
stems from the fact that the module_init routine of dccp_probe
registers a jprobe on the dccp_sendmsg symbol. Since the symbol is
only referenced as a text string (the .symbol_name field in the jprobe
struct) rather than the address of the symbol itself, depmod never
picks this dependency up, and so if you load the dccp_probe module
without the dccp module loaded, the register_jprobe call fails with an
-EINVAL, and the whole module load fails.
The fix is pretty easy, we can just wrap the register_jprobe call in a
try_then_request_module call, which forces the dependency to get
satisfied prior to the probe registration.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.
Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove long removed "inet_protocol_base" declaration.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No code change, cosmetical changes only:
* whitespace cleanup via scripts/cleanfile,
* remove self-references to filename at top of files,
* fix coding style (extraneous brackets),
* fix documentation style (kernel-doc-nano-HOWTO).
Thanks are due to Ivo Augusto Calado who raised these issues by
submitting good-quality patches.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function block inet_connect_sock_af_ops contains no data
make it constant.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
String literals are constant, and usually, we can also tag the array
of pointers const too, moving it to the .rodata section.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
percpu counter dccp_orphan_count is init in dccp_init() by
percpu_counter_init() while dccp module is loaded, but the
destroy of it is missing while dccp module is unloaded. We
can get the kernel WARNING about this. Reproduct by the
following commands:
$ modprobe dccp
$ rmmod dccp
$ modprobe dccp
WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c()
Hardware name: VMware Virtual Platform
list_add corruption. next->prev should be prev (c080c0c4), but was (null). (next
=ca7188cc).
Modules linked in: dccp(+) nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc
Pid: 1956, comm: modprobe Not tainted 2.6.31-rc5 #55
Call Trace:
[<c042f8fa>] warn_slowpath_common+0x6a/0x81
[<c053a6cb>] ? __list_add+0x27/0x5c
[<c042f94f>] warn_slowpath_fmt+0x29/0x2c
[<c053a6cb>] __list_add+0x27/0x5c
[<c053c9b3>] __percpu_counter_init+0x4d/0x5d
[<ca9c90c7>] dccp_init+0x19/0x2ed [dccp]
[<c0401141>] do_one_initcall+0x4f/0x111
[<ca9c90ae>] ? dccp_init+0x0/0x2ed [dccp]
[<c06971b5>] ? notifier_call_chain+0x26/0x48
[<c0444943>] ? __blocking_notifier_call_chain+0x45/0x51
[<c04516f7>] sys_init_module+0xac/0x1bd
[<c04028e4>] sysenter_do_call+0x12/0x22
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DCCP protocol tries to allocate some large hash tables during
initialisation using the largest size possible. This can be larger than
what the page allocator can provide so it prints a warning. However, the
caller is able to handle the situation so this patch suppresses the
warning.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adding memory barrier after the poll_wait function, paired with
receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
to wrap the memory barrier.
Without the memory barrier, following race can happen.
The race fires, when following code paths meet, and the tp->rcv_nxt
and __add_wait_queue updates stay in CPU caches.
CPU1 CPU2
sys_select receive packet
... ...
__add_wait_queue update tp->rcv_nxt
... ...
tp->rcv_nxt check sock_def_readable
... {
schedule ...
if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
wake_up_interruptible(sk->sk_sleep)
...
}
If there was no cache the code would work ok, since the wait_queue and
rcv_nxt are opposit to each other.
Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
passed the tp->rcv_nxt check and sleeps, or will get the new value for
tp->rcv_nxt and will return with new data mask.
In both cases the process (CPU1) is being added to the wait queue, so the
waitqueue_active (CPU2) call cannot miss and will wake up CPU1.
The bad case is when the __add_wait_queue changes done by CPU1 stay in its
cache, and so does the tp->rcv_nxt update on CPU2 side. The CPU1 will then
endup calling schedule and sleep forever if there are no more data on the
socket.
Calls to poll_wait in following modules were ommited:
net/bluetooth/af_bluetooth.c
net/irda/af_irda.c
net/irda/irnet/irnet_ppp.c
net/mac80211/rc80211_pid_debugfs.c
net/phonet/socket.c
net/rds/af_rds.c
net/rfkill/core.c
net/sunrpc/cache.c
net/sunrpc/rpc_pipe.c
net/tipc/socket.c
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change all the code that deals directly with ICMPv6 type and code
values to use u8 instead of a signed int as that's the actual data
type.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define three accessors to get/set dst attached to a skb
struct dst_entry *skb_dst(const struct sk_buff *skb)
void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;
Delete skb->dst field
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb
Delete skb->rtable field
Setting rtable is not allowed, just set dst instead as rtable is an alias.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes a problem caused by the overlap of the connection-setup and
established-state phases of DCCP connections.
During connection setup, the client retransmits Confirm Feature-Negotiation
options until a response from the server signals that it can move from the
half-established PARTOPEN into the OPEN state, whereupon the connection is
fully established on both ends (RFC 4340, 8.1.5).
However, since the client may already send data while it is in the PARTOPEN
state, consequences arise for the Maximum Packet Size: the problem is that the
initial option overhead is much higher than for the subsequent established
phase, as it involves potentially many variable-length list-type options
(server-priority options, RFC 4340, 6.4).
Applying the standard MPS is insufficient here: especially with larger
payloads this can lead to annoying, counter-intuitive EMSGSIZE errors.
On the other hand, reducing the MPS available for the established phase by
the added initial overhead is highly wasteful and inefficient.
The solution chosen therefore is a two-phase strategy:
If the payload length of the DataAck in PARTOPEN is too large, an Ack is sent
to carry the options, and the feature-negotiation list is then flushed.
This means that the server gets two Acks for one Response. If both Acks get
lost, it is probably better to restart the connection anyway and devising yet
another special-case does not seem worth the extra complexity.
The result is a higher utilisation of the available packet space for the data
transmission phase (established state) of a connection.
The patch (over-)estimates the initial overhead to be 32*4 bytes -- commonly
seen values were around 90 bytes for initial feature-negotiation options.
It uses sizeof(u32) to mean "aligned units of 4 bytes".
For consistency, another use of 4-byte alignment is adapted.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch resolves a long-standing FIXME to dynamically update the Maximum
Packet Size depending on actual options usage.
It uses the flags set by the feature-negotiation infrastructure to compute
the required header option size.
Most options are fixed-size, a notable exception are Ack Vectors (required
currently only by CCID-2). These can have any length between 3 and 1020
bytes. As a result of testing, 16 bytes (2 bytes for type/length plus 14 Ack
Vector cells) have been found to be sufficient for loss-free situations.
There are currently no CCID-specific header options which may appear on data
packets, thus it is not necessary to define a corresponding CCID field as
suggested in the old comment.
Further changes:
----------------
Adjusted the type of 'cur_mps' to match the unsigned return type of the
function.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since all feature-negotiation processing now takes place in feat.c,
functions for producing verbose debugging output are concentrated
there.
New functions to print out values, entry records, and options are
provided, and also a macro is defined to not always have the function
name in the output line.
Thanks a lot to Wei Yongjun and Giuseppe Galeota for help and
discussion with an earlier revision of this patch.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch takes care of initialising and type-checking sysctls
related to feature negotiation. Type checking is important since some
of the sysctls now directly impact the feature-negotiation process.
The sysctls are initialised with the known default values for each
feature. For the type-checking the value constraints from RFC 4340
are used:
* Sequence Window uses the specified Wmin=32, the maximum is ulong (4 bytes),
tested and confirmed that it works up to 4294967295 - for Gbps speed;
* Ack Ratio is between 0 .. 0xffff (2-byte unsigned integer);
* CCIDs are between 0 .. 255;
* request_retries, retries1, retries2 also between 0..255 for good measure;
* tx_qlen is checked to be non-negative;
* sync_ratelimit remains as before.
Notes:
------
1. Die s@sysctl_dccp_feat@sysctl_dccp@g since the sysctls are now in feat.c.
2. As pointed out by Arnaldo, the pattern of type-checking repeats itself in
other places, sometimes with exactly the same kind of definitions (e.g.
"static int zero;"). It may be a good idea (kernel janitors?) to consolidate
type checking. For the sake of keeping the changeset small and in order not
to affect other subsystems, I have not strived to generalise here.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds full support for local/remote Sequence Window feature, from which the
* sequence-number-validity (W) and
* acknowledgment-number-validity (W') windows
derive as specified in RFC 4340, 7.5.3.
Specifically, the following is contained in this patch:
* integrated new socket fields into dccp_sk;
* updated the update_gsr/gss routines with regard to these fields;
* updated handler code: the Sequence Window feature is located at the TX side,
so the local feature is meant if the handler-rx flag is false;
* the initialisation of `rcv_wnd' in reqsk is removed, since
- rcv_wnd is not used by the code anywhere;
- sequence number checks are not done in the LISTEN state (cf. 7.5.3);
- dccp_check_req checks the Ack number validity more rigorously;
* the `struct dccp_minisock' became empty and is now removed.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
This initialises feature negotiation from two tables, which are in
turn are initialised from sysctls.
As a novel feature, specifics of the implementation (e.g. that short
seqnos and ECN are not yet available) are advertised for robustness.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thanks to Wei and Arnaldo for pointing out the correct
new reference for CCID-3.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removed the __exit annotation of tfrc_lib_exit(), in order to suppress the following section mismatch messages:
WARNING: net/dccp/dccp.o(.text+0xd9): Section mismatch in reference from the function ccid_cleanup_builtins() to the function .exit.text:tfrc_lib_exit()
The function ccid_cleanup_builtins() references a function in an exit section.
Often the function tfrc_lib_exit() has valid usage outside the exit section
and the fix is to remove the __exit annotation of tfrc_lib_exit.
WARNING: net/dccp/dccp.o(.init.text+0x48): Section mismatch in reference from the function ccid_initialize_builtins() to the function .exit.text:tfrc_lib_exit()
The function __init ccid_initialize_builtins() references
a function __exit tfrc_lib_exit().
This is often seen when error handling in the init function
uses functionality in the exit path.
The fix is often to remove the __exit annotation of
tfrc_lib_exit() so it may be used outside an exit section.
Signed-off-by: Leonardo Potenza <lpotenza@inwind.it>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>