The initial session number when a link is created is based on a random
value, taken from struct tipc_net->random. It is then incremented for
each link reset to avoid mixing protocol messages from different link
sessions.
However, when a bearer is reset all its links are deleted, and will
later be re-created using the same random value as the first time.
This means that if the link never went down between creation and
deletion we will still sometimes have two subsequent sessions with
the same session number. In virtual environments with potentially
long transmission times this has turned out to be a real problem.
We now fix this by randomizing the session number each time a link
is created.
With a session number size of 16 bits this gives a risk of session
collision of 1/64k. To reduce this further, we also introduce a sanity
check on the very first STATE message arriving at a link. If this has
an acknowledge value differing from 0, which is logically impossible,
we ignore the message. The final risk for session collision is hence
reduced to 1/4G, which should be sufficient.
Signed-off-by: LUU Duc Canh <canh.d.luu@dektech.com.au>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We see the following scenario:
1) Link endpoint B on node 1 discovers that its peer endpoint is gone.
Since there is a second working link, failover procedure is started.
2) Link endpoint A on node 1 sends a FAILOVER message to peer endpoint
A on node 2. The node item 1->2 goes to state FAILINGOVER.
3) Linke endpoint A/2 receives the failover, and is supposed to take
down its parallell link endpoint B/2, while producing a FAILOVER
message to send back to A/1.
4) However, B/2 has already been deleted, so no FAILOVER message can
created.
5) Node 1->2 remains in state FAILINGOVER forever, refusing to receive
any messages that can bring B/1 up again. We are left with a non-
redundant link between node 1 and 2.
We fix this with letting endpoint A/2 build a dummy FAILOVER message
to send to back to A/1, so that the situation can be resolved.
Signed-off-by: LUU Duc Canh <canh.d.luu@dektech.com.au>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit referred to below introduced an update of the link
capabilities field that is not safe. Given the recently added
feature to remove idle node and link items after 5 minutes, there
is a small risk that the update will happen at the very moment the
targeted link is being removed. To avoid this we have to perform
the update inside the node item's write lock protection.
Fixes: 9012de5089 ("tipc: add sequence number check for link STATE messages")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some virtual environments we observe a significant higher number of
packet reordering and delays than we have been used to traditionally.
This makes it necessary with stricter checks on incoming link protocol
messages' session number, which until now only has been validated for
RESET messages.
Since the other two message types, ACTIVATE and STATE messages also
carry this number, it is easy to extend the validation check to those
messages.
We also introduce a flag indicating if a link has a valid peer session
number or not. This eliminates the mixing of 32- and 16-bit arithmethics
we are currently using to achieve this.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some switch infrastructures produce huge amounts of packet duplicates.
This becomes a problem if those messages are STATE/NACK protocol
messages, causing unnecessary retransmissions of already accepted
packets.
We now introduce a unique sequence number per STATE protocol message
so that duplicates can be identified and ignored. This will also be
useful when tracing such cases, and to avert replay attacks when TIPC
is encrypted.
For compatibility reasons we have to introduce a new capability flag
TIPC_LINK_PROTO_SEQNO to handle this new feature.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function for checking if there is an node address conflict is
supposed to return a suggestion for a new address if it finds a
conflict, and zero otherwise. But in case the peer being checked
is previously unknown it does instead return a "suggestion" for
the checked address itself. This results in a DSC_TRIAL_FAIL_MSG
being sent unecessarily to the peer, and sometimes makes the trial
period starting over again.
Fixes: 25b0b9c4e8 ("tipc: handle collisions of 32-bit node address hash values")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A peer node is considered down if there are no
active links (or) lost contact to the node. In current implementation,
a peer node instance is deleted either if
a) TIPC module is removed (or)
b) Application can use a netlink/iproute2 interface to delete a
specific down node.
Thus, a down node instance lives in the system forever, unless the
application explicitly removes it.
We fix this by deleting the nodes which are down for
a specified amount of time (5 minutes).
Existing node supervision timer is used to achieve this.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In single-link usage, the function tipc_node_timeout() still iterates
over the whole link array to handle each link. Given that the maximum
number of bearers are 3, there are 2 redundant iterations with lock
grab/release. Since this function is executing very frequently it makes
sense to optimize it.
This commit adds conditional checking to exit from the loop if the
known number of configured links has already been accessed.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bpf syscall and selftests conflicts were trivial
overlapping changes.
The r8169 change involved moving the added mdelay from 'net' into a
different function.
A TLS close bug fix overlapped with the splitting of the TLS state
into separate TX and RX parts. I just expanded the tests in the bug
fix from "ctx->conf == X" into "ctx->tx_conf == X && ctx->rx_conf
== X".
Signed-off-by: David S. Miller <davem@davemloft.net>
After the introduction of a 128-bit node identity it may be difficult
for a user to correlate between this identity and the generated node
hash address.
We now try to make this easier by introducing a new ioctl() call for
fetching a node identity by using the hash value as key. This will
be particularly useful when we extend some of the commands in the
'tipc' tool, but we also expect regular user applications to need
this feature.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 36a50a989e ("tipc: fix infinite loop when dumping link monitor
summary") intended to fix a problem with user tool looping when max
number of bearers are enabled.
Unfortunately, the wrong version of the commit was posted, so the
problem was not solved at all.
This commit adds the missing part.
Fixes: 36a50a989e ("tipc: fix infinite loop when dumping link monitor summary")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we have option to configure MTU of UDP media. The configured
MTU takes effect on the links going up after that moment. I.e, a user
has to reset bearer to have new value applied across its links. This is
confusing and disturbing on a running cluster.
We now introduce the functionality to change the default UDP bearer MTU
in struct tipc_bearer. Additionally, the links are updated dynamically,
without any need for a reset, when bearer value is changed. We leverage
the existing per-link functionality and the design being symetrical to
the confguration of link tolerance.
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When configuring the number of used bearers to MAX_BEARER and issuing
command "tipc link monitor summary", the command enters infinite loop
in user space.
This issue happens because function tipc_nl_node_dump_monitor() returns
the wrong 'prev_bearer' value when all potential monitors have been
scanned.
The correct behavior is to always try to scan all monitors until either
the netlink message is full, in which case we return the bearer identity
of the affected monitor, or we continue through the whole bearer array
until we can return MAX_BEARERS. This solution also caters for the case
where there may be gaps in the bearer array.
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the new RB tree structure for service ranges it becomes possible to
solve an old problem; - we can now allow overlapping service ranges in
the table.
When inserting a new service range to the tree, we use 'lower' as primary
key, and when necessary 'upper' as secondary key.
Since there may now be multiple service ranges matching an indicated
'lower' value, we must also add the 'upper' value to the functions
used for removing publications, so that the correct, corresponding
range item can be found.
These changes guarantee that a well-formed publication/withdrawal item
from a peer node never will be rejected, and make it possible to
eliminate the problematic backlog functionality we currently have for
handling such cases.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current design of the binding table has an unnecessary memory
consuming and complex data structure. It aggregates the service range
items into an array, which is expanded by a factor two every time it
becomes too small to hold a new item. Furthermore, the arrays never
shrink when the number of ranges diminishes.
We now replace this array with an RB tree that is holding the range
items as tree nodes, each range directly holding a list of bindings.
This, along with a few name changes, improves both readability and
volume of the code, as well as reducing memory consumption and hopefully
improving cache hit rate.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following sparse warning:
net/tipc/node.c:336:18: warning:
symbol 'tipc_node_create' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a 32-bit node address is generated from a 128-bit identifier,
there is a risk of collisions which must be discovered and handled.
We do this as follows:
- We don't apply the generated address immediately to the node, but do
instead initiate a 1 sec trial period to allow other cluster members
to discover and handle such collisions.
- During the trial period the node periodically sends out a new type
of message, DSC_TRIAL_MSG, using broadcast or emulated broadcast,
to all the other nodes in the cluster.
- When a node is receiving such a message, it must check that the
presented 32-bit identifier either is unused, or was used by the very
same peer in a previous session. In both cases it accepts the request
by not responding to it.
- If it finds that the same node has been up before using a different
address, it responds with a DSC_TRIAL_FAIL_MSG containing that
address.
- If it finds that the address has already been taken by some other
node, it generates a new, unused address and returns it to the
requester.
- During the trial period the requesting node must always be prepared
to accept a failure message, i.e., a message where a peer suggests a
different (or equal) address to the one tried. In those cases it
must apply the suggested value as trial address and restart the trial
period.
This algorithm ensures that in the vast majority of cases a node will
have the same address before and after a reboot. If a legacy user
configures the address explicitly, there will be no trial period and
messages, so this protocol addition is completely backwards compatible.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We add a 128-bit node identity, as an alternative to the currently used
32-bit node address.
For the sake of compatibility and to minimize message header changes
we retain the existing 32-bit address field. When not set explicitly by
the user, this field will be filled with a hash value generated from the
much longer node identity, and be used as a shorthand value for the
latter.
We permit either the address or the identity to be set by configuration,
but not both, so when the address value is set by a legacy user the
corresponding 128-bit node identity is generated based on the that value.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nominally, TIPC organizes network nodes into a three-level network
hierarchy consisting of the levels 'zone', 'cluster' and 'node'. This
hierarchy is reflected in the node address format, - it is sub-divided
into an 8-bit zone id, and 12 bit cluster id, and a 12-bit node id.
However, the 'zone' and 'cluster' levels have in reality never been
fully implemented,and never will be. The result of this has been
that the first 20 bits the node identity structure have been wasted,
and the usable node identity range within a cluster has been limited
to 12 bits. This is starting to become a problem.
In the following commits, we will need to be able to connect between
nodes which are using the whole 32-bit value space of the node address.
We therefore remove the restrictions on which values can be assigned
to node identity, -it is from now on only a 32-bit integer with no
assumed internal structure.
Isolation between clusters is now achieved only by setting different
values for the 'network id' field used during neighbor discovery, in
practice leading to the latter becoming the new cluster identity.
The rules for accepting discovery requests/responses from neighboring
nodes now become:
- If the user is using legacy address format on both peers, reception
of discovery messages is subject to the legacy lookup domain check
in addition to the cluster id check.
- Otherwise, the discovery request/response is always accepted, provided
both peers have the same network id.
This secures backwards compatibility for users who have been using zone
or cluster identities as cluster separators, instead of the intended
'network id'.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the default link tolerance set in struct tipc_bearer only
has effect on links going up after that moment. I.e., a user has to
reset all the node's links across that bearer to have the new value
applied. This is too limiting and disturbing on a running cluster to
be useful.
We now change this so that also already existing links are updated
dynamically, without any need for a reset, when the bearer value is
changed. We leverage the already existing per-link functionality
for this to achieve the wanted effect.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>