Commit Graph

173 Commits

Author SHA1 Message Date
Jon Paul Maloy dc8d1eb305 tipc: fix node reference count bug
Commit 5405ff6e15 ("tipc: convert node lock to rwlock")
introduced a bug to the node reference counter handling. When a
message is successfully sent in the function tipc_node_xmit(),
we return directly after releasing the node lock, instead of
continuing and decrementing the node reference counter as we
should do.

This commit fixes this bug.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 15:19:40 -05:00
Jon Paul Maloy 1a90632da8 tipc: eliminate remnants of hungarian notation
The number of variables with Hungarian notation (l_ptr, n_ptr etc.)
has been significantly reduced over the last couple of years.

We now root out the last traces of this practice.
There are no functional changes in this commit.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 14:06:10 -05:00
Jon Paul Maloy 38206d5939 tipc: narrow down interface towards struct tipc_link
We move the definition of struct tipc_link from link.h to link.c in
order to minimize its exposure to the rest of the code.

When needed, we define new functions to make it possible for external
entities to access and set data in the link.

Apart from the above, there are no functional changes.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 14:06:10 -05:00
Jon Paul Maloy 5be9c08671 tipc: narrow down exposure of struct tipc_node
In our effort to have less code and include dependencies between
entities such as node, link and bearer, we try to narrow down
the exposed interface towards the node as much as possible.

In this commit, we move the definition of struct tipc_node, along
with many of its associated function declarations, from node.h to
node.c. We also move some function definitions from link.c and
name_distr.c to node.c, since they access fields in struct tipc_node
that should not be externally visible. The moved functions are renamed
according to new location, and made static whenever possible.

There are no functional changes in this commit.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 14:06:10 -05:00
Jon Paul Maloy 5405ff6e15 tipc: convert node lock to rwlock
According to the node FSM a node in state SELF_UP_PEER_UP cannot
change state inside a lock context, except when a TUNNEL_PROTOCOL
(SYNCH or FAILOVER) packet arrives. However, the node's individual
links may still change state.

Since each link now is protected by its own spinlock, we finally have
the conditions in place to convert the node spinlock to an rwlock_t.
If the node state and arriving packet type are rigth, we can let the
link directly receive the packet under protection of its own spinlock
and the node lock in read mode. In all other cases we use the node
lock in write mode. This enables full concurrent execution between
parallel links during steady-state traffic situations, i.e., 99+ %
of the time.

This commit implements this change.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 14:06:10 -05:00
Jon Paul Maloy 2312bf61ae tipc: introduce per-link spinlock
As a preparation to allow parallel links to work more independently
from each other we introduce a per-link spinlock, to be stored in the
struct nodes's link entry area. Since the node lock still is a regular
spinlock there is no increase in parallellism at this stage.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 14:06:10 -05:00
Jon Paul Maloy 1d7e1c2595 tipc: reduce code dependency between binding table and node layer
The file name_distr.c currently contains three functions,
named_cluster_distribute(), tipc_publ_subcscribe() and
tipc_publ_unsubscribe() that all directly access fields in
struct tipc_node. We want to eliminate such dependencies, so
we move those functions to the file node.c and rename them to
tipc_node_broadcast(), tipc_node_subscribe() and tipc_node_unsubscribe()
respectively.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 14:06:10 -05:00
Jon Paul Maloy 5c10e97940 tipc: small cleanup of function tipc_node_check_state()
The function tipc_node_check_state() contains the core logics
for handling link synchronization and failover. For this reason,
it is important to keep it as comprehensible as possible.

In this commit, we make three small cleanups.

1) If the node is in state SELF_DOWN_PEER_LEAVING and the received
   packet confirms that the peer has lost contact, there will be no
   further action in this function. To make this clearer, we return
   from the function directly after the state change.

2) Since commit 0f8b8e28fb ("tipc: eliminate risk of stalled
   link synchronization") only the logically first TUNNEL_PROTO/SYNCH
   packet can alter the link state and set the synch point,
   independently of arrival order. Hence, there is not any longer any
   need to adjust the synch value in case such packets arrive in
   disorder. We remove this adjustment.

3) It is the intention that any message arriving on any of the links
   may trig a check for and possible termination of a node SYNCH state.
   A redundant and unnoticed check for tipc_link_is_synching() obviously
   beats this purpose, with the effect that only packets arriving on the
   synching link may currently end the synch state. We remove this check.
   This change will further shorten the synchronization period between
   parallel links.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 14:06:10 -05:00
Wu Fengguang 742e038330 tipc: link_is_bc_sndlink() can be static
TO: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
CC: Jon Maloy <jon.maloy@ericsson.com>
CC: Ying Xue <ying.xue@windriver.com>
CC: tipc-discussion@lists.sourceforge.net
CC: linux-kernel@vger.kernel.org

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-25 06:31:52 -07:00
Jon Paul Maloy 2af5ae372a tipc: clean up unused code and structures
After the previous changes in this series, we can now remove some
unused code and structures, both in the broadcast, link aggregation
and link code.

There are no functional changes in this commit.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-24 06:56:47 -07:00
Jon Paul Maloy c49a0a8439 tipc: ensure binding table initial distribution is sent via first link
Correct synchronization of the broadcast link at first contact between
two nodes is dependent on the assumption that the binding table "bulk"
update passes via the same link as the initial broadcast syncronization
message, i.e., via the first link that is established.

This is not guaranteed in the current implementation. If two link
come up very close to each other in time, the "bulk" may quite well
pass via the second link, and hence void the guarantee of a correct
initial synchronization before the broadcast link is opened.

This commit makes two small changes to strengthen this guarantee.

1) We let the second established link occupy slot 1 of the
   "active_links" array, while the first link will retain slot 0.
   (This is in reality a cosmetic change, we could just as well keep
    the current, opposite order)

2) We let the name distributor always use link selector/slot 0 when
   it sends it binding table updates.

The extra traffic bias on the first link caused by this change should
be negligible, since binding table updates constitutes a very small
fraction of the total traffic.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-24 06:56:46 -07:00
Jon Paul Maloy c72fa872a2 tipc: eliminate link's reference to owner node
With the recent commit series, we have established a one-way dependency
between the link aggregation (struct tipc_node) instances and their
pertaining tipc_link instances. This has enabled quite significant code
and structure simplifications.

In this commit, we eliminate the field 'owner', which points to an
instance of struct tipc_node, from struct tipc_link, and replace it with
a pointer to struct net, which is the only external reference now needed
by a link instance.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-24 06:56:45 -07:00
Jon Paul Maloy b06b281e79 tipc: simplify bearer level broadcast
Until now, we have been keeping track of the exact set of broadcast
destinations though the help structure tipc_node_map. This leads us to
have to maintain a whole infrastructure for supporting this, including
a pseudo-bearer and a number of functions to manipulate both the bearers
and the node map correctly. Apart from the complexity, this approach is
also limiting, as struct tipc_node_map only can support cluster local
broadcast if we want to avoid it becoming excessively large. We want to
eliminate this limitation, in order to enable introduction of scoped
multicast in the future.

A closer analysis reveals that it is unnecessary maintaining this "full
set" overview; it is sufficient to keep a counter per bearer, indicating
how many nodes can be reached via this bearer at the moment. The protocol
is now robust enough to handle transitional discrepancies between the
nominal number of reachable destinations, as expected by the broadcast
protocol itself, and the number which is actually reachable at the
moment. The initial broadcast synchronization, in conjunction with the
retransmission mechanism, ensures that all packets will eventually be
acknowledged by the correct set of destinations.

This commit introduces these changes.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-24 06:56:39 -07:00
Jon Paul Maloy 5266698661 tipc: let broadcast packet reception use new link receive function
The code path for receiving broadcast packets is currently distinct
from the unicast path. This leads to unnecessary code and data
duplication, something that can be avoided with some effort.

We now introduce separate per-peer tipc_link instances for handling
broadcast packet reception. Each receive link keeps a pointer to the
common, single, broadcast link instance, and can hence handle release
and retransmission of send buffers as if they belonged to the own
instance.

Furthermore, we let each unicast link instance keep a reference to both
the pertaining broadcast receive link, and to the common send link.
This makes it possible for the unicast links to easily access data for
broadcast link synchronization, as well as for carrying acknowledges for
received broadcast packets.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-24 06:56:37 -07:00
Jon Paul Maloy fd556f209a tipc: introduce capability bit for broadcast synchronization
Until now, we have tried to support both the newer, dedicated broadcast
synchronization mechanism along with the older, less safe, RESET_MSG/
ACTIVATE_MSG based one. The latter method has turned out to be a hazard
in a highly dynamic cluster, so we find it safer to disable it completely
when we find that the former mechanism is supported by the peer node.

For this purpose, we now introduce a new capabability bit,
TIPC_BCAST_SYNCH, to inform any peer nodes that dedicated broadcast
syncronization is supported by the present node. The new bit is conveyed
between peers in the 'capabilities' field of neighbor discovery messages.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-24 06:56:35 -07:00
Jon Paul Maloy 0e05498e9e tipc: make link implementation independent from struct tipc_bearer
In reality, the link implementation is already independent from
struct tipc_bearer, in that it doesn't store any reference to it.
However, we still pass on a pointer to a bearer instance in the
function tipc_link_create(), just to have it extract some
initialization information from it.

I later commits, we need to create instances of tipc_link without
having any associated struct tipc_bearer. To facilitate this, we
want to extract the initialization data already in the creator
function in node.c, before calling tipc_link_create(), and pass
this info on as individual parameters in the call.

This commit introduces this change.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-24 06:56:30 -07:00
David S. Miller 26440c835f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/asix_common.c
	net/ipv4/inet_connection_sock.c
	net/switchdev/switchdev.c

In the inet_connection_sock.c case the request socket hashing scheme
is completely different in net-next.

The other two conflicts were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-20 06:08:27 -07:00
Jon Paul Maloy c819930090 tipc: update node FSM when peer RESET message is received
The change made in the previous commit revealed a small flaw in the way
the node FSM is updated. When the function tipc_node_link_down() is
called for the last link to a node, we should check whether this was
caused by a local reset or by a received RESET message from the peer.
In the latter case, we can directly issue a PEER_LOST_CONTACT_EVT to
the node FSM, so that it is ready to re-establish contact. If this is
not done, the peer node will sometimes have to go through a second
establish cycle before the link becomes stable.

We fix this in this commit by conditionally issuing the mentioned
event in the function tipc_node_link_down(). We also move LINK_RESET
FSM even away from the link_reset() function and into the caller
function, partially because it is easier to follow the code when state
changes are gathered at a limited number of locations, partially
because there will be cases in future commits where we don't want the
link to go RESET mode when link_reset() is called.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15 23:55:23 -07:00
Jon Paul Maloy 282b3a0562 tipc: send out RESET immediately when link goes down
When a link is taken down because of a node local event, such as
disabling of a bearer or an interface, we currently leave it to the
peer node to discover the broken communication. The default time for
such failure discovery is 1.5-2 seconds.

If we instead allow the terminating link endpoint to send out a RESET
message at the moment it is reset, we can achieve the impression that
both endpoints are going down instantly. Since this is a very common
scenario, we find it worthwhile to make this small modification.

Apart from letting the link produce the said message, we also have to
ensure that the interface is able to transmit it before TIPC is
detached. We do this by performing the disabling of a bearer in three
steps:

1) Disable reception of TIPC packets from the interface in question.
2) Take down the links, while allowing them so send out a RESET message.
3) Disable transmission of TIPC packets on the interface.

Apart from this, we now have to react on the NETDEV_GOING_DOWN event,
instead of as currently the NEDEV_DOWN event, to ensure that such
transmission is possible during the teardown phase.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15 23:55:22 -07:00
Jon Paul Maloy 73f646cec3 tipc: delay ESTABLISH state event when link is established
Link establishing, just like link teardown, is a non-atomic action, in
the sense that discovering that conditions are right to establish a link,
and the actual adding of the link to one of the node's send slots is done
in two different lock contexts. The link FSM is designed to help bridging
the gap between the two contexts in a safe manner.

We have now discovered a weakness in the implementaton of this FSM.
Because we directly let the link go from state LINK_ESTABLISHING to
state LINK_ESTABLISHED already in the first lock context, we are unable
to distinguish between a fully established link, i.e., a link that has
been added to its slot, and a link that has not yet reached the second
lock context. It may hence happen that a manual intervention, e.g., when
disabling an interface, causes the function tipc_node_link_down() to try
removing the link from the node slots, decrementing its active link
counter etc, although the link was never added there in the first place.

We solve this by delaying the actual state change until we reach the
second lock context, inside the function tipc_node_link_up(). This
makes it possible for potentail callers of __tipc_node_link_down() to
know if they should proceed or not, and the problem is solved.

Unforunately, the situation described above also has a second problem.
Since there by necessity is a tipc_node_link_up() call pending once
the node lock has been released, we must defuse that call by setting
the link back from LINK_ESTABLISHING to LINK_RESET state. This forces
us to make a slight modification to the link FSM, which will now look
as follows.

 +------------------------------------+
 |RESET_EVT                           |
 |                                    |
 |                             +--------------+
 |           +-----------------|   SYNCHING   |-----------------+
 |           |FAILURE_EVT      +--------------+   PEER_RESET_EVT|
 |           |                  A            |                  |
 |           |                  |            |                  |
 |           |                  |            |                  |
 |           |                  |SYNCH_      |SYNCH_            |
 |           |                  |BEGIN_EVT   |END_EVT           |
 |           |                  |            |                  |
 |           V                  |            V                  V
 |    +-------------+          +--------------+          +------------+
 |    |  RESETTING  |<---------|  ESTABLISHED |--------->| PEER_RESET |
 |    +-------------+ FAILURE_ +--------------+ PEER_    +------------+
 |           |        EVT        |    A         RESET_EVT       |
 |           |                   |    |                         |
 |           |  +----------------+    |                         |
 |  RESET_EVT|  |RESET_EVT            |                         |
 |           |  |                     |                         |
 |           |  |                     |ESTABLISH_EVT            |
 |           |  |  +-------------+    |                         |
 |           |  |  | RESET_EVT   |    |                         |
 |           |  |  |             |    |                         |
 |           V  V  V             |    |                         |
 |    +-------------+          +--------------+        RESET_EVT|
 +--->|    RESET    |--------->| ESTABLISHING |<----------------+
      +-------------+ PEER_    +--------------+
       |           A  RESET_EVT       |
       |           |                  |
       |           |                  |
       |FAILOVER_  |FAILOVER_         |FAILOVER_
       |BEGIN_EVT  |END_EVT           |BEGIN_EVT
       |           |                  |
       V           |                  |
      +-------------+                 |
      | FAILINGOVER |<----------------+
      +-------------+

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15 23:55:21 -07:00
Jon Paul Maloy 0f8b8e28fb tipc: eliminate risk of stalled link synchronization
In commit 6e498158a8 ("tipc: move link synch and failover to link aggregation level")
we introduced a new mechanism for performing link failover and
synchronization. We have now detected a bug in this mechanism.

During link synchronization we use the arrival of any packet on
the tunnel link to trig a check for whether it has reached the
synchronization point or not. This has turned out to be too
permissive, since it may cause an arriving non-last SYNCH packet to
end the synch state, just to see the next SYNCH packet initiate a
new synch state with a new, higher synch point. This is not fatal,
but should be avoided, because it may significantly extend the
synchronization period, while at the same time we are not allowed
to send NACKs if packets are lost. In the worst case, a low-traffic
user may see its traffic stall until a LINK_PROTOCOL state message
trigs the link to leave synchronization state.

At the same time, LINK_PROTOCOL packets which happen to have a (non-
valid) sequence number lower than the tunnel link's rcv_nxt value will
be consistently dropped, and will never be able to resolve the situation
described above.

We fix this by exempting LINK_PROTOCOL packets from the sequence number
check, as they should be. We also reduce (but don't completely
eliminate) the risk of entering multiple synchronization states by only
allowing the (logically) first SYNCH packet to initiate a synchronization
state. This works independently of actual packet arrival order.

Fixes: commit 6e498158a8 ("tipc: move link synch and failover to link aggregation level")

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14 06:06:40 -07:00
Jon Paul Maloy 2be80c2d87 tipc: fix stale link problem during synchronization
Recent changes to the link synchronization means that we can now just
drop packets arriving on the synchronizing link before the synch point
is reached. This has lead to significant simplifications to the
implementation, but also turns out to have a flip side that we need
to consider.

Under unlucky circumstances, the two endpoints may end up
repeatedly dropping each other's packets, while immediately
asking for retransmission of the same packets, just to drop
them once more. This pattern will eventually be broken when
the synch point is reached on the other link, but before that,
the endpoints may have arrived at the retransmission limit
(stale counter) that indicates that the link should be broken.
We see this happen at rare occasions.

The fix for this is to not ask for retransmissions when a link is in
state LINK_SYNCHING. The fact that the link has reached this state
means that it has already received the first SYNCH packet, and that it
knows the synch point. Hence, it doesn't need any more packets until the
other link has reached the synch point, whereafter it can go ahead and
ask for the missing packets.

However, because of the reduced traffic on the synching link that
follows this change, it may now take longer to discover that the
synch point has been reached. We compensate for this by letting all
packets, on any of the links, trig a check for synchronization
termination. This is possible because the packets themselves don't
contain any information that is needed for discovering this condition.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-23 16:14:45 -07:00
Jon Paul Maloy 5ae2f8e685 tipc: interrupt link synchronization when a link goes down
When we introduced the new link failover/synch mechanism
in commit 6e498158a8
("tipc: move link synch and failover to link aggregation level"),
we missed the case when the non-tunnel link goes down during the link
synchronization period. In this case the tunnel link will remain in
state LINK_SYNCHING, something leading to unpredictable behavior when
the failover procedure is initiated.

In this commit, we ensure that the node and remaining link goes
back to regular communication state (SELF_UP_PEER_UP/LINK_ESTABLISHED)
when one of the parallel links goes down. We also ensure that we don't
re-enter synch mode if subsequent SYNCH packets arrive on the remaining
link.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-23 16:14:45 -07:00
Jon Paul Maloy 17b2063077 tipc: eliminate risk of premature link setup during failover
When a link goes down, and there is still a working link towards its
destination node, a failover is initiated, and the failed link is not
allowed to re-establish until that procedure is finished. To ensure
this, the concerned link endpoints are set to state LINK_FAILINGOVER,
and the node endpoints to NODE_FAILINGOVER during the failover period.

However, if the link reset is due to a disabled bearer, the corres-
ponding link endpoint is deleted, and only the node endpoint knows
about the ongoing failover. Now, if the disabled bearer is re-enabled
during the failover period, the discovery mechanism may create a new
link endpoint that is ready to be established, despite that this is not
permitted. This situation may cause both the ongoing failover and any
subsequent link synchronization to fail.

In this commit, we ensure that a newly created link goes directly to
state LINK_FAILINGOVER if the corresponding node state is
NODE_FAILINGOVER. This eliminates the problem described above.

Furthermore, we tighten the criteria for which packets are allowed
to end a failover state in the function tipc_node_check_state().
By checking that the receiving link is up and running, instead of just
checking that it is not in failover mode, we eliminate the risk that
protocol packets from the re-created link may cause the failover to
be prematurely terminated.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-23 16:14:45 -07:00
Jon Paul Maloy 440d8963cd tipc: clean up link creation
We simplify the link creation function tipc_link_create() and the way
the link struct it is connected to the node struct. In particular, we
remove the duplicate initialization of some fields which are anyway set
in tipc_link_reset().

Tested-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30 17:25:15 -07:00