Commit Graph

83 Commits

Author SHA1 Message Date
Jon Paul Maloy
dd3f9e70f5 tipc: add packet sequence number at instant of transmission
Currently, the packet sequence number is updated and added to each
packet at the moment a packet is added to the link backlog queue.
This is wasteful, since it forces the code to traverse the send
packet list packet by packet when adding them to the backlog queue.
It would be better to just splice the whole packet list into the
backlog queue when that is the right action to do.

In this commit, we do this change. Also, since the sequence numbers
cannot now be assigned to the packets at the moment they are added
the backlog queue, we do instead calculate and add them at the moment
of transmission, when the backlog queue has to be traversed anyway.
We do this in the function tipc_link_push_packet().

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14 12:24:46 -04:00
Jon Paul Maloy
f21e897ecc tipc: improve link congestion algorithm
The link congestion algorithm used until now implies two problems.

- It is too generous towards lower-level messages in situations of high
  load by giving "absolute" bandwidth guarantees to the different
  priority levels. LOW traffic is guaranteed 10%, MEDIUM is guaranted
  20%, HIGH is guaranteed 30%, and CRITICAL is guaranteed 40% of the
  available bandwidth. But, in the absence of higher level traffic, the
  ratio between two distinct levels becomes unreasonable. E.g. if there
  is only LOW and MEDIUM traffic on a system, the former is guaranteed
  1/3 of the bandwidth, and the latter 2/3. This again means that if
  there is e.g. one LOW user and 10 MEDIUM users, the  former will have
  33.3% of the bandwidth, and the others will have to compete for the
  remainder, i.e. each will end up with 6.7% of the capacity.

- Packets of type MSG_BUNDLER are created at SYSTEM importance level,
  but only after the packets bundled into it have passed the congestion
  test for their own respective levels. Since bundled packets don't
  result in incrementing the level counter for their own importance,
  only occasionally for the SYSTEM level counter, they do in practice
  obtain SYSTEM level importance. Hence, the current implementation
  provides a gap in the congestion algorithm that in the worst case
  may lead to a link reset.

We now refine the congestion algorithm as follows:

- A message is accepted to the link backlog only if its own level
  counter, and all superior level counters, permit it.

- The importance of a created bundle packet is set according to its
  contents. A bundle packet created from messges at levels LOW to
  CRITICAL is given importance level CRITICAL, while a bundle created
  from a SYSTEM level message is given importance SYSTEM. In the latter
  case only subsequent SYSTEM level messages are allowed to be bundled
  into it.

This solves the first problem described above, by making the bandwidth
guarantee relative to the total number of users at all levels; only
the upper limit for each level remains absolute. In the example
described above, the single LOW user would use 1/11th of the bandwidth,
the same as each of the ten MEDIUM users, but he still has the same
guarantee against starvation as the latter ones.

The fix also solves the second problem. If the CRITICAL level is filled
up by bundle packets of that level, no lower level packets will be
accepted any more.

Suggested-by: Gergely Kiss <gergely.kiss@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14 12:24:46 -04:00
Jon Paul Maloy
e4bf4f7696 tipc: simplify packet sequence number handling
Although the sequence number in the TIPC protocol is 16 bits, we have
until now stored it internally as an unsigned 32 bits integer.
We got around this by always doing explicit modulo-65535 operations
whenever we need to access a sequence number.

We now make the incoming and outgoing sequence numbers to unsigned
16-bit integers, and remove the modulo operations where applicable.

We also move the arithmetic inline functions for 16 bit integers
to core.h, and the function buf_seqno() to msg.h, so they can easily
be accessed from anywhere in the code.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14 12:24:46 -04:00
Jon Paul Maloy
dff29b1a88 tipc: eliminate delayed link deletion at link failover
When a bearer is disabled manually, all its links have to be reset
and deleted. However, if there is a remaining, parallel link ready
to take over a deleted link's traffic, we currently delay the delete
of the removed link until the failover procedure is finished. This
is because the remaining link needs to access state from the reset
link, such as the last received packet number, and any partially
reassembled buffer, in order to perform a successful failover.

In this commit, we do instead move the state data over to the new
link, so that it can fulfill the procedure autonomously, without
accessing any data on the old link. This means that we can now
proceed and delete all pertaining links immediately when a bearer
is disabled. This saves us from some unnecessary complexity in such
situations.

We also choose to change the confusing definitions CHANGEOVER_PROTOCOL,
ORIGINAL_MSG and DUPLICATE_MSG to the more descriptive TUNNEL_PROTOCOL,
FAILOVER_MSG and SYNCH_MSG respectively.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:27:12 -04:00
Jon Paul Maloy
8b4ed8634f tipc: eliminate race condition at dual link establishment
Despite recent improvements, the establishment of dual parallel
links still has a small glitch where messages can bypass each
other. When the second link in a dual-link configuration is
established, part of the first link's traffic will be steered over
to the new link. Although we do have a mechanism to ensure that
packets sent before and after the establishment of the new link
arrive in sequence to the destination node, this is not enough.
The arriving messages will still be delivered upwards in different
threads, something entailing a risk of message disordering during
the transition phase.

To fix this, we introduce a synchronization mechanism between the
two parallel links, so that traffic arriving on the new link cannot
be added to its input queue until we are guaranteed that all
pre-establishment messages have been delivered on the old, parallel
link.

This problem seems to always have been around, but its occurrence is
so rare that it has not been noticed until recent intensive testing.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 14:05:56 -04:00
Jon Paul Maloy
3127a0200d tipc: clean up handling of link congestion
After the recent changes in message importance handling it becomes
possible to simplify handling of messages and sockets when we
encounter link congestion.

We merge the function tipc_link_cong() into link_schedule_user(),
and simplify the code of the latter. The code should now be
easier to follow, especially regarding return codes and handling
of the message that caused the situation.

In case the scheduling function is unable to pre-allocate a wakeup
message buffer, it now returns -ENOBUFS, which is a more correct
code than the previously used -EHOSTUNREACH.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 14:05:56 -04:00
Jon Paul Maloy
e3eea1eb47 tipc: clean up handling of message priorities
Messages transferred by TIPC are assigned an "importance priority", -an
integer value indicating how to treat the message when there is link or
destination socket congestion.

There is no separate header field for this value. Instead, the message
user values have been chosen in ascending order according to perceived
importance, so that the message user field can be used for this.

This is not a good solution. First, we have many more users than the
needed priority levels, so we end up with treating more priority
levels than necessary. Second, the user field cannot always
accurately reflect the priority of the message. E.g., a message
fragment packet should really have the priority of the enveloped
user data message, and not the priority of the MSG_FRAGMENTER user.
Until now, we have been working around this problem in different ways,
but it is now time to implement a consistent way of handling such
priorities, although still within the constraint that we cannot
allocate any more bits in the regular data message header for this.

In this commit, we define a new priority level, TIPC_SYSTEM_IMPORTANCE,
that will be the only one used apart from the four (lower) user data
levels. All non-data messages map down to this priority. Furthermore,
we take some free bits from the MSG_FRAGMENTER header and allocate
them to store the priority of the enveloped message. We then adjust
the functions msg_importance()/msg_set_importance() so that they
read/set the correct header fields depending on user type.

This small protocol change is fully compatible, because the code at
the receiving end of a link currently reads the importance level
only from user data messages, where there is no change.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-14 14:38:32 -04:00
Jon Paul Maloy
05dcc5aa4d tipc: split link outqueue
struct tipc_link contains one single queue for outgoing packets,
where both transmitted and waiting packets are queued.

This infrastructure is hard to maintain, because we need
to keep a number of fields to keep track of which packets are
sent or unsent, and the number of packets in each category.

A lot of code becomes simpler if we split this queue into a transmission
queue, where sent/unacknowledged packets are kept, and a backlog queue,
where we keep the not yet sent packets.

In this commit we do this separation.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-14 14:38:32 -04:00
Jon Paul Maloy
cf2157f88a tipc: move message validation function to msg.c
The function link_buf_validate() is in reality re-entrant and context
independent, and will in later commits be called from several locations.
Therefore, we move it to msg.c, make it outline and rename the it to
tipc_msg_validate().

We also redesign the function to make proper use of pskb_may_pull()

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-14 14:38:32 -04:00
Jon Paul Maloy
7764d6e83d tipc: add framework for node capabilities exchange
The TIPC protocol spec has defined a 13 bit capability bitmap in
the neighbor discovery header, as a means to maintain compatibility
between different code and protocol generations. Until now this field
has been unused.

We now introduce the basic framework for exchanging capabilities
between nodes at first contact. After exchange, a peer node's
capabilities are stored as a 16 bit bitmap in struct tipc_node.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-14 14:38:32 -04:00
Erik Hugne
d0f91938be tipc: add ip/udp media type
The ip/udp bearer can be configured in a point-to-point
mode by specifying both local and remote ip/hostname,
or it can be enabled in multicast mode, where links are
established to all tipc nodes that have joined the same
multicast group. The multicast IP address is generated
based on the TIPC network ID, but can be overridden by
using another multicast address as remote ip.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-05 22:08:42 -05:00
Erik Hugne
91e2eb5684 tipc: rename media/msg related definitions
The TIPC_MEDIA_ADDR_SIZE and TIPC_MEDIA_ADDR_OFFSET names
are misleading, as they actually define the size and offset of
the whole media info field and not the address part. This patch
does not have any functional changes.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27 18:18:48 -05:00
Jon Paul Maloy
cb1b728096 tipc: eliminate race condition at multicast reception
In a previous commit in this series we resolved a race problem during
unicast message reception.

Here, we resolve the same problem at multicast reception. We apply the
same technique: an input queue serializing the delivery of arriving
buffers. The main difference is that here we do it in two steps.
First, the broadcast link feeds arriving buffers into the tail of an
arrival queue, which head is consumed at the socket level, and where
destination lookup is performed. Second, if the lookup is successful,
the resulting buffer clones are fed into a second queue, the input
queue. This queue is consumed at reception in the socket just like
in the unicast case. Both queues are protected by the same lock, -the
one of the input queue.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 16:00:03 -08:00
Jon Paul Maloy
c637c10355 tipc: resolve race problem at unicast message reception
TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.

We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.

This solves the sequentiality problem, and seems to cause no measurable
performance degradation.

A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 16:00:02 -08:00
Jon Paul Maloy
e3a77561e7 tipc: split up function tipc_msg_eval()
The function tipc_msg_eval() is in reality doing two related, but
different tasks. First it tries to find a new destination for named
messages, in case there was no first lookup, or if the first lookup
failed. Second, it does what its name suggests, evaluating the validity
of the message and its destination, and returning an appropriate error
code depending on the result.

This is confusing, and in this commit we choose to break it up into two
functions. A new function, tipc_msg_lookup_dest(), first attempts to find
a new destination, if the message is of the right type. If this lookup
fails, or if the message should not be subject to a second lookup, the
already existing tipc_msg_reverse() is called. This function performs
prepares the message for rejection, if applicable.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 16:00:02 -08:00
Jon Paul Maloy
c5898636c4 tipc: reduce usage of context info in socket and link
The most common usage of namespace information is when we fetch the
own node addess from the net structure. This leads to a lot of
passing around of a parameter of type 'struct net *' between
functions just to make them able to obtain this address.

However, in many cases this is unnecessary. The own node address
is readily available as a member of both struct tipc_sock and
tipc_link, and can be fetched from there instead.
The fact that the vast majority of functions in socket.c and link.c
anyway are maintaining a pointer to their respective base structures
makes this option even more compelling.

In this commit, we introduce the inline functions tsk_own_node()
and link_own_node() to make it easy for functions to fetch the node
address from those structs instead of having to pass along and
dereference the namespace struct.

In particular, we make calls to the msg_xx() functions in msg.{h,c}
context independent by directly passing them the own node address
as parameter when needed. Those functions should be regarded as
leaves in the code dependency tree, and it is hence desirable to
keep them namspace unaware.

Apart from a potential positive effect on cache behavior, these
changes make it easier to introduce the changes that will follow
later in this series.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 16:00:01 -08:00
Ying Xue
3474753954 tipc: make tipc node address support net namespace
If net namespace is supported in tipc, each namespace will be treated
as a separate tipc node. Therefore, every namespace must own its
private tipc node address. This means the "tipc_own_addr" global
variable of node address must be moved to tipc_net structure to
satisfy the requirement. It's turned out that users also can assign
node address for every namespace.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:33 -05:00
Ying Xue
4ac1c8d0ee tipc: name tipc name table support net namespace
TIPC name table is used to store the mapping relationship between
TIPC service name and socket port ID. When tipc supports namespace,
it allows users to publish service names only owned by a certain
namespace. Therefore, every namespace must have its private name
table to prevent service names published to one namespace from being
contaminated by other service names in another namespace. Therefore,
The name table global variable (ie, nametbl) and its lock must be
moved to tipc_net structure, and a parameter of namespace must be
added for necessary functions so that they can obtain name table
variable defined in tipc_net structure.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:33 -05:00
Ying Xue
1da465683a tipc: make tipc broadcast link support net namespace
TIPC broadcast link is statically established and its relevant states
are maintained with the global variables: "bcbearer", "bclink" and
"bcl". Allowing different namespace to own different broadcast link
instances, these variables must be moved to tipc_net structure and
broadcast link instances would be allocated and initialized when
namespace is created.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:33 -05:00
Ying Xue
859fc7c0ce tipc: cleanup core.c and core.h files
Only the works of initializing and shutting down tipc module are done
in core.h and core.c files, so all stuffs which are not closely
associated with the two tasks should be moved to appropriate places.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:31 -05:00
Ying Xue
2f55c43788 tipc: remove unnecessary wrapper functions of kernel timer APIs
Not only some wrapper function like k_term_timer() is empty, but also
some others including k_start_timer() and k_cancel_timer() don't return
back any value to its caller, what's more, there is no any component
in the kernel world to do such thing. Therefore, these timer interfaces
defined in tipc module should be purged.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:31 -05:00
Ying Xue
a6ca109443 tipc: use generic SKB list APIs to manage TIPC outgoing packet chains
Use standard SKB list APIs associated with struct sk_buff_head to
manage socket outgoing packet chain and name table outgoing packet
chain, having relevant code simpler and more readable.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:30:17 -05:00
Ying Xue
58dc55f256 tipc: use generic SKB list APIs to manage link transmission queue
Use standard SKB list APIs associated with struct sk_buff_head to
manage link transmission queue, having relevant code more clean.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:30:17 -05:00
Ying Xue
58311d1690 tipc: eliminate two pseudo message types of BUNDLE_OPEN and BUNDLE_CLOSED
The pseudo message types of BUNDLE_CLOSED as well as BUNDLE_OPEN are
used to flag whether or not more messages can be bundled into a data
packet in the outgoing transmission queue. Obviously, no more messages
can be appended after the packet has been sent and is waiting to be
acknowledged and deleted. These message types do in reality represent
a send-side local implementation flag, and are not defined as part of
the protocol. It is therefore safe to move it to to where it belongs,
that is, the control area (TIPC_SKB_CB) of the buffer.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:30:17 -05:00
Al Viro
45dcc687f7 tipc_msg_build(): pass msghdr instead of its ->msg_iov
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 05:16:41 -05:00