Saeed Mahameed says:
====================
Introduce mlx5 ethernet timestamping
This patch series introduces the support for ConnectX-4 timestamping
and the PTP kernel interface.
Changes from V2:
net/mlx5_core: Introduce access function to read internal_timer
- Remove one line function
- Change function name
net/mlx5e: Add HW timestamping (TS) support:
- Data path performance optimization (caching tstamp struct in rq,sq)
- Change read/write_lock_irqsave to read/write_lock
- Move ioctl functions to en_clock file
- Changed overflow start algorithm according to comments from Richard
- Move timestamp init/cleanup to open/close ndos.
In details:
1st patch prevents the driver from modifying skb->data and SKB CB in
device xmit function.
2nd patch adds the needed low level helpers for:
- Fetching the hardware clock (hardware internal timer)
- Parsing CQEs timestamps
- Device frequency capability
3rd patch adds new en_clock.c file that handles all needed timestamping
operations:
- Internal clock structure initialization and other helper functions
- Added the needed ioctl for setting/getting the current timestamping
configuration.
- used this configuration in RX/TX data path to fill the SKB with
the timestamp.
4th patch Introduces PTP (PHC) support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a PHC support to the mlx5_en driver. Use reader/writer spinlocks to
protect the timecounter since every packet received needs to call
timecounter_cycle2time() when timestamping is enabled. This can become
a performance bottleneck with RSS and multiple receive queues if normal
spinlocks are used.
The driver has been tested with both Documentation/ptp/testptp and the
linuxptp project (http://linuxptp.sourceforge.net/) on a Mellanox
ConnectX-4 card.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for enable/disable HW timestamping for incoming and/or
outgoing packets. To enable/disable HW timestamping appropriate
ioctl should be used. Currently HWTSTAMP_FILTER_ALL/NONE and
HWTSAMP_TX_ON/OFF only are supported. Make all relevant changes in
RX/TX flows to consider TS request and plant HW timestamps into
relevant structures.
Add internal clock for converting hardware timestamp to nanoseconds. In
addition, add a service task to catch internal clock overflow, to make
sure timestamping is accurate.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A preparation step which adds support for reading the hardware
internal timer and the hardware timestamping from the CQE.
In addition, advertize device_frequency_khz HCA capability.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the SKB is cloned, or has an elevated users count, someone else
can be looking at it at the same time.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long says:
====================
sctp: use transport hashtable to replace association's with rhashtable
for telecom center, the usual case is that a server is connected by thousands
of clients. but if the server with only one enpoint(udp style) use the same
sport and dport to communicate with every clients, and every assoc in server
will be hashed in the same chain of global assoc hashtable due to currently we
choose dport and sport as the hash key.
when a packet is received, sctp_rcv try to find the assoc with sport and dport,
since that chain is too long to find it fast, it make the performance turn to
very low, some test data is as follow:
in server:
$./ss [start a udp style server there]
in client:
$./cc [start 2500 sockets to connect server with same port and different ip,
and use one of them to send data to server]
===== test on net-next
-- perf top
server:
55.73% [kernel] [k] sctp_assoc_is_match
6.80% [kernel] [k] sctp_assoc_lookup_paddr
4.81% [kernel] [k] sctp_v4_cmp_addr
3.12% [kernel] [k] _raw_spin_unlock_irqrestore
1.94% [kernel] [k] sctp_cmp_addr_exact
client:
46.01% [kernel] [k] sctp_endpoint_lookup_assoc
5.55% libc-2.17.so [.] __libc_calloc
5.39% libc-2.17.so [.] _int_free
3.92% libc-2.17.so [.] _int_malloc
3.23% [kernel] [k] __memset
-- spent time
time is 487s, send pkt is 10000000
we need to change the way to calculate the hash key, to use lport +
rport + paddr as the hash key can avoid this issue.
besides, this patchset will use transport hashtable to replace
association hashtable to lookup with rhashtable api. get transport
first then get association by t->asoc. and also it will make tcp
style work better.
===== test with this patchset:
-- perf top
server:
15.98% [kernel] [k] _raw_spin_unlock_irqrestore
9.92% [kernel] [k] __pv_queued_spin_lock_slowpath
7.22% [kernel] [k] copy_user_generic_string
2.38% libpthread-2.17.so [.] __recvmsg_nocancel
1.88% [kernel] [k] sctp_recvmsg
client:
11.90% [kernel] [k] sctp_hash_cmp
8.52% [kernel] [k] rht_deferred_worker
4.94% [kernel] [k] __pv_queued_spin_lock_slowpath
3.95% [kernel] [k] sctp_bind_addr_match
2.49% [kernel] [k] __memset
-- spent time
time is 22s, send pkt is 10000000
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_endpoint_lookup_assoc is called in the protection of sock lock
there is no need to call local_bh_disable in this function. so remove
them.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
transport hashtable will replace the association hashtable,
so association hashtable is not used in sctp any more, so
drop the codes about that.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Traversal the transport rhashtable, get the association only once through
the condition assoc->peer.primary_path != transport.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc
and __sctp_lookup_association, it's invoked in the protection of sock
lock, it will be safe, but sctp_lookup_association need to call
rcu_read_lock() and to detect the t->dead to protect it.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tranport hashtbale will replace the association hashtable to do the
lookup for transport, and then get association by t->assoc, rhashtable
apis will be used because of it's resizable, scalable and using rcu.
lport + rport + paddr will be the base hashkey to locate the chain,
with net to protect one netns from another, then plus the laddr to
compare to get the target.
this patch will provider the lookup functions:
- sctp_epaddr_lookup_transport
- sctp_addrs_lookup_transport
hash/unhash functions:
- sctp_hash_transport
- sctp_unhash_transport
init/destroy functions:
- sctp_transport_hashtable_init
- sctp_transport_hashtable_destroy
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Craig Gallek says:
====================
Faster SO_REUSEPORT
This series contains two optimizations for the SO_REUSEPORT feature:
Faster lookup when selecting a socket for an incoming packet and
the ability to select the socket from the group using a BPF program.
This series only includes the UDP path. I plan to submit a follow-up
including the TCP path if the implementation in this series is
acceptable.
Changes in v4:
- pskb_may_pull is unnecessary with pskb_pull (per Alexei Starovoitov)
Changes in v3:
- skb_pull_inline -> pskb_pull (per Alexei Starovoitov)
- reuseport_attach* -> sk_reuseport_attach* and simple return statement
syntax change (per Daniel Borkmann)
Changes in v2:
- Fix ARM build; remove unnecessary include.
- Handle case where protocol header is not in linear section (per
Alexei Starovoitov).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This program will build classic and extended BPF programs and
validate the socket selection logic when used with
SO_ATTACH_REUSEPORT_CBPF and SO_ATTACH_REUSEPORT_EBPF.
It also validates the re-programing flow and several edge cases.
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expose socket options for setting a classic or extended BPF program
for use when selecting sockets in an SO_REUSEPORT group. These options
can be used on the first socket to belong to a group before bind or
on any socket in the group after bind.
This change includes refactoring of the existing sk_filter code to
allow reuse of the existing BPF filter validation checks.
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Include a struct sock_reuseport instance when a UDP socket binds to
a specific address for the first time with the reuseport flag set.
When selecting a socket for an incoming UDP packet, use the information
available in sock_reuseport if present.
This required adding an additional field to the UDP source address
equality function to differentiate between exact and wildcard matches.
The original use case allowed wildcard matches when checking for
existing port uses during bind. The new use case of adding a socket
to a reuseport group requires exact address matching.
Performance test (using a machine with 2 CPU sockets and a total of
48 cores): Create reuseport groups of varying size. Use one socket
from this group per user thread (pinning each thread to a different
core) calling recvmmsg in a tight loop. Record number of messages
received per second while saturating a 10G link.
10 sockets: 18% increase (~2.8M -> 3.3M pkts/s)
20 sockets: 14% increase (~2.9M -> 3.3M pkts/s)
40 sockets: 13% increase (~3.0M -> 3.4M pkts/s)
This work is based off a similar implementation written by
Ying Cai <ycai@google.com> for implementing policy-based reuseport
selection.
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct sock_reuseport is an optional shared structure referenced by each
socket belonging to a reuseport group. When a socket is bound to an
address/port not yet in use and the reuseport flag has been set, the
structure will be allocated and attached to the newly bound socket.
When subsequent calls to bind are made for the same address/port, the
shared structure will be updated to include the new socket and the
newly bound socket will reference the group structure.
Usually, when an incoming packet was destined for a reuseport group,
all sockets in the same group needed to be considered before a
dispatching decision was made. With this structure, an appropriate
socket can be found after looking up just one socket in the group.
This shared structure will also allow for more complicated decisions to
be made when selecting a socket (eg a BPF filter).
This work is based off a similar implementation written by
Ying Cai <ycai@google.com> for implementing policy-based reuseport
selection.
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko says:
====================
mlxsw: couple of fixes
Couple of fixes from Ido.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bridge port attributes are offloaded to hardware when invoked with SELF
flag set, but it really makes no sense to reflect them when port is not
bridged.
Allow a user to change these attribute only when port is bridged and
initialize them correctly when joining or leaving a bridge.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the bridge status of physical ports in the appropriate functions, to
be consistent with LAG join/leave and vPorts joining/leaving bridge.
Also, remove the error messages in these two functions, as we already
emit errors in both the single functions they call.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible for us to fail when joining or leaving a bridge, so let
the user know about that by returning NOTIFY_BAD, as already done for
LAG join/leave and 802.1D bridges.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We set PVID to 1 in mlxsw_sp_port_vlan_init(), so we can remove this
statement.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The cphy_ops structures are never modified, so declare them as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
ARM allmodconfig fails because of the addition of the FMAN driver:
drivers/built-in.o: In function `dtsec_restart_autoneg':
binder.c:(.text+0x173328): undefined reference to `mdiobus_read'
binder.c:(.text+0x173348): undefined reference to `mdiobus_write'
drivers/built-in.o: In function `dtsec_config':
binder.c:(.text+0x173d24): undefined reference to `of_phy_find_device'
drivers/built-in.o: In function `init_phy':
binder.c:(.text+0x1763b0): undefined reference to `of_phy_connect'
drivers/built-in.o: In function `stop':
binder.c:(.text+0x176014): undefined reference to `phy_stop'
drivers/built-in.o: In function `start':
binder.c:(.text+0x176078): undefined reference to `phy_start'
The reason is that the driver uses PHYLIB, but that is a loadable
module here, and fman itself is built-in.
This patch makes it possible to configure fman as a module as well
so we don't change the status of PHYLIB in an allmodconfig kernel,
and it adds a 'select PHYLIB' statement to ensure that phylib is
always built-in when fman is.
The driver uses "builtin_platform_driver(fman_driver);", which means
it cannot be unloaded, but it's still possible to have it as a loadable
module that gets loaded once and never removed.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 5adae51a64 ("fsl/fman: Add FMan MURAM support")
Signed-off-by: David S. Miller <davem@davemloft.net>
Moving the caller of iptunnel_xmit_stats causes a build error in
randconfig builds that disable CONFIG_INET:
In file included from ../net/xfrm/xfrm_input.c:17:0:
../include/net/ip6_tunnel.h: In function 'ip6tunnel_xmit':
../include/net/ip6_tunnel.h:93:2: error: implicit declaration of function 'iptunnel_xmit_stats' [-Werror=implicit-function-declaration]
iptunnel_xmit_stats(dev, pkt_len);
The reason is that the iptunnel_xmit_stats definition is hidden
inside #ifdef CONFIG_INET but the caller is not. We can change
one or the other to fix it, and this patch adds a second #ifdef
around ip6tunnel_xmit() to avoid seeing the invalid call.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 039f50629b ("ip_tunnel: Move stats update to iptunnel_xmit()")
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Samuel Ortiz says:
====================
NFC 4.5 pull request
This is the first NFC pull request for 4.5 and it brings:
- A new driver for the STMicroelectronics ST95HF NFC chipset.
The ST95HF is an NFC digital transceiver with an embedded analog
front-end and as such relies on the Linux NFC digital
implementation. This is the 3rd user of the NFC digital stack.
- ACPI support for the ST st-nci and st21nfca drivers.
- A small improvement for the nfcsim driver, as we can now tune
the Rx delay through sysfs.
- A bunch of minor cleanups and small fixes from Christophe Ricard,
for a few drivers and the NFC core code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>