Commit Graph

7777 Commits

Author SHA1 Message Date
Xin Long
df2c71ffdf sctp: add SCTP_ASCONF_SUPPORTED sockopt
SCTP_ASCONF_SUPPORTED sockopt is used to set enpoint's asconf
flag. With this feature, each endpoint will have its own flag
for its future asoc's asconf_capable, instead of netns asconf
flag.

Note that when both ep's asconf_enable and auth_enable are
enabled, SCTP_CID_ASCONF and SCTP_CID_ASCONF_ACK should be
added into auth_chunk_list.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19 18:27:28 -07:00
Heiner Kallweit
99b60d56a3 net: phy: add EEE-related constants
Add EEE-related constants. This includes the new MMD EEE registers for
NBase-T / 802.3bz.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19 13:04:45 -07:00
David S. Miller
446bf64b61 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge conflict of mlx5 resolved using instructions in merge
commit 9566e650bf.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19 11:54:03 -07:00
Linus Torvalds
06821504fd Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

  1) Fix jmp to 1st instruction in x64 JIT, from Alexei Starovoitov.

  2) Severl kTLS fixes in mlx5 driver, from Tariq Toukan.

  3) Fix severe performance regression due to lack of SKB coalescing of
     fragments during local delivery, from Guillaume Nault.

  4) Error path memory leak in sch_taprio, from Ivan Khoronzhuk.

  5) Fix batched events in skbedit packet action, from Roman Mashak.

  6) Propagate VLAN TX offload to hw_enc_features in bond and team
     drivers, from Yue Haibing.

  7) RXRPC local endpoint refcounting fix and read after free in
     rxrpc_queue_local(), from David Howells.

  8) Fix endian bug in ibmveth multicast list handling, from Thomas
     Falcon.

  9) Oops, make nlmsg_parse() wrap around the correct function,
     __nlmsg_parse not __nla_parse(). Fix from David Ahern.

 10) Memleak in sctp_scend_reset_streams(), fro Zheng Bin.

 11) Fix memory leak in cxgb4, from Wenwen Wang.

 12) Yet another race in AF_PACKET, from Eric Dumazet.

 13) Fix false detection of retransmit failures in tipc, from Tuong
     Lien.

 14) Use after free in ravb_tstamp_skb, from Tho Vu.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits)
  ravb: Fix use-after-free ravb_tstamp_skb
  netfilter: nf_tables: map basechain priority to hardware priority
  net: sched: use major priority number as hardware priority
  wimax/i2400m: fix a memory leak bug
  net: cavium: fix driver name
  ibmvnic: Unmap DMA address of TX descriptor buffers after use
  bnxt_en: Fix to include flow direction in L2 key
  bnxt_en: Use correct src_fid to determine direction of the flow
  bnxt_en: Suppress HWRM errors for HWRM_NVM_GET_VARIABLE command
  bnxt_en: Fix handling FRAG_ERR when NVM_INSTALL_UPDATE cmd fails
  bnxt_en: Improve RX doorbell sequence.
  bnxt_en: Fix VNIC clearing logic for 57500 chips.
  net: kalmia: fix memory leaks
  cx82310_eth: fix a memory leak bug
  bnx2x: Fix VF's VLAN reconfiguration in reload.
  Bluetooth: Add debug setting for changing minimum encryption key size
  tipc: fix false detection of retransmit failures
  lan78xx: Fix memory leaks
  MAINTAINERS: r8169: Update path to the driver
  MAINTAINERS: PHY LIBRARY: Update files in the record
  ...
2019-08-19 10:00:01 -07:00
Ido Schimmel
0f420b6c52 devlink: Add packet trap infrastructure
Add the basic packet trap infrastructure that allows device drivers to
register their supported packet traps and trap groups with devlink.

Each driver is expected to provide basic information about each
supported trap, such as name and ID, but also the supported metadata
types that will accompany each packet trapped via the trap. The
currently supported metadata type is just the input port, but more will
be added in the future. For example, output port and traffic class.

Trap groups allow users to set the action of all member traps. In
addition, users can retrieve per-group statistics in case per-trap
statistics are too narrow. In the future, the trap group object can be
extended with more attributes, such as policer settings which will limit
the amount of traffic generated by member traps towards the CPU.

Beside registering their packet traps with devlink, drivers are also
expected to report trapped packets to devlink along with relevant
metadata. devlink will maintain packets and bytes statistics for each
packet trap and will potentially report the trapped packet with its
metadata to user space via drop monitor netlink channel.

The interface towards the drivers is simple and allows devlink to set
the action of the trap. Currently, only two actions are supported:
'trap' and 'drop'. When set to 'trap', the device is expected to provide
the sole copy of the packet to the driver which will pass it to devlink.
When set to 'drop', the device is expected to drop the packet and not
send a copy to the driver. In the future, more actions can be added,
such as 'mirror'.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17 12:40:08 -07:00
Ido Schimmel
8e94c3bc92 drop_monitor: Allow user to start monitoring hardware drops
Drop monitor has start and stop commands, but so far these were only
used to start and stop monitoring of software drops.

Now that drop monitor can also monitor hardware drops, we should allow
the user to control these as well.

Do that by adding SW and HW flags to these commands. If no flag is
specified, then only start / stop monitoring software drops. This is
done in order to maintain backward-compatibility with existing user
space applications.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17 12:40:08 -07:00
Ido Schimmel
d40e1deb93 drop_monitor: Add support for summary alert mode for hardware drops
In summary alert mode a notification is sent with a list of recent drop
reasons and a count of how many packets were dropped due to this reason.

To avoid expensive operations in the context in which packets are
dropped, each CPU holds an array whose number of entries is the maximum
number of drop reasons that can be encoded in the netlink notification.
Each entry stores the drop reason and a count. When a packet is dropped
the array is traversed and a new entry is created or the count of an
existing entry is incremented.

Later, in process context, the array is replaced with a newly allocated
copy and the old array is encoded in a netlink notification. To avoid
breaking user space, the notification includes the ancillary header,
which is 'struct net_dm_alert_msg' with number of entries set to '0'.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17 12:40:08 -07:00
Ido Schimmel
5e58109b1e drop_monitor: Add support for packet alert mode for hardware drops
In a similar fashion to software drops, extend drop monitor to send
netlink events when packets are dropped by the underlying hardware.

The main difference is that instead of encoding the program counter (PC)
from which kfree_skb() was called in the netlink message, we encode the
hardware trap name. The two are mostly equivalent since they should both
help the user understand why the packet was dropped.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17 12:40:08 -07:00
David S. Miller
8714652fcd Merge tag 'linux-can-next-for-5.4-20190814' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:

====================
pull-request: can-next 2019-08-14

this is a pull request for net-next/master consisting of 41 patches.

The first two patches are for the kvaser_pciefd driver: Christer Beskow
removes unnecessary code in the kvaser_pciefd_pwm_stop() function,
YueHaibing removes the unused including of <linux/version.h>.

In the next patch YueHaibing also removes the unused including of
<linux/version.h> in the f81601 driver.

In the ti_hecc driver the next 6 patches are by me and fix checkpatch
warnings. YueHaibing's patch removes an unused variable in the
ti_hecc_mailbox_read() function.

The next 6 patches all target the xilinx_can driver. Anssi Hannula's
patch fixes a chip start failure with an invalid bus. The patch by
Venkatesh Yadav Abbarapu skips an error message in case of a deferred
probe. The 3 patches by Appana Durga Kedareswara rao fix the RX and TX
path for CAN-FD frames. Srinivas Neeli's patch fixes the bit timing
calculations for CAN-FD.

The next 12 patches are by me and several checkpatch warnings in the
af_can, raw and bcm components.

Thomas Gleixner provides a patch for the bcm, which switches the timer
to HRTIMER_MODE_SOFT and removes the hrtimer_tasklet.

Then 6 more patches by me for the gw component, which fix checkpatch
warnings, followed by 2 patches by Oliver Hartkopp to add CAN-FD
support.

The vcan driver gets 3 patches by me, fixing checkpatch warnings.

And finally a patch by Andre Hartmann to fix typos in CAN's netlink
header.
====================
2019-08-15 12:43:22 -07:00
Jeremy Sowden
707816c8b0 netfilter: remove deprecation warnings from uapi headers.
There are two netfilter userspace headers which contain deprecation
warnings.  While these headers are not used within the kernel, they are
compiled stand-alone for header-testing.

Pablo informs me that userspace iptables still refer to these headers,
and the intention was to use xt_LOG.h instead and remove these, but
userspace was never updated.

Remove the warnings.

Fixes: 2a475c409f ("kbuild: remove all netfilter headers from header-test blacklist.")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-14 23:36:27 +02:00
Linus Torvalds
a8dba0531b Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Doug Ledford:
 "Fairly small pull request for -rc3. I'm out of town the rest of this
  week, so I made sure to clean out as much as possible from patchworks
  in enough time for 0-day to chew through it (Yay! for 0-day being back
  online! :-)). Jason might send through any emergency stuff that could
  pop up, otherwise I'm back next week.

  The only real thing of note is the siw ABI change. Since we just
  merged siw *this* release, there are no prior kernel releases to
  maintain kernel ABI with. I told Bernard that if there is anything
  else about the siw ABI he thinks he might want to change before it
  goes set in stone, he should get it in ASAP. The siw module was around
  for several years outside the kernel tree, and it had to be revamped
  considerably for inclusion upstream, so we are making no attempts to
  be backward compatible with the out of tree version. Once 5.3 is
  actually released, we will have our baseline ABI to maintain.

  Summary:

   - Fix a memory registration release flow issue that was causing a
     WARN_ON (mlx5)

   - If the counters for a port aren't allocated, then we can't do
     operations on the non-existent counters (core)

   - Check the right variable for error code result (mlx5)

   - Fix a use after free issue (mlx5)

   - Fix an off by one memory leak (siw)

   - Actually return an error code on error (core)

   - Allow siw to be built on 32bit arches (siw, ABI change, but OK
     since siw was just merged this merge window and there is no prior
     released kernel to maintain compatibility with and we also updated
     the rdma-core user space package to match)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/siw: Change CQ flags from 64->32 bits
  RDMA/core: Fix error code in stat_get_doit_qp()
  RDMA/siw: Fix a memory leak in siw_init_cpulist()
  IB/mlx5: Fix use-after-free error while accessing ev_file pointer
  IB/mlx5: Check the correct variable in error handling code
  RDMA/counter: Prevent QP counter binding if counters unsupported
  IB/mlx5: Fix implicit MR release flow
2019-08-14 11:10:38 -07:00
Jakub Kicinski
c162610c7d Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next:

1) Rename mss field to mss_option field in synproxy, from Fernando Mancera.

2) Use SYSCTL_{ZERO,ONE} definitions in conntrack, from Matteo Croce.

3) More strict validation of IPVS sysctl values, from Junwei Hu.

4) Remove unnecessary spaces after on the right hand side of assignments,
   from yangxingwu.

5) Add offload support for bitwise operation.

6) Extend the nft_offload_reg structure to store immediate date.

7) Collapse several ip_set header files into ip_set.h, from
   Jeremy Sowden.

8) Make netfilter headers compile with CONFIG_KERNEL_HEADER_TEST=y,
   from Jeremy Sowden.

9) Fix several sparse warnings due to missing prototypes, from
   Valdis Kletnieks.

10) Use static lock initialiser to ensure connlabel spinlock is
    initialized on boot time to fix sched/act_ct.c, patch
    from Florian Westphal.
====================

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-08-13 18:22:57 -07:00
Jakub Kicinski
708852dcac Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
The following pull-request contains BPF updates for your *net-next* tree.

There is a small merge conflict in libbpf (Cc Andrii so he's in the loop
as well):

        for (i = 1; i <= btf__get_nr_types(btf); i++) {
                t = (struct btf_type *)btf__type_by_id(btf, i);

                if (!has_datasec && btf_is_var(t)) {
                        /* replace VAR with INT */
                        t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
  <<<<<<< HEAD
                        /*
                         * using size = 1 is the safest choice, 4 will be too
                         * big and cause kernel BTF validation failure if
                         * original variable took less than 4 bytes
                         */
                        t->size = 1;
                        *(int *)(t+1) = BTF_INT_ENC(0, 0, 8);
                } else if (!has_datasec && kind == BTF_KIND_DATASEC) {
  =======
                        t->size = sizeof(int);
                        *(int *)(t + 1) = BTF_INT_ENC(0, 0, 32);
                } else if (!has_datasec && btf_is_datasec(t)) {
  >>>>>>> 72ef80b5ee
                        /* replace DATASEC with STRUCT */

Conflict is between the two commits 1d4126c4e1 ("libbpf: sanitize VAR to
conservative 1-byte INT") and b03bc6853c ("libbpf: convert libbpf code to
use new btf helpers"), so we need to pick the sanitation fixup as well as
use the new btf_is_datasec() helper and the whitespace cleanup. Looks like
the following:

  [...]
                if (!has_datasec && btf_is_var(t)) {
                        /* replace VAR with INT */
                        t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
                        /*
                         * using size = 1 is the safest choice, 4 will be too
                         * big and cause kernel BTF validation failure if
                         * original variable took less than 4 bytes
                         */
                        t->size = 1;
                        *(int *)(t + 1) = BTF_INT_ENC(0, 0, 8);
                } else if (!has_datasec && btf_is_datasec(t)) {
                        /* replace DATASEC with STRUCT */
  [...]

The main changes are:

1) Addition of core parts of compile once - run everywhere (co-re) effort,
   that is, relocation of fields offsets in libbpf as well as exposure of
   kernel's own BTF via sysfs and loading through libbpf, from Andrii.

   More info on co-re: http://vger.kernel.org/bpfconf2019.html#session-2
   and http://vger.kernel.org/lpc-bpf2018.html#session-2

2) Enable passing input flags to the BPF flow dissector to customize parsing
   and allowing it to stop early similar to the C based one, from Stanislav.

3) Add a BPF helper function that allows generating SYN cookies from XDP and
   tc BPF, from Petar.

4) Add devmap hash-based map type for more flexibility in device lookup for
   redirects, from Toke.

5) Improvements to XDP forwarding sample code now utilizing recently enabled
   devmap lookups, from Jesper.

6) Add support for reporting the effective cgroup progs in bpftool, from Jakub
   and Takshak.

7) Fix reading kernel config from bpftool via /proc/config.gz, from Peter.

8) Fix AF_XDP umem pages mapping for 32 bit architectures, from Ivan.

9) Follow-up to add two more BPF loop tests for the selftest suite, from Alexei.

10) Add perf event output helper also for other skb-based program types, from Allan.

11) Fix a co-re related compilation error in selftests, from Yonghong.
====================

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-08-13 16:24:57 -07:00
Bernard Metzler
2c8ccb37b0 RDMA/siw: Change CQ flags from 64->32 bits
This patch changes the driver/user shared (mmapped) CQ notification
flags field from unsigned 64-bits size to unsigned 32-bits size. This
enables building siw on 32-bit architectures.

This patch changes the siw-abi, but as siw was only just merged in
this merge window cycle, there are no released kernels with the prior
abi.  We are making no attempt to be binary compatible with siw user
space libraries prior to the merge of siw into the upstream kernel,
only moving forward with upstream kernels and upstream rdma-core
provided siw libraries are we guaranteeing compatibility.

Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20190809151816.13018-1-bmt@zurich.ibm.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-13 12:22:06 -04:00
Andre Hartmann
3ca3c4aad2 can: netlink: fix documentation typos
This patch fixes some documentation typos in struct can_bittiming_const.

Signed-off-by: Andre Hartmann <aha_1980@gmx.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2019-08-13 17:32:21 +02:00
Oliver Hartkopp
456a8a646b can: gw: add support for CAN FD frames
Introduce CAN FD support which needs an extension of the netlink API to
pass CAN FD type content to the kernel which has a different size to
Classic CAN. Additionally the struct canfd_frame has a new 'flags' element
that can now be modified with can-gw.

The new CGW_FLAGS_CAN_FD option flag defines whether the routing job
handles Classic CAN or CAN FD frames. This setting is very strict at
reception time and enables the new possibilities, e.g. CGW_FDMOD_* and
modifying the flags element of struct canfd_frame, only when
CGW_FLAGS_CAN_FD is set.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2019-08-13 17:32:21 +02:00
Oliver Hartkopp
e9dc7c6050 can: gw: use struct canfd_frame as internal data structure
To prepare the CAN FD support this patch implements the first adaptions in
data structures for CAN FD without changing the current functionality.

Additionally some code at the end of this patch is moved or indented to
simplify the review of the next implementation step.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2019-08-13 17:32:21 +02:00
Jeremy Sowden
a1b2f04ea5 netfilter: add missing includes to a number of header-files.
A number of netfilter header-files used declarations and definitions
from other headers without including them.  Added include directives to
make those declarations and definitions available.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-13 12:14:39 +02:00
Ido Schimmel
e9feb58020 drop_monitor: Expose tail drop counter
Previous patch made the length of the per-CPU skb drop list
configurable. Expose a counter that shows how many packets could not be
enqueued to this list.

This allows users determine the desired queue length.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-11 10:53:30 -07:00
Ido Schimmel
30328d46af drop_monitor: Make drop queue length configurable
In packet alert mode, each CPU holds a list of dropped skbs that need to
be processed in process context and sent to user space. To avoid
exhausting the system's memory the maximum length of this queue is
currently set to 1000.

Allow users to tune the length of this queue according to their needs.
The configured length is reported to user space when drop monitor
configuration is queried.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-11 10:53:30 -07:00
Ido Schimmel
444be061d0 drop_monitor: Add a command to query current configuration
Users should be able to query the current configuration of drop monitor
before they start using it. Add a command to query the existing
configuration which currently consists of alert mode and packet
truncation length.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-11 10:53:30 -07:00
Ido Schimmel
57986617a7 drop_monitor: Allow truncation of dropped packets
When sending dropped packets to user space it is not always necessary to
copy the entire packet as usually only the headers are of interest.

Allow user to specify the truncation length and add the original length
of the packet as additional metadata to the netlink message.

By default no truncation is performed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-11 10:53:30 -07:00
Ido Schimmel
ca30707dee drop_monitor: Add packet alert mode
So far drop monitor supported only one alert mode in which a summary of
locations in which packets were recently dropped was sent to user space.

This alert mode is sufficient in order to understand that packets were
dropped, but lacks information to perform a more detailed analysis.

Add a new alert mode in which the dropped packet itself is passed to
user space along with metadata: The drop location (as program counter
and resolved symbol), ingress netdevice and drop timestamp. More
metadata can be added in the future.

To avoid performing expensive operations in the context in which
kfree_skb() is invoked (can be hard IRQ), the dropped skb is cloned and
queued on per-CPU skb drop list. Then, in process context the netlink
message is allocated, prepared and finally sent to user space.

The per-CPU skb drop list is limited to 1000 skbs to prevent exhausting
the system's memory. Subsequent patches will make this limit
configurable and also add a counter that indicates how many skbs were
tail dropped.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-11 10:53:30 -07:00
Ido Schimmel
28315f7999 drop_monitor: Add alert mode operations
The next patch is going to add another alert mode in which the dropped
packet is notified to user space, instead of only a summary of recent
drops.

Abstract the differences between the modes by adding alert mode
operations. The operations are selected based on the currently
configured mode and associated with the probes and the work item just
before tracing starts.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-11 10:53:30 -07:00
Daniel Borkmann
cd48bdda4f sock: make cookie generation global instead of per netns
Generating and retrieving socket cookies are a useful feature that is
exposed to BPF for various program types through bpf_get_socket_cookie()
helper.

The fact that the cookie counter is per netns is quite a limitation
for BPF in practice in particular for programs in host namespace that
use socket cookies as part of a map lookup key since they will be
causing socket cookie collisions e.g. when attached to BPF cgroup hooks
or cls_bpf on tc egress in host namespace handling container traffic
from veth or ipvlan devices with peer in different netns. Change the
counter to be global instead.

Socket cookie consumers must assume the value as opqaue in any case.
Not every socket must have a cookie generated and knowledge of the
counter value itself does not provide much value either way hence
conversion to global is fine.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Martynas Pumputis <m@lambda.lt>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-09 13:14:46 -07:00