Commit Graph

448 Commits

Author SHA1 Message Date
Eli Cohen
50174a7f2c IB/core: Add interfaces to control VF attributes
Following the practice exercised for network devices which allow the PF
net device to configure attributes of its virtual functions, we
introduce the following functions to be used by IPoIB which is the
network driver implementation for IB devices.

ib_set_vf_link_state - set the policy for a VF link. More below.
ib_get_vf_config - read configuration information of a VF
ib_get_vf_stats - read VF statistics
ib_set_vf_guid - set the node or port GUID of a VF

Also add an indication in the device cap flags that indicates that this
IB devices is based on a virtual function.

A VF shares the physical port with the PF and other VFs. When setting
the link state we have three options:

1. Auto - in this mode, the virtual port follows the state of the
   physical port and becomes active only if the physical port's state is
   active. In all other cases it remains in a Down state.
2. Down - sets the state of the virtual port to Down
3. Up - causes the virtual port to transition into Initialize state if
   it was not already in this state. A virtualization aware subnet manager
   can then bring the state of the port into the Active state.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 17:13:14 -04:00
Eli Cohen
a0c1b2a350 IB/core: Support accessing SA in virtualized environment
Per the ongoing standardisation process, when virtual HCAs are present
in a network, traffic is routed based on a destination GID. In order to
access the SA we use the well known SA GID.

We also add a GRH required boolean field to the port attributes which is
used to report to the verbs consumer whether this port is connected to a
virtual network. We use this field to realize whether we need to create
an address vector with GRH to access the subnet administrator. We clear
the port attributes struct before calling the hardware driver to make
sure the default remains that GRH is not required.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:34:06 -04:00
Eli Cohen
fad61ad4e7 IB/core: Add subnet prefix to port info
The subnet prefix is a part of the port_info MAD returned and should be
available at the ib_port_attr struct. We define it here and provide a
default implementation in case the hardware driver does not provide one.
The subnet prefix is required when creating the address vector to access
the SA in networks where GRH must be used.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:34:06 -04:00
Leon Romanovsky
fb532d6a79 IB/{core, ulp} Support above 32 possible device capability flags
The old bitwise device_cap_flags variable was limited to u32 which
has all bits already defined. In order to overcome it, we converted
device_cap_flags variable to be u64 type.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:32:59 -04:00
Linus Torvalds
1200b6809d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support more Realtek wireless chips, from Jes Sorenson.

   2) New BPF types for per-cpu hash and arrap maps, from Alexei
      Starovoitov.

   3) Make several TCP sysctls per-namespace, from Nikolay Borisov.

   4) Allow the use of SO_REUSEPORT in order to do per-thread processing
   of incoming TCP/UDP connections.  The muxing can be done using a
   BPF program which hashes the incoming packet.  From Craig Gallek.

   5) Add a multiplexer for TCP streams, to provide a messaged based
      interface.  BPF programs can be used to determine the message
      boundaries.  From Tom Herbert.

   6) Add 802.1AE MACSEC support, from Sabrina Dubroca.

   7) Avoid factorial complexity when taking down an inetdev interface
      with lots of configured addresses.  We were doing things like
      traversing the entire address less for each address removed, and
      flushing the entire netfilter conntrack table for every address as
      well.

   8) Add and use SKB bulk free infrastructure, from Jesper Brouer.

   9) Allow offloading u32 classifiers to hardware, and implement for
      ixgbe, from John Fastabend.

  10) Allow configuring IRQ coalescing parameters on a per-queue basis,
      from Kan Liang.

  11) Extend ethtool so that larger link mode masks can be supported.
      From David Decotigny.

  12) Introduce devlink, which can be used to configure port link types
      (ethernet vs Infiniband, etc.), port splitting, and switch device
      level attributes as a whole.  From Jiri Pirko.

  13) Hardware offload support for flower classifiers, from Amir Vadai.

  14) Add "Local Checksum Offload".  Basically, for a tunneled packet
      the checksum of the outer header is 'constant' (because with the
      checksum field filled into the inner protocol header, the payload
      of the outer frame checksums to 'zero'), and we can take advantage
      of that in various ways.  From Edward Cree"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
  bonding: fix bond_get_stats()
  net: bcmgenet: fix dma api length mismatch
  net/mlx4_core: Fix backward compatibility on VFs
  phy: mdio-thunder: Fix some Kconfig typos
  lan78xx: add ndo_get_stats64
  lan78xx: handle statistics counter rollover
  RDS: TCP: Remove unused constant
  RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
  net: smc911x: convert pxa dma to dmaengine
  team: remove duplicate set of flag IFF_MULTICAST
  bonding: remove duplicate set of flag IFF_MULTICAST
  net: fix a comment typo
  ethernet: micrel: fix some error codes
  ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
  bpf, dst: add and use dst_tclassid helper
  bpf: make skb->tc_classid also readable
  net: mvneta: bm: clarify dependencies
  cls_bpf: reset class and reuse major in da
  ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
  ldmvsw: Add ldmvsw.c driver code
  ...
2016-03-19 10:05:34 -07:00
Mike Marciniszyn
ef086c0d5d IB/hfi1: Report pid in qp_stats to aid debug
Tracking user/QP ownership is needed to debug issues with
user ULPs like OpenMPI.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:19 -04:00
Doug Ledford
082eaa5083 Merge branches 'nes', 'cxgb4' and 'iwpm' into k.o/for-4.6 2016-03-16 13:57:43 -04:00
Faisal Latif
b493d91d33 iwcm: common code for port mapper
moved port mapper related code from drivers into common code

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Tatyana E. Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-16 13:47:52 -04:00
Doug Ledford
d2ad9cc759 Merge branches 'mlx4', 'mlx5' and 'ocrdma' into k.o/for-4.6 2016-03-16 13:38:28 -04:00
Doug Ledford
76b0640279 Merge branches 'ib_core', 'ib_ipoib', 'srpt', 'drain-cq-v4' and 'net/9p' into k.o/for-4.6 2016-03-14 17:42:57 -04:00
Mike Marciniszyn
14553ca110 staging/rdma/hfi1: Adaptive PIO for short messages
The change requires a new pio_busy field in the iowait structure to
track the number of outstanding pios.  The new counter together
with the sdma counter serve as the basis for a packet by packet decision
as to which egress mechanism to use.  Since packets given to different
egress mechanisms are not ordered, this scheme will preserve the order.

The iowait drain/wait mechanisms are extended for a pio case.  An
additional qp wait flag is added for the PIO drain wait case.

Currently the only pio wait is for buffers, so the no_bufs_available()
routine name is changed to pio_wait() and a third argument is passed
with one of the two pio wait flags to generalize the routine.  A module
parameter is added to hold a configurable threshold. For now, the
module parameter is zero.

A heuristic routine is added to return the func pointer of the proper
egress routine to use.

The heuristic is as follows:
- SMI always uses pio
- GSI,UD qps <= threshold use pio
- UD qps > threadhold use sdma
  o No coordination with sdma is required because order is not required
    and this qp pio count is not maintained for UD
- RC/UC ONLY packets <= threshold chose as follows:
  o If sdmas pending, use SDMA
  o Otherwise use pio and enable the pio tracking count at
    the time the pio buffer is allocated
- RC/UC ONLY packets > threshold use SDMA
  o If pio's are pending the pio_wait with the new wait flag is
    called to delay for pios to drain

The threshold is potentially reduced by the QP's mtu.

The sc_buffer_alloc() has two additional args (a callback, a void *)
which are exploited by the RC/UC cases to pass a new complete routine
and a qp *.

When the shadow ring completes the credit associated with a packet,
the new complete routine is called.  The verbs_pio_complete() will then
decrement the busy count and trigger any drain waiters in qp destroy
or reset.

Reviewed-by: Jubin John <jubin.john@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10 20:38:14 -05:00
Mike Marciniszyn
d2421a82f6 IB/rdmvt: close send engine struct holes
pahole noted the wasted 4 bytes after s_lock and r_lock.

Move s_flags and r_psn to fill the holes.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10 20:38:11 -05:00
Dennis Dalessandro
79a225be38 IB/rdmavt: Remove unnecessary exported functions
Remove exported functions which are no longer required as the
functionality has moved into rdmavt. This also requires re-ordering some
of the functions since their prototype no longer appears in a header
file. Rather than add forward declarations it is just cleaner to
re-order some of the functions.

Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10 20:38:10 -05:00
Dennis Dalessandro
0765b01b8e IB/rdmavt: Remove signal_supported and comments
Initially it was intended that rdmavt would support some signaling
between the underlying driver and itself. However this turned out to be
unnecessary for qib and hfi1. If we need to add something like this in
later to support another driver we should do it then. As of now this
essentially dead code so remove it.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10 20:38:10 -05:00
Dennis Dalessandro
ce73fe25a6 IB/rdmavt: Remove RVT_FLAGs
While hfi1 and qib were still supporting bits and pieces of core verbs
components there needed to be a way to convey if rdmavt should handle
allocation and initialize of resources like the queue pair table. Now
that all of this is moved into rdmavt there is no need for these flags.
They are no longer used in the drivers.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10 20:38:10 -05:00
Dennis Dalessandro
4eadd8ff21 IB/qib,rdmavt: Move smi_ah to qib
Rdmavt adopted an smi_ah from qib which is not needed by hfi1. Move this
back to qib and get it out of the common library.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10 20:38:09 -05:00
Dennis Dalessandro
1348d706f2 IB/rdmavt: Add per verb driver callback checking
For each verb validate that all requirements for driver callbacks are met.
If a function is called without checking for a valid pointer, it is a
required function. Also document what each callback function does.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10 20:38:09 -05:00
Dennis Dalessandro
90793f7179 IB/rdmavt: Clean up comments and add more documentation
Add, remove, and otherwise clean up existing comments that are leftover
from the initial code postings of rdmavt. Many of the comments were added
to provide an idea on the direction we were thinking of going. Now that the
design is solidified make a pass over and clean everything up. Also add
details where lacking.

Ensure all non static functions have nano comments.

Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10 20:38:08 -05:00
Mike Marciniszyn
46a80d62e6 IB/qib, staging/rdma/hfi1: add s_hlock for use in post send
This patch adds an additional lock to reduce contention on the s_lock.

This lock is used in post_send() so that the post_send is not
serialized with the send engine and other send related processing.

To do this the s_next_psn is now maintained on post_send() while
post_send() related fields are moved to a new cache line.  There is
an s_avail maintained for the post_send() to mitigate trading cache
lines with the send engine.  The lock is released/acquired around
releasing the just built packet to the egress mechanism.

Reviewed-by: Jubin John <jubin.john@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10 20:38:07 -05:00
Vennila Megavannan
bfee5e32e7 IB/rdmavt, staging/rdma/hfi1: use qps to dynamically scale timeout value
A busy_jiffies variable is maintained and updated when rc qps are
created and deleted. busy_jiffies is a scaled value of the number
of rc qps in the device. busy_jiffies is incremented every rc qp
scaling interval. busy_jiffies is added to the rc timeout
in add_retry_timer and mod_retry_timer. The rc qp scaling interval
is selected based on extensive performance evaluation of targeted
workloads.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vennila Megavannan <vennila.megavannan@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10 20:38:07 -05:00
Mike Marciniszyn
066fad2894 IB/rdmavt: remove unused qp field
The field is a vestige from ipath.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10 20:38:05 -05:00
Easwar Hariharan
e1bf0d5ecd staging/rdma/hfi1, IB/core: Fix LinkDownReason define for consistency
LinkDownReason LocalMediaNotInstalled lacked an underscore
and was inconsistent with other defines in the same family.
This patch fixes this.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10 20:37:59 -05:00
Harish Chegondi
61a650c14d IB/rdmavt: Add support for query_port, modify_port and get_port_immutable
rvt_query_port calls into the driver through a call back function
query_port_state to populate the rest of ib_port_attr elements.
rvt_modify_port calls into the driver if needed through a call back
function shut_down_port()

Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10 20:37:32 -05:00
Dennis Dalessandro
1f024992ef IB/rdmavt: Add query gid support.
Addin query gid support. Rdmavt still relies on the driver to maintain
the gid table. Rdmavt simply calls into the driver to retrive the guid
for a particular port.

Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10 20:37:32 -05:00
Dennis Dalessandro
f1badc7163 IB/rdmavt: Clean up distinction between port number and index
IB core uses 1 relative indexing for ports. All of our data structures
use 0 based indexing. Add an inline function that we can use whenever we
need to validate a legal value and try to convert a port number to a
port index at the entrance into rdmavt.

Try to follow the policy that when we are talking about a port from IB
core point of view we refer to it as a port number. When port is an
index into our arrays refer to it as a port index.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10 20:37:31 -05:00