Commit Graph

6313 Commits

Author SHA1 Message Date
Duan Jiong
c92a59eca8 ipv6: handle Redirect ICMP Message with no Redirected Header option
rfc 4861 says the Redirected Header option is optional, so
the kernel should not drop the Redirect Message that has no
Redirected Header option. In this patch, the function
ip6_redirect_no_header() is introduced to deal with that
condition.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
2013-08-22 20:08:21 -07:00
Jesper Dangaard Brouer
8a8e3d84b1 net_sched: restore "linklayer atm" handling
commit 56b765b79 ("htb: improved accuracy at high rates")
broke the "linklayer atm" handling.

 tc class add ... htb rate X ceil Y linklayer atm

The linklayer setting is implemented by modifying the rate table
which is send to the kernel.  No direct parameter were
transferred to the kernel indicating the linklayer setting.

The commit 56b765b79 ("htb: improved accuracy at high rates")
removed the use of the rate table system.

To keep compatible with older iproute2 utils, this patch detects
the linklayer by parsing the rate table.  It also supports future
versions of iproute2 to send this linklayer parameter to the
kernel directly. This is done by using the __reserved field in
struct tc_ratespec, to convey the choosen linklayer option, but
only using the lower 4 bits of this field.

Linklayer detection is limited to speeds below 100Mbit/s, because
at high rates the rtab is gets too inaccurate, so bad that
several fields contain the same values, this resembling the ATM
detect.  Fields even start to contain "0" time to send, e.g. at
1000Mbit/s sending a 96 bytes packet cost "0", thus the rtab have
been more broken than we first realized.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-15 01:43:08 -07:00
Pravin B Shelar
4221f40513 ip_tunnel: Do not use inner ip-header-id for tunnel ip-header-id.
Using inner-id for tunnel id is not safe in some rare cases.
E.g. packets coming from multiple sources entering same tunnel
can have same id. Therefore on tunnel packet receive we
could have packets from two different stream but with same
source and dst IP with same ip-id which could confuse ip packet
reassembly.

Following patch reverts optimization from commit
490ab08127 (IP_GRE: Fix IP-Identification.)

CC: Jarno Rajahalme <jrajahalme@nicira.com>
CC: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-13 16:52:50 -07:00
Eliezer Tamir
288a937637 net: rename busy poll MIB counter
Rename mib counter from "low latency" to "busy poll"

v1 also moved the counter to the ip MIB (suggested by Shawn Bohrer)
Eric Dumazet suggested that the current location is better.

So v2 just renames the counter to fit the new naming convention.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09 11:39:08 -07:00
Eliezer Tamir
bf37d2b3fd busy_poll: cleanup do-nothing placeholders
When renaming ll_poll to busy poll, I introduced a typo
in the name of the do-nothing placeholder for sk_busy_loop
and called it sk_busy_poll.
This broke compile when busy poll was not configured.
Cong Wang submitted a patch to fixed that.
This patch removes the now redundant, misspelled placeholder.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-04 19:41:27 -07:00
Cong Wang
e0d1095ae3 net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLL
Eliezer renames several *ll_poll to *busy_poll, but forgets
CONFIG_NET_LL_RX_POLL, so in case of confusion, rename it too.

Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01 15:11:17 -07:00
Cong Wang
dfcefb0be1 net: fix a compile error when CONFIG_NET_LL_RX_POLL is not set
When CONFIG_NET_LL_RX_POLL is not set, I got:

net/socket.c: In function ‘sock_poll’:
net/socket.c:1165:4: error: implicit declaration of function ‘sk_busy_loop’ [-Werror=implicit-function-declaration]

Fix this by adding a nop when !CONFIG_NET_LL_RX_POLL.

Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01 15:10:58 -07:00
Michal Kubeček
2ac3ac8f86 ipv6: prevent fib6_run_gc() contention
On a high-traffic router with many processors and many IPv6 dst
entries, soft lockup in fib6_run_gc() can occur when number of
entries reaches gc_thresh.

This happens because fib6_run_gc() uses fib6_gc_lock to allow
only one thread to run the garbage collector but ip6_dst_gc()
doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc()
returns. On a system with many entries, this can take some time
so that in the meantime, other threads pass the tests in
ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for
the lock. They then have to run the garbage collector one after
another which blocks them for quite long.

Resolve this by replacing special value ~0UL of expire parameter
to fib6_run_gc() by explicit "force" parameter to choose between
spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with
force=false if gc_thresh is reached but not max_size.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01 14:16:20 -07:00
John W. Linville
22e02a0272 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-08-01 14:30:59 -04:00
Joe Perches
d9d10a3096 ndisc: Add missing inline to ndisc_addr_option_pad
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 15:18:17 -07:00
Samuel Ortiz
9ea7187c53 NFC: netlink: Rename CMD_FW_UPLOAD to CMD_FW_DOWNLOAD
Loading a firmware into a target is typically called firmware
download, not firmware upload. So we rename the netlink API to
NFC_CMD_FW_DOWNLOAD in order to avoid any terminology confusion from
userspace.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-07-31 01:19:43 +02:00
Linus Torvalds
be9c6d9169 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Just a bunch of small fixes and tidy ups:

   1) Finish the "busy_poll" renames, from Eliezer Tamir.

   2) Fix RCU stalls in IFB driver, from Ding Tianhong.

   3) Linearize buffers properly in tun/macvtap zerocopy code.

   4) Don't crash on rmmod in vxlan, from Pravin B Shelar.

   5) Spinlock used before init in alx driver, from Maarten Lankhorst.

   6) A sparse warning fix in bnx2x broke TSO checksums, fix from Dmitry
      Kravkov.

   7) Dummy and ifb driver load failure paths can oops, fixes from Tan
      Xiaojun and Ding Tianhong.

   8) Correct MTU calculations in IP tunnels, from Alexander Duyck.

   9) Account all TCP retransmits in SNMP stats properly, from Yuchung
      Cheng.

  10) atl1e and via-rhine do not handle DMA mapping failures properly,
      from Neil Horman.

  11) Various equal-cost multipath route fixes in ipv6 from Hannes
      Frederic Sowa"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (36 commits)
  ipv6: only static routes qualify for equal cost multipathing
  via-rhine: fix dma mapping errors
  atl1e: fix dma mapping warnings
  tcp: account all retransmit failures
  usb/net/r815x: fix cast to restricted __le32
  usb/net/r8152: fix integer overflow in expression
  net: access page->private by using page_private
  net: strict_strtoul is obsolete, use kstrtoul instead
  drivers/net/ieee802154: don't use devm_pinctrl_get_select_default() in probe
  drivers/net/ethernet/cadence: don't use devm_pinctrl_get_select_default() in probe
  drivers/net/can/c_can: don't use devm_pinctrl_get_select_default() in probe
  net/usb: add relative mii functions for r815x
  net/tipc: use %*phC to dump small buffers in hex form
  qlcnic: Adding Maintainers.
  gre: Fix MTU sizing check for gretap tunnels
  pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts
  pkt_sched: sch_qfq: improve efficiency of make_eligible
  gso: Update tunnel segmentation to support Tx checksum offload
  inet: fix spacing in assignment
  ifb: fix oops when loading the ifb failed
  ...
2013-07-13 17:42:22 -07:00
Linus Torvalds
19d2f8e0fb Merge tag 'for-linus-3.11-merge-window-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull second round of 9p patches from Eric Van Hensbergen:
 "Several of these patches were rebased in order to correct style
  issues.  Only stylistic changes were made versus the patches which
  were in linux-next for two weeks.  The rebases have been in linux-next
  for 3 days and have passed my regressions.

  The bulk of these are RDMA fixes and improvements.  There's also some
  additions on the extended attributes front to support some additional
  namespaces and a new option for TCP to force allocation of mount
  requests from a priviledged port"

* tag 'for-linus-3.11-merge-window-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  fs/9p: Remove the unused variable "err" in v9fs_vfs_getattr()
  9P: Add cancelled() to the transport functions.
  9P/RDMA: count posted buffers without a pending request
  9P/RDMA: Improve error handling in rdma_request
  9P/RDMA: Do not free req->rc in error handling in rdma_request()
  9P/RDMA: Use a semaphore to protect the RQ
  9P/RDMA: Protect against duplicate replies
  9P/RDMA: increase P9_RDMA_MAXSIZE to 1MB
  9pnet: refactor struct p9_fcall alloc code
  9P/RDMA: rdma_request() needs not allocate req->rc
  9P: Fix fcall allocation for rdma
  fs/9p: xattr: add trusted and security namespaces
  net/9p: add privport option to 9p tcp transport
2013-07-11 10:21:23 -07:00
Eliezer Tamir
64b0dc517e net: rename busy poll socket op and globals
Rename LL_SO to BUSY_POLL_SO
Rename sysctl_net_ll_{read,poll} to sysctl_busy_{read,poll}
Fix up users of these variables.
Fix documentation for sysctl.

a patch for the socket.7  man page will follow separately,
because of limitations of my mail setup.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-10 17:08:27 -07:00
Eliezer Tamir
8b80cda536 net: rename ll methods to busy-poll
Rename ndo_ll_poll to ndo_busy_poll.
Rename sk_mark_ll to sk_mark_napi_id.
Rename skb_mark_ll to skb_mark_napi_id.
Correct all useres of these functions.
Update comments and defines  in include/net/busy_poll.h

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-10 17:08:27 -07:00
Eliezer Tamir
076bb0c82a net: rename include/net/ll_poll.h to include/net/busy_poll.h
Rename the file and correct all the places where it is included.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-10 17:08:27 -07:00
Eliezer Tamir
76b1e9b981 net/fs: change busy poll time accounting
Suggested by Linus:
Changed time accounting for busy-poll:
- Make it microsecond based.
- Use unsigned longs.
- Revert back to use time_after instead of time_in_range.
Reorder poll/select busy loop conditions:
- Clear busy_flag after one time we can't busy-poll.
- Only init busy_end if we actually are going to busy-poll.
Added one more missing need_resched() test.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-09 12:33:04 -07:00
Eliezer Tamir
cbf55001b2 net: rename low latency sockets functions to busy poll
Rename functions in include/net/ll_poll.h to busy wait.
Clarify documentation about expected power use increase.
Rename POLL_LL to POLL_BUSY_LOOP.
Add need_resched() testing to poll/select busy loops.

Note, that in select and poll can_busy_poll is dynamic and is
updated continuously to reflect the existence of supported
sockets with valid queue information.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-08 19:25:45 -07:00
Simon Derr
80b45261a0 9P: Add cancelled() to the transport functions.
RDMA needs to post a buffer for each incoming reply.
Hence it needs to keep count of these and needs to be
aware of whether a flushed request has received a reply
or not.

This patch adds the cancelled() callback to the transport modules.
It is called when RFLUSH has been received and that the corresponding
request will never receive a reply.

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-07-07 22:18:18 -05:00
Jim Garlick
2f28c8b31d net/9p: add privport option to 9p tcp transport
If the privport option is specified, the tcp transport binds local
address to a reserved port before connecting to the 9p server.

In some cases when 9P AUTH cannot be implemented, this is better than
nothing.

Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-07-07 21:59:54 -05:00
Eric Dumazet
9eb5bf838d net: sock: fix TCP_SKB_MIN_TRUESIZE
commit eea86af6b1 ("net: sock: adapt SOCK_MIN_RCVBUF and
SOCK_MIN_SNDBUF") forgot the sk_buff alignment taken into account
in __alloc_skb() : skb->truesize = SKB_TRUESIZE(size);

While above commit fixed the sender issue, the receiver is still
dropping the second packet (on loopback device), because the receiver
socket can not really hold two skbs :
First packet truesize already is above sk_rcvbuf, so even TCP coalescing
cannot help.

On a typical 64bit build, each tcp skb truesize is 2304, instead of 2272

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-03 16:52:10 -07:00
Daniel Borkmann
c50cd35788 net: gre: move GSO functions to gre_offload
Similarly to TCP/UDP offloading, move all related GRE functions to
gre_offload.c to make things more explicit and similar to the rest
of the code.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-03 14:37:39 -07:00
Eliezer Tamir
419076f59f net: lls fix build with allnoconfig
correct placeholder declarations to prevent build breakage when
!CONFIG_NET_LL_RX_POLL

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 23:50:47 -07:00
Eliezer Tamir
1bc2774d86 net: convert lls to use time_in_range()
Time in range will fail safely if we move to a different cpu with an
extremely large clock skew.
Add time_in_range64() and convert lls to use it.

changelog:
v2
- fixed double call to sched_clock in can_poll_ll
- fixed checkpatchisms

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 15:53:53 -07:00
Hannes Frederic Sowa
8822b64a0f ipv6: call udp_push_pending_frames when uncorking a socket with AF_INET pending data
We accidentally call down to ip6_push_pending_frames when uncorking
pending AF_INET data on a ipv6 socket. This results in the following
splat (from Dave Jones):

skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:126!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
+netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37
task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
RIP: 0010:[<ffffffff816e759c>]  [<ffffffff816e759c>] skb_panic+0x63/0x65
RSP: 0018:ffff8801e6431de8  EFLAGS: 00010282
RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
FS:  00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
Call Trace:
 [<ffffffff8159a9aa>] skb_push+0x3a/0x40
 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0
 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140
 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0
 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20
 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0
 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0
 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40
 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20
 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0
 [<ffffffff816f5d54>] tracesys+0xdd/0xe2
Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
RIP  [<ffffffff816e759c>] skb_panic+0x63/0x65
 RSP <ffff8801e6431de8>

This patch adds a check if the pending data is of address family AF_INET
and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
if that is the case.

This bug was found by Dave Jones with trinity.

(Also move the initialization of fl6 below the AF_INET check, even if
not strictly necessary.)

Cc: Dave Jones <davej@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 12:44:18 -07:00