Commit Graph

4764 Commits

Author SHA1 Message Date
Neil Horman
2b73bc65e2 netprio_cgroup: fix wrong memory access when NETPRIO_CGROUP=m
When the netprio_cgroup module is not loaded, net_prio_subsys_id
is -1, and so sock_update_prioidx() accesses cgroup_subsys array
with negative index subsys[-1].

Make the code resembles cls_cgroup code, which is bug free.

Origionally-authored-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-10 15:08:57 -05:00
David S. Miller
16bda13d90 net: Make qdisc_skb_cb upper size bound explicit.
Just like skb->cb[], so that qdisc_skb_cb can be encapsulated inside
of other data structures.

This is intended to be used by IPoIB so that it can remember
addressing information stored at hard_header_ops->create() time that
it can fetch when the packet gets to the transmit routine.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-09 13:50:34 -05:00
Julian Anastasov
e6b45241c5 ipv4: reset flowi parameters on route connect
Eric Dumazet found that commit 813b3b5db8
(ipv4: Use caller's on-stack flowi as-is in output
route lookups.) that comes in 3.0 added a regression.
The problem appears to be that resulting flowi4_oif is
used incorrectly as input parameter to some routing lookups.
The result is that when connecting to local port without
listener if the IP address that is used is not on a loopback
interface we incorrectly assign RTN_UNICAST to the output
route because no route is matched by oif=lo. The RST packet
can not be sent immediately by tcp_v4_send_reset because
it expects RTN_LOCAL.

	So, change ip_route_connect and ip_route_newports to
update the flowi4 fields that are input parameters because
we do not want unnecessary binding to oif.

	To make it clear what are the input parameters that
can be modified during lookup and to show which fields of
floiw4 are reused add a new function to update the flowi4
structure: flowi4_update_output.

Thanks to Yurij M. Plotnikov for providing a bug report including a
program to reproduce the problem.

Thanks to Eric Dumazet for tracking the problem down to
tcp_v4_send_reset and providing initial fix.

Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-04 19:29:48 -05:00
Arun Sharma
efcdbf24fd net: Disambiguate kernel message
Some of our machines were reporting:

TCP: too many of orphaned sockets

even when the number of orphaned sockets was well below the
limit.

We print a different message depending on whether we're out
of TCP memory or there are too many orphaned sockets.

Also move the check out of line and cleanup the messages
that were printed.

Signed-off-by: Arun Sharma <asharma@fb.com>
Suggested-by: Mohan Srinivasan <mohan@fb.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: David Miller <davem@davemloft.net>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-01 14:41:50 -05:00
Glauber Costa
4acb41903b net/tcp: Fix tcp memory limits initialization when !CONFIG_SYSCTL
sysctl_tcp_mem() initialization was moved to sysctl_tcp_ipv4.c
in commit 3dc43e3e4d, since it
became a per-ns value.

That code, however, will never run when CONFIG_SYSCTL is
disabled, leading to bogus values on those fields - causing hung
TCP sockets.

This patch fixes it by keeping an initialization code in
tcp_init(). It will be overwritten by the first net namespace
init if CONFIG_SYSCTL is compiled in, and do the right thing if
it is compiled out.

It is also named properly as tcp_init_mem(), to properly signal
its non-sysctl side effect on TCP limits.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Glauber Costa <glommer@parallels.com>
Cc: David S. Miller <davem@davemloft.net>
Link: http://lkml.kernel.org/r/4F22D05A.8030604@parallels.com
[ renamed the function, tidied up the changelog a bit ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-30 12:41:06 -05:00
Eric W. Biederman
5ee4433efe netns: Fail conspicously if someone uses net_generic at an inappropriate time.
By definition net_generic should never be called when it can return
NULL.  Fail conspicously with a BUG_ON to make it clear when people mess
up that a NULL return should never happen.

Recently there was a bug in the CAIF subsystem where it was registered
with register_pernet_device instead of register_pernet_subsys.  It was
erroneously concluded that net_generic could validly return NULL and
that net_assign_generic was buggy (when it was just inefficient).
Hopefully this BUG_ON will prevent people to coming to similar erroneous
conclusions in the futrue.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-27 21:06:02 -05:00
Glauber Costa
9018e93948 net: explicitly add jump_label.h header to sock.h
Commit 36a1211970 removed linux/module.h
include statement from one of the headers that end up in net/sock.h.
It was providing us with static_branch() definition implicitly, so
after its removal the build got broken.

To fix this, and avoid having this happening in the future,
let me do the right thing and include linux/jump_label.h
explicitly in sock.h.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-26 17:13:26 -05:00
Linus Torvalds
701b259f44 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Davem says:

1) Fix JIT code generation on x86-64 for divide by zero, from Eric Dumazet.

2) tg3 header length computation correction from Eric Dumazet.

3) More build and reference counting fixes for socket memory cgroup
   code from Glauber Costa.

4) module.h snuck back into a core header after all the hard work we
   did to remove that, from Paul Gortmaker and Jesper Dangaard Brouer.

5) Fix PHY naming regression and add some new PCI IDs in stmmac, from
   Alessandro Rubini.

6) Netlink message generation fix in new team driver, should only advertise
   the entries that changed during events, from Jiri Pirko.

7) SRIOV VF registration and unregistration fixes, and also add a
   missing PCI ID, from Roopa Prabhu.

8) Fix infinite loop in tx queue flush code of brcmsmac, from Stanislaw Gruszka.

9) ftgmac100/ftmac100 build fix, missing interrupt.h include.

10) Memory leak fix in net/hyperv do_set_mutlicast() handling, from Wei Yongjun.

11) Off by one fix in netem packet scheduler, from Vijay Subramanian.

12) TCP loss detection fix from Yuchung Cheng.

13) TCP reset packet MD5 calculation uses wrong address, fix from Shawn Lu.

14) skge carrier assertion and DMA mapping fixes from Stephen Hemminger.

15) Congestion recovery undo performed at the wrong spot in BIC and CUBIC
    congestion control modules, fix from Neal Cardwell.

16) Ethtool ETHTOOL_GSSET_INFO is unnecessarily restrictive, from Michał Mirosław.

17) Fix triggerable race in ipv6 sysctl handling, from Francesco Ruggeri.

18) Statistics bug fixes in mlx4 from Eugenia Emantayev.

19) rds locking bug fix during info dumps, from your's truly.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits)
  rds: Make rds_sock_lock BH rather than IRQ safe.
  netprio_cgroup.h: dont include module.h from other includes
  net: flow_dissector.c missing include linux/export.h
  team: send only changed options/ports via netlink
  net/hyperv: fix possible memory leak in do_set_multicast()
  drivers/net: dsa/mv88e6xxx.c files need linux/module.h
  stmmac: added PCI identifiers
  llc: Fix race condition in llc_ui_recvmsg
  stmmac: fix phy naming inconsistency
  dsa: Add reporting of silicon revision for Marvell 88E6123/88E6161/88E6165 switches.
  tg3: fix ipv6 header length computation
  skge: add byte queue limit support
  mv643xx_eth: Add Rx Discard and Rx Overrun statistics
  bnx2x: fix compilation error with SOE in fw_dump
  bnx2x: handle CHIP_REVISION during init_one
  bnx2x: allow user to change ring size in ISCSI SD mode
  bnx2x: fix Big-Endianess in ethtool -t
  bnx2x: fixed ethtool statistics for MF modes
  bnx2x: credit-leakage fixup on vlan_mac_del_all
  macvlan: fix a possible use after free
  ...
2012-01-24 15:51:40 -08:00
Paul Gortmaker
36a1211970 netprio_cgroup.h: dont include module.h from other includes
A considerable effort was invested in wiping out module.h
from being present in all the other standard includes.  This
one leaked back in, but once again isn't strictly necessary,
so remove it.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-24 16:42:26 -05:00
Randy Dunlap
2f6c76aa5f kernel-doc: fix new warnings in cfg80211.h
Fix new kernel-doc warnings:

Warning(include/net/cfg80211.h:1165): No description found for parameter 'channel_type'
Warning(include/net/cfg80211.h:2090): No description found for parameter 'probe_resp_offload'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc:	Johannes Berg <johannes@sipsolutions.net>
Cc:	linux-wireless@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-23 08:44:54 -08:00
David S. Miller
b1cc16b8e6 bluetooth: hci: Fix type of "enable_hs" to bool.
Fixes:

net/bluetooth/hci_core.c: In function ‘__check_enable_hs’:
net/bluetooth/hci_core.c:2587:1: warning: return from incompatible pointer type [enabled by default]

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-22 15:08:46 -05:00
Glauber Costa
0e90b31f4b net: introduce res_counter_charge_nofail() for socket allocations
There is a case in __sk_mem_schedule(), where an allocation
is beyond the maximum, but yet we are allowed to proceed.
It happens under the following condition:

	sk->sk_wmem_queued + size >= sk->sk_sndbuf

The network code won't revert the allocation in this case,
meaning that at some point later it'll try to do it. Since
this is never communicated to the underlying res_counter
code, there is an inbalance in res_counter uncharge operation.

I see two ways of fixing this:

1) storing the information about those allocations somewhere
   in memcg, and then deducting from that first, before
   we start draining the res_counter,
2) providing a slightly different allocation function for
   the res_counter, that matches the original behavior of
   the network code more closely.

I decided to go for #2 here, believing it to be more elegant,
since #1 would require us to do basically that, but in a more
obscure way.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
CC: Tejun Heo <tj@kernel.org>
CC: Li Zefan <lizf@cn.fujitsu.com>
CC: Laurent Chavey <chavey@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-22 15:08:46 -05:00
Glauber Costa
376be5ff8a net: fix socket memcg build with !CONFIG_NET
There is still a build bug with the sock memcg code, that triggers
with !CONFIG_NET, that survived my series of randconfig builds.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
CC: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-22 15:08:45 -05:00
Randy Dunlap
1a3bc369ba kernel-doc: fix new warning in net/sock.h
Fix new kernel-doc warning:

Warning(include/net/sock.h:372): No description found for parameter 'sk_cgrp_prioidx'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-22 15:08:45 -05:00
Eric Dumazet
747465ef7a net: fix some sparse errors
make C=2 CF="-D__CHECK_ENDIAN__" M=net

And fix flowi4_init_output() prototype for sport

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-17 10:31:12 -05:00
Linus Torvalds
7c17d86a85 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits)
  pptp: Accept packet with seq zero
  RDS: Remove some unused iWARP code
  net: fsl: fec: handle 10Mbps speed in RMII mode
  drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c: add missing iounmap
  drivers/net/ethernet/tundra/tsi108_eth.c: add missing iounmap
  ksz884x: fix mtu for VLAN
  net_sched: sfq: add optional RED on top of SFQ
  dp83640: Fix NOHZ local_softirq_pending 08 warning
  gianfar: Fix invalid TX frames returned on error queue when time stamping
  gianfar: Fix missing sock reference when processing TX time stamps
  phylib: introduce mdiobus_alloc_size()
  net: decrement memcg jump label when limit, not usage, is changed
  net: reintroduce missing rcu_assign_pointer() calls
  inet_diag: Rename inet_diag_req_compat into inet_diag_req
  inet_diag: Rename inet_diag_req into inet_diag_req_v2
  bond_alb: don't disable softirq under bond_alb_xmit
  mac80211: fix rx->key NULL pointer dereference in promiscuous mode
  nl80211: fix old station flags compatibility
  mdio-octeon: use an unique MDIO bus name.
  mdio-gpio: use an unique MDIO bus name.
  ...
2012-01-12 20:30:02 -08:00
Eric Dumazet
ddecf0f4db net_sched: sfq: add optional RED on top of SFQ
Adds an optional Random Early Detection on each SFQ flow queue.

Traditional SFQ limits count of packets, while RED permits to also
control number of bytes per flow, and adds ECN capability as well.

1) We dont handle the idle time management in this RED implementation,
since each 'new flow' begins with a null qavg. We really want to address
backlogged flows.

2) if headdrop is selected, we try to ecn mark first packet instead of
currently enqueued packet. This gives faster feedback for tcp flows
compared to traditional RED [ marking the last packet in queue ]

Example of use :

tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 4sec sfq \
	limit 3000 headdrop flows 512 divisor 16384 \
	redflowlimit 100000 min 8000 max 60000 probability 0.20 ecn

qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop
flows 512/16384 divisor 16384
 ewma 6 min 8000b max 60000b probability 0.2 ecn
 prob_mark 0 prob_mark_head 4876 prob_drop 6131
 forced_mark 0 forced_mark_head 0 forced_drop 0
 Sent 1175211782 bytes 777537 pkt (dropped 6131, overlimits 11007
requeues 0)
 rate 99483Kbit 8219pps backlog 689392b 456p requeues 0

In this test, with 64 netperf TCP_STREAM sessions, 50% using ECN enabled
flows, we can see number of packets CE marked is smaller than number of
drops (for non ECN flows)

If same test is run, without RED, we can check backlog is much bigger.

qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop
flows 512/16384 divisor 16384
 Sent 1148683617 bytes 795006 pkt (dropped 0, overlimits 0 requeues 0)
 rate 98429Kbit 8521pps backlog 1221290b 841p requeues 0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Dave Taht <dave.taht@gmail.com>
Tested-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12 20:05:28 -08:00
David S. Miller
9ee6045f09 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2012-01-12 12:10:00 -08:00
Linus Torvalds
609eac1c15 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  fs/9p: iattr_valid flags are kernel internal flags map them to 9p values.
  fs/9p: We should not allocate a new inode when creating hardlines.
  fs/9p: v9fs_stat2inode should update suid/sgid bits.
  9p: Reduce object size with CONFIG_NET_9P_DEBUG
  fs/9p: check schedule_timeout_interruptible return value

Fix up trivial conflicts in fs/9p/{vfs_inode.c,vfs_inode_dotl.c} due to
debug messages having changed to use p9_debug() on one hand, and the
changes for umode_t on the other.
2012-01-10 15:09:01 -08:00
John W. Linville
874c60bad9 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-next 2012-01-10 15:44:17 -05:00
Linus Torvalds
6b3da11b3c Merge branch 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: Remove irqsafe_cpu_xxx variants

Fix up conflict in arch/x86/include/asm/percpu.h due to clash with
cebef5beed ("x86: Fix and improve percpu_cmpxchg{8,16}b_double()")
which edited the (now removed) irqsafe_cpu_cmpxchg*_double code.
2012-01-09 13:08:28 -08:00
Stephen Rothwell
475f1b5264 net: sk_update_clone is only used in net/core/sock.c
so move it there.  Fixes build errors when CONFIG_INET is not defined:

In file included from include/linux/tcp.h:211:0,
                 from include/linux/ipv6.h:221,
                 from include/net/ipv6.h:16,
                 from include/linux/sunrpc/clnt.h:26,
                 from include/linux/nfs_fs.h:50,
                 from init/do_mounts.c:20:
include/net/sock.h: In function 'sk_update_clone':
include/net/sock.h:1109:3: error: implicit declaration of function 'sock_update_memcg' [-Werror=implicit-function-declaration]

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-08 23:44:26 -08:00
Linus Torvalds
98793265b4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
  Kconfig: acpi: Fix typo in comment.
  misc latin1 to utf8 conversions
  devres: Fix a typo in devm_kfree comment
  btrfs: free-space-cache.c: remove extra semicolon.
  fat: Spelling s/obsolate/obsolete/g
  SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
  tools/power turbostat: update fields in manpage
  mac80211: drop spelling fix
  types.h: fix comment spelling for 'architectures'
  typo fixes: aera -> area, exntension -> extension
  devices.txt: Fix typo of 'VMware'.
  sis900: Fix enum typo 'sis900_rx_bufer_status'
  decompress_bunzip2: remove invalid vi modeline
  treewide: Fix comment and string typo 'bufer'
  hyper-v: Update MAINTAINERS
  treewide: Fix typos in various parts of the kernel, and fix some comments.
  clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
  gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
  leds: Kconfig: Fix typo 'D2NET_V2'
  sound: Kconfig: drop unknown symbol ARCH_CLPS7500
  ...

Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
kconfig additions, close to removed commented-out old ones)
2012-01-08 13:21:22 -08:00
Glauber Costa
f3f511e1ce net: fix sock_clone reference mismatch with tcp memcontrol
Sockets can also be created through sock_clone. Because it copies
all data in the sock structure, it also copies the memcg-related pointer,
and all should be fine. However, since we now use reference counts in
socket creation, we are left with some sockets that have no reference
counts. It matters when we destroy them, since it leads to a mismatch.

Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: David S. Miller <davem@davemloft.net>
CC: Greg Thelen <gthelen@google.com>
CC: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com>
CC: Laurent Chavey <chavey@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-07 10:16:34 -08:00
Eric Dumazet
eeca6688d6 net_sched: red: split red_parms into parms and vars
This patch splits the red_parms structure into two components.

One holding the RED 'constant' parameters, and one containing the
variables.

This permits a size reduction of GRED qdisc, and is a preliminary step
to add an optional RED unit to SFQ.

SFQRED will have a single red_parms structure shared by all flows, and a
private red_vars per flow.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Dave Taht <dave.taht@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-05 14:01:21 -05:00