Commit Graph

323573 Commits

Author SHA1 Message Date
Jozsef Kadlecsik b9fed74818 netfilter: ipset: Check and reject crazy /0 input parameters
bitmap:ip and bitmap:ip,mac type did not reject such a crazy range
when created and using such a set results in a kernel crash.
The hash types just silently ignored such parameters.

Reject invalid /0 input parameters explicitely.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2012-09-21 21:51:34 +02:00
Jozsef Kadlecsik 6e27c9b4ee netfilter: ipset: Fix sparse warnings "incorrect type in assignment"
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2012-09-21 21:51:22 +02:00
Jan Engelhardt 2cbc78a29e netfilter: combine ipt_REDIRECT and ip6t_REDIRECT
Combine more modules since the actual code is so small anyway that the
kmod metadata and the module in its loaded state totally outweighs the
combined actual code size.

IP_NF_TARGET_REDIRECT becomes a compat option; IP6_NF_TARGET_REDIRECT
is completely eliminated since it has not see a release yet.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-21 12:12:05 +02:00
Jan Engelhardt b3d54b3e40 netfilter: combine ipt_NETMAP and ip6t_NETMAP
Combine more modules since the actual code is so small anyway that the
kmod metadata and the module in its loaded state totally outweighs the
combined actual code size.

IP_NF_TARGET_NETMAP becomes a compat option; IP6_NF_TARGET_NETMAP
is completely eliminated since it has not see a release yet.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-21 12:11:08 +02:00
Ulrich Weber 136251d02f netfilter: nf_nat: remove obsolete rcu_read_unlock call
hlist walk in find_appropriate_src() is not protected anymore by rcu_read_lock(),
so rcu_read_unlock() is unnecessary if in_range() matches.

This bug was added in (c7232c9 netfilter: add protocol independent NAT core).

Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-21 12:09:25 +02:00
Patrick McHardy b0cdb1d9a9 netfilter: nf_nat: fix oops when unloading protocol modules
When unloading a protocol module nf_ct_iterate_cleanup() is used to
remove all conntracks using the protocol from the bysource hash and
clean their NAT sections. Since the conntrack isn't actually killed,
the NAT callback is invoked twice, once for each direction, which
causes an oops when trying to delete it from the bysource hash for
the second time.

The same oops can also happen when removing both an L3 and L4 protocol
since the cleanup function doesn't check whether the conntrack has
already been cleaned up.

Pid: 4052, comm: modprobe Not tainted 3.6.0-rc3-test-nat-unload-fix+ #32 Red Hat KVM
RIP: 0010:[<ffffffffa002c303>]  [<ffffffffa002c303>] nf_nat_proto_clean+0x73/0xd0 [nf_nat]
RSP: 0018:ffff88007808fe18  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800728550c0 RCX: ffff8800756288b0
RDX: dead000000200200 RSI: ffff88007808fe88 RDI: ffffffffa002f208
RBP: ffff88007808fe28 R08: ffff88007808e000 R09: 0000000000000000
R10: dead000000200200 R11: dead000000100100 R12: ffffffff81c6dc00
R13: ffff8800787582b8 R14: ffff880078758278 R15: ffff88007808fe88
FS:  00007f515985d700(0000) GS:ffff88007cd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f515986a000 CR3: 000000007867a000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 4052, threadinfo ffff88007808e000, task ffff8800756288b0)
Stack:
 ffff88007808fe68 ffffffffa002c290 ffff88007808fe78 ffffffff815614e3
 ffffffff00000000 00000aeb00000246 ffff88007808fe68 ffffffff81c6dc00
 ffff88007808fe88 ffffffffa00358a0 0000000000000000 000000000040f5b0
Call Trace:
 [<ffffffffa002c290>] ? nf_nat_net_exit+0x50/0x50 [nf_nat]
 [<ffffffff815614e3>] nf_ct_iterate_cleanup+0xc3/0x170
 [<ffffffffa002c55a>] nf_nat_l3proto_unregister+0x8a/0x100 [nf_nat]
 [<ffffffff812a0303>] ? compat_prepare_timeout+0x13/0xb0
 [<ffffffffa0035848>] nf_nat_l3proto_ipv4_exit+0x10/0x23 [nf_nat_ipv4]
 ...

To fix this,

- check whether the conntrack has already been cleaned up in
  nf_nat_proto_clean

- change nf_ct_iterate_cleanup() to only invoke the callback function
  once for each conntrack (IP_CT_DIR_ORIGINAL).

The second change doesn't affect other callers since when conntracks are
actually killed, both directions are removed from the hash immediately
and the callback is already only invoked once. If it is not killed, the
second callback invocation will always return the same decision not to
kill it.

Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-21 11:35:18 +02:00
Pablo Neira Ayuso b0041d1b8e netfilter: fix IPv6 NAT dependencies in Kconfig
* NF_NAT_IPV6 requires IP6_NF_IPTABLES

* IP6_NF_TARGET_MASQUERADE, IP6_NF_TARGET_NETMAP, IP6_NF_TARGET_REDIRECT
  and IP6_NF_TARGET_NPT require NF_NAT_IPV6.

This change just mirrors what IPv4 does in Kconfig, for consistency.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-21 11:33:19 +02:00
David S. Miller b216a4d86f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:

====================
This series contains updates to igb and ixgbevf.

v2: updated patch description in 04 patch (ixgbevf: scheduling while
    atomic in reset hw path)
 ...
Akeem G. Abodunrin (1):
  igb: Support to enable EEE on all eee_supported devices

Alexander Duyck (2):
  igb: Remove artificial restriction on RQDPC stat reading
  ixgbevf: Add support for VF API negotiation

John Fastabend (1):
  ixgbevf: scheduling while atomic in reset hw path
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20 22:33:42 -04:00
Joe Perches 127a479442 net1080: Neaten netdev_dbg use
Remove unnecessary temporary variable and #ifdef DEBUG block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20 22:05:22 -04:00
Greg Kroah-Hartman 49ae25b03c USB: remove dbg() usage in USB networking drivers
The dbg() USB macro is so old, it predates me.  The USB networking drivers are
the last hold-out using this macro, and we want to get rid of it, so replace
the usage of it with the proper netdev_dbg() or dev_dbg() (depending on the
context) calls.

Some places we end up using a local variable for the debug call, so also
convert the other existing dev_* calls to use it as well, to save tiny amounts
of code space.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20 17:53:14 -04:00
Alan Cox 4308fc58dc tcp: Document use of undefined variable.
Both tcp_timewait_state_process and tcp_check_req use the same basic
construct of

	struct tcp_options received tmp_opt;
	tmp_opt.saw_tstamp = 0;

then call

	tcp_parse_options

However if they are fed a frame containing a TCP_SACK then tbe code
behaviour is undefined because opt_rx->sack_ok is undefined data.

This ought to be documented if it is intentional.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20 17:29:36 -04:00
Christoph Paasch bb68b64724 ipv4: Don't add TCP-code in inet_sock_destruct
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Acked-by: H.K. Jerry Chu <hkchu@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20 17:12:27 -04:00
Or Gerlitz 9baa0b0364 IB/ipoib: Add rtnl_link_ops support
Add rtnl_link_ops to IPoIB, with the first usage being child device
create/delete through them. Childs devices are now either legacy ones,
created/deleted through the ipoib sysfs entries, or RTNL ones.

Adding support for RTNL childs involved refactoring of ipoib_vlan_add
which is now used by both the sysfs and the link_ops code.

Also, added ndo_uninit entry to support calling unregister_netdevice_queue
from the rtnl dellink entry. This required removal of calls to
ipoib_dev_cleanup from the driver in flows which use unregister_netdevice,
since the networking core will invoke ipoib_uninit which does exactly that.

Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20 16:49:17 -04:00
David S. Miller b85c715c2e Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next
Ben Hutchings says:

====================
1. Extension to PPS/PTP to allow for PHC devices where pulses are
   subject to a variable but measurable delay.
2. PPS/PTP/PHC support for Solarflare boards with a timestamping
   peripheral.
3. MTD support for updating the timestamping peripheral on those boards.
4. Fix for potential over-length requests to firmware.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20 16:39:59 -04:00
John Fastabend 012dc19a45 ixgbevf: scheduling while atomic in reset hw path
In ixgbevf_reset_hw_vf() msleep is called while holding mbx_lock
resulting in a schedule while atomic bug with trace below.

This patch uses mdelay instead.

BUG: scheduling while atomic: ip/6539/0x00000002
2 locks held by ip/6539:
 #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81419cc3>] rtnl_lock+0x17/0x19
 #1:  (&(&adapter->mbx_lock)->rlock){+.+...}, at: [<ffffffffa0030855>] ixgbevf_reset+0x30/0xc1 [ixgbevf]
Modules linked in: ixgbevf ixgbe mdio libfc scsi_transport_fc 8021q scsi_tgt garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 uinput igb coretemp hwmon crc32c_intel ioatdma i2c_i801 shpchp microcode lpc_ich mfd_core i2c_core joydev dca pcspkr serio_raw pata_acpi ata_generic usb_storage pata_jmicron
Pid: 6539, comm: ip Not tainted 3.6.0-rc3jk-net-next+ #104
Call Trace:
 [<ffffffff81072202>] __schedule_bug+0x6a/0x79
 [<ffffffff814bc7e0>] __schedule+0xa2/0x684
 [<ffffffff8108f85f>] ? trace_hardirqs_off+0xd/0xf
 [<ffffffff814bd0c0>] schedule+0x64/0x66
 [<ffffffff814bb5e2>] schedule_timeout+0xa6/0xca
 [<ffffffff810536b9>] ? lock_timer_base+0x52/0x52
 [<ffffffff812629e0>] ? __udelay+0x15/0x17
 [<ffffffff814bb624>] schedule_timeout_uninterruptible+0x1e/0x20
 [<ffffffff810541c0>] msleep+0x1b/0x22
 [<ffffffffa002e723>] ixgbevf_reset_hw_vf+0x90/0xe5 [ixgbevf]
 [<ffffffffa0030860>] ixgbevf_reset+0x3b/0xc1 [ixgbevf]
 [<ffffffffa0032fba>] ixgbevf_open+0x43/0x43e [ixgbevf]
 [<ffffffff81409610>] ? dev_set_rx_mode+0x2e/0x33
 [<ffffffff8140b0f1>] __dev_open+0xa0/0xe5
 [<ffffffff814097ed>] __dev_change_flags+0xbe/0x142
 [<ffffffff8140b01c>] dev_change_flags+0x21/0x56
 [<ffffffff8141a843>] do_setlink+0x2e2/0x7f4
 [<ffffffff81016e36>] ? native_sched_clock+0x37/0x39
 [<ffffffff8141b0ac>] rtnl_newlink+0x277/0x4bb
 [<ffffffff8141aee9>] ? rtnl_newlink+0xb4/0x4bb
 [<ffffffff812217d1>] ? selinux_capable+0x32/0x3a
 [<ffffffff8104fb17>] ? ns_capable+0x4f/0x67
 [<ffffffff81419cc3>] ? rtnl_lock+0x17/0x19
 [<ffffffff81419f28>] rtnetlink_rcv_msg+0x236/0x253
 [<ffffffff81419cf2>] ? rtnetlink_rcv+0x2d/0x2d
 [<ffffffff8142fd42>] netlink_rcv_skb+0x43/0x94
 [<ffffffff81419ceb>] rtnetlink_rcv+0x26/0x2d
 [<ffffffff8142faf1>] netlink_unicast+0xee/0x174
 [<ffffffff81430327>] netlink_sendmsg+0x26a/0x288
 [<ffffffff813fb04f>] ? rcu_read_unlock+0x56/0x67
 [<ffffffff813f5e6d>] __sock_sendmsg_nosec+0x58/0x61
 [<ffffffff813f81b7>] __sock_sendmsg+0x3d/0x48
 [<ffffffff813f8339>] sock_sendmsg+0x6e/0x87
 [<ffffffff81107c9f>] ? might_fault+0xa5/0xac
 [<ffffffff81402a72>] ? copy_from_user+0x2a/0x2c
 [<ffffffff81402e62>] ? verify_iovec+0x54/0xaa
 [<ffffffff813f9834>] __sys_sendmsg+0x206/0x288
 [<ffffffff810694fa>] ? up_read+0x23/0x3d
 [<ffffffff811307e5>] ? fcheck_files+0xac/0xea
 [<ffffffff8113095e>] ? fget_light+0x3a/0xb9
 [<ffffffff813f9a2e>] sys_sendmsg+0x42/0x60
 [<ffffffff814c5ba9>] system_call_fastpath+0x16/0x1b

CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-By: Robert Garrett <robertx.e.garrett@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-09-20 02:57:00 -07:00
Alexander Duyck 3118678518 ixgbevf: Add support for VF API negotiation
This change makes it so that the VF can support the PF/VF API negotiation
protocol.  Specifically in this case we are adding support for API 1.0
which will mean that the VF is capable of cleaning up buffers that span
multiple descriptors without triggering an error.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-09-20 02:47:19 -07:00
Akeem G. Abodunrin e5461112d9 igb: Support to enable EEE on all eee_supported devices
Current implementation enables EEE on only i350 device. This patch enables
EEE on all eee_supported devices. Also, configured LPI clock to keep
running before EEE is enabled on i210 and i211 devices.

Signed-off-by: Akeem G. Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Jeff Pieper  <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-09-20 02:47:12 -07:00
Alexander Duyck ae1c07a6b7 igb: Remove artificial restriction on RQDPC stat reading
For some reason the reading of the RQDPC register was being artificially
limited to 4K.  Instead of limiting the value we should read the value and
add the full amount.  Otherwise this can lead to a misleading number of
dropped packets when the actual value is in fact much higher.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Jeff Pieper   <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-09-20 02:30:32 -07:00
Michal Schmidt aee77e4acc r8169: use unlimited DMA burst for TX
The r8169 driver currently limits the DMA burst for TX to 1024 bytes. I have
a box where this prevents the interface from using the gigabit line to its full
potential. This patch solves the problem by setting TX_DMA_BURST to unlimited.

The box has an ASRock B75M motherboard with on-board RTL8168evl/8111evl
(XID 0c900880). TSO is enabled.

I used netperf (TCP_STREAM test) to measure the dependency of TX throughput
on MTU. I did it for three different values of TX_DMA_BURST ('5'=512, '6'=1024,
'7'=unlimited). This chart shows the results:
http://michich.fedorapeople.org/r8169/r8169-effects-of-TX_DMA_BURST.png

Interesting points:
 - With the current DMA burst limit (1024):
   - at the default MTU=1500 I get only 842 Mbit/s.
   - when going from small MTU, the performance rises monotonically with
     increasing MTU only up to a peak at MTU=1076 (908 MBit/s). Then there's
     a sudden drop to 762 MBit/s from which the throughput rises monotonically
     again with further MTU increases.
 - With a smaller DMA burst limit (512):
   - there's a similar peak at MTU=1076 and another one at MTU=564.
 - With unlimited DMA burst:
   - at the default MTU=1500 I get nice 940 Mbit/s.
   - the throughput rises monotonically with increasing MTU with no strange
     peaks.

Notice that the peaks occur at MTU sizes that are multiples of the DMA burst
limit plus 52. Why 52? Because:
  20 (IP header) + 20 (TCP header) + 12 (TCP options) = 52

The Realtek-provided r8168 driver (v8.032.00) uses unlimited TX DMA burst too,
except for CFG_METHOD_1 where the TX DMA burst is set to 512 bytes.
CFG_METHOD_1 appears to be the oldest MAC version of "RTL8168B/8111B",
i.e. RTL_GIGA_MAC_VER_11 in r8169. Not sure if this MAC version really needs
the smaller burst limit, or if any other versions have similar requirements.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19 19:19:07 -04:00
Amerigo Wang 6b102865e7 ipv6: unify fragment thresh handling code
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19 17:23:28 -04:00
Amerigo Wang d4915c087f ipv6: make ip6_frag_nqueues() and ip6_frag_mem() static inline
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19 17:23:28 -04:00
Amerigo Wang b836c99fd6 ipv6: unify conntrack reassembly expire code with standard one
Two years ago, Shan Wei tried to fix this:
http://patchwork.ozlabs.org/patch/43905/

The problem is that RFC2460 requires an ICMP Time
Exceeded -- Fragment Reassembly Time Exceeded message should be
sent to the source of that fragment, if the defragmentation
times out.

"
   If insufficient fragments are received to complete reassembly of a
   packet within 60 seconds of the reception of the first-arriving
   fragment of that packet, reassembly of that packet must be
   abandoned and all the fragments that have been received for that
   packet must be discarded.  If the first fragment (i.e., the one
   with a Fragment Offset of zero) has been received, an ICMP Time
   Exceeded -- Fragment Reassembly Time Exceeded message should be
   sent to the source of that fragment.
"

As Herbert suggested, we could actually use the standard IPv6
reassembly code which follows RFC2460.

With this patch applied, I can see ICMP Time Exceeded sent
from the receiver when the sender sent out 3/4 fragmented
IPv6 UDP packet.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19 17:23:28 -04:00
Amerigo Wang c038a767cd ipv6: add a new namespace for nf_conntrack_reasm
As pointed by Michal, it is necessary to add a new
namespace for nf_conntrack_reasm code, this prepares
for the second patch.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19 17:23:28 -04:00
Amerigo Wang 8c4c49df5c netpoll: call ->ndo_select_queue() in tx path
In netpoll tx path, we miss the chance of calling ->ndo_select_queue(),
thus could cause problems when bonding is involved.

This patch makes dev_pick_tx() extern (and rename it to netdev_pick_tx())
to let netpoll call it in netpoll_send_skb_on_dev().

Reported-by: Sylvain Munaut <s.munaut@whatever-company.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Tested-by: Sylvain Munaut <s.munaut@whatever-company.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19 17:19:09 -04:00
stephen hemminger 6b6e27255f netdev: make address const in device address management
The internal functions for add/deleting addresses don't change
their argument.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19 16:35:22 -04:00