Commit Graph

5688 Commits

Author SHA1 Message Date
Linus Torvalds 5fc92de3c7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking updates from David Miller:
 "Here is a pile of bug fixes that accumulated while I was in Europe"

 1) In fixing kernel leaks to userspace during copying of socket
    addresses, we broke a case that used to work, namely the user
    providing a buffer larger than the in-kernel generic socket address
    structure.  This broke Ruby amongst other things.  Fix from Dan
    Carpenter.

 2) Fix regression added by byte queue limit support in 8139cp driver,
    from Yang Yingliang.

 3) The addition of MSG_SENDPAGE_NOTLAST buggered up a few sendpage
    implementations, they should just treat it the same as MSG_MORE.
    Fix from Richard Weinberger and Shawn Landden.

 4) Handle icmpv4 errors received on ipv6 SIT tunnels correctly, from
    Oussama Ghorbel.  In particular we should send an ICMPv6 unreachable
    in such situations.

 5) Fix some regressions in the recent genetlink fixes, in particular
    get the pmcraid driver to use the new safer interfaces correctly.
    From Johannes Berg.

 6) macvtap was converted to use a per-cpu set of statistics, but some
    code was still bumping tx_dropped elsewhere.  From Jason Wang.

 7) Fix build failure of xen-netback due to missing include on some
    architectures, from Andy Whitecroft.

 8) macvtap double counts received packets in statistics, fix from Vlad
    Yasevich.

 9) Fix various cases of using *_STATS_BH() when *_STATS() is more
    appropriate.  From Eric Dumazet and Hannes Frederic Sowa.

10) Pktgen ipsec mode doesn't update the ipv4 header length and checksum
    properly after encapsulation.  Fix from Fan Du.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
  net/mlx4_en: Remove selftest TX queues empty condition
  {pktgen, xfrm} Update IPv4 header total len and checksum after tranformation
  virtio_net: make all RX paths handle erors consistently
  virtio_net: fix error handling for mergeable buffers
  virtio_net: Fixed a trivial typo (fitler --> filter)
  netem: fix gemodel loss generator
  netem: fix loss 4 state model
  netem: missing break in ge loss generator
  net/hsr: Support iproute print_opt ('ip -details ...')
  net/hsr: Very small fix of comment style.
  MAINTAINERS: Added net/hsr/ maintainer
  ipv6: fix possible seqlock deadlock in ip6_finish_output2
  ixgbe: Make ixgbe_identify_qsfp_module_generic static
  ixgbe: turn NETIF_F_HW_L2FW_DOFFLOAD off by default
  ixgbe: ixgbe_fwd_ring_down needs to be static
  e1000: fix possible reset_task running after adapter down
  e1000: fix lockdep warning in e1000_reset_task
  e1000: prevent oops when adapter is being closed and reset simultaneously
  igb: Fixed Wake On LAN support
  inet: fix possible seqlock deadlocks
  ...
2013-12-02 10:09:07 -08:00
Eric Dumazet f1d8cba61c inet: fix possible seqlock deadlocks
In commit c9e9042994 ("ipv4: fix possible seqlock deadlock") I left
another places where IP_INC_STATS_BH() were improperly used.

udp_sendmsg(), ping_v4_sendmsg() and tcp_v4_connect() are called from
process context, not from softirq context.

This was detected by lockdep seqlock support.

Reported-by: jongman heo <jongman.heo@samsung.com>
Fixes: 584bdf8cbd ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP")
Fixes: c319b4d76b ("net: ipv4: add IPPROTO_ICMP socket kind")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-29 16:37:36 -05:00
Shawn Landden d3f7d56a7a net: update consumers of MSG_MORE to recognize MSG_SENDPAGE_NOTLAST
Commit 35f9c09fe (tcp: tcp_sendpages() should call tcp_push() once)
added an internal flag MSG_SENDPAGE_NOTLAST, similar to
MSG_MORE.

algif_hash, algif_skcipher, and udp used MSG_MORE from tcp_sendpages()
and need to see the new flag as identical to MSG_MORE.

This fixes sendfile() on AF_ALG.

v3: also fix udp

Cc: Tom Herbert <therbert@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org> # 3.4.x + 3.2.x
Reported-and-tested-by: Shawn Landden <shawnlandden@gmail.com>
Original-patch: Richard Weinberger <richard@nod.at>
Signed-off-by: Shawn Landden <shawn@churchofgit.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-29 16:32:54 -05:00
Baker Zhang 39af0c409e net: remove outdated comment for ipv4 and ipv6 protocol handler
since f9242b6b28
inet: Sanitize inet{,6} protocol demux.

there are not pretended hash tables for ipv4 or
ipv6 protocol handler.

Signed-off-by: Baker Zhang <Baker.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-28 18:47:51 -05:00
Hannes Frederic Sowa 85fbaa7503 inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu functions
Commit bceaa90240 ("inet: prevent leakage
of uninitialized memory to user in recv syscalls") conditionally updated
addr_len if the msg_name is written to. The recv_error and rxpmtu
functions relied on the recvmsg functions to set up addr_len before.

As this does not happen any more we have to pass addr_len to those
functions as well and set it to the size of the corresponding sockaddr
length.

This broke traceroute and such.

Fixes: bceaa90240 ("inet: prevent leakage of uninitialized memory to user in recv syscalls")
Reported-by: Brad Spengler <spender@grsecurity.net>
Reported-by: Tom Labanowski
Cc: mpb <mpb.mail@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-23 14:46:23 -08:00
Gao feng fb10f802b0 tcp_memcg: remove useless var old_lim
nobody needs it. remove.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-23 14:46:21 -08:00
Herbert Xu b8ee93ba80 gro: Clean up tcpX_gro_receive checksum verification
This patch simplifies the checksum verification in tcpX_gro_receive
by reusing the CHECKSUM_COMPLETE code for CHECKSUM_NONE.  All it
does for CHECKSUM_NONE is compute the partial checksum and then
treat it as if it came from the hardware (CHECKSUM_COMPLETE).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-23 14:46:19 -08:00
Herbert Xu cc5c00bbb4 gro: Only verify TCP checksums for candidates
In some cases we may receive IP packets that are longer than
their stated lengths.  Such packets are never merged in GRO.
However, we may end up computing their checksums incorrectly
and end up allowing packets with a bogus checksum enter our
stack with the checksum status set as verified.

Since such packets are rare and not performance-critical, this
patch simply skips the checksum verification for them.

Reported-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>

Thanks,
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-23 14:46:19 -08:00
Linus Torvalds d2c2ad54c4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix memory leaks and other issues in mwifiex driver, from Amitkumar
    Karwar.

 2) skb_segment() can choke on packets using frag lists, fix from
    Herbert Xu with help from Eric Dumazet and others.

 3) IPv4 output cached route instantiation properly handles races
    involving two threads trying to install the same route, but we
    forgot to propagate this logic to input routes as well.  Fix from
    Alexei Starovoitov.

 4) Put protections in place to make sure that recvmsg() paths never
    accidently copy uninitialized memory back into userspace and also
    make sure that we never try to use more that sockaddr_storage for
    building the on-kernel-stack copy of a sockaddr.  Fixes from Hannes
    Frederic Sowa.

 5) R8152 driver transmit flow bug fixes from Hayes Wang.

 6) Fix some minor fallouts from genetlink changes, from Johannes Berg
    and Michael Opdenacker.

 7) AF_PACKET sendmsg path can race with netdevice unregister notifier,
    fix by using RCU to make sure the network device doesn't go away
    from under us.  Fix from Daniel Borkmann.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
  gso: handle new frag_list of frags GRO packets
  genetlink: fix genl_set_err() group ID
  genetlink: fix genlmsg_multicast() bug
  packet: fix use after free race in send path when dev is released
  xen-netback: stop the VIF thread before unbinding IRQs
  wimax: remove dead code
  net/phy: Add the autocross feature for forced links on VSC82x4
  net/phy: Add VSC8662 support
  net/phy: Add VSC8574 support
  net/phy: Add VSC8234 support
  net: add BUG_ON if kernel advertises msg_namelen > sizeof(struct sockaddr_storage)
  net: rework recvmsg handler msg_name and msg_namelen logic
  bridge: flush br's address entry in fdb when remove the
  net: core: Always propagate flag changes to interfaces
  ipv4: fix race in concurrent ip_route_input_slow()
  r8152: fix incorrect type in assignment
  r8152: support stopping/waking tx queue
  r8152: modify the tx flow
  r8152: fix tx/rx memory overflow
  netfilter: ebt_ip6: fix source and destination matching
  ...
2013-11-22 09:57:35 -08:00
David S. Miller cd2cc01b67 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
netfilter fixes for net

The following patchset contains fixes for your net tree, they are:

* Remove extra quote from connlimit configuration in Kconfig, from
  Randy Dunlap.

* Fix missing mss option in syn packets sent to the backend in our
  new synproxy target, from Martin Topholm.

* Use window scale announced by client when sending the forged
  syn to the backend, from Martin Topholm.

* Fix IPv6 address comparison in ebtables, from Luís Fernando
  Cornachioni Estrozi.

* Fix wrong endianess in sequence adjustment which breaks helpers
  in NAT configurations, from Phil Oester.

* Fix the error path handling of nft_compat, from me.

* Make sure the global conntrack counter is decremented after the
  object has been released, also from me.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-21 12:44:15 -05:00
Linus Torvalds e6d69a60b7 Merge branch 'next' of git://git.infradead.org/users/vkoul/slave-dma
Pull slave-dmaengine changes from Vinod Koul:
 "This brings for slave dmaengine:

   - Change dma notification flag to DMA_COMPLETE from DMA_SUCCESS as
     dmaengine can only transfer and not verify validaty of dma
     transfers

   - Bunch of fixes across drivers:

      - cppi41 driver fixes from Daniel

      - 8 channel freescale dma engine support and updated bindings from
        Hongbo

      - msx-dma fixes and cleanup by Markus

   - DMAengine updates from Dan:

      - Bartlomiej and Dan finalized a rework of the dma address unmap
        implementation.

      - In the course of testing 1/ a collection of enhancements to
        dmatest fell out.  Notably basic performance statistics, and
        fixed / enhanced test control through new module parameters
        'run', 'wait', 'noverify', and 'verbose'.  Thanks to Andriy and
        Linus [Walleij] for their review.

      - Testing the raid related corner cases of 1/ triggered bugs in
        the recently added 16-source operation support in the ioatdma
        driver.

      - Some minor fixes / cleanups to mv_xor and ioatdma"

* 'next' of git://git.infradead.org/users/vkoul/slave-dma: (99 commits)
  dma: mv_xor: Fix mis-usage of mmio 'base' and 'high_base' registers
  dma: mv_xor: Remove unneeded NULL address check
  ioat: fix ioat3_irq_reinit
  ioat: kill msix_single_vector support
  raid6test: add new corner case for ioatdma driver
  ioatdma: clean up sed pool kmem_cache
  ioatdma: fix selection of 16 vs 8 source path
  ioatdma: fix sed pool selection
  ioatdma: Fix bug in selftest after removal of DMA_MEMSET.
  dmatest: verbose mode
  dmatest: convert to dmaengine_unmap_data
  dmatest: add a 'wait' parameter
  dmatest: add basic performance metrics
  dmatest: add support for skipping verification and random data setup
  dmatest: use pseudo random numbers
  dmatest: support xor-only, or pq-only channels in tests
  dmatest: restore ability to start test at module load and init
  dmatest: cleanup redundant "dmatest: " prefixes
  dmatest: replace stored results mechanism, with uniform messages
  Revert "dmatest: append verify result to results"
  ...
2013-11-20 13:20:24 -08:00
Alexei Starovoitov dcdfdf56b4 ipv4: fix race in concurrent ip_route_input_slow()
CPUs can ask for local route via ip_route_input_noref() concurrently.
if nh_rth_input is not cached yet, CPUs will proceed to allocate
equivalent DSTs on 'lo' and then will try to cache them in nh_rth_input
via rt_cache_route()
Most of the time they succeed, but on occasion the following two lines:
	orig = *p;
	prev = cmpxchg(p, orig, rt);
in rt_cache_route() do race and one of the cpus fails to complete cmpxchg.
But ip_route_input_slow() doesn't check the return code of rt_cache_route(),
so dst is leaking. dst_destroy() is never called and 'lo' device
refcnt doesn't go to zero, which can be seen in the logs as:
	unregister_netdevice: waiting for lo to become free. Usage count = 1
Adding mdelay() between above two lines makes it easily reproducible.
Fix it similar to nh_pcpu_rth_output case.

Fixes: d2d68ba9fe ("ipv4: Cache input routes in fib_info nexthops.")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-20 15:28:44 -05:00
Linus Torvalds 1ee2dcc224 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Mostly these are fixes for fallout due to merge window changes, as
  well as cures for problems that have been with us for a much longer
  period of time"

 1) Johannes Berg noticed two major deficiencies in our genetlink
    registration.  Some genetlink protocols we passing in constant
    counts for their ops array rather than something like
    ARRAY_SIZE(ops) or similar.  Also, some genetlink protocols were
    using fixed IDs for their multicast groups.

    We have to retain these fixed IDs to keep existing userland tools
    working, but reserve them so that other multicast groups used by
    other protocols can not possibly conflict.

    In dealing with these two problems, we actually now use less state
    management for genetlink operations and multicast groups.

 2) When configuring interface hardware timestamping, fix several
    drivers that simply do not validate that the hwtstamp_config value
    is one the driver actually supports.  From Ben Hutchings.

 3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.

 4) In dev_forward_skb(), set the skb->protocol in the right order
    relative to skb_scrub_packet().  From Alexei Starovoitov.

 5) Bridge erroneously fails to use the proper wrapper functions to make
    calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid.  Fix from Toshiaki
    Makita.

 6) When detaching a bridge port, make sure to flush all VLAN IDs to
    prevent them from leaking, also from Toshiaki Makita.

 7) Put in a compromise for TCP Small Queues so that deep queued devices
    that delay TX reclaim non-trivially don't have such a performance
    decrease.  One particularly problematic area is 802.11 AMPDU in
    wireless.  From Eric Dumazet.

 8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
    here.  Fix from Eric Dumzaet, reported by Dave Jones.

 9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.

10) When computing mergeable buffer sizes, virtio-net fails to take the
    virtio-net header into account.  From Michael Dalton.

11) Fix seqlock deadlock in ip4_datagram_connect() wrt.  statistic
    bumping, this one has been with us for a while.  From Eric Dumazet.

12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
    Hugne.

13) 6lowpan bit used for traffic classification was wrong, from Jukka
    Rissanen.

14) macvlan has the same issue as normal vlans did wrt.  propagating LRO
    disabling down to the real device, fix it the same way.  From Michal
    Kubecek.

15) CPSW driver needs to soft reset all slaves during suspend, from
    Daniel Mack.

16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.

17) The xen-netfront RX buffer refill timer isn't properly scheduled on
    partial RX allocation success, from Ma JieYue.

18) When ipv6 ping protocol support was added, the AF_INET6 protocol
    initialization cleanup path on failure was borked a little.  Fix
    from Vlad Yasevich.

19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
    blocks we can do the wrong thing with the msg_name we write back to
    userspace.  From Hannes Frederic Sowa.  There is another fix in the
    works from Hannes which will prevent future problems of this nature.

20) Fix route leak in VTI tunnel transmit, from Fan Du.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
  genetlink: make multicast groups const, prevent abuse
  genetlink: pass family to functions using groups
  genetlink: add and use genl_set_err()
  genetlink: remove family pointer from genl_multicast_group
  genetlink: remove genl_unregister_mc_group()
  hsr: don't call genl_unregister_mc_group()
  quota/genetlink: use proper genetlink multicast APIs
  drop_monitor/genetlink: use proper genetlink multicast APIs
  genetlink: only pass array to genl_register_family_with_ops()
  tcp: don't update snd_nxt, when a socket is switched from repair mode
  atm: idt77252: fix dev refcnt leak
  xfrm: Release dst if this dst is improper for vti tunnel
  netlink: fix documentation typo in netlink_set_err()
  be2net: Delete secondary unicast MAC addresses during be_close
  be2net: Fix unconditional enabling of Rx interface options
  net, virtio_net: replace the magic value
  ping: prevent NULL pointer dereference on write to msg_name
  bnx2x: Prevent "timeout waiting for state X"
  bnx2x: prevent CFC attention
  bnx2x: Prevent panic during DMAE timeout
  ...
2013-11-19 15:50:47 -08:00
Johannes Berg c53ed74236 genetlink: only pass array to genl_register_family_with_ops()
As suggested by David Miller, make genl_register_family_with_ops()
a macro and pass only the array, evaluating ARRAY_SIZE() in the
macro, this is a little safer.

The openvswitch has some indirection, assing ops/n_ops directly in
that code. This might ultimately just assign the pointers in the
family initializations, saving the struct genl_family_and_ops and
code (once mcast groups are handled differently.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:05 -05:00
Andrey Vagin dbde497966 tcp: don't update snd_nxt, when a socket is switched from repair mode
snd_nxt must be updated synchronously with sk_send_head.  Otherwise
tp->packets_out may be updated incorrectly, what may bring a kernel panic.

Here is a kernel panic from my host.
[  103.043194] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[  103.044025] IP: [<ffffffff815aaaaf>] tcp_rearm_rto+0xcf/0x150
...
[  146.301158] Call Trace:
[  146.301158]  [<ffffffff815ab7f0>] tcp_ack+0xcc0/0x12c0

Before this panic a tcp socket was restored. This socket had sent and
unsent data in the write queue. Sent data was restored in repair mode,
then the socket was switched from reapair mode and unsent data was
restored. After that the socket was switched back into repair mode.

In that moment we had a socket where write queue looks like this:
snd_una    snd_nxt   write_seq
   |_________|________|
             |
	  sk_send_head

After a second switching from repair mode the state of socket was
changed:

snd_una          snd_nxt, write_seq
   |_________ ________|
             |
	  sk_send_head

This state is inconsistent, because snd_nxt and sk_send_head are not
synchronized.

Bellow you can find a call trace, how packets_out can be incremented
twice for one skb, if snd_nxt and sk_send_head are not synchronized.
In this case packets_out will be always positive, even when
sk_write_queue is empty.

tcp_write_wakeup
	skb = tcp_send_head(sk);
	tcp_fragment
		if (!before(tp->snd_nxt, TCP_SKB_CB(buff)->end_seq))
			tcp_adjust_pcount(sk, skb, diff);
	tcp_event_new_data_sent
		tp->packets_out += tcp_skb_pcount(skb);

I think update of snd_nxt isn't required, when a socket is switched from
repair mode.  Because it's initialized in tcp_connect_init. Then when a
write queue is restored, snd_nxt is incremented in tcp_event_new_data_sent,
so it's always is in consistent state.

I have checked, that the bug is not reproduced with this patch and
all tests about restoring tcp connections work fine.

Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:14:20 -05:00
fan.du 236c9f8486 xfrm: Release dst if this dst is improper for vti tunnel
After searching rt by the vti tunnel dst/src parameter,
if this rt has neither attached to any transformation
nor the transformation is not tunnel oriented, this rt
should be released back to ip layer.

otherwise causing dst memory leakage.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 15:50:57 -05:00
Hannes Frederic Sowa cf970c002d ping: prevent NULL pointer dereference on write to msg_name
A plain read() on a socket does set msg->msg_name to NULL. So check for
NULL pointer first.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-18 16:02:03 -05:00
Hannes Frederic Sowa bceaa90240 inet: prevent leakage of uninitialized memory to user in recv syscalls
Only update *addr_len when we actually fill in sockaddr, otherwise we
can return uninitialized memory from the stack to the caller in the
recvfrom, recvmmsg and recvmsg syscalls. Drop the the (addr_len == NULL)
checks because we only get called with a valid addr_len pointer either
from sock_common_recvmsg or inet_recvmsg.

If a blocking read waits on a socket which is concurrently shut down we
now return zero and set msg_msgnamelen to 0.

Reported-by: mpb <mpb.mail@gmail.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-18 15:12:03 -05:00
Martin Topholm a6441b7a39 netfilter: synproxy: send mss option to backend
When the synproxy_parse_options is called on the client ack the mss
option will not be present. Consequently mss wont be included in the
backend syn packet, which falls back to 536 bytes mss.

Therefore XT_SYNPROXY_OPT_MSS is explicitly flagged when recovering mss
value from cookie.

Signed-off-by: Martin Topholm <mph@one.com>
Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-11-18 12:53:36 +01:00
Tetsuo Handa 652586df95 seq_file: remove "%n" usage from seq_file users
All seq_printf() users are using "%n" for calculating padding size,
convert them to use seq_setwidth() / seq_pad() pair.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Joe Perches <joe@perches.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15 09:32:20 +09:00
Eric Dumazet c9e9042994 ipv4: fix possible seqlock deadlock
ip4_datagram_connect() being called from process context,
it should use IP_INC_STATS() instead of IP_INC_STATS_BH()
otherwise we can deadlock on 32bit arches, or get corruptions of
SNMP counters.

Fixes: 584bdf8cbd ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:31:14 -05:00
Johannes Berg 4534de8305 genetlink: make all genl_ops users const
Now that genl_ops are no longer modified in place when
registering, they can be made const. This patch was done
mostly with spatch:

@@
identifier ops;
@@
+const
 struct genl_ops ops[] = {
 ...
 };

(except the struct thing in net/openvswitch/datapath.c)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:41 -05:00
Eric Dumazet dccf76ca6b net-tcp: fix panic in tcp_fastopen_cache_set()
We had some reports of crashes using TCP fastopen, and Dave Jones
gave a nice stack trace pointing to the error.

Issue is that tcp_get_metrics() should not be called with a NULL dst

Fixes: 1fe4c481ba ("net-tcp: Fast Open client - cookie cache")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dave Jones <davej@redhat.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Tested-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 16:33:18 -05:00
Eric Dumazet 98e09386c0 tcp: tsq: restore minimal amount of queueing
After commit c9eeec26e3 ("tcp: TSQ can use a dynamic limit"), several
users reported throughput regressions, notably on mvneta and wifi
adapters.

802.11 AMPDU requires a fair amount of queueing to be effective.

This patch partially reverts the change done in tcp_write_xmit()
so that the minimal amount is sysctl_tcp_limit_output_bytes.

It also remove the use of this sysctl while building skb stored
in write queue, as TSO autosizing does the right thing anyway.

Users with well behaving NICS and correct qdisc (like sch_fq),
can then lower the default sysctl_tcp_limit_output_bytes value from
128KB to 8KB.

This new usage of sysctl_tcp_limit_output_bytes permits each driver
authors to check how their driver performs when/if the value is set
to a minimum of 4KB.

Normally, line rate for a single TCP flow should be possible,
but some drivers rely on timers to perform TX completion and
too long TX completion delays prevent reaching full throughput.

Fixes: c9eeec26e3 ("tcp: TSQ can use a dynamic limit")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Sujith Manoharan <sujith@msujith.org>
Reported-by: Arnaud Ebalard <arno@natisbad.org>
Tested-by: Sujith Manoharan <sujith@msujith.org>
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 16:25:14 -05:00
Alexei Starovoitov 81b9eab5eb core/dev: do not ignore dmac in dev_forward_skb()
commit 06a23fe31c
("core/dev: set pkt_type after eth_type_trans() in dev_forward_skb()")
and refactoring 64261f230a
("dev: move skb_scrub_packet() after eth_type_trans()")

are forcing pkt_type to be PACKET_HOST when skb traverses veth.

which means that ip forwarding will kick in inside netns
even if skb->eth->h_dest != dev->dev_addr

Fix order of eth_type_trans() and skb_scrub_packet() in dev_forward_skb()
and in ip_tunnel_rcv()

Fixes: 06a23fe31c ("core/dev: set pkt_type after eth_type_trans() in dev_forward_skb()")
CC: Isaku Yamahata <yamahatanetdev@gmail.com>
CC: Maciej Zenczykowski <zenczykowski@gmail.com>
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 02:39:53 -05:00