Commit Graph

1372 Commits

Author SHA1 Message Date
Linus Torvalds
d2aa455037 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (55 commits)
  netxen: fix tx ring accounting
  netxen: fix detection of cut-thru firmware mode
  forcedeth: fix dma api mismatches
  atm: sk_wmem_alloc initial value is one
  net: correct off-by-one write allocations reports
  via-velocity : fix no link detection on boot
  Net / e100: Fix suspend of devices that cannot be power managed
  TI DaVinci EMAC : Fix rmmod error
  net: group address list and its count
  ipv4: Fix fib_trie rebalancing, part 2
  pkt_sched: Update drops stats in act_police
  sky2: version 1.23
  sky2: add GRO support
  sky2: skb recycling
  sky2: reduce default transmit ring
  sky2: receive counter update
  sky2: fix shutdown synchronization
  sky2: PCI irq issues
  sky2: more receive shutdown
  sky2: turn off pause during shutdown
  ...

Manually fix trivial conflict in net/core/skbuff.c due to kmemcheck
2009-06-18 14:07:15 -07:00
Jiri Pirko
31278e7147 net: group address list and its count
This patch is inspired by patch recently posted by Johannes Berg. Basically what
my patch does is to group list and a count of addresses into newly introduced
structure netdev_hw_addr_list. This brings us two benefits:
1) struct net_device becames a bit nicer.
2) in the future there will be a possibility to operate with lists independently
   on netdevices (with exporting right functions).
I wanted to introduce this patch before I'll post a multicast lists conversion.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

 drivers/net/bnx2.c              |    4 +-
 drivers/net/e1000/e1000_main.c  |    4 +-
 drivers/net/ixgbe/ixgbe_main.c  |    6 +-
 drivers/net/mv643xx_eth.c       |    2 +-
 drivers/net/niu.c               |    4 +-
 drivers/net/virtio_net.c        |   10 ++--
 drivers/s390/net/qeth_l2_main.c |    2 +-
 include/linux/netdevice.h       |   17 +++--
 net/core/dev.c                  |  130 ++++++++++++++++++--------------------
 9 files changed, 89 insertions(+), 90 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-18 00:29:08 -07:00
Stephen Hemminger
603a8bbe62 skbuff: don't corrupt mac_header on skb expansion
The skb mac_header field is sometimes NULL (or ~0u) as a sentinel
value. The places where skb is expanded add an offset which would
change this flag into an invalid pointer (or offset).

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-17 18:46:41 -07:00
Stephen Hemminger
19633e129c skbuff: skb_mac_header_was_set is always true on >32 bit
Looking at the crash in log_martians(), one suspect is that the check for
mac header being set is not correct.  The value of mac_header defaults to
0 on allocation, therefore skb_mac_header_was_set will always be true on
platforms using NET_SKBUFF_USES_OFFSET.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-17 18:46:03 -07:00
Linus Torvalds
b3fec0fe35 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck: (39 commits)
  signal: fix __send_signal() false positive kmemcheck warning
  fs: fix do_mount_root() false positive kmemcheck warning
  fs: introduce __getname_gfp()
  trace: annotate bitfields in struct ring_buffer_event
  net: annotate struct sock bitfield
  c2port: annotate bitfield for kmemcheck
  net: annotate inet_timewait_sock bitfields
  ieee1394/csr1212: fix false positive kmemcheck report
  ieee1394: annotate bitfield
  net: annotate bitfields in struct inet_sock
  net: use kmemcheck bitfields API for skbuff
  kmemcheck: introduce bitfield API
  kmemcheck: add opcode self-testing at boot
  x86: unify pte_hidden
  x86: make _PAGE_HIDDEN conditional
  kmemcheck: make kconfig accessible for other architectures
  kmemcheck: enable in the x86 Kconfig
  kmemcheck: add hooks for the page allocator
  kmemcheck: add hooks for page- and sg-dma-mappings
  kmemcheck: don't track page tables
  ...
2009-06-16 13:09:51 -07:00
Vegard Nossum
a98b65a3ad net: annotate struct sock bitfield
2009/2/24 Ingo Molnar <mingo@elte.hu>:
> ok, this is the last warning i have from today's overnight -tip
> testruns - a 32-bit system warning in sock_init_data():
>
> [    2.610389] NET: Registered protocol family 16
> [    2.616138] initcall netlink_proto_init+0x0/0x170 returned 0 after 7812 usecs
> [    2.620010] WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (f642c184)
> [    2.624002] 010000000200000000000000604990c000000000000000000000000000000000
> [    2.634076]  i i i i i i u u i i i i i i i i i i i i i i i i i i i i i i i i
> [    2.641038]          ^
> [    2.643376]
> [    2.644004] Pid: 1, comm: swapper Not tainted (2.6.29-rc6-tip-01751-g4d1c22c-dirty #885)
> [    2.648003] EIP: 0060:[<c07141a1>] EFLAGS: 00010282 CPU: 0
> [    2.652008] EIP is at sock_init_data+0xa1/0x190
> [    2.656003] EAX: 0001a800 EBX: f6836c00 ECX: 00463000 EDX: c0e46fe0
> [    2.660003] ESI: f642c180 EDI: c0b83088 EBP: f6863ed8 ESP: c0c412ec
> [    2.664003]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [    2.668003] CR0: 8005003b CR2: f682c400 CR3: 00b91000 CR4: 000006f0
> [    2.672003] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [    2.676003] DR6: ffff4ff0 DR7: 00000400
> [    2.680002]  [<c07423e5>] __netlink_create+0x35/0xa0
> [    2.684002]  [<c07443cc>] netlink_kernel_create+0x4c/0x140
> [    2.688002]  [<c072755e>] rtnetlink_net_init+0x1e/0x40
> [    2.696002]  [<c071b601>] register_pernet_operations+0x11/0x30
> [    2.700002]  [<c071b72c>] register_pernet_subsys+0x1c/0x30
> [    2.704002]  [<c0bf3c8c>] rtnetlink_init+0x4c/0x100
> [    2.708002]  [<c0bf4669>] netlink_proto_init+0x159/0x170
> [    2.712002]  [<c0101124>] do_one_initcall+0x24/0x150
> [    2.716002]  [<c0bbf3c7>] do_initcalls+0x27/0x40
> [    2.723201]  [<c0bbf3fc>] do_basic_setup+0x1c/0x20
> [    2.728002]  [<c0bbfb8a>] kernel_init+0x5a/0xa0
> [    2.732002]  [<c0103e47>] kernel_thread_helper+0x7/0x10
> [    2.736002]  [<ffffffff>] 0xffffffff

We fix this false positive by annotating the bitfield in struct
sock.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15 15:49:36 +02:00
Vegard Nossum
fe55f6d5c0 net: use kmemcheck bitfields API for skbuff
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15 15:49:25 +02:00
David S. Miller
9cbc1cb8cd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:
	Documentation/feature-removal-schedule.txt
	drivers/scsi/fcoe/fcoe.c
	net/core/drop_monitor.c
	net/core/net-traces.c
2009-06-15 03:02:23 -07:00
Michał Mirosław
da6782927d bridge: Simplify interface for ATM LANE
This patch changes FDB entry check for ATM LANE bridge integration.
There's no point in holding a FDB entry around SKB building.

br_fdb_get()/br_fdb_put() pair are changed into single br_fdb_test_addr()
hook that checks if the addr has FDB entry pointing to other port
to the one the request arrived on.

FDB entry refcounting is removed as it's not used anywhere else.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-11 21:03:21 -07:00
John Dykstra
746e6ad23c [PATCH] net core: Some interface flags not returned by SIOCGIFFLAGS
Commit b00055aacd " [NET] core: add
RFC2863 operstate" defined new interface flag values.  Its
documentation specified that these flags could be accessed from user
space via SIOCGIFFLAGS.  However, this does not work because the new
flags do not fit in that ioctl's argument width.

Change the documentation to match the code's behavior.  Also change
the source to explicitly show the truncation.  This _should_ have no
effect on executable code, and did not with gcc 4.2.4 generating x86
code.

A new ioctl could be defined to return all interface flags to user
space.  However, since this has been broken for three years with no
one complaining, there doesn't seem much need.  They are still
accessible via netlink.

Reported-by:  "Fredrik Arnerup" <fredrik.arnerup@edgeware.tv>
Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-11 20:57:21 -07:00
Timo Teras
5ef12d98a1 neigh: fix state transition INCOMPLETE->FAILED via Netlink request
The current code errors out the INCOMPLETE neigh entry skb queue only from
the timer if maximum probes have been attempted and there has been no reply.
This also causes the transtion to FAILED state.

However, the neigh entry can be also updated via Netlink to inform that the
address is unavailable.  Currently, neigh_update() just stops the timers and
leaves the pending skb's unreleased. This results that the clean up code in
the timer callback is never called, preventing also proper garbage collection.

This fixes neigh_update() to process the pending skb queue immediately if
INCOMPLETE -> FAILED state transtion occurs due to a Netlink request.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-11 04:16:28 -07:00
Eric Dumazet
2b85a34e91 net: No more expensive sock_hold()/sock_put() on each tx
One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.

This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.

We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.

We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.

As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.

skb_set_owner_w() doesnt call sock_hold() anymore

sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.

Drawback is that a skb->truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.

Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-11 02:55:43 -07:00
Linus Torvalds
8623661180 Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits)
  Revert "x86, bts: reenable ptrace branch trace support"
  tracing: do not translate event helper macros in print format
  ftrace/documentation: fix typo in function grapher name
  tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK
  tracing: add protection around module events unload
  tracing: add trace_seq_vprint interface
  tracing: fix the block trace points print size
  tracing/events: convert block trace points to TRACE_EVENT()
  ring-buffer: fix ret in rb_add_time_stamp
  ring-buffer: pass in lockdep class key for reader_lock
  tracing: add annotation to what type of stack trace is recorded
  tracing: fix multiple use of __print_flags and __print_symbolic
  tracing/events: fix output format of user stack
  tracing/events: fix output format of kernel stack
  tracing/trace_stack: fix the number of entries in the header
  ring-buffer: discard timestamps that are at the start of the buffer
  ring-buffer: try to discard unneeded timestamps
  ring-buffer: fix bug in ring_buffer_discard_commit
  ftrace: do not profile functions when disabled
  tracing: make trace pipe recognize latency format flag
  ...
2009-06-10 19:53:40 -07:00
Johannes Berg
8f77f3849c mac80211: do not pass PS frames out of mac80211 again
In order to handle powersave frames properly we had needed
to pass these out to the device queues again, and introduce
the skb->requeue bit. This, however, also has unnecessary
overhead by needing to 'clean up' already tried frames, and
this clean-up code is also buggy when software encryption
is used.

Instead of sending the frames via the master netdev queue
again, simply put them into the pending queue. This also
fixes a problem where frames for that particular station
could be reordered when some were still on the software
queues and older ones are re-injected into the software
queue after them.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-10 13:28:37 -04:00
Sergey Lapin
fcb94e4224 Add constants for the ieee 802.15.4 stack
IEEE 802.15.4 stack requires several constants to be defined/adjusted.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Sergey Lapin <slapin@ossfans.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09 05:25:30 -07:00
Eric Dumazet
0c27922e49 net: dev_addr_init() fix
commit f001fde5ea
(net: introduce a list of device addresses dev_addr_list (v6))
added one regression Vegard Nossum found in its testings.

With kmemcheck help, Vegard found some uninitialized memory
was read and reported to user, potentialy leaking kernel data.
( thread can be found on http://lkml.org/lkml/2009/5/30/177 )

dev_addr_init() incorrectly uses sizeof() operator. We were
initializing one byte instead of MAX_ADDR_LEN bytes.

Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09 05:11:42 -07:00
David S. Miller
285e42802b net/core/user_dma.c: Use frag list abstraction interfaces.
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09 00:19:10 -07:00
David S. Miller
fbb398a832 net/core/skbuff.c: Use frag list abstraction interfaces.
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09 00:18:59 -07:00
David S. Miller
4cf704fbea net/core/dev.c: Use frag list abstraction interfaces.
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09 00:18:51 -07:00
David S. Miller
5b1a002ade datagram: Use frag list abstraction interfaces.
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09 00:18:15 -07:00
Figo.zhang
1d0ebfe7c9 net pkgen.c:fix no need for check
vfree() does its own 'NULL' check, so no need for check before
calling it.

Signed-off-by: Figo.zhang <figo1802@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-08 00:40:35 -07:00
Sridhar Samudrala
2faef52b72 net: Fix memcpy_toiovecend() to use the right offset
Increment the iovec base by the offset passed in for the initial
copy_to_user() in memcpy_to_iovecend().

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-08 00:25:39 -07:00
Sridhar Samudrala
d2d27bfd11 net: Fix skb_copy_datagram_from_iovec() to pass the right offset
I am working on enabling UFO between KVM guests using virtio-net and i have
some patches that i got working with 2.6.30-rc8. When i wanted to try them
with net-next-2.6, i noticed that virtio-net is not working with that tree.

After some debugging, it turned out to be several bugs in the recent patches
to fix aio with tun driver, specifically the following 2 commits.

http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git;a=commitdiff;h=0a1ec07a67bd8b0033dace237249654d015efa21
http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git;a=commitdiff;h=6f26c9a7555e5bcca3560919db9b852015077dae

Fix the call to memcpy_from_iovecend() in skb_copy_datagram_from_iovec
to pass the right iovec offset.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-08 00:25:38 -07:00
Eric Dumazet
042a53a9e4 net: skb_shared_info optimization
skb_dma_unmap() is quite expensive for small packets,
because we use two different cache lines from skb_shared_info.

One to access nr_frags, one to access dma_maps[0]

Instead of dma_maps being an array of MAX_SKB_FRAGS + 1 elements,
let dma_head alone in a new dma_head field, close to nr_frags,
to reduce cache lines misses.

Tested on my dev machine (bnx2 & tg3 adapters), nice speedup !

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-08 00:21:48 -07:00
Eric Dumazet
eae3f29cc7 net: num_dma_maps is not used
Get rid of num_dma_maps in struct skb_shared_info, as it seems unused.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-08 00:20:23 -07:00