Commit Graph

983 Commits

Author SHA1 Message Date
Ingo Molnar
1a781a777b Merge branch 'generic-ipi' into generic-ipi-for-linus
Conflicts:

	arch/powerpc/Kconfig
	arch/s390/kernel/time.c
	arch/x86/kernel/apic_32.c
	arch/x86/kernel/cpu/perfctr-watchdog.c
	arch/x86/kernel/i8259_64.c
	arch/x86/kernel/ldt.c
	arch/x86/kernel/nmi_64.c
	arch/x86/kernel/smpboot.c
	arch/x86/xen/smp.c
	include/asm-x86/hw_irq_32.h
	include/asm-x86/hw_irq_64.h
	include/asm-x86/mach-default/irq_vectors.h
	include/asm-x86/mach-voyager/irq_vectors.h
	include/asm-x86/smp.h
	kernel/Makefile

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-15 21:55:59 +02:00
Linus Torvalds
666484f025 Merge branch 'core/softirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core/softirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  softirq: remove irqs_disabled warning from local_bh_enable
  softirq: remove initialization of static per-cpu variable
  Remove argument from open_softirq which is always NULL
2008-07-14 15:28:42 -07:00
Patrick McHardy
2fe195cfe3 net: fib_rules: fix error code for unsupported families
The errno code returned must be negative.

Fixes "RTNETLINK answers: Unknown error 18446744073709551519".

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-01 19:59:37 -07:00
Wang Chen
93b3cff991 netdevice: Fix wrong string handle in kernel command line parsing
v1->v2: Use strlcpy() to ensure s[i].name be null-termination.

1. In netdev_boot_setup_add(), a long name will leak.
   ex. : dev=21,0x1234,0x1234,0x2345,eth123456789verylongname.........
2. In netdev_boot_setup_check(), mismatch will happen if s[i].name
   is a substring of dev->name.
   ex. : dev=...eth1 dev=...eth11

[ With feedback from Ben Hutchings. ]

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-01 19:57:19 -07:00
Wang Chen
8fde8a0769 net: Tyop of sk_filter() comment
Parameter "needlock" no long exists.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-01 19:55:40 -07:00
Wang Chen
5dbaec5dc6 netdevice: Fix typo of dev_unicast_add() comment
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27 19:35:16 -07:00
Octavian Purdila
db43a282d3 tcp: fix for splice receive when used with software LRO
If an skb has nr_frags set to zero but its frag_list is not empty (as
it can happen if software LRO is enabled), and a previous
tcp_read_sock has consumed the linear part of the skb, then
__skb_splice_bits:

(a) incorrectly reports an error and

(b) forgets to update the offset to account for the linear part

Any of the two problems will cause the subsequent __skb_splice_bits
call (the one that handles the frag_list skbs) to either skip data,
or, if the unadjusted offset is greater then the size of the next skb
in the frag_list, make tcp_splice_read loop forever.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27 17:27:21 -07:00
Jens Axboe
8691e5a8f6 smp_call_function: get rid of the unused nonatomic/retry argument
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.

Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:24:35 +02:00
Ingo Molnar
a60b33cf59 Merge branch 'linus' into core/softirq 2008-06-23 10:52:59 +02:00
Eric W. Biederman
b9f75f45a6 netns: Don't receive new packets in a dead network namespace.
Alexey Dobriyan <adobriyan@gmail.com> writes:
> Subject: ICMP sockets destruction vs ICMP packets oops

> After icmp_sk_exit() nuked ICMP sockets, we get an interrupt.
> icmp_reply() wants ICMP socket.
>
> Steps to reproduce:
>
> 	launch shell in new netns
> 	move real NIC to netns
> 	setup routing
> 	ping -i 0
> 	exit from shell
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> IP: [<ffffffff803fce17>] icmp_sk+0x17/0x30
> PGD 17f3cd067 PUD 17f3ce067 PMD 0 
> Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
> CPU 0 
> Modules linked in: usblp usbcore
> Pid: 0, comm: swapper Not tainted 2.6.26-rc6-netns-ct #4
> RIP: 0010:[<ffffffff803fce17>]  [<ffffffff803fce17>] icmp_sk+0x17/0x30
> RSP: 0018:ffffffff8057fc30  EFLAGS: 00010286
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff81017c7db900
> RDX: 0000000000000034 RSI: ffff81017c7db900 RDI: ffff81017dc41800
> RBP: ffffffff8057fc40 R08: 0000000000000001 R09: 000000000000a815
> R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8057fd28
> R13: ffffffff8057fd00 R14: ffff81017c7db938 R15: ffff81017dc41800
> FS:  0000000000000000(0000) GS:ffffffff80525000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 000000017fcda000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 0, threadinfo ffffffff8053a000, task ffffffff804fa4a0)
> Stack:  0000000000000000 ffff81017c7db900 ffffffff8057fcf0 ffffffff803fcfe4
>  ffffffff804faa38 0000000000000246 0000000000005a40 0000000000000246
>  000000000001ffff ffff81017dd68dc0 0000000000005a40 0000000055342436
> Call Trace:
>  <IRQ>  [<ffffffff803fcfe4>] icmp_reply+0x44/0x1e0
>  [<ffffffff803d3a0a>] ? ip_route_input+0x23a/0x1360
>  [<ffffffff803fd645>] icmp_echo+0x65/0x70
>  [<ffffffff803fd300>] icmp_rcv+0x180/0x1b0
>  [<ffffffff803d6d84>] ip_local_deliver+0xf4/0x1f0
>  [<ffffffff803d71bb>] ip_rcv+0x33b/0x650
>  [<ffffffff803bb16a>] netif_receive_skb+0x27a/0x340
>  [<ffffffff803be57d>] process_backlog+0x9d/0x100
>  [<ffffffff803bdd4d>] net_rx_action+0x18d/0x250
>  [<ffffffff80237be5>] __do_softirq+0x75/0x100
>  [<ffffffff8020c97c>] call_softirq+0x1c/0x30
>  [<ffffffff8020f085>] do_softirq+0x65/0xa0
>  [<ffffffff80237af7>] irq_exit+0x97/0xa0
>  [<ffffffff8020f198>] do_IRQ+0xa8/0x130
>  [<ffffffff80212ee0>] ? mwait_idle+0x0/0x60
>  [<ffffffff8020bc46>] ret_from_intr+0x0/0xf
>  <EOI>  [<ffffffff80212f2c>] ? mwait_idle+0x4c/0x60
>  [<ffffffff80212f23>] ? mwait_idle+0x43/0x60
>  [<ffffffff8020a217>] ? cpu_idle+0x57/0xa0
>  [<ffffffff8040f380>] ? rest_init+0x70/0x80
> Code: 10 5b 41 5c 41 5d 41 5e c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53
> 48 83 ec 08 48 8b 9f 78 01 00 00 e8 2b c7 f1 ff 89 c0 <48> 8b 04 c3 48 83 c4 08
> 5b c9 c3 66 66 66 66 66 2e 0f 1f 84 00
> RIP  [<ffffffff803fce17>] icmp_sk+0x17/0x30
>  RSP <ffffffff8057fc30>
> CR2: 0000000000000000
> ---[ end trace ea161157b76b33e8 ]---
> Kernel panic - not syncing: Aiee, killing interrupt handler!

Receiving packets while we are cleaning up a network namespace is a
racy proposition. It is possible when the packet arrives that we have
removed some but not all of the state we need to fully process it.  We
have the choice of either playing wack-a-mole with the cleanup routines
or simply dropping packets when we don't have a network namespace to
handle them.

Since the check looks inexpensive in netif_receive_skb let's just
drop the incoming packets.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-20 22:16:51 -07:00
Ben Hutchings
6de329e26c net: Fix test for VLAN TX checksum offload capability
Selected device feature bits can be propagated to VLAN devices, so we
can make use of TX checksum offload and TSO on VLAN-tagged packets.
However, if the physical device does not do VLAN tag insertion or
generic checksum offload then the test for TX checksum offload in
dev_queue_xmit() will see a protocol of htons(ETH_P_8021Q) and yield
false.

This splits the checksum offload test into two functions:

- can_checksum_protocol() tests a given protocol against a feature bitmask

- dev_can_checksum() first tests the skb protocol against the device
  features; if that fails and the protocol is htons(ETH_P_8021Q) then
  it tests the encapsulated protocol against the effective device
  features for VLANs

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-16 17:02:28 -07:00
Ingo Molnar
9583f3d9c0 Merge branch 'linus' into core/softirq 2008-06-16 11:24:17 +02:00
Octavian Purdila
293ad60401 tcp: Fix for race due to temporary drop of the socket lock in skb_splice_bits.
skb_splice_bits temporary drops the socket lock while iterating over
the socket queue in order to break a reverse locking condition which
happens with sendfile. This, however, opens a window of opportunity
for tcp_collapse() to aggregate skbs and thus potentially free the
current skb used in skb_splice_bits and tcp_read_sock.

This patch fixes the problem by (re-)getting the same "logical skb"
after the lock has been temporary dropped.

Based on idea and initial patch from Evgeniy Polyakov.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 15:45:58 -07:00
Thomas Graf
bc3ed28caa netlink: Improve returned error codes
Make nlmsg_trim(), nlmsg_cancel(), genlmsg_cancel(), and
nla_nest_cancel() void functions.

Return -EMSGSIZE instead of -1 if the provided message buffer is not
big enough.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-03 16:36:54 -07:00
Brice Goglin
7557af2515 net_dma: remove duplicate assignment in dma_skb_copy_datagram_iovec
No need to compute copy twice in the frags loop in
dma_skb_copy_datagram_iovec().

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-03 16:07:45 -07:00
Stephen Hemminger
b9f5f52cca net: neighbour table ABI problem
The neighbor table time of last use information is returned in the
incorrect unit. Kernel to user space ABI's need to use USER_HZ (or
milliseconds), otherwise the application has to try and discover the
real system HZ value which is problematic.  Linux has standardized on
keeping USER_HZ consistent (100hz) even when kernel is running
internally at some other value.

This change is small, but it breaks the ABI for older version of
iproute2 utilities.  But these utilities are already broken since they
are looking at the psched_hz values which are completely different. So
let's just go ahead and fix both kernel and user space. Older
utilities will just print wrong values.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-03 16:03:15 -07:00
Carlos R. Mafra
962cf36c5b Remove argument from open_softirq which is always NULL
As git-grep shows, open_softirq() is always called with the last argument
being NULL

block/blk-core.c:       open_softirq(BLOCK_SOFTIRQ, blk_done_softirq, NULL);
kernel/hrtimer.c:       open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq, NULL);
kernel/rcuclassic.c:    open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL);
kernel/rcupreempt.c:    open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL);
kernel/sched.c: open_softirq(SCHED_SOFTIRQ, run_rebalance_domains, NULL);
kernel/softirq.c:       open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL);
kernel/softirq.c:       open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL);
kernel/timer.c: open_softirq(TIMER_SOFTIRQ, run_timer_softirq, NULL);
net/core/dev.c: open_softirq(NET_TX_SOFTIRQ, net_tx_action, NULL);
net/core/dev.c: open_softirq(NET_RX_SOFTIRQ, net_rx_action, NULL);

This observation has already been made by Matthew Wilcox in June 2002
(http://www.cs.helsinki.fi/linux/linux-kernel/2002-25/0687.html)

"I notice that none of the current softirq routines use the data element
passed to them."

and the situation hasn't changed since them. So it appears we can safely
remove that extra argument to save 128 (54) bytes of kernel data (text).

Signed-off-by: Carlos R. Mafra <crmafra@ift.unesp.br>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 07:43:15 +02:00
Denis V. Lunev
d3ede327e8 pktgen: make sure that pktgen_thread_worker has been executed
The following courruption can happen during pktgen stop:
list_del corruption. prev->next should be ffff81007e8a5e70, but was 6b6b6b6b6b6b6b6b
kernel BUG at lib/list_debug.c:67!
      :pktgen:pktgen_thread_worker+0x374/0x10b0
      ? autoremove_wake_function+0x0/0x40
      ? _spin_unlock_irqrestore+0x42/0x80
      ? :pktgen:pktgen_thread_worker+0x0/0x10b0
      kthread+0x4d/0x80
      child_rip+0xa/0x12
      ? restore_args+0x0/0x30
      ? kthread+0x0/0x80
      ? child_rip+0x0/0x12
RIP  list_del+0x48/0x70

The problem is that pktgen_thread_worker can not be executed if kthread_stop
has been called too early. Insert a completion on the normal initialization
path to make sure that pktgen_thread_worker will gain the control for sure.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-20 15:12:44 -07:00
David Woodhouse
0e91796eb4 net: Fix call to ->change_rx_flags(dev, IFF_MULTICAST) in dev_change_flags()
Am I just being particularly dim today, or can the call to
dev->change_rx_flags(dev, IFF_MULTICAST) in dev_change_flags() never
happen?

We've just set dev->flags = flags & IFF_MULTICAST, effectively. So the
condition '(dev->flags ^ flags) & IFF_MULTICAST' is _never_ going to be
true.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-20 14:36:14 -07:00
Stephen Hemminger
dcc997738e net: handle errors from device_rename
device_rename can fail with -EEXIST or -ENOMEM, so handle any
problems.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-14 22:33:38 -07:00
Rami Rosen
9ee6b7f155 net: Fix typo in net/core/sock.c.
In sock_queue_rcv_skb()  (net/core/sock.c) it should be:
"Cast sk->rcvbuf ..." instead of: "Cast skb->rcvbuf ..."

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-14 03:50:03 -07:00
Johannes Berg
f5184d267c net: Allow netdevices to specify needed head/tailroom
This patch adds needed_headroom/needed_tailroom members to struct
net_device and updates many places that allocate sbks to use them. Not
all of them can be converted though, and I'm sure I missed some (I
mostly grepped for LL_RESERVED_SPACE)

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-12 20:48:31 -07:00
Ben Hutchings
e46b66bc42 net: Added ASSERT_RTNL() to dev_open() and dev_close().
dev_open() and dev_close() must be called holding the RTNL, since they
call device functions and netdevice notifiers that are promised the RTNL.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-08 02:53:17 -07:00
Pavel Emelyanov
aca51397d0 netns: Fix arbitrary net_device-s corruptions on net_ns stop.
When a net namespace is destroyed, some devices (those, not killed
on ns stop explicitly) are moved back to init_net.

The problem, is that this net_ns change has one point of failure -
the __dev_alloc_name() may be called if a name collision occurs (and
this is easy to trigger). This allocator performs a likely-to-fail
GFP_ATOMIC allocation to find a suitable number. Other possible 
conditions that may cause error (for device being ns local or not
registered) are always false in this case.

So, when this call fails, the device is unregistered. But this is
*not* the right thing to do, since after this the device may be
released (and kfree-ed) improperly. E. g. bridges require more
actions (sysfs update, timer disarming, etc.), some other devices 
want to remove their private areas from lists, etc.

I. e. arbitrary use-after-free cases may occur.

The proposed fix is the following: since the only reason for the
dev_change_net_namespace to fail is the name generation, we may
give it a unique fall-back name w/o %d-s in it - the dev<ifindex>
one, since ifindexes are still unique.

So make this change, raise the failure-case printk loglevel to 
EMERG and replace the unregister_netdevice call with BUG().

[ Use snprintf() -DaveM ]

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-08 01:24:25 -07:00
Johannes Berg
c800578510 net: Fix useless comment reference loop.
include/linux/skbuff.h says:
        /* These elements must be at the end, see alloc_skb() for details.  */

net/core/skbuff.c says:
	* See comment in sk_buff definition, just before the 'tail' member

This patch contains my guess as to the actual reason rather than a
dead comment reference loop.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-03 20:56:42 -07:00