Commit Graph

15701 Commits

Author SHA1 Message Date
Pavel Emelyanov
e314dbdc1c [NET]: Virtual ethernet device driver.
Veth stands for Virtual ETHernet. It is a simple tunnel driver
that works at the link layer and looks like a pair of ethernet
devices interconnected with each other.

Mainly it allows to communicate between network namespaces but
it can be used as is as well.

The newlink callback is organized that way to make it easy to
create the peer device in the separate namespace when we have
them in kernel.

This implementation uses another interface - the RTM_NRELINK
message introduced by Patric.

Bug fixes from Daniel Lezcano.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:47:46 -07:00
Pavel Emelianov
e71992889e [RTNETLINK]: Introduce generic rtnl_create_link().
This routine gets the parsed rtnl attributes and creates a new
link with generic info (IFLA_LINKINFO policy). Its intention
is to help the drivers, that need to create several links at
once (like VETH).

This is nothing but a copy-paste-ed part of rtnl_newlink() function
that is responsible for creation of new device.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:47:45 -07:00
Stephen Hemminger
bea3348eef [NET]: Make NAPI polling independent of struct net_device objects.
Several devices have multiple independant RX queues per net
device, and some have a single interrupt doorbell for several
queues.

In either case, it's easier to support layouts like that if the
structure representing the poll is independant from the net
device itself.

The signature of the ->poll() call back goes from:

	int foo_poll(struct net_device *dev, int *budget)

to

	int foo_poll(struct napi_struct *napi, int budget)

The caller is returned the number of RX packets processed (or
the number of "NAPI credits" consumed if you want to get
abstract).  The callee no longer messes around bumping
dev->quota, *budget, etc. because that is all handled in the
caller upon return.

The napi_struct is to be embedded in the device driver private data
structures.

Furthermore, it is the driver's responsibility to disable all NAPI
instances in it's ->stop() device close handler.  Since the
napi_struct is privatized into the driver's private data structures,
only the driver knows how to get at all of the napi_struct instances
it may have per-device.

With lots of help and suggestions from Rusty Russell, Roland Dreier,
Michael Chan, Jeff Garzik, and Jamal Hadi Salim.

Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.

[ Ported to current tree and all drivers converted.  Integrated
  Stephen's follow-on kerneldoc additions, and restored poll_list
  handling to the old style to fix mutual exclusion issues.  -DaveM ]

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:47:45 -07:00
Andy Green
dfe6e81dea [MAC80211]: Add get_unaligned to ieee80211_get_radiotap_len
ieee80211_get_radiotap_len() tries to dereference radiotap length without
taking care that it is completely unaligned and get_unaligned()
is required.

Signed-off-by: Andy Green <andy@warmcat.com>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:47:40 -07:00
Daniel Drake
d9430a3288 [MAC80211]: implement ERP info change notifications
zd1211rw and bcm43xx are interested in being notified when ERP IE conditions
change, so that they can reprogram a register which affects how control frames
are transmitted.

This patch adds an interface similar to the one that can be found in softmac.

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:47:39 -07:00
Daniel Drake
7e9ed18874 [MAC80211]: improved short preamble handling
Similarly to CTS protection, whether short preambles are used for 802.11b
transmissions should be a per-subif setting, not device global.

For STAs, this patch makes short preamble handling automatic based on the ERP
IE. For APs, hostapd still uses the prism ioctls, but the write ioctl has been
restricted to AP-only subifs.

ieee80211_txrx_data.short_preamble (an unused field) was removed.

Unfortunately, some API changes were required for the following functions:
 - ieee80211_generic_frame_duration
 - ieee80211_rts_duration
 - ieee80211_ctstoself_duration
 - ieee80211_rts_get
 - ieee80211_ctstoself_get
Affected drivers were updated accordingly.

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:47:38 -07:00
Ivo van Doorn
d5d08def92 [MAC80211]: Add LONG_RETRY flag to ieee80211_tx_control
mac80211 informs the driver what the short and long retry values are through
set_retry_limit(), but when packets are being transmitted it did not inform the
driver which of the 2 retry limits should actually be used.
Instead it sends the actual value, but for drivers that can only set the retry limit
and the register and in the descriptor need to indicate which of the limits should
be used this is not really useful.

This patch will add a IEEE80211_TXCTL_LONG_RETRY_LIMIT flag to the
ieee80211_tx_control structure. By default the short retry limit should be
used but if the flag is set the long retry should be used.

This does not prevent the driver to ignore the request for "no retry" packets,
but at least those will be send out with the short retry limit. But there is no
perfect cure for this problem.. :(

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:47:38 -07:00
Michael Wu
be8755e180 [MAC80211]: improve locking of sta_info related structures
The sta_info code has some awkward locking which prevents some driver
callbacks from being allowed to sleep. This patch makes the locking more
focused so code that calls driver callbacks are allowed to sleep. It also
converts sta_lock to a rwlock.

Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:47:37 -07:00
Johannes Berg
571ecf676d [MAC80211]: split RX handlers into own file
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:47:29 -07:00
Linus Torvalds
e46dc1dab9 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [IPv6]: Fix ICMPv6 redirect handling with target multicast address
  [PKT_SCHED] cls_u32: error code isn't been propogated properly
  [ROSE]: Fix rose.ko oops on unload
  [TCP]: Fix fastpath_cnt_hint when GSO skb is partially ACKed
2007-10-08 12:59:10 -07:00
Peter Zijlstra
a200ee182a mm: set_page_dirty_balance() vs ->page_mkwrite()
All the current page_mkwrite() implementations also set the page dirty. Which
results in the set_page_dirty_balance() call to _not_ call balance, because the
page is already found dirty.

This allows us to dirty a _lot_ of pages without ever hitting
balance_dirty_pages().  Not good (tm).

Force a balance call if ->page_mkwrite() was successful.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-08 12:58:14 -07:00
Alexey Dobriyan
891e6a9312 [ROSE]: Fix rose.ko oops on unload
Commit a3d384029a aka
"[AX.25]: Fix unchecked rose_add_loopback_neigh uses"
transformed rose_loopback_neigh var into statically allocated one.
However, on unload it will be kfree's which can't work.

Steps to reproduce:

	modprobe rose
	rmmod rose

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
 printing eip:
c014c664
*pde = 00000000
Oops: 0000 [#1]
PREEMPT DEBUG_PAGEALLOC
Modules linked in: rose ax25 fan ufs loop usbhid rtc snd_intel8x0 snd_ac97_codec ehci_hcd ac97_bus uhci_hcd thermal usbcore button processor evdev sr_mod cdrom
CPU:    0
EIP:    0060:[<c014c664>]    Not tainted VLI
EFLAGS: 00210086   (2.6.23-rc9 #3)
EIP is at kfree+0x48/0xa1
eax: 00000556   ebx: c1734aa0   ecx: f6a5e000   edx: f7082000
esi: 00000000   edi: f9a55d20   ebp: 00200287   esp: f6a5ef28
ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
Process rmmod (pid: 1823, ti=f6a5e000 task=f7082000 task.ti=f6a5e000)
Stack: f9a55d20 f9a5200c 00000000 00000000 00000000 f6a5e000 f9a5200c f9a55a00 
       00000000 bf818cf0 f9a51f3f f9a55a00 00000000 c0132c60 65736f72 00000000 
       f69f9630 f69f9528 c014244a f6a4e900 00200246 f7082000 c01025e6 00000000 
Call Trace:
 [<f9a5200c>] rose_rt_free+0x1d/0x49 [rose]
 [<f9a5200c>] rose_rt_free+0x1d/0x49 [rose]
 [<f9a51f3f>] rose_exit+0x4c/0xd5 [rose]
 [<c0132c60>] sys_delete_module+0x15e/0x186
 [<c014244a>] remove_vma+0x40/0x45
 [<c01025e6>] sysenter_past_esp+0x8f/0x99
 [<c012bacf>] trace_hardirqs_on+0x118/0x13b
 [<c01025b6>] sysenter_past_esp+0x5f/0x99
 =======================
Code: 05 03 1d 80 db 5b c0 8b 03 25 00 40 02 00 3d 00 40 02 00 75 03 8b 5b 0c 8b 73 10 8b 44 24 18 89 44 24 04 9c 5d fa e8 77 df fd ff <8b> 56 08 89 f8 e8 84 f4 fd ff e8 bd 32 06 00 3b 5c 86 60 75 0f 
EIP: [<c014c664>] kfree+0x48/0xa1 SS:ESP 0068:f6a5ef28

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-07 23:44:17 -07:00
Linus Torvalds
0c2043abef Don't do load-average calculations at even 5-second intervals
It turns out that there are a few other five-second timers in the
kernel, and if the timers get in sync, the load-average can get
artificially inflated by events that just happen to coincide.

So just offset the load average calculation it by a timer tick.

Noticed by Anders Boström, for whom the coincidence started triggering
on one of his machines with the JBD jiffies rounding code (JBD is one of
the subsystems that also end up using a 5-second timer by default).

Tested-by: Anders Boström <anders@bostrom.dyndns.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-07 16:23:13 -07:00
Serge Belyshev
4ecbca8554 Remove unnecessary cast in prefetch()
It is ok to call prefetch() function with NULL argument, as specifically
commented in include/linux/prefetch.h.  But in standard C, it is invalid
to dereference NULL pointer (see C99 standard 6.5.3.2 paragraph 4 and
note #84).

prefetch() has a memory reference for its argument.

Newer gcc versions (4.3 and above) will use that to conclude that "x"
argument is non-null and thus wreaking havok everywhere prefetch() was
inlined.

Fixed by removing cast and changing asm constraint.

[ It seems in theory gcc 4.2 could miscompile this too; although no
  cases known.  In 2.6.24 we should probably switch to
  __builtin_prefetch() instead, but this is a simpler fix for now.
				-- AK ]

Signed-off-by: Serge Belyshev <belyshev@depni.sinp.msu.ru>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-05 08:04:35 -07:00
Linus Torvalds
c7659e2c13 Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [MIPS] Terminally fix local_{dec,sub}_if_positive
  [MIPS] Type proof reimplementation of cmpxchg.
  [MIPS] pg-r4k.c: Fix a typo in an R4600 v2 erratum workaround
2007-10-03 15:43:17 -07:00
Michael Hennerich
cda6a20b68 Blackfin arch: fix PORT_J BUG for BF537/6 EMAC driver reported by Kalle Pokki <kalle.pokki@iki.fi>
Cc: Kalle Pokki <kalle.pokki@iki.fi>
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
2007-10-04 00:36:18 +08:00
Michael Hennerich
c58c2140f0 Blackfin arch: gpio pinmux and resource allocation API required by BF537 on chip ethernet mac driver
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
2007-10-04 00:35:05 +08:00
Ralf Baechle
9ea0f043fe [MIPS] Terminally fix local_{dec,sub}_if_positive
They contain 64-bit instructions so wouldn't work on 32-bit kernels or
32-bit hardware.  Since there are no users, blow them away.  They
probably were only ever created because there are atomic_sub_if_positive
and atomic_dec_if_positive which exist only for sake of semaphores.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-10-03 14:30:52 +01:00
Ralf Baechle
fef74705ea [MIPS] Type proof reimplementation of cmpxchg.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-10-03 14:30:52 +01:00
Linus Torvalds
a3470171d6 Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [MIPS] vmlinux.lds.S: Handle note sections
  [MIPS] Fix value of O_TRUNC
2007-10-01 20:15:45 -07:00
Ralf Baechle
ca074a3392 [MIPS] Fix value of O_TRUNC
A "cleanup" almost two years ago deleted the old definition from
<asm/fcntl.h>, so asm-generic/fcntl.h defaulted it to the the same
value as FASYNC ...   which happened to be the wrong thing.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-10-01 14:17:50 +01:00
Nick Piggin
4827bbb06e i386: remove bogus comment about memory barrier
The comment being removed by this patch is incorrect and misleading.

In the following situation:

	1. load  ...
	2. store 1 -> X
	3. wmb
	4. rmb
	5. load  a <- Y
	6. store ...

4 will only ensure ordering of 1 with 5.
3 will only ensure ordering of 2 with 6.

Further, a CPU with strictly in-order stores will still only provide that
2 and 6 are ordered (effectively, it is the same as a weakly ordered CPU
with wmb after every store).

In all cases, 5 may still be executed before 2 is visible to other CPUs!

The additional piece of the puzzle that mb() provides is the store/load
ordering, which fundamentally cannot be achieved with any combination of
rmb()s and wmb()s.

This can be an unexpected result if one expected any sort of global ordering
guarantee to barriers (eg. that the barriers themselves are sequentially
consistent with other types of barriers).  However sfence or lfence barriers
need only provide an ordering partial ordering of memory operations -- Consider
that wmb may be implemented as nothing more than inserting a special barrier
entry in the store queue, or, in the case of x86, it can be a noop as the store
queue is in order. And an rmb may be implemented as a directive to prevent
subsequent loads only so long as their are no previous outstanding loads (while
there could be stores still in store queues).

I can actually see the occasional load/store being reordered around lfence on
my core2. That doesn't prove my above assertions, but it does show the comment
is wrong (unless my program is -- can send it out by request).

So:
   mb() and smp_mb() always have and always will require a full mfence
   or lock prefixed instruction on x86.  And we should remove this comment.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Paul McKenney <paulmck@us.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-29 09:13:59 -07:00
Linus Torvalds
05e31754d1 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [TCP]: Fix MD5 signature handling on big-endian.
  [NET]: Zero length write() on socket should not simply return 0.
2007-09-28 15:44:44 -07:00
David S. Miller
f8ab18d2d9 [TCP]: Fix MD5 signature handling on big-endian.
Based upon a report and initial patch by Peter Lieven.

tcp4_md5sig_key and tcp6_md5sig_key need to start with
the exact same members as tcp_md5sig_key.  Because they
are both cast to that type by tcp_v{4,6}_md5_do_lookup().

Unfortunately tcp{4,6}_md5sig_key use a u16 for the key
length instead of a u8, which is what tcp_md5sig_key
uses.  This just so happens to work by accident on
little-endian, but on big-endian it doesn't.

Instead of casting, just place tcp_md5sig_key as the first member of
the address-family specific structures, adjust the access sites, and
kill off the ugly casts.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-28 15:18:35 -07:00
Ralf Baechle
7d809ba3f9 [MIPS] Fix CONFIG_BUILD_ELF64 kernels with symbols in CKSEG0.
The __pa() for those did assume that all symbols have XKPHYS values and
the math fails for any other address range.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-09-27 23:19:16 +01:00