Commit Graph

146 Commits

Author SHA1 Message Date
Nick Piggin
a3a065e3f1 fs: no games with DCACHE_UNHASHED
Filesystems outside the regular namespace do not have to clear DCACHE_UNHASHED
in order to have a working /proc/$pid/fd/XXX. Nothing in proc prevents the
fd link from being used if its dentry is not in the hash.

Also, it does not get put into the dcache hash if DCACHE_UNHASHED is clear;
that depends on the filesystem calling d_add or d_rehash.

So delete the misleading comments and needless code.

Acked-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-17 10:51:40 -05:00
Al Viro
2c48b9c455 switch alloc_file() to passing struct path
... and have the caller grab both mnt and dentry; kill
leak in infiniband, while we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:42 -05:00
Al Viro
cc3808f8c3 switch sock_alloc_file() to alloc_file()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:42 -05:00
Al Viro
7cbe66b6b5 merge sock_alloc_fd/sock_attach_fd into a new helper
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:41 -05:00
Al Viro
198de4d7ac reorder alloc_fd/attach_fd in socketpair()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:41 -05:00
Jean-Mickael Guerin
d7256d0eb4 net: compat_mmsghdr must be used in sys_recvmmsg
Both to traverse the entries and to set the msg_len field.

Commiter note: folded two patches and avoided one branch repeating the
compat test.

Signed-off-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02 01:23:23 -08:00
Arnd Bergmann
805003a41c net/atm: move all compat_ioctl handling to atm/ioctl.c
We have two implementations of the compat_ioctl handling for ATM, the
one that we have had for ages in fs/compat_ioctl.c and the one added to
net/atm/ioctl.c by David Woodhouse. Unfortunately, both versions are
incomplete, and in practice we use a very confusing combination of the
two.

For ioctl numbers that have the same identifier on 32 and 64 bit systems,
we go directly through the compat_ioctl socket operation, for those that

differ, we do a conversion in fs/compat_ioctl.c.

This patch moves both variants into the vcc_compat_ioctl() function,
while preserving the current behaviour. It also kills off the COMPATIBLE_IOCTL
definitions that we never use here.
Doing it this way is clearly not a good solution, but I hope it is a
step into the right direction, so that someone is able to clean up this
mess for real.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-11 19:22:23 -08:00
Arnd Bergmann
a2116ed223 net/compat: fix dev_ifsioc emulation corner cases
Handling for SIOCSHWTSTAMP is broken on architectures
with a split user/kernel address space like s390,
because it passes a real user pointer while using
set_fs(KERNEL_DS).
A similar problem might arise the next time somebody
adds code to dev_ifsioc.

Split up dev_ifsioc into three separate functions for
SIOCSHWTSTAMP, SIOC*IFMAP and all other numbers so
we can get rid of set_fs in all potentially affected
cases.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Patrick Ohly <patrick.ohly@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-11 19:22:22 -08:00
Arnd Bergmann
7a50a240c4 net/compat_ioctl: support SIOCWANDEV
This adds compat_ioctl support for SIOCWANDEV, which has
always been missing.

The definition of struct compat_ifreq was missing an
ifru_settings fields that is needed to support SIOCWANDEV,
so add that and clean up the whitespace damage in the
struct definition.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 20:57:03 -08:00
Arnd Bergmann
fab2532ba5 net, compat_ioctl: fix SIOCGMII ioctls
SIOCGMIIPHY and SIOCGMIIREG return data through ifreq,
so it needs to be converted on the way out as well.

SIOCGIFPFLAGS is unused, but has the same problem in theory.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 20:56:21 -08:00
Arnd Bergmann
9177efd399 net, compat_ioctl: handle more ioctls correctly
The MII ioctls and SIOCSIFNAME need to go through ifsioc conversion,
which they never did so far. Some others are not implemented in the
native path, so we can just return -EINVAL directly.

Add IFSLAVE ioctls to the EINVAL list and move it to the end to
optimize the code path for the common case.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06 23:11:15 -08:00
Arnd Bergmann
6b96018b28 compat: move sockios handling to net/socket.c
This removes the original socket compat_ioctl code
from fs/compat_ioctl.c and converts the code from the copy
in net/socket.c into a single function. We add a few cycles
of runtime to compat_sock_ioctl() with the long switch()
statement, but gain some cycles in return by simplifying
the call chain to get there.

Due to better inlining, save 1.5kb of object size in the
process, and enable further savings:

before:
   text    data     bss     dec     hex filename
  13540   18008    2080   33628    835c obj/fs/compat_ioctl.o
  14565     636      40   15241    3b89 obj/net/socket.o

after:
   text    data     bss     dec     hex filename
   8916   15176    2080   26172    663c obj/fs/compat_ioctl.o
  20725     636      40   21401    5399 obj/net/socket.o

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06 23:10:54 -08:00
Arnd Bergmann
7a229387d3 net: copy socket ioctl code to net/socket.h
This makes an identical copy of the socket compat_ioctl code
from fs/compat_ioctl.c to net/socket.c, as a preparation
for moving the functionality in a way that can be easily
reviewed.

The code is hidden inside of #if 0 and gets activated in the
patch that will make it work.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06 23:00:29 -08:00
Eric Paris
3f378b6844 net: pass kern to net_proto_family create function
The generic __sock_create function has a kern argument which allows the
security system to make decisions based on if a socket is being created by
the kernel or by userspace.  This patch passes that flag to the
net_proto_family specific create function, so it can do the same thing.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-05 22:18:14 -08:00
Arnaldo Carvalho de Melo
a2e2725541 net: Introduce recvmmsg socket syscall
Meaning receive multiple messages, reducing the number of syscalls and
net stack entry/exit operations.

Next patches will introduce mechanisms where protocols that want to
optimize this operation will provide an unlocked_recvmsg operation.

This takes into account comments made by:

. Paul Moore: sock_recvmsg is called only for the first datagram,
  sock_recvmsg_nosec is used for the rest.

. Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
  works in the same fashion as the ppoll one.

  If the underlying protocol returns a datagram with MSG_OOB set, this
  will make recvmmsg return right away with as many datagrams (+ the OOB
  one) it has received so far.

. Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
  datagrams and then recvmsg returns an error, recvmmsg will return
  the successfully received datagrams, store the error and return it
  in the next call.

This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
where we will be able to acquire the lock only at batch start and end, not at
every underlying recvmsg call.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-12 23:40:10 -07:00
Neil Horman
3b885787ea net: Generalize socket rx gap / receive queue overflow cmsg
Create a new socket level option to report number of queue overflows

Recently I augmented the AF_PACKET protocol to report the number of frames lost
on the socket receive queue between any two enqueued frames.  This value was
exported via a SOL_PACKET level cmsg.  AFter I completed that work it was
requested that this feature be generalized so that any datagram oriented socket
could make use of this option.  As such I've created this patch, It creates a
new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
overflowed between any two given frames.  It also augments the AF_PACKET
protocol to take advantage of this new feature (as it previously did not touch
sk->sk_drops, which this patch uses to record the overflow count).  Tested
successfully by me.

Notes:

1) Unlike my previous patch, this patch simply records the sk_drops value, which
is not a number of drops between packets, but rather a total number of drops.
Deltas must be computed in user space.

2) While this patch currently works with datagram oriented protocols, it will
also be accepted by non-datagram oriented protocols. I'm not sure if thats
agreeable to everyone, but my argument in favor of doing so is that, for those
protocols which aren't applicable to this option, sk_drops will always be zero,
and reporting no drops on a receive queue that isn't used for those
non-participating protocols seems reasonable to me.  This also saves us having
to code in a per-protocol opt in mechanism.

3) This applies cleanly to net-next assuming that commit
977750076d (my af packet cmsg patch) is reverted

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-12 13:26:31 -07:00
David S. Miller
8aa0f64ac3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-10-09 14:40:09 -07:00
Johannes Berg
3d23e349d8 wext: refactor
Refactor wext to
 * split out iwpriv handling
 * split out iwspy handling
 * split out procfs support
 * allow cfg80211 to have wireless extensions compat code
   w/o CONFIG_WIRELESS_EXT

After this, drivers need to
 - select WIRELESS_EXT	- for wext support
 - select WEXT_PRIV	- for iwpriv support
 - select WEXT_SPY	- for iwspy support

except cfg80211 -- which gets new hooks in wext-core.c
and can then get wext handlers without CONFIG_WIRELESS_EXT.

Wireless extensions procfs support is auto-selected
based on PROC_FS and anything that requires the wext core
(i.e. WIRELESS_EXT or CFG80211_WEXT).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-07 16:39:43 -04:00
Eric Dumazet
bcdce7195e net: speedup sk_wake_async()
An incoming datagram must bring into cpu cache *lot* of cache lines,
in particular : (other parts omitted (hash chains, ip route cache...))

On 32bit arches :

offsetof(struct sock, sk_rcvbuf)       =0x30    (read)
offsetof(struct sock, sk_lock)         =0x34   (rw)

offsetof(struct sock, sk_sleep)        =0x50 (read)
offsetof(struct sock, sk_rmem_alloc)   =0x64   (rw)
offsetof(struct sock, sk_receive_queue)=0x74   (rw)

offsetof(struct sock, sk_forward_alloc)=0x98   (rw)

offsetof(struct sock, sk_callback_lock)=0xcc    (rw)
offsetof(struct sock, sk_drops)        =0xd8 (read if we add dropcount support, rw if frame dropped)
offsetof(struct sock, sk_filter)       =0xf8    (read)

offsetof(struct sock, sk_socket)       =0x138 (read)

offsetof(struct sock, sk_data_ready)   =0x15c   (read)


We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
with no fasync() structures. (socket->fasync_list ptr is probably already in cache
because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)

This avoids one cache line load per incoming packet for common cases (no fasync())

We can leave (or even move in a future patch) sk->sk_socket in a cold location

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-06 17:28:29 -07:00
David S. Miller
b7058842c9 net: Make setsockopt() optlen be unsigned.
This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.

Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-30 16:12:20 -07:00
Arjan van de Ven
47379052b5 net: Add explicit bound checks in net/socket.c
The sys_socketcall() function has a very clever system for the copy
size of its arguments. Unfortunately, gcc cannot deal with this in
terms of proving that the copy_from_user() is then always in bounds.
This is the last (well 9th of this series, but last in the kernel) such
case around.

With this patch, we can turn on code to make having the boundary provably
right for the whole kernel, and detect introduction of new security
accidents of this type early on.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-28 12:57:44 -07:00
Nick Black
1fd7317d02 Move magic numbers into magic.h
Move various magic-number definitions into magic.h.

Signed-off-by: Nick Black <dank@qemfd.net>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 07:39:28 -07:00
Alexey Dobriyan
b87221de6a const: mark remaining super_operations const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:24 -07:00
Eric Dumazet
29a020d35f [PATCH] net: kmemcheck annotation in struct socket
struct socket has a 16 bit hole that triggers kmemcheck warnings.

As suggested by Ingo, use kmemcheck annotations

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-15 02:39:20 -07:00
Linus Torvalds
e694958388 Make sock_sendpage() use kernel_sendpage()
kernel_sendpage() does the proper default case handling for when the
socket doesn't have a native sendpage implementation.

Now, arguably this might be something that we could instead solve by
just specifying that all protocols should do it themselves at the
protocol level, but we really only care about the common protocols.
Does anybody really care about sendpage on something like Appletalk? Not
likely.

Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Julien TINNES <julien@cr0.org>
Acked-by: Tavis Ormandy <taviso@sdf.lonestar.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-13 10:57:26 -07:00