Currently all network protocols need to call dev_ioctl as the default
fallback in their ioctl implementations. This patch adds a fallback
to dev_ioctl to sock_ioctl if the protocol returned -ENOIOCTLCMD.
This way all the procotol ioctl handlers can be simplified and we don't
need to export dev_ioctl.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
To help in reducing the number of include dependencies, several files were
touched as they were getting needed headers indirectly for stuff they use.
Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
linux/dccp.h include twice.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I noticed that some of 'struct proto_ops' used in the kernel may share
a cache line used by locks or other heavily modified data. (default
linker alignement is 32 bytes, and L1_CACHE_LINE is 64 or 128 at
least)
This patch makes sure a 'struct proto_ops' can be declared as const,
so that all cpus can share all parts of it without false sharing.
This is not mandatory : a driver can still use a read/write structure
if it needs to (and eventually a __read_mostly)
I made a global stubstitute to change all existing occurences to make
them const.
This should reduce the possibility of false sharing on SMP, and
speedup some socket system calls.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace cube root algorithim with a faster version using Newton-Raphson.
Surprisingly, doing the scaled div64_64 is faster than a true 64 bit
division on 64 bit CPU's.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Revised version of patch to pre-compute values for TCP cubic.
* d32,d64 replaced with descriptive names
* cube_factor replaces
srtt[scaled by count] / HZ * ((1 << (10+2*BICTCP_HZ)) / bic_scale)
* beta_scale replaces
8*(BICTCP_BETA_SCALE+beta)/3/(BICTCP_BETA_SCALE-beta);
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As DCCP needs to be called in the same spots.
Now we have a member in inet_sock (is_icsk), set at sock creation time from
struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and
DCCP) to see if a struct sock instance is a inet_connection_sock for places
like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if
sk_type was SOCK_STREAM, that is insufficient because we now use the same code
for DCCP, that has sk_type SOCK_DCCP.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Renaming it to inet_hash_connect, making it possible to ditch
dccp_v4_hash_connect and share the same code with TCP instead.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So that we can share several timewait sockets related functions and
make the timewait mini sockets infrastructure closer to the request
mini sockets one.
Next changesets will take advantage of this, moving more code out of
TCP and DCCP v4 and v6 to common infrastructure.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Out of tcp6_timewait_sock, that now is just an aggregation of
inet_timewait_sock and inet6_timewait_sock, using tw_ipv6_offset in struct
inet_timewait_sock, that is common to the IPv6 transport protocols that use
timewait sockets, like DCCP and TCP.
tw_ipv6_offset plays the struct inet_sock pinfo6 role, i.e. for the generic
code to find the IPv6 area in a timewait sock.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes dead code. I don't see the reason to keep this cruft
around, besides cluttering the nice and functionally working code.
Signed-off-by: Roberto Nibali <ratz@drugphish.ch>
Signed-off-by: Horms <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since udp_checksum_init always returns 0 there is no point in
having it return a value.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a packet is obtained from skb_recv_datagram with MSG_PEEK enabled
it is left on the socket receive queue. This means that when we detect
a checksum error we have to be careful when trying to free the packet
as someone could have dequeued it in the time being.
Currently this delicate logic is duplicated three times between UDPv4,
UDPv6 and RAWv6. This patch moves them into a one place and simplifies
the code somewhat.
This is based on a suggestion by Eric Dumazet.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Renaming it to inet_csk_addr2sockaddr.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And move it to struct inet_connection_sock. DCCP will use it in the
upcoming changesets.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And inet6_rsk_offset in inet_request_sock, for the same reasons as
inet_sock's pinfo6 member.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Another spin of Herbert Xu's "safer ip reassembly" patch
for 2.6.16.
(The original patch is here:
http://marc.theaimsgroup.com/?l=linux-netdev&m=112281936522415&w=2
and my only contribution is to have tested it.)
This patch (optionally) does additional checks before accepting IP
fragments, which can greatly reduce the possibility of reassembling
fragments which originated from different IP datagrams.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arthur Kepner <akepner@sgi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Part of a performance problem with ip_tables is that memory allocation
is not NUMA aware, but 'only' SMP aware (ie each CPU normally touch
separate cache lines)
Even with small iptables rules, the cost of this misplacement can be
high on common workloads. Instead of using one vmalloc() area
(located in the node of the iptables process), we now allocate an area
for each possible CPU, using vmalloc_node() so that memory should be
allocated in the CPU's node if possible.
Port to arp_tables and ip6_tables by Harald Welte.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As noticed by Phil Oester, the GRE NAT protocol helper is initialized
before the NAT core, which makes registration fail.
Change the linking order to make NAT be initialized first.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>