IPV6, unlike IPV4, doesn't have a routing cache.
Routing table entries, as well as clones made in response
to route lookup requests, all live in the same table. And
all of these things are together collected in the destination
cache table for ipv6.
This means that routing table entries count against the garbage
collection limits, even though such entries cannot ever be reclaimed
and are added explicitly by the administrator (rather than being
created in response to lookups).
Therefore it makes no sense to count ipv6 routing table entries
against the GC limits.
Add a DST_NOCOUNT destination cache entry flag, and skip the counting
if it is set. Use this flag bit in ipv6 when adding routing table
entries.
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
AFS: Use i_generation not i_version for the vnode uniquifier
AFS: Set s_id in the superblock to the volume name
vfs: Fix data corruption after failed write in __block_write_begin()
afs: afs_fill_page reads too much, or wrong data
VFS: Fix vfsmount overput on simultaneous automount
fix wrong iput on d_inode introduced by e6bc45d65d
Delay struct net freeing while there's a sysfs instance refering to it
afs: fix sget() races, close leak on umount
ubifs: fix sget races
ubifs: split allocation of ubifs_info into a separate function
fix leak in proc_set_super()
* new refcount in struct net, controlling actual freeing of the memory
* new method in kobj_ns_type_operations (->drop_ns())
* ->current_ns() semantics change - it's supposed to be followed by
corresponding ->drop_ns(). For struct net in case of CONFIG_NET_NS it bumps
the new refcount; net_drop_ns() decrements it and calls net_free() if the
last reference has been dropped. Method renamed to ->grab_current_ns().
* old net_free() callers call net_drop_ns() instead.
* sysfs_exit_ns() is gone, along with a large part of callchain
leading to it; now that the references stored in ->ns[...] stay valid we
do not need to hunt them down and replace them with NULL. That fixes
problems in sysfs_lookup() and sysfs_readdir(), along with getting rid
of sb->s_instances abuse.
Note that struct net *shutdown* logics has not changed - net_cleanup()
is called exactly when it used to be called. The only thing postponed by
having a sysfs instance refering to that struct net is actual freeing of
memory occupied by struct net.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
There is a dev_put(ndev) missing on an error path. This was
introduced in 0c1ad04aec "netpoll: prevent netpoll setup on slave
devices".
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Testing of VLAN_FLAG_REORDER_HDR does not belong in vlan_untag
but rather in vlan_do_receive. Otherwise the vlan header
will not be properly put on the packet in the case of
vlan header accelleration.
As we remove the check from vlan_check_reorder_header
rename it vlan_reorder_header to keep the naming clean.
Fix up the skb->pkt_type early so we don't look at the packet
after adding the vlan tag, which guarantees we don't goof
and look at the wrong field.
Use a simple if statement instead of a complicated switch
statement to decided that we need to increment rx_stats
for a multicast packet.
Hopefully at somepoint we will just declare the case where
VLAN_FLAG_REORDER_HDR is cleared as unsupported and remove
the code. Until then this keeps it working correctly.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 8d8fc29d02
(netpoll: disable netpoll when enslave a device), we automatically
disable netpoll when the underlying device is being enslaved,
we also need to prevent people from setuping netpoll on
devices that are already enslaved.
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Frank Blaschka reported :
<quote>
During heavy network load we turn off/on cpus.
Sometimes this causes a stall on the network device.
Digging into the dump I found out following:
napi is scheduled but does not run. From the I/O buffers
and the napi state I see napi/rx_softirq processing has stopped
because the budget was reached. napi stays in the
softnet_data poll_list and the rx_softirq was raised again.
I assume at this time the cpu offline comes in,
the rx softirq is raised/moved to another cpu but napi stays in the
poll_list of the softnet_data of the now offline cpu.
Reviewing dev_cpu_callback (net/core/dev.c) I did not find the
poll_list is transfered to the new cpu.
</quote>
This patch is a straightforward implementation of Frank suggestion :
Transfert poll_list and trigger NET_RX_SOFTIRQ on new cpu.
Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This interface uses a temporary buffer, but for no real reason.
And now can generate warnings like:
net/sched/sch_generic.c: In function dev_watchdog
net/sched/sch_generic.c:254:10: warning: unused variable drivername
Just return driver->name directly or "".
Reported-by: Connor Hansen <cmdkhh@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BTW, looking through the code related to struct net lifetime rules has
caught something else:
struct net *get_net_ns_by_fd(int fd)
{
...
file = proc_ns_fget(fd);
if (!file)
goto out;
ei = PROC_I(file->f_dentry->d_inode);
while in proc_ns_fget() we have two return ERR_PTR(...) and not a single
path that would return NULL. The other caller of proc_ns_fget() treats
ERR_PTR() correctly...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (40 commits)
tg3: Fix tg3_skb_error_unmap()
net: tracepoint of net_dev_xmit sees freed skb and causes panic
drivers/net/can/flexcan.c: add missing clk_put
net: dm9000: Get the chip in a known good state before enabling interrupts
drivers/net/davinci_emac.c: add missing clk_put
af-packet: Add flag to distinguish VID 0 from no-vlan.
caif: Fix race when conditionally taking rtnl lock
usbnet/cdc_ncm: add missing .reset_resume hook
vlan: fix typo in vlan_dev_hard_start_xmit()
net/ipv4: Check for mistakenly passed in non-IPv4 address
iwl4965: correctly validate temperature value
bluetooth l2cap: fix locking in l2cap_global_chan_by_psm
ath9k: fix two more bugs in tx power
cfg80211: don't drop p2p probe responses
Revert "net: fix section mismatches"
drivers/net/usb/catc.c: Fix potential deadlock in catc_ctrl_run()
sctp: stop pending timers and purge queues when peer restart asoc
drivers/net: ks8842 Fix crash on received packet when in PIO mode.
ip_options_compile: properly handle unaligned pointer
iwlagn: fix incorrect PCI subsystem id for 6150 devices
...
Because there is a possibility that skb is kfree_skb()ed and zero cleared
after ndo_start_xmit, we should not see the contents of skb like skb->len and
skb->dev->name after ndo_start_xmit. But trace_net_dev_xmit does that
and causes panic by NULL pointer dereference.
This patch fixes trace_net_dev_xmit not to see the contents of skb directly.
If you want to reproduce this panic,
1. Get tracepoint of net_dev_xmit on
2. Create 2 guests on KVM
2. Make 2 guests use virtio_net
4. Execute netperf from one to another for a long time as a network burden
5. host will panic(It takes about 30 minutes)
Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
net: Kill ratelimit.h dependency in linux/net.h
net: Add linux/sysctl.h includes where needed.
net: Kill ether_table[] declaration.
inetpeer: fix race in unused_list manipulations
atm: expose ATM device index in sysfs
IPVS: bug in ip_vs_ftp, same list heaad used in all netns.
bug.h: Move ratelimit warn interfaces to ratelimit.h
bonding: cleanup module option descriptions
net:8021q:vlan.c Fix pr_info to just give the vlan fullname and version.
net: davinci_emac: fix dev_err use at probe
can: convert to %pK for kptr_restrict support
net: fix ETHTOOL_SFEATURES compatibility with old ethtool_ops.set_flags
netfilter: Fix several warnings in compat_mtw_from_user().
netfilter: ipset: fix ip_set_flush return code
netfilter: ipset: remove unused variable from type_pf_tdel()
netfilter: ipset: Use proper timeout value to jiffies conversion
Ingo Molnar noticed that we have this unnecessary ratelimit.h
dependency in linux/net.h, which hid compilation problems from
people doing builds only with CONFIG_NET enabled.
Move this stuff out to a seperate net/net_ratelimit.h file and
include that in the only two places where this thing is needed.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
As reported by Ingo Molnar, we still have configuration combinations
where use of the WARN_RATELIMIT interfaces break the build because
dependencies don't get met.
Instead of going down the long road of trying to make it so that
ratelimit.h can get included by kernel.h or asm-generic/bug.h,
just move the interface into ratelimit.h and make users have
to include that.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Current code squashes flags to bool - this makes set_flags fail whenever
some ETH_FLAG_* equivalent features are set. Fix this.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd:
net: fix get_net_ns_by_fd for !CONFIG_NET_NS
ns proc: Return -ENOENT for a nonexistent /proc/self/ns/ entry.
ns: Declare sys_setns in syscalls.h
net: Allow setting the network namespace by fd
ns proc: Add support for the ipc namespace
ns proc: Add support for the uts namespace
ns proc: Add support for the network namespace.
ns: Introduce the setns syscall
ns: proc files for namespace naming policy.
Commit e67f88dd12 (dont hold rtnl mutex during netlink dump callbacks)
missed fact that rtnl_fill_ifinfo() must be called with rtnl held.
Because of possible deadlocks between two mutexes (cb_mutex and rtnl),
its not easy to solve this problem, so revert this part of the patch.
It also forgot one rcu_read_unlock() in FIB dump_rules()
Add one ASSERT_RTNL() in rtnl_fill_ifinfo() to remind us the rule.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After merging the final tree, today's linux-next build (powerpc
ppc44x_defconfig) failed like this:
net/built-in.o: In function `get_net_ns_by_fd':
(.text+0x11976): undefined reference to `netns_operations'
net/built-in.o: In function `get_net_ns_by_fd':
(.text+0x1197a): undefined reference to `netns_operations'
netns_operations is only available if CONFIG_NET_NS is set ...
Caused by commit f063052947 ("net: Allow setting the network namespace
by fd").
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
dst_default_metrics is readonly, we dont want to kfree() it later.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
synchronize_rcu() is very slow in various situations (HZ=100,
CONFIG_NO_HZ=y, CONFIG_PREEMPT=n)
Extract from my (mostly idle) 8 core machine :
synchronize_rcu() in 99985 us
synchronize_rcu() in 79982 us
synchronize_rcu() in 87612 us
synchronize_rcu() in 79827 us
synchronize_rcu() in 109860 us
synchronize_rcu() in 98039 us
synchronize_rcu() in 89841 us
synchronize_rcu() in 79842 us
synchronize_rcu() in 80151 us
synchronize_rcu() in 119833 us
synchronize_rcu() in 99858 us
synchronize_rcu() in 73999 us
synchronize_rcu() in 79855 us
synchronize_rcu() in 79853 us
When we hold RTNL mutex, we would like to spend some cpu cycles but not
block too long other processes waiting for this mutex.
We also want to setup/dismantle network features as fast as possible at
boot/shutdown time.
This patch makes synchronize_net() call the expedited version if RTNL is
locked.
synchronize_rcu_expedited() typical delay is about 20 us on my machine.
synchronize_rcu_expedited() in 18 us
synchronize_rcu_expedited() in 18 us
synchronize_rcu_expedited() in 18 us
synchronize_rcu_expedited() in 18 us
synchronize_rcu_expedited() in 20 us
synchronize_rcu_expedited() in 16 us
synchronize_rcu_expedited() in 20 us
synchronize_rcu_expedited() in 18 us
synchronize_rcu_expedited() in 18 us
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Ben Greear <greearb@candelatech.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A mis-configured filter can spam the logs with lots of stack traces.
Rate-limit the warnings and add printout of the bogus filter information.
Original-patch-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This also shrinks the object size a little.
text data bss dec hex filename
28834 186 8 29028 7164 net/core/pktgen.o
28816 186 8 29010 7152 net/core/pktgen.o.AFTER
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: "David Miller" <davem@davemloft.net>,
Signed-off-by: David S. Miller <davem@davemloft.net>
In the old days, we used to access dev->master in __netif_receive_skb()
in a rcu_read_lock section.
So one synchronize_net() call was needed in netdev_set_master() to make
sure another cpu could not use old master while/after we release it.
We now use netdev_rx_handler infrastructure and added one
synchronize_net() call in bond_release()/bond_release_all()
Remove the obsolete synchronize_net() from netdev_set_master() and add
one in bridge del_nbp() after its netdev_rx_handler_unregister() call.
This makes enslave -d a bit faster.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jiri Pirko <jpirko@redhat.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>