The problem: proc_net files remember which network namespace the are
against but do not remember hold a reference count (as that would pin
the network namespace). So we currently have a small window where
the reference count on a network namespace may be incremented when opening
a /proc file when it has already gone to zero.
To fix this introduce maybe_get_net and get_proc_net.
maybe_get_net increments the network namespace reference count only if it is
greater then zero, ensuring we don't increment a reference count after it
has gone to zero.
get_proc_net handles all of the magic to go from a proc inode to the network
namespace instance and call maybe_get_net on it.
PROC_NET the old accessor is removed so that we don't get confused and use
the wrong helper function.
Then I fix up the callers to use get_proc_net and handle the case case
where get_proc_net returns NULL. In that case I return -ENXIO because
effectively the network namespace has already gone away so the files
we are trying to access don't exist anymore.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the appropriate EXPORT_SYMBOLS for proc_net_create,
proc_net_fops_create and proc_net_remove to fix errors when
compiling allmodconfig
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes most of the generic device layer network
namespace safe. This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables. The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl
were modified to take a network namespace argument, and
deal with it.
vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.
So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces. The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace. This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.
For now the ifindex generator is left global.
Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.
At the same time there are assumptions in the network stack
that the ifindex of a network device won't change. Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each netlink socket will live in exactly one network namespace,
this includes the controlling kernel sockets.
This patch updates all of the existing netlink protocols
to only support the initial network namespace. Request
by clients in other namespaces will get -ECONREFUSED.
As they would if the kernel did not have the support for
that netlink protocol compiled in.
As each netlink protocol is updated to be multiple network
namespace safe it can register multiple kernel sockets
to acquire a presence in the rest of the network namespaces.
The implementation in af_netlink is a simple filter implementation
at hash table insertion and hash table look up time.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes /proc/net per network namespace. It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass &init_net for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has been updated to be handle multiple network namespaces.
Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current implementation of dev_ifname makes maintenance difficult
because updates to the implementation of the ioctl have to made in two
places. So this patch updates dev_ifname32 to do a classic 32/64
structure conversion and call sys_ioctl like the rest of the
compat calls do.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent fix for a circular lock dependency unfortunately introduced a
potential memory leak in the event where the call to nlmsvc_lookup_host
fails for some reason.
Thanks to Roel Kluin for spotting this.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When IOCB_FLAG_RESFD flag is set and iocb->aio_resfd is incorrect,
statement 'goto out_put_req' is executed. At label 'out_put_req',
aio_put_req(..) is called, which requires 'req->ki_filp' set.
Signed-off-by: Yan Zheng<yanzheng@21cn.com>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/cooloney/blackfin-2.6:
Blackfin arch: fix PORT_J BUG for BF537/6 EMAC driver reported by Kalle Pokki <kalle.pokki@iki.fi>
Blackfin arch: gpio pinmux and resource allocation API required by BF537 on chip ethernet mac driver
Blackfin arch: add some missing syscall
binfmt_flat: checkpatch fixing minimum support for the blackfin relocations
Binfmt_flat: Add minimum support for the Blackfin relocations
The fs was not unlocking the local alloc inode mutex in the code path in
which it failed to find a window of free bits in the global bitmap.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Nick Piggin points out that splice isn't being good about the mmap
semaphore: while two readers can nest inside each others, it does leave
a possible deadlock if a writer (ie a new mmap()) comes in during that
nesting.
Original "just move the locking" patch by Nick, replaced by one by me
based on an optimistic pagefault_disable(). And then Jens tested and
updated that patch.
Reported-by: Nick Piggin <npiggin@suse.de>
Tested-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit b394e43e99.
Lachlan McIlroy says:
It tried to fix an issue where log replay is replaying an inode cluster
initialisation transaction that should not be replayed because the inode
cluster on disk is more up to date. Since we don't log file sizes (we
rely on inode flushing to get them to disk) then we can't just replay
all the transations in the log and expect the inode to be completely
restored. We lose file size updates. Unfortunately this fix is causing
more (serious) problems than it is fixing.
SGI-PV: 969656
SGI-Modid: xfs-linux-melb:xfs-kern:29804a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
It doesn't look as if the NFS file name limit is being initialised correctly
in the struct nfs_server. Make sure that we limit whatever is being set in
nfs_probe_fsinfo() and nfs_init_server().
Also ensure that readdirplus and nfs4_path_walk respect our file name
limits.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The problem is that the garbage collector for the 'host' structures
nlm_gc_hosts(), holds nlm_host_mutex while calling down to
nlmsvc_mark_resources, which, eventually takes the file->f_mutex.
We cannot therefore call nlmsvc_lookup_host() from within
nlmsvc_create_block, since the caller will already hold file->f_mutex, so
the attempt to grab nlm_host_mutex may deadlock.
Fix the problem by calling nlmsvc_lookup_host() outside the file->f_mutex.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6:
[PATCH] WE : Add missing auth compat-ioctl
[PATCH] softmac: Fix inability to associate with WEP networks
Different types of ufs hold state in different places, to hide complexity
of this, there is ufs_get_fs_state, it returns state according to
"UFS_SB(sb)->s_flags", but during mount ufs_get_fs_state is called, before
setting s_flags, this cause message for ufs types like sun ufs: "fs need
fsck", and remount in readonly state.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add minimum support for the Blackfin relocations, since we don't have
enough space in each reloc. The idea is to store a value with one
relocation so that subsequent ones can access it.
Actually, this patch is required for Blackfin. Currently if BINFMT_FLAT is
enabled, git-tree kernel will fail to compile.
Signed-off-by: Bernd Schmidt <bernd.schmidt@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: David McCullough <davidm@snapgear.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Miles Bader <miles.bader@necel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
ocfs2: Pack vote message and response structures
ocfs2: Don't double set write parameters
ocfs2: Fix pos/len passed to ocfs2_write_cluster
ocfs2: Allow smaller allocations during large writes
The ocfs2_vote_msg and ocfs2_response_msg structs needed to be
packed to ensure similar sizeofs in 32-bit and 64-bit arches. Without this,
we had inadvertantly broken 32/64 bit cross mounts.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The target page offsets were being incorrectly set a second time in
ocfs2_prepare_page_for_write(), which was causing problems on a 16k page
size kernel. Additionally, ocfs2_write_failure() was incorrectly using those
parameters instead of the parameters for the individual page being cleaned
up.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>