Pull NFS client updates for Linux 3.4 from Trond Myklebust:
"New features include:
- Add NFS client support for containers.
This should enable most of the necessary functionality, including
lockd support, and support for rpc.statd, NFSv4 idmapper and
RPCSEC_GSS upcalls into the correct network namespace from which
the mount system call was issued.
- NFSv4 idmapper scalability improvements
Base the idmapper cache on the keyring interface to allow
concurrent access to idmapper entries. Start the process of
migrating users from the single-threaded daemon-based approach to
the multi-threaded request-key based approach.
- NFSv4.1 implementation id.
Allows the NFSv4.1 client and server to mutually identify each
other for logging and debugging purposes.
- Support the 'vers=4.1' mount option for mounting NFSv4.1 instead of
having to use the more counterintuitive 'vers=4,minorversion=1'.
- SUNRPC tracepoints.
Start the process of adding tracepoints in order to improve
debugging of the RPC layer.
- pNFS object layout support for autologin.
Important bugfixes include:
- Fix a bug in rpc_wake_up/rpc_wake_up_status that caused them to
fail to wake up all tasks when applied to priority waitqueues.
- Ensure that we handle read delegations correctly, when we try to
truncate a file.
- A number of fixes for NFSv4 state manager loops (mostly to do with
delegation recovery)."
* tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (224 commits)
NFS: fix sb->s_id in nfs debug prints
xprtrdma: Remove assumption that each segment is <= PAGE_SIZE
xprtrdma: The transport should not bug-check when a dup reply is received
pnfs-obj: autologin: Add support for protocol autologin
NFS: Remove nfs4_setup_sequence from generic rename code
NFS: Remove nfs4_setup_sequence from generic unlink code
NFS: Remove nfs4_setup_sequence from generic read code
NFS: Remove nfs4_setup_sequence from generic write code
NFS: Fix more NFS debug related build warnings
SUNRPC/LOCKD: Fix build warnings when CONFIG_SUNRPC_DEBUG is undefined
nfs: non void functions must return a value
SUNRPC: Kill compiler warning when RPC_DEBUG is unset
SUNRPC/NFS: Add Kbuild dependencies for NFS_DEBUG/RPC_DEBUG
NFS: Use cond_resched_lock() to reduce latencies in the commit scans
NFSv4: It is not safe to dereference lsp->ls_state in release_lockowner
NFS: ncommit count is being double decremented
SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up()
Try using machine credentials for RENEW calls
NFSv4.1: Fix a few issues in filelayout_commit_pagelist
NFSv4.1: Clean ups and bugfixes for the pNFS read/writeback/commit code
...
The test for "if (cred->request_key_auth->flags & KEY_FLAG_REVOKED) {"
should actually testing that the (1 << KEY_FLAG_REVOKED) bit is set.
The current code actually checks for KEY_FLAG_DEAD.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
The keyctl_set_timeout function isn't exported to other parts of the
kernel, but I want to use it for the NFS idmapper. I already have the
key, but I wanted a generic way to set the timeout.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* git://git.samba.org/sfrench/cifs-2.6:
CIFS: Rename *UCS* functions to *UTF16*
[CIFS] ACL and FSCACHE support no longer EXPERIMENTAL
[CIFS] Fix build break with multiuser patch when LANMAN disabled
cifs: warn about impending deprecation of legacy MultiuserMount code
cifs: fetch credentials out of keyring for non-krb5 auth multiuser mounts
cifs: sanitize username handling
keys: add a "logon" key type
cifs: lower default wsize when unix extensions are not used
cifs: better instrumentation for coalesce_t2
cifs: integer overflow in parse_dacl()
cifs: Fix sparse warning when calling cifs_strtoUCS
CIFS: Add descriptions to the brlock cache functions
Replace the rcu_assign_pointer() calls with rcu_assign_keypointer().
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
The kernel contains some special internal keyrings, for instance the DNS
resolver keyring :
2a93faf1 I----- 1 perm 1f030000 0 0 keyring .dns_resolver: empty
It would occasionally be useful to allow the contents of such keyrings to be
flushed by root (cache invalidation).
Allow a flag to be set on a keyring to mark that someone possessing the
sysadmin capability can clear the keyring, even without normal write access to
the keyring.
Set this flag on the special keyrings created by the DNS resolver, the NFS
identity mapper and the CIFS identity mapper.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
For CIFS, we want to be able to store NTLM credentials (aka username
and password) in the keyring. We do not, however want to allow users
to fetch those keys back out of the keyring since that would be a
security risk.
Unfortunately, due to the nuances of key permission bits, it's not
possible to do this. We need to grant search permissions so the kernel
can find these keys, but that also implies permissions to read the
payload.
Resolve this by adding a new key_type. This key type is essentially
the same as key_type_user, but does not define a .read op. This
prevents the payload from ever being visible from userspace. This
key type also vets the description to ensure that it's "qualified"
by checking to ensure that it has a ':' in it that is preceded by
other characters.
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Enabling CONFIG_PROVE_RCU and CONFIG_SPARSE_RCU_POINTER resulted in
"suspicious rcu_dereference_check() usage!" and "incompatible types
in comparison expression (different address spaces)" messages.
Access the masterkey directly when holding the rwsem.
Changelog v1:
- Use either rcu_read_lock()/rcu_derefence_key()/rcu_read_unlock()
or remove the unnecessary rcu_derefence() - David Howells
Reported-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Define rcu_assign_keypointer(), which uses the key payload.rcudata instead
of payload.data, to resolve the CONFIG_SPARSE_RCU_POINTER message:
"incompatible types in comparison expression (different address spaces)"
Replace the rcu_assign_pointer() calls in encrypted/trusted keys with
rcu_assign_keypointer().
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Add missing smp_rmb() primitives to the keyring search code.
When keyring payloads are appended to without replacement (thus using up spare
slots in the key pointer array), an smp_wmb() is issued between the pointer
assignment and the increment of the key count (nkeys).
There should be corresponding read barriers between the read of nkeys and
dereferences of keys[n] when n is dependent on the value of nkeys.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Give keys their own lockdep class to differentiate them from each other in case
a key of one type has to refer to a key of another type.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Encrypted keys are encrypted/decrypted using either a trusted or
user-defined key type, which is referred to as the 'master' key.
The master key may be of type trusted iff the trusted key is
builtin or both the trusted key and encrypted keys are built as
modules. This patch resolves the build dependency problem.
- Use "masterkey-$(CONFIG_TRUSTED_KEYS)-$(CONFIG_ENCRYPTED_KEYS)" construct
to encapsulate the above logic. (Suggested by Dimtry Kasatkin.)
- Fixing the encrypted-keys Makefile, results in a module name change
from encrypted.ko to encrypted-keys.ko.
- Add module dependency for request_trusted_key() definition
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
The basic idea behind cross memory attach is to allow MPI programs doing
intra-node communication to do a single copy of the message rather than a
double copy of the message via shared memory.
The following patch attempts to achieve this by allowing a destination
process, given an address and size from a source process, to copy memory
directly from the source process into its own address space via a system
call. There is also a symmetrical ability to copy from the current
process's address space into a destination process's address space.
- Use of /proc/pid/mem has been considered, but there are issues with
using it:
- Does not allow for specifying iovecs for both src and dest, assuming
preadv or pwritev was implemented either the area read from or
written to would need to be contiguous.
- Currently mem_read allows only processes who are currently
ptrace'ing the target and are still able to ptrace the target to read
from the target. This check could possibly be moved to the open call,
but its not clear exactly what race this restriction is stopping
(reason appears to have been lost)
- Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
domain socket is a bit ugly from a userspace point of view,
especially when you may have hundreds if not (eventually) thousands
of processes that all need to do this with each other
- Doesn't allow for some future use of the interface we would like to
consider adding in the future (see below)
- Interestingly reading from /proc/pid/mem currently actually
involves two copies! (But this could be fixed pretty easily)
As mentioned previously use of vmsplice instead was considered, but has
problems. Since you need the reader and writer working co-operatively if
the pipe is not drained then you block. Which requires some wrapping to
do non blocking on the send side or polling on the receive. In all to all
communication it requires ordering otherwise you can deadlock. And in the
example of many MPI tasks writing to one MPI task vmsplice serialises the
copying.
There are some cases of MPI collectives where even a single copy interface
does not get us the performance gain we could. For example in an
MPI_Reduce rather than copy the data from the source we would like to
instead use it directly in a mathops (say the reduce is doing a sum) as
this would save us doing a copy. We don't need to keep a copy of the data
from the source. I haven't implemented this, but I think this interface
could in the future do all this through the use of the flags - eg could
specify the math operation and type and the kernel rather than just
copying the data would apply the specified operation between the source
and destination and store it in the destination.
Although we don't have a "second user" of the interface (though I've had
some nibbles from people who may be interested in using it for intra
process messaging which is not MPI). This interface is something which
hardware vendors are already doing for their custom drivers to implement
fast local communication. And so in addition to this being useful for
OpenMPI it would mean the driver maintainers don't have to fix things up
when the mm changes.
There was some discussion about how much faster a true zero copy would
go. Here's a link back to the email with some testing I did on that:
http://marc.info/?l=linux-mm&m=130105930902915&w=2
There is a basic man page for the proposed interface here:
http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt
This has been implemented for x86 and powerpc, other architecture should
mainly (I think) just need to add syscall numbers for the process_vm_readv
and process_vm_writev. There are 32 bit compatibility versions for
64-bit kernels.
For arch maintainers there are some simple tests to be able to quickly
verify that the syscalls are working correctly here:
http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz
Signed-off-by: Chris Yeoh <yeohc@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: <linux-man@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For each hex2bin call in encrypted keys, check that the ascii hex string
is valid. On failure, return -EINVAL.
Changelog v1:
- hex2bin now returns an int
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
For each hex2bin call in trusted keys, check that the ascii hex string is
valid. On failure, return -EINVAL.
Changelog v1:
- hex2bin now returns an int
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Fixes this build error:
security/keys/encrypted-keys/masterkey_trusted.c: In function 'request_trusted_key':
security/keys/encrypted-keys/masterkey_trusted.c:35:2: error: implicit declaration of function 'IS_ERR'
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Encrypted keys are decrypted/encrypted using either a trusted-key or,
for those systems without a TPM, a user-defined key. This patch
removes the trusted-keys and TCG_TPM dependencies.
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
unregister_key_type() has code to mark a key as dead and make it unavailable in
one loop and then destroy all those unavailable key payloads in the next loop.
However, the loop to mark keys dead renders the key undetectable to the second
loop by changing the key type pointer also.
Fix this by the following means:
(1) The key code has two garbage collectors: one deletes unreferenced keys and
the other alters keyrings to delete links to old dead, revoked and expired
keys. They can end up holding each other up as both want to scan the key
serial tree under spinlock. Combine these into a single routine.
(2) Move the dead key marking, dead link removal and dead key removal into the
garbage collector as a three phase process running over the three cycles
of the normal garbage collection procedure. This is tracked by the
KEY_GC_REAPING_DEAD_1, _2 and _3 state flags.
unregister_key_type() then just unlinks the key type from the list, wakes
up the garbage collector and waits for the third phase to complete.
(3) Downgrade the key types sem in unregister_key_type() once it has deleted
the key type from the list so that it doesn't block the keyctl() syscall.
(4) Dead keys that cannot be simply removed in the third phase have their
payloads destroyed with the key's semaphore write-locked to prevent
interference by the keyctl() syscall. There should be no in-kernel users
of dead keys of that type by the point of unregistration, though keyctl()
may be holding a reference.
(5) Only perform timer recalculation in the GC if the timer actually expired.
If it didn't, we'll get another cycle when it goes off - and if the key
that actually triggered it has been removed, it's not a problem.
(6) Only garbage collect link if the timer expired or if we're doing dead key
clean up phase 2.
(7) As only key_garbage_collector() is permitted to use rb_erase() on the key
serial tree, it doesn't need to revalidate its cursor after dropping the
spinlock as the node the cursor points to must still exist in the tree.
(8) Drop the spinlock in the GC if there is contention on it or if we need to
reschedule. After dealing with that, get the spinlock again and resume
scanning.
This has been tested in the following ways:
(1) Run the keyutils testsuite against it.
(2) Using the AF_RXRPC and RxKAD modules to test keytype removal:
Load the rxrpc_s key type:
# insmod /tmp/af-rxrpc.ko
# insmod /tmp/rxkad.ko
Create a key (http://people.redhat.com/~dhowells/rxrpc/listen.c):
# /tmp/listen &
[1] 8173
Find the key:
# grep rxrpc_s /proc/keys
091086e1 I--Q-- 1 perm 39390000 0 0 rxrpc_s 52:2
Link it to a session keyring, preferably one with a higher serial number:
# keyctl link 0x20e36251 @s
Kill the process (the key should remain as it's linked to another place):
# fg
/tmp/listen
^C
Remove the key type:
rmmod rxkad
rmmod af-rxrpc
This can be made a more effective test by altering the following part of
the patch:
if (unlikely(gc_state & KEY_GC_REAPING_DEAD_2)) {
/* Make sure everyone revalidates their keys if we marked a
* bunch as being dead and make sure all keyring ex-payloads
* are destroyed.
*/
kdebug("dead sync");
synchronize_rcu();
To call synchronize_rcu() in GC phase 1 instead. That causes that the
keyring's old payload content to hang around longer until it's RCU
destroyed - which usually happens after GC phase 3 is complete. This
allows the destroy_dead_key branch to be tested.
Reported-by: Benjamin Coddington <bcodding@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>