If the server decides to close the RPC socket, we currently don't actually
respond until either another RPC call is scheduled, or until xprt_autoclose()
gets called by the socket expiry timer (which may be up to 5 minutes
later).
This patch ensures that xprt_autoclose() is called much sooner if the
server closes the socket.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
inode->i_mode contains a lot more than just the mode bits. Make sure that
we mask away this extra stuff in SETATTR calls to the server.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: Every ULP that uses the in-kernel RPC client, except the NLM
client, sets cl_chatty. There's no reason why NLM shouldn't set it, so
just get rid of cl_chatty and always be verbose.
Test-plan:
Compile with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
At some point, transport endpoint addresses will no longer be IPv4. To hide
the structure of the rpc_xprt's address field from ULPs and port mappers,
add an API for setting the port number during an RPC bind operation.
Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We'd like to hide fields in rpc_xprt and rpc_clnt from upper layer protocols.
Start by creating an API to force RPC rebind, replacing logic that simply
sets cl_port to zero.
Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add RPC client transport switch support for replacing buffer management
on a per-transport basis.
In the current IPv4 socket transport implementation, RPC buffers are
allocated as needed for each RPC message that is sent. Some transport
implementations may choose to use pre-allocated buffers for encoding,
sending, receiving, and unmarshalling RPC messages, however. For
transports capable of direct data placement, the buffers can be carved
out of a pre-registered area of memory rather than from a slab cache.
Test-plan:
Millions of fsx operations. Performance characterization with "sio" and
"iozone". Use oprofile and other tools to look for significant regression
in CPU utilization.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Thanks to Ed Keizer for bug and root cause. He says: "... we could only mount
the top-level Solaris share. We could not mount deeper into the tree.
Investigation showed that Solaris allows UNIX authenticated FSINFO only on the
top level of the share. This is a problem because we share/export our home
directories one level higher than we mount them. I.e. we share the partition
and not the individual home directories. This prevented access to home
directories."
We still may need to try auth_sys for the case where the client doesn't have
appropriate credentials.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The procedure that decodes statd sm_notify call seems to be skipping a
few arguments. How did this ever work?
>From folks at Polyserve.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the server receives an NLM cancel call and finds no waiting lock to
cancel, then chances are the lock has already been applied, and the client
just hadn't yet processed the NLM granted callback before it sent the
cancel.
The Open Group text, for example, perimts a server to return either success
(LCK_GRANTED) or failure (LCK_DENIED) in this case. But returning an error
seems more helpful; the client may be able to use it to recognize that a
race has occurred and to recover from the race.
So, modify the relevant functions to return an error in this case.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently when lockd gets an NLM_CANCEL request, it also does an unlock for
the same range. This is incorrect.
The Open Group documentation says that "This procedure cancels an
*outstanding* blocked lock request." (Emphasis mine.)
Also, consider a client that holds a lock on the first byte of a file, and
requests a lock on the entire file. If the client cancels that request
(perhaps because the requesting process is signalled), the server shouldn't
apply perform an unlock on the entire file, since that will also remove the
previous lock that the client was already granted.
Or consider a lock request that actually *downgraded* an exclusive lock to
a shared lock.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Slightly simpler logic here makes it more trivial to verify that the up's
and down's are balanced here. Break out an assignment from a conditional
while we're at it.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Upon return of a write delegation, the server will almost always bump the
change attribute. Ensure that we pick up that change so that we don't
invalidate our data cache unnecessarily.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
According to RFC3530 we're supposed to cache the change attribute
at the time the client receives a write delegation.
If the inode is clean, a CB_GETATTR callback by the server to the
client is supposed to return the cached change attribute.
If, OTOH, the inode is dirty, the client should bump the cached
change attribute by 1.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The SuS states that a call to write() will cause mtime to be updated on
the file. In order to satisfy that requirement, we need to flush out
any cached writes in nfs_getattr().
Speed things up slightly by not committing the writes.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Most NFS server implementations allow up to 64KB reads and writes on the
wire. The Solaris NFS server allows up to a megabyte, for instance.
Now the Linux NFS client supports transfer sizes up to 1MB, too. This will
help reduce protocol and context switch overhead on read/write intensive NFS
workloads, and support larger atomic read and write operations on servers
that support them.
Test-plan:
Connectathon and iozone on mount point with wsize=rsize>32768 over TCP.
Tests with NFS over UDP to verify the maximum RPC payload size cap.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
To help NFS users and server developers, make the "inode number mismatch"
message display more useful information.
Test-plan:
None.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
nfs_statfs() generates a log message when GETATTR returns an error. This
is usually a useless message. Make it a dprintk.
Test plan:
None
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>