Commit Graph

687 Commits

Author SHA1 Message Date
Yu Zhiguo 3c8e03166a NFSv4: do exact check about attribute specified
Server should return NFS4ERR_ATTRNOTSUPP if an attribute specified is
not supported in current environment.
Operations CREATE, NVERIFY, OPEN, SETATTR and VERIFY should do this check.

This bug is found when do newpynfs tests. The names of the tests that failed
are following:
  CR12 NVF7a NVF7b NVF7c NVF7d NVF7f NVF7r NVF7s
  OPEN15 VF7a VF7b VF7c VF7d VF7f VF7r VF7s

Add function do_check_fattr() to do exact check:
1, Check attribute specified is supported by the NFSv4 server or not.
2, Check FATTR4_WORD0_ACL & FATTR4_WORD0_FS_LOCATIONS are supported
   in current environment or not.
3, Check attribute specified is writable or not.

step 1 and 3 are done in function nfsd4_decode_fattr() but removed
to this function now.

Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-01 18:01:54 -04:00
Greg Banks 1dbd0d53f3 knfsd: remove unreported filehandle stats counters
The file nfsfh.c contains two static variables nfsd_nr_verified and
nfsd_nr_put.  These are counters which are incremented as a side
effect of the fh_verify() fh_compose() and fh_put() operations,
i.e. at least twice per NFS call for any non-trivial workload.
Needless to say this makes the cacheline that contains them (and any
other innocent victims) a very hot contention point indeed under high
call-rate workloads on multiprocessor NFS server.  It also turns out
that these counters are not used anywhere.  They're not reported to
userspace, they're not used in logic, they're not even exported from
the object file (let alone the module).  All they do is waste CPU time.

So this patch removes them.

Tests on a 16 CPU Altix A4700 with 2 10gige Myricom cards, configured
separately (no bonding).  Workload is 640 client threads doing directory
traverals with random small reads, from server RAM.

Before
======

Kernel profile:

  %   cumulative   self              self     total
 time   samples   samples    calls   1/call   1/call  name
  6.05   2716.00  2716.00    30406     0.09     1.02  svc_process
  4.44   4706.00  1990.00     1975     1.01     1.01  spin_unlock_irqrestore
  3.72   6376.00  1670.00     1666     1.00     1.00  svc_export_put
  3.41   7907.00  1531.00     1786     0.86     1.02  nfsd_ofcache_lookup
  3.25   9363.00  1456.00    10965     0.13     1.01  nfsd_dispatch
  3.10  10752.00  1389.00     1376     1.01     1.01  nfsd_cache_lookup
  2.57  11907.00  1155.00     4517     0.26     1.03  svc_tcp_recvfrom
  ...
  2.21  15352.00  1003.00     1081     0.93     1.00  nfsd_choose_ofc  <----
  ^^^^

Here the function nfsd_choose_ofc() reads a global variable
which by accident happened to be located in the same cacheline as
nfsd_nr_verified.

Call rate:

nullarbor:~ # pmdumptext nfs3.server.calls
...
Thu Dec 13 00:15:27     184780.663
Thu Dec 13 00:15:28     184885.881
Thu Dec 13 00:15:29     184449.215
Thu Dec 13 00:15:30     184971.058
Thu Dec 13 00:15:31     185036.052
Thu Dec 13 00:15:32     185250.475
Thu Dec 13 00:15:33     184481.319
Thu Dec 13 00:15:34     185225.737
Thu Dec 13 00:15:35     185408.018
Thu Dec 13 00:15:36     185335.764

After
=====

kernel profile:

  %   cumulative   self              self     total
 time   samples   samples    calls   1/call   1/call  name
  6.33   2813.00  2813.00    29979     0.09     1.01  svc_process
  4.66   4883.00  2070.00     2065     1.00     1.00  spin_unlock_irqrestore
  4.06   6687.00  1804.00     2182     0.83     1.00  nfsd_ofcache_lookup
  3.20   8110.00  1423.00    10932     0.13     1.00  nfsd_dispatch
  3.03   9456.00  1346.00     1343     1.00     1.00  nfsd_cache_lookup
  2.62  10622.00  1166.00     4645     0.25     1.01  svc_tcp_recvfrom
[...]
  0.10  42586.00    44.00       74     0.59     1.00  nfsd_choose_ofc  <--- HA!!
  ^^^^

Call rate:

nullarbor:~ # pmdumptext nfs3.server.calls
...
Thu Dec 13 01:45:28     194677.118
Thu Dec 13 01:45:29     193932.692
Thu Dec 13 01:45:30     194294.364
Thu Dec 13 01:45:31     194971.276
Thu Dec 13 01:45:32     194111.207
Thu Dec 13 01:45:33     194999.635
Thu Dec 13 01:45:34     195312.594
Thu Dec 13 01:45:35     195707.293
Thu Dec 13 01:45:36     194610.353
Thu Dec 13 01:45:37     195913.662
Thu Dec 13 01:45:38     194808.675

i.e. about a 5.3% improvement in call rate.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Reviewed-by: David Chinner <dgc@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-27 14:14:03 -04:00
Greg Banks cf0a586cf4 knfsd: fix reply cache memory corruption
Fix a regression in the reply cache introduced when the code was
converted to use proper Linux lists.  When a new entry needs to be
inserted, the case where all the entries are currently being used
by threads is not correctly detected.  This can result in memory
corruption and a crash.  In the current code this is an extremely
unlikely corner case; it would require the machine to have 1024
nfsd threads and all of them to be busy at the same time.  However,
upcoming reply cache changes make this more likely; a crash due to
this problem was actually observed in field.

Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-27 14:14:02 -04:00
Greg Banks fca4217c5b knfsd: reply cache cleanups
Make REQHASH() an inline function.  Rename hash_list to cache_hash.
Fix an obsolete comment.

Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-27 14:14:02 -04:00
Wang Chen 02cb2858db nfsd: nfs4_stat_init cleanup
Save some loop time.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-06 16:22:41 -04:00
Randy Dunlap 9064caae8f nfsd: use C99 struct initializers
Eliminate 56 sparse warnings like this one:

fs/nfsd/nfs4xdr.c:1331:15: warning: obsolete array initializer, use C99 syntax

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-03 15:09:12 -04:00
J. Bruce Fields 63e4863fab nfsd4: make recall callback an asynchronous rpc
As with the probe, this removes the need for another kthread.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-03 15:08:56 -04:00
J. Bruce Fields 3aea09dc91 nfsd4: track recall retries in nfs4_delegation
Move this out of a local variable into the nfs4_delegation object in
preparation for making this an async rpc call (at which point we'll need
any state like this in a common object that's preserved across function
calls).

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-01 20:11:12 -04:00
J. Bruce Fields 6707bd3d42 nfsd4: remove unused dl_trunc
There's no point in keeping this field around--it's always zero.

(Background: the protocol allows you to tell the client that the file is
about to be truncated, as an optimization to save the client from
writing back dirty pages that will just be discarded.  We don't
implement this hint.  If we do some day, adding this field back in will
be the least of the work involved.)

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-01 19:57:46 -04:00
J. Bruce Fields b53d40c507 nfsd4: eliminate struct nfs4_cb_recall
The nfs4_cb_recall struct is used only in nfs4_delegation, so its
pointer to the containing delegation is unnecessary--we could just use
container_of().

But there's no real reason to have this a separate struct at all--just
move these fields to nfs4_delegation.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-01 19:50:00 -04:00
J. Bruce Fields c237dc0303 nfsd4: rename callback struct to cb_conn
I want to use the name for a struct that actually does represent a
single callback.

(Actually, I've never been sure it helps to a separate struct for the
callback information.  Some day maybe those fields could just be dumped
into struct nfs4_client.  I don't know.)

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-01 17:31:44 -04:00
J. Bruce Fields e300a63ce4 nfsd4: replace callback thread by asynchronous rpc
We don't really need a synchronous rpc, and moving to an asynchronous
rpc allows us to do without this extra kthread.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-29 17:10:53 -04:00
J. Bruce Fields 3cef9ab266 nfsd4: lookup up callback cred only once
Lookup the callback cred once and then use it for all subsequent
callbacks.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-29 16:45:03 -04:00
J. Bruce Fields ecdd03b791 nfsd4: create rpc callback client from server thread
The code is a little simpler, and it should be easier to avoid races, if
we just do all rpc client creation/destruction from nfsd or laundromat
threads and do only the rpc calls themselves asynchronously.  The rpc
creation doesn't involve any significant waiting (it doesn't call the
client, for example), so there's no reason not to do this.

Also don't bother destroying the client on failure of the rpc null
probe.  We may want to retry the probe later anyway.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-29 16:44:53 -04:00
J. Bruce Fields e1cab5a589 nfsd4: set cb_client inside setup_callback_client
This is just a minor code simplification.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-29 16:44:47 -04:00
J. Bruce Fields 595947acaa nfsd4: set shorter timeout
We tried to do something overly complicated with the callback rpc
timeouts here.  And they're wrong--the result is that by the time a
single callback times out, it's already too late to tell the client
(using the cb_path_down return to RENEW) that the callback is down.

Use a much shorter, simpler timeout.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-29 16:44:40 -04:00
J. Bruce Fields f64f79ea5f nfsd4: setclientid_confirm callback-change fixes
This setclientid_confirm case should allow the client to change
callbacks, but it currently has a dummy implementation that just turns
off callbacks completely.  That dummy implementation isn't completely
correct either, though:

	- There's no need to remove any client recovery directory in
	  this case.
	- New clientid confirm verifiers should be generated (and
	  returned) in setclientid; there's no need to generate a new
	  one here.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-29 16:44:34 -04:00
J. Bruce Fields b8fd47aefa nfsd: quiet compile warning
Stephen Rothwell said:
"Today's linux-next build (powerpc ppc64_defconfig) produced this new
warning:

fs/nfsd/nfs4state.c: In function 'EXPIRED_STATEID':
fs/nfsd/nfs4state.c:2757: warning: comparison of distinct pointer types lacks a cast

Caused by commit 78155ed75f ("nfsd4:
distinguish expired from stale stateids")."

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
2009-04-29 11:36:17 -04:00
J. Bruce Fields c654b8a9cb nfsd: support ext4 i_version
ext4 supports a real NFSv4 change attribute, which is bumped whenever
the ctime would be updated, including times when two updates arrive
within a jiffy of each other.  (Note that although ext4 has space for
nanosecond-precision ctime, the real resolution is lower: it actually
uses jiffies as the time-source.)  This ensures clients will invalidate
their caches when they need to.

There is some fear that keeping the i_version up-to-date could have
performance drawbacks, so for now it's turned on only by a mount option.
We hope to do something better eventually.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Theodore Tso <tytso@mit.edu>
2009-04-29 11:35:49 -04:00
J. Bruce Fields 3352d2c2d0 nfsd4: delete obsolete xdr comments
We don't need comments to tell us these macros are ugly.  And we're long
past trying to share any of this code with the BSD's.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-29 11:35:49 -04:00
J. Bruce Fields bc749ca4c4 nfsd: eliminate ENCODE_HEAD macro
This macro doesn't serve any useful purpose.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-29 11:35:49 -04:00
Chuck Lever e06b64050e NFSD: Stricter buffer size checking in fs/nfsd/nfsctl.c
Clean up: For consistency, handle output buffer size checking in a
other nfsctl functions the same way it's done for write_versions().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-28 13:54:30 -04:00
Chuck Lever 261758b5c3 NFSD: Stricter buffer size checking in write_versions()
While it's not likely today that there are enough NFS versions to
overflow the output buffer in write_versions(), we should be more
careful about detecting the end of the buffer.

The number of NFS versions will only increase as NFSv4 minor versions
are added.

Note that this API doesn't behave the same as portlist.  Here we
attempt to display as many versions as will fit in the buffer, and do
not provide any indication that an overflow would have occurred.  I
don't have any good rationale for that.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-28 13:54:30 -04:00
Chuck Lever 3d72ab8fdd NFSD: Stricter buffer size checking in write_recoverydir()
While it's not likely a pathname will be longer than
SIMPLE_TRANSACTION_SIZE, we should be more careful about just
plopping it into the output buffer without bounds checking.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-28 13:54:30 -04:00
Chuck Lever 8435d34dbb SUNRPC: pass buffer size to svc_sock_names()
Adjust the synopsis of svc_sock_names() to pass in the size of the
output buffer.  Add a documenting comment.

This is a cosmetic change for now.  A subsequent patch will make sure
the buffer length is passed to one_sock_name(), where the length will
actually be useful.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-28 13:54:28 -04:00