Commit Graph

1761 Commits

Author SHA1 Message Date
Linus Torvalds 22b4e63ebe Merge tag 'nfs-for-3.6-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:

 - Final (hopefully) fix for the range checking code in NFSv4 getacl.
   This should fix the Oopses being seen when the acl size is close to
   PAGE_SIZE.
 - Fix a regression with the legacy binary mount code
 - Fix a regression in the readdir cookieverf initialisation
 - Fix an RPC over UDP regression
 - Ensure that we report all errors in the NFSv4 open code
 - Ensure that fsync() reports all relevant synchronisation errors.

* tag 'nfs-for-3.6-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: fsync() must exit with an error if page writeback failed
  SUNRPC: Fix a UDP transport regression
  NFS: return error from decode_getfh in decode open
  NFSv4: Fix buffer overflow checking in __nfs4_get_acl_uncached
  NFSv4: Fix range checking in __nfs4_get_acl_uncached and __nfs4_proc_set_acl
  NFS: Fix a problem with the legacy binary mount code
  NFS: Fix the initialisation of the readdir 'cookieverf' array
2012-09-13 09:04:13 +08:00
Trond Myklebust f39c1bfb5a SUNRPC: Fix a UDP transport regression
Commit 43cedbf0e8 (SUNRPC: Ensure that
we grab the XPRT_LOCK before calling xprt_alloc_slot) is causing
hangs in the case of NFS over UDP mounts.

Since neither the UDP or the RDMA transport mechanism use dynamic slot
allocation, we can skip grabbing the socket lock for those transports.
Add a new rpc_xprt_op to allow switching between the TCP and UDP/RDMA
case.

Note that the NFSv4.1 back channel assigns the slot directly
through rpc_run_bc_task, so we can ignore that case.

Reported-by: Dick Streefland <dick.streefland@altium.nl>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.1]
2012-09-07 11:43:49 -04:00
J. Bruce Fields d10f27a750 svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping
The rpc server tries to ensure that there will be room to send a reply
before it receives a request.

It does this by tracking, in xpt_reserved, an upper bound on the total
size of the replies that is has already committed to for the socket.

Currently it is adding in the estimate for a new reply *before* it
checks whether there is space available.  If it finds that there is not
space, it then subtracts the estimate back out.

This may lead the subsequent svc_xprt_enqueue to decide that there is
space after all.

The results is a svc_recv() that will repeatedly return -EAGAIN, causing
server threads to loop without doing any actual work.

Cc: stable@vger.kernel.org
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:39:19 -04:00
J. Bruce Fields f06f00a24d svcrpc: sends on closed socket should stop immediately
svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply.
However, the XPT_CLOSE won't be acted on immediately.  Meanwhile other
threads could send further replies before the socket is really shut
down.  This can manifest as data corruption: for example, if a truncated
read reply is followed by another rpc reply, that second reply will look
to the client like further read data.

Symptoms were data corruption preceded by svc_tcp_sendto logging
something like

	kernel: rpc-srv/tcp: nfsd: sent only 963696 when sending 1048708 bytes - shutting down socket

Cc: stable@vger.kernel.org
Reported-by: Malahal Naineni <malahal@us.ibm.com>
Tested-by: Malahal Naineni <malahal@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:38:59 -04:00
J. Bruce Fields be1e44441a svcrpc: fix BUG() in svc_tcp_clear_pages
Examination of svc_tcp_clear_pages shows that it assumes sk_tcplen is
consistent with sk_pages[] (in particular, sk_pages[n] can't be NULL if
sk_tcplen would lead us to expect n pages of data).

svc_tcp_restore_pages zeroes out sk_pages[] while leaving sk_tcplen.
This is OK, since both functions are serialized by XPT_BUSY.  However,
that means the inconsistency must be repaired before dropping XPT_BUSY.

Therefore we should be ensuring that svc_tcp_save_pages repairs the
problem before exiting svc_tcp_recv_record on error.

Symptoms were a BUG() in svc_tcp_clear_pages.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:38:44 -04:00
Linus Torvalds ac694dbdbc Merge branch 'akpm' (Andrew's patch-bomb)
Merge Andrew's second set of patches:
 - MM
 - a few random fixes
 - a couple of RTC leftovers

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits)
  rtc/rtc-88pm80x: remove unneed devm_kfree
  rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
  mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
  tmpfs: distribute interleave better across nodes
  mm: remove redundant initialization
  mm: warn if pg_data_t isn't initialized with zero
  mips: zero out pg_data_t when it's allocated
  memcg: gix memory accounting scalability in shrink_page_list
  mm/sparse: remove index_init_lock
  mm/sparse: more checks on mem_section number
  mm/sparse: optimize sparse_index_alloc
  memcg: add mem_cgroup_from_css() helper
  memcg: further prevent OOM with too many dirty pages
  memcg: prevent OOM with too many dirty pages
  mm: mmu_notifier: fix freed page still mapped in secondary MMU
  mm: memcg: only check anon swapin page charges for swap cache
  mm: memcg: only check swap cache pages for repeated charging
  mm: memcg: split swapin charge function into private and public part
  mm: memcg: remove needless !mm fixup to init_mm when charging
  mm: memcg: remove unneeded shmem charge type
  ...
2012-07-31 19:25:39 -07:00
Linus Torvalds 6dbb35b0a7 Merge tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull second wave of NFS client updates from Trond Myklebust:

 - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into
   separate modules.

 - Fix Oopses in the NFSv4 idmapper

 - Fix a deadlock whereby rpciod tries to allocate a new socket and ends
   up recursing into the NFS code due to memory reclaim.

 - Increase the number of permitted callback connections.

* tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: explicitly reject LOCK_MAND flock() requests
  nfs: increase number of permitted callback connections.
  SUNRPC: return negative value in case rpcbind client creation error
  NFS: Convert v4 into a module
  NFS: Convert v3 into a module
  NFS: Convert v2 into a module
  NFS: Keep module parameters in the generic NFS client
  NFS: Split out remaining NFS v4 inode functions
  NFS: Pass super operations and xattr handlers in the nfs_subversion
  NFS: Only initialize the ACL client in the v3 case
  NFS: Create a try_mount rpc op
  NFS: Remove the NFS v4 xdev mount function
  NFS: Add version registering framework
  NFS: Fix a number of bugs in the idmapper
  nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
  sunrpc: clarify comments on rpc_make_runnable
  pnfsblock: bail out partial page IO
2012-07-31 18:45:44 -07:00
Mel Gorman a564b8f039 nfs: enable swap on NFS
Implement the new swapfile a_ops for NFS and hook up ->direct_IO.  This
will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under
PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol
->connect() method.

PF_MEMALLOC should allow the allocation of struct socket and related
objects and the early (re)setting of SOCK_MEMALLOC should allow us to
receive the packets required for the TCP connection buildup.

[jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases]
[dfeng@redhat.com: Fix handling of multiple swap files]
[a.p.zijlstra@chello.nl: Original patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:48 -07:00
Linus Torvalds 08843b79fb Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from J. Bruce Fields:
 "This has been an unusually quiet cycle--mostly bugfixes and cleanup.
  The one large piece is Stanislav's work to containerize the server's
  grace period--but that in itself is just one more step in a
  not-yet-complete project to allow fully containerized nfs service.

  There are a number of outstanding delegation, container, v4 state, and
  gss patches that aren't quite ready yet; 3.7 may be wilder."

* 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits)
  NFSd: make boot_time variable per network namespace
  NFSd: make grace end flag per network namespace
  Lockd: move grace period management from lockd() to per-net functions
  LockD: pass actual network namespace to grace period management functions
  LockD: manage grace list per network namespace
  SUNRPC: service request network namespace helper introduced
  NFSd: make nfsd4_manager allocated per network namespace context.
  LockD: make lockd manager allocated per network namespace
  LockD: manage grace period per network namespace
  Lockd: add more debug to host shutdown functions
  Lockd: host complaining function introduced
  LockD: manage used host count per networks namespace
  LockD: manage garbage collection timeout per networks namespace
  LockD: make garbage collector network namespace aware.
  LockD: mark host per network namespace on garbage collect
  nfsd4: fix missing fault_inject.h include
  locks: move lease-specific code out of locks_delete_lock
  locks: prevent side-effects of locks_release_private before file_lock is initialized
  NFSd: set nfsd_serv to NULL after service destruction
  NFSd: introduce nfsd_destroy() helper
  ...
2012-07-31 14:42:28 -07:00
Linus Torvalds 1fad1e9a74 Merge tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
 "Features include:
   - More preparatory patches for modularising NFSv2/v3/v4.  Split out
     the various NFSv2/v3/v4-specific code into separate files
   - More preparation for the NFSv4 migration code
   - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold
     parameters
   - pNFS fast failover when the data servers are down
   - Various cleanups and debugging patches"

* tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits)
  nfs: fix fl_type tests in NFSv4 code
  NFS: fix pnfs regression with directio writes
  NFS: fix pnfs regression with directio reads
  sunrpc: clnt: Add missing braces
  nfs: fix stub return type warnings
  NFS: exit_nfs_v4() shouldn't be an __exit function
  SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors
  NFS: Split out NFS v4 client functions
  NFS: Split out the NFS v4 filesystem types
  NFS: Create a single nfs_clone_super() function
  NFS: Split out NFS v4 server creating code
  NFS: Initialize the NFS v4 client from init_nfs_v4()
  NFS: Move the v4 getroot code to nfs4getroot.c
  NFS: Split out NFS v4 file operations
  NFS: Initialize v4 sysctls from nfs_init_v4()
  NFS: Create an init_nfs_v4() function
  NFS: Split out NFS v4 inode operations
  NFS: Split out NFS v3 inode operations
  NFS: Split out NFS v2 inode operations
  NFS: Clean up nfs4_proc_setclientid() and friends
  ...
2012-07-30 19:16:57 -07:00
Stanislav Kinsbursky caea33da89 SUNRPC: return negative value in case rpcbind client creation error
Without this patch kernel will panic on LockD start, because lockd_up() checks
lockd_up_net() result for negative value.
From my pow it's better to return negative value from rpcbind routines instead
of replacing all such checks like in lockd_up().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.0]
2012-07-30 20:39:05 -04:00
Jeff Layton 5cf02d09b5 nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-07-30 18:55:59 -04:00
Jeff Layton 506026c3ec sunrpc: clarify comments on rpc_make_runnable
rpc_make_runnable is not generally called with the queue lock held, unless
it's waking up a task that has been sitting on a waitqueue. This is safe
when the task has not entered the FSM yet, but the comments don't really
spell this out.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 18:55:08 -04:00
Joe Perches cac5d07e3c sunrpc: clnt: Add missing braces
Add a missing set of braces that commit 4e0038b6b2
("SUNRPC: Move clnt->cl_server into struct rpc_xprt")
forgot.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.4]
2012-07-30 17:58:08 -04:00
Eric Dumazet ddbe503203 ipv6: add ipv6_addr_hash() helper
Introduce ipv6_addr_hash() helper doing a XOR on all bits
of an IPv6 address, with an optimized x86_64 version.

Use it in flow dissector, as suggested by Andrew McGregor,
to reduce hash collision probabilities in fq_codel (and other
users of flow dissector)

Use it in ip6_tunnel.c and use more bit shuffling, as suggested
by David Laight, as existing hash was ignoring most of them.

Use it in sunrpc and use more bit shuffling, using hash_32().

Use it in net/ipv6/addrconf.c, using hash_32() as well.

As a cleanup, use it in net/ipv4/tcp_metrics.c

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrew McGregor <andrewmcgr@gmail.com>
Cc: Dave Taht <dave.taht@gmail.com>
Cc: Tom Herbert <therbert@google.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-18 11:28:46 -07:00
Trond Myklebust 013448c59b SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors
The patch "SUNRPC: Add rpcauth_list_flavors()" introduces a new error
path in gss_mech_list_pseudoflavors, but fails to release the spin lock.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17 17:02:57 -04:00
Chuck Lever 6a1a1e34dc SUNRPC: Add rpcauth_list_flavors()
The gss_mech_list_pseudoflavors() function provides a list of
currently registered GSS pseudoflavors.  This list does not include
any non-GSS flavors that have been registered with the RPC client.
nfs4_find_root_sec() currently adds these extra flavors by hand.

Instead, nfs4_find_root_sec() should be looking at the set of flavors
that have been explicitly registered via rpcauth_register().  And,
other areas of code will soon need the same kind of list that
contains all flavors the kernel currently knows about (see below).

Rather than cloning the open-coded logic in nfs4_find_root_sec() to
those new places, introduce a generic RPC function that generates a
full list of registered auth flavors and pseudoflavors.

A new rpc_authops method is added that lists a flavor's
pseudoflavors, if it has any.  I encountered an interesting module
loader loop when I tried to get the RPC client to invoke
gss_mech_list_pseudoflavors() by name.

This patch is a pre-requisite for server trunking discovery, and a
pre-requisite for fixing up the in-kernel mount client to do better
automatic security flavor selection.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16 15:12:15 -04:00
NeilBrown 200724a707 SUNRPC/cache: fix reporting of expired cache entries in 'content' file.
Entries that are in a sunrpc cache but are not valid should be reported
with a leading '#' so they look like a comment.
Commit  d202cce896 (sunrpc: never return expired entries in sunrpc_cache_lookup)
broke this for expired entries.

This particularly applies to entries that have been replaced by newer entries.
sunrpc_cache_update sets the expiry of the replaced entry to '0', but it
remains in the cache until the next 'cache_clean'.
The result is that if you

  echo 0 2000000000 1 0 > /proc/net/rpc/auth.unix.gid/channel

several times, then

  cat /proc/net/rpc/auth.unix.gid/content

It will display multiple entries for the one uid, which is at least confusing:

  #uid cnt: gids...
  0 1: 0
  0 1: 0
  0 1: 0

With this patch, expired entries are marked as comments so you get

  #uid cnt: gids...
  0 1: 0
  # 0 1: 0
  # 0 1: 0

These expired entries will never be seen by cache_check() as they are always
*after* a non-expired entry with the same key - so the extra check is only
needed in c_show()

Signed-off-by:  NeilBrown <neilb@suse.de>

--
It's not a big problem, but it had me confused for a while, so it could
well confuse others.
Thanks,
NeilBrown
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-12 11:20:28 -04:00
Ben Hutchings 2c53040f01 net: Fix (nearly-)kernel-doc comments for various functions
Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 23:13:45 -07:00
David S. Miller 60d354ebeb sunrpc: Don't do a dst_confirm() on an input routes.
xs_udp_data_ready() is operating on received packets, and tries to
do a dst_confirm() on the dst attached to the SKB.

This isn't right, dst confirmation is for output routes, not input
rights.  It's for resetting the timers on the nexthop neighbour entry
for the route, indicating that we've got good evidence that we've
successfully reached it.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 01:02:43 -07:00
David S. Miller b26d344c6b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/caif/caif_hsi.c
	drivers/net/usb/qmi_wwan.c

The qmi_wwan merge was trivial.

The caif_hsi.c, on the other hand, was not.  It's a conflict between
1c385f1fdf ("caif-hsi: Replace platform
device with ops structure.") in the net-next tree and commit
39abbaef19 ("caif-hsi: Postpone init of
HIS until open()") in the net tree.

I did my best with that one and will ask Sjur to check it out.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-28 17:37:00 -07:00
Trond Myklebust 140150dbb1 SUNRPC: Remove unused function xdr_encode_pages
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:49 -04:00
Trond Myklebust f8bb7f0854 SUNRPC: Clean up xdr_enter_page
Use the xdr_align_pages() helper

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:46 -04:00
Trond Myklebust 3994ee6fbf SUNRPC: Clean up xdr_read_pages
Move the page alignment code into a separate helper.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:46 -04:00
Trond Myklebust bd00f84bc5 SUNRPC: Simplify the end-of-buffer calculation in xdr_read_pages
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:46 -04:00