Commit Graph

66 Commits

Author SHA1 Message Date
Trond Myklebust
5bb89b4702 NFSv4.1/pnfs: Separate out metadata and data consistency for pNFS
The LAYOUTCOMMIT operation means different things to different layout types.
For blocks and objects, it is both a data and metadata consistency operation.
For files and flexfiles, it is only a metadata consistency operation.

This patch separates out the 2 cases, allowing the files/flexfiles layout
drivers to optimise away the data consistency calls to layoutcommit.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-03-27 12:39:38 -04:00
Trond Myklebust
84a80f62f7 NFSv4.1: Convert pNFS deviceid to use kfree_rcu()
Use of synchronize_rcu() when unmounting and potentially freeing a lot
of deviceids is problematic. There really is no reason why we can't just
use kfree_rcu() here.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-03-27 12:32:24 -04:00
Peng Tao
48d635f14a nfs: add nfs_pgio_current_mirror helper
Let it return current nfs_pgio_mirror in use depending on pg_mirror_count.
For read, we always use pg_mirrors[0], so this effectively gives us freedom
to use pg_mirror_idx to track the actual mirror to read from through out the
IO stack.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03 11:06:48 -08:00
Weston Andros Adamson
a7d42ddb30 nfs: add mirroring support to pgio layer
This patch adds mirrored write support to the pgio layer. The default
is to use one mirror, but pgio callers may define callbacks to change
this to any value up to the (arbitrarily selected) limit of 16.

The basic idea is to break out members of nfs_pageio_descriptor that cannot
be shared between mirrored DSes and put them in a new structure.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
2015-02-03 11:06:45 -08:00
Weston Andros Adamson
180bb5ec06 pnfs: release lseg in pnfs_generic_pg_cleanup
This is needed to support mirrored writes - the first write can't just
trash the lseg, we need to keep it around until all mirrors have
written.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
2015-02-03 11:06:44 -08:00
Linus Torvalds
848a552893 Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
Pull email address change from Boaz Harrosh.

* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  Boaz Harrosh - fix email in Documentation
  Boaz Harrosh - Fix broken email address
  MAINTAINERS: Change Boaz Harrosh's email
2014-10-21 12:53:45 -07:00
Boaz Harrosh
aa281ac631 Boaz Harrosh - Fix broken email address
I no longer have access to the Panasas email.
So change to an email that can always reach me.

Signed-off-by: Boaz Harrosh <ooo@electrozaur.com>
2014-10-19 20:22:32 +03:00
Christoph Hellwig
fd41b4748b pnfs/objlayout: fix endianess annotation in objio_alloc_deviceid_node
The kbuild test robot complained about a new sparse warning in
objio_alloc_deviceid_node, but it turns out that this was just a moved
reference to an existing variable.  Fix it to have the right big endian
annotated type.

Note that there are some other endianess issues in this file that I didn't
bother to sort out as they involve global headers.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-12 13:20:43 -04:00
Christoph Hellwig
661373b13d pnfs: factor GETDEVICEINFO implementations
Add support to the common pNFS core to issue GETDEVICEINFO calls on
a device ID cache miss.  The code is taken from the well debugged
file layout implementation and calls out to the layoutdriver through
a new alloc_deviceid_node method.  The calling conventions for
nfs4_find_get_deviceid are changed so that all information needed to
send a GETDEVICEINFO request is passed to the common code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10 12:47:03 -07:00
Weston Andros Adamson
c65e6254ca nfs: remove unused writeverf code
Remove duplicate writeverf structure from merge of nfs_pgio_header and
nfs_pgio_data and remove writeverf related flags and logic to handle
more than one RPC per nfs_pgio_header.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24 18:47:00 -04:00
Weston Andros Adamson
d45f60c678 nfs: merge nfs_pgio_data into _header
struct nfs_pgio_data only exists as a member of nfs_pgio_header, but is
passed around everywhere, because there used to be multiple _data structs
per _header. Many of these functions then use the _data to find a pointer
to the _header.  This patch cleans this up by merging the nfs_pgio_data
structure into nfs_pgio_header and passing nfs_pgio_header around instead.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24 18:47:00 -04:00
Weston Andros Adamson
823b0c9d98 nfs: rename members of nfs_pgio_data
Rename "verf" to "writeverf" and "pages" to "page_array" to prepare for
merge of nfs_pgio_data and nfs_pgio_header.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24 18:46:59 -04:00
Trond Myklebust
0aa61e78a0 pNFS: Handle allocation errors correctly in objlayout_alloc_layout_hdr()
Return the NULL pointer when the allocation fails.

Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29 20:17:17 -04:00
Weston Andros Adamson
0f9c429eca nfs: chain calls to pg_test
Now that pg_test can change the size of the request (by returning a non-zero
size smaller than the request), pg_test functions that call other
pg_test functions must return the minimum of the result - or 0 if any fail.

Also clean up the logic of some pg_test functions so that all checks are
for contitions where coalescing is not possible.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29 11:11:47 -04:00
Weston Andros Adamson
b4fdac1a51 nfs: modify pg_test interface to return size_t
This is a step toward allowing pg_test to inform the the
coalescing code to reduce the size of requests so they may fit in
whatever scheme the pg_test callback wants to define.

For now, just return the size of the request if there is space, or 0
if there is not.  This shouldn't change any behavior as it acts
the same as when the pg_test functions returned bool.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29 11:11:43 -04:00
Anna Schumaker
9c7e1b3d50 NFS: Create a common read and write data struct
At this point, the only difference between nfs_read_data and
nfs_write_data is the write verifier.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-28 18:12:47 -04:00
Andy Adamson
52fcac988a NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 15:34:45 -04:00
Trond Myklebust
cd5875fefe NFSv4.1: Use layout credentials for get_deviceinfo calls
This is not strictly needed, since get_deviceinfo is not allowed to
return NFS4ERR_ACCESS or NFS4ERR_WRONG_CRED, but lets do it anyway
for consistency with other pNFS operations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-06 16:24:37 -04:00
Masanari Iida
a895d57da0 treewide: Fix typo in printks
Correct spelling typos in printk and comments.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-04-12 15:21:36 +02:00
fanchaoting
5a12cca697 umount oops when remove blocklayoutdriver first
now pnfs client uses block layout, maybe we can remove
blocklayoutdriver first. if we umount later,
it can cause oops in unset_pnfs_layoutdriver.
because nfss->pnfs_curr_ld->clear_layoutdriver is invalid.

reproduce it:
 modprobe  blocklayoutdriver
 mount -t nfs4 -o minorversion=1 pnfsip:/ /mnt/
 rmmod blocklayoutdriver
 umount /mnt

then you can see following

CPU 0
Pid: 17023, comm: umount.nfs4 Tainted: GF          O 3.7.0-rc6-pnfs #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
RIP: 0010:[<ffffffffa04cfe6d>]  [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4]
RSP: 0018:ffff8800022d9e48  EFLAGS: 00010286
RAX: ffffffffa04a1b00 RBX: ffff88000b013800 RCX: 0000000000000001
RDX: ffffffff81ae8ee0 RSI: ffff880001ee94b8 RDI: ffff88000b013800
RBP: ffff8800022d9e58 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880001ee9400
R13: ffff8800105978c0 R14: 00007fff25846c08 R15: 0000000001bba550
FS:  00007f45ae7f0700(0000) GS:ffff880012c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa04a1b38 CR3: 0000000002c0c000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process umount.nfs4 (pid: 17023, threadinfo ffff8800022d8000, task ffff880006e48aa0)
Stack:
ffff8800105978c0 ffff88000b013800 ffff8800022d9e78 ffffffffa04cd0ce
ffff8800022d9e78 ffff88000b013800 ffff8800022d9ea8 ffffffffa04755a7
ffff8800022d9ea8 ffff880002f96400 ffff88000b013800 ffff880002f96400
Call Trace:
[<ffffffffa04cd0ce>] nfs4_destroy_server+0x1e/0x30 [nfsv4]
[<ffffffffa04755a7>] nfs_free_server+0xb7/0x150 [nfs]
[<ffffffffa047d4d5>] nfs_kill_super+0x35/0x40 [nfs]
[<ffffffff81178d35>] deactivate_locked_super+0x45/0x70
[<ffffffff8117986a>] deactivate_super+0x4a/0x70
[<ffffffff81193ee2>] mntput_no_expire+0xd2/0x130
[<ffffffff81194d62>] sys_umount+0x72/0xe0
[<ffffffff8154af59>] system_call_fastpath+0x16/0x1b
Code: 06 e1 b8 ea ff ff ff eb 9e 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 8b 87 80 03 00 00 48 89 fb 48 85 c0 74 29 <48> 8b 40 38 48 85 c0 74 02 ff d0 48 8b 03 3e ff 48 04 0f 94 c2
RIP  [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4]
RSP <ffff8800022d9e48>
CR2: ffffffffa04a1b38
---[ end trace 29f75aaedda058bf ]---

Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-02-17 15:40:15 -05:00
Trond Myklebust
eba24e1fe5 NFSv4.1: Remove unused function last_byte_offset
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:38 -05:00
Trond Myklebust
2e928e4878 NFSv4.1: Declare osd_pri_2_pnfs_err(), objio_init_read/write to be static
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-16 12:38:00 -04:00
Peng Tao
6296556f0b NFS41: send real write size in layoutget
For buffer write, block layout client scan inode mapping to find
next hole and use offset-to-hole as layoutget length. Object
layout client uses offset-to-isize as layoutget length.

For direct write, both block layout and object layout use dreq->bytes_left.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-08 19:32:22 -04:00
Boaz Harrosh
7de6e28417 pnfs-obj: Better IO pattern in case of unaligned offset
Depending on layout and ARCH, ORE has some limits on max IO sizes
which is communicated on (what else) ore_layout->max_io_length,
which is always stripe aligned.
This was considered as the pg_test boundary for splitting and starting
a new IO.

But in the case of a long IO where the start offset is not aligned
what would happen is that both end of IO[N] and start of IO[N+1]
would be unaligned, causing each IO boundary parity unit to be
calculated and written twice.

So what we do in this patch is split the very start of an unaligned
IO, up to a stripe boundary, and then next IO's can continue fully
aligned til the end.

We might be sacrificing the case where the full unaligned IO would
fit within a single max_io_length, but the sacrifice is well worth
the elimination of double calculation and parity units IO.
Actually the sacrificing is marginal and is almost unmeasurable.

TODO:
	If we know the total expected linear segment that will
	be received, at pg_init, we could use that information
	in many places:
	1. blocks-layout get_layout write segment size
	2. Better mds-threshold
	3. In above situation for a better clean split

	I will do this in future submission.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-08-02 17:42:51 -04:00
Boaz Harrosh
c999ff6802 pnfs-obj: Fix __r4w_get_page when offset is beyond i_size
It is very common for the end of the file to be unaligned on
stripe size. But since we know it's beyond file's end then
the XOR should be preformed with all zeros.

Old code used to just read zeros out of the OSD devices, which is a great
waist. But what scares me more about this situation is that, we now have
pages attached to the file's mapping that are beyond i_size. I don't
like the kind of bugs this calls for.

Fix both birds, by returning a global zero_page, if offset is beyond
i_size.

TODO:
	Change the API to ->__r4w_get_page() so a NULL can be
	returned without being considered as error, since XOR API
	treats NULL entries as zero_pages.

[Bug since 3.2. Should apply the same way to all Kernels since]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-07-20 11:50:31 +03:00