Currently gfs2 issues barrier unconditionally. There are various reasons
to disable them, be that just for testing or for stupid devices flushing
large battert backed caches. Add a nobarrier option that matches xfs and
btrfs for this. Also add a symmetric barrier option to turn it back on
at remount time.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
It's not necessary to do any 64bit division for the statfs sync code, so
remove it.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
GFS2 now has three new mount options, statfs_quantum, quota_quantum and
statfs_percent. statfs_quantum and quota_quantum simply allow you to
set the tunables of the same name. Setting setting statfs_quantum to 0
will also turn on the statfs_slow tunable. statfs_percent accepts an
integer between 0 and 100. Numbers between 1 and 100 will cause GFS2 to
do any early sync when the local number of blocks free changes by at
least statfs_percent from the totoal number of blocks free. Setting
statfs_percent to 0 disables this.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
These two functions are altered so that gfs2_quota_sync may
in future be called directly from the VFS. The GFS2 superblock
changes to a VFS super block and there is an addition of an int
argument which is currently ignored.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We have a long term plan to use the "-o meta" flag to GFS2 mounts to
access the alternate root which is used to store metadata for a GFS2
filesystem. This will allow us to eventually remove support for the
gfs2meta filesystem type (which is in any case just a "front end" to
the gfs2 filesystem type with the meta/master root).
Currently the "-o meta" option is only taken into account on the
initial mount of the filesystem. Subsequent mounts of the same
filesystem (i.e. on the same device) result in basically the same
as bind mounting the root of the original mount.
This patch changes that by using what is more or less a copy
of get_sb_bdev() and extending it so that it will take into
account the alternate root in all cases. The main difference
is that we have to parse the mount options a bit earlier. We can
then use them to select the appropriate root towards the end of
the function.
In addition this also fixes a bug where it was possible (but certainly
not desirable) to set different ro/rw options for the meta root
when mounted via the gfs2meta fs compared with the original mount.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
There is a potential race in the inode deallocation code if two
nodes try to deallocate the same inode at the same time. Most of
the issue is solved by the iopen locking. There is still a small
window which is not covered by the iopen lock. This patches fixes
that and also makes the deallocation code more robust in the face of
any errors in the rgrp bitmaps, or erroneous iopen callbacks from
other nodes.
This does introduce one extra disk read, but that is generally not
an issue since its the same block that must be written to later
in the deallocation process. The total disk accesses therefore stay
the same,
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The inum structure used throughout GFS2 has two fields. One
no_addr is the disk block number of the inode in question and
is used everywhere as the inode number. The other, no_formal_ino,
is used only as the generation number for NFS.
Historically the no_formal_ino field was set using a complicated
system of one global and one per-node file containing inode numbers
in order to ensure that each no_formal_ino was unique. Also this
code made no provision for what would happen when eventually the
(64 bit) numbers ran out. Now I know that is pretty unlikely to
happen given the large space of numbers, but it is possible
nevertheless.
The only guarantee required for no_formal_ino is that, for any
single inode, the same number doesn't get reused too quickly.
We already have a generation number which is kept in the inode
and initialised from a counter in the resource group (almost
no overhead, since we have to touch the resource group anyway
in order to allocate an inode in the first place). Aside from
ensuring that we never use the value 0 in the no_formal_ino
field, we can use that counter directly.
As a result of that change, we lose about 200 lines of code and
also gain about 10 creates/sec on the postmark benchmark (on
my test machine).
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Use the more conventional name for the extended attribute
support code. Update all the places which care.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch adds "-o errors=panic" and "-o errors=withdraw" to the
gfs2 mount options. The "errors=withdraw" option is today's
current behaviour, meaning to withdraw from the file system if a
non-serious gfs2 error occurs. The new "errors=panic" option
tells gfs2 to force a kernel panic if a non-serious gfs2 file
system error occurs. This may be useful, for example, where
fabric-level fencing is used that has no way to reboot (such as
fence_scsi).
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We already have an offline uevent (used when a withdraw occurs)
but no online uevent. This adds an online uevent so that userspace
will be able to detect a successful mount by means other than
not receiving a remove event after the add & recovery (change)
uevents.
It has also been added to the remount path as well - we can't use
a change uevent there as older GFS2 userspace acts on change uevents
according to the state that it thinks the fs is in, so we can't
easily add any new ones.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
When a file is deleted from a gfs2 filesystem on one node, a dcache
entry for it may still exist on other nodes in the cluster. If this
happens, gfs2 will be unable to free this file on disk. Because of this,
it's possible to have a gfs2 filesystem with no files on it and no free
space. With this patch, when a node receives a callback notifying it
that the file is being deleted on another node, it schedules a new
workqueue thread to remove the file's dcache entry.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
GFS2 wasn't syncing its statfs info on grows. This causes a problem
when you grow the filesystem on multiple nodes. GFS2 would calculate
the new space based on the resource groups (which are always current),
and then assume that the filesystem had grown the from the existing
statfs size. If you grew the filesystem on two different nodes in a
short time, the second node wouldn't see the statfs size change from the
first node, and would assume that it was grown by a larger amount than
it was. When all these changes were synced out, the total fileystem
size would be incorrect (the first grow would be counted twice).
This patch syncs makes GFS2 read in the statfs changes from disk before
a grow, and write them out after the grow, while the master statfs inode
is locked.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Move BKL into ->put_super from the only caller. A couple of
filesystems had trivial enough ->put_super (only kfree and NULLing of
s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most
of them probably don't need it, but I'd rather sort that out individually.
Preferably after all the other BKL pushdowns in that area.
[AV: original used to move lock_super() down as well; these changes are
removed since we don't do lock_super() at all in generic_shutdown_super()
now]
[AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
mount.c only contained a single function, so is not really
worth retaining on its own. All of the super related code
is now either in super.c or ops_fstype.c
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is the big patch that I've been working on for some time
now. There are many reasons for wanting to make this change
such as:
o Reducing overhead by eliminating duplicated fields between structures
o Simplifcation of the code (reduces the code size by a fair bit)
o The locking interface is now the DLM interface itself as proposed
some time ago.
o Fewer lookups of glocks when processing replies from the DLM
o Fewer memory allocations/deallocations for each glock
o Scope to do further optimisations in the future (but this patch is
more than big enough for now!)
Please note that (a) this patch relates to the lock_dlm module and
not the DLM itself, that is still a separate module; and (b) that
we retain the ability to build GFS2 as a standalone single node
filesystem with out requiring the DLM.
This patch needs a lot of testing, hence my keeping it I restarted
my -git tree after the last merge window. That way, this has the maximum
exposure before its merged. This is (modulo a few minor bug fixes) the
same patch that I've been posting on and off the the last three months
and its passed a number of different tests so far.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This reverts commit 78802499912f1ba31ce83a94c55b5a980f250a43.
The original patch is causing problems in relation to order of
operations at umount in relation to jdata files. I need to fix
this a different way.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
There was a use-after-free with the GFS2 super block during
umount. This patch moves almost all of the umount code from
->put_super into ->kill_sb, the only bit that cannot be moved
being the glock hash clearing which has to remain as ->put_super
due to umount ordering requirements. As a result its now obvious
that the kfree is the final operation, whereas before it was
hidden in ->put_super.
Also gfs2_jindex_free is then only referenced from a single file
so thats moved and marked static too.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The functions which are being moved can all be marked
static in their new locations, since they only have
a single caller each. Their new locations are more
logical than before and some of the functions are
small enough that the compiler might well inline them.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
gfs2_lock_fs_check_clean() should not be calling gfs2_jindex_hold()
since it doesn't work like rindex hold, despite the comment. That
allows gfs2_jindex_hold() to be moved into ops_fstype.c where it
can be made static.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch moves the final field so that we can get rid
of struct gfs2_rgrpd_host, as promised some time ago. Also
by rearranging the fields slightly, we are able to reduce
the size of the gfs2_rgrpd structure at the same time.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch moved the i_size field from the gfs2_dinode_host and
following the ext3 convention renames it i_disksize.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>