* get rid of fake struct file/struct dentry in __blkdev_get()
* merge __blkdev_get() and do_open()
* get rid of flags argument of blkdev_get()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add some thin wrappers around ocfs2_insert_extent() for each of the 3
different btree types, ocfs2_inode_insert_extent(),
ocfs2_xattr_value_insert_extent() and ocfs2_xattr_tree_insert_extent(). The
last is for the xattr index btree, which will be used in a followup patch.
All the old callers in file.c etc will call ocfs2_dinode_insert_extent(),
while the other two handle the xattr issue. And the init of extent tree are
handled by these functions.
When storing xattr value which is too large, we will allocate some clusters
for it and here ocfs2_extent_list and ocfs2_extent_rec will also be used. In
order to re-use the b-tree operation code, a new parameter named "private"
is added into ocfs2_extent_tree and it is used to indicate the root of
ocfs2_exent_list. The reason is that we can't deduce the root from the
buffer_head now. It may be in an inode, an ocfs2_xattr_block or even worse,
in any place in an ocfs2_xattr_bucket.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
ocfs2/cluster/netdebug.c: fix warning
fs/ocfs2/cluster/netdebug.c:154: warning: format '%lu' expects
type 'long unsigned int', but argument 17 has type 'suseconds_t'
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Commit 0f475b2abe (ocfs2/net: Silence build
warnings) made sense as far as it fixed compile warnings, but it was not
required that it made the functions global.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
The configfs operations ->make_item() and ->make_group() currently
return a new item/group. A return of NULL signifies an error. Because
of this, -ENOMEM is the only return code bubbled up the stack.
Multiple folks have requested the ability to return specific error codes
when these operations fail. This patch adds that ability by changing the
->make_item/group() ops to return ERR_PTR() values. These errors are
bubbled up appropriately. NULL returns are changed to -ENOMEM for
compatibility.
Also updated are the in-kernel users of configfs.
This is a rework of reverted commit 11c3b79218.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
The configfs operations ->make_item() and ->make_group() currently
return a new item/group. A return of NULL signifies an error. Because
of this, -ENOMEM is the only return code bubbled up the stack.
Multiple folks have requested the ability to return specific error codes
when these operations fail. This patch adds that ability by changing the
->make_item/group() ops to return an int.
Also updated are the in-kernel users of configfs.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
suseconds_t is type long on most arches except sparc64 where it is type int.
This patch silences the following warnings that are generated when building
on it.
netdebug.c: In function 'nst_seq_show':
netdebug.c:152: warning: format '%lu' expects type 'long unsigned int', but argument 13 has type 'suseconds_t'
netdebug.c:152: warning: format '%lu' expects type 'long unsigned int', but argument 15 has type 'suseconds_t'
netdebug.c:152: warning: format '%lu' expects type 'long unsigned int', but argument 17 has type 'suseconds_t'
netdebug.c: In function 'sc_seq_show':
netdebug.c:332: warning: format '%lu' expects type 'long unsigned int', but argument 19 has type 'suseconds_t'
netdebug.c:332: warning: format '%lu' expects type 'long unsigned int', but argument 21 has type 'suseconds_t'
netdebug.c:332: warning: format '%lu' expects type 'long unsigned int', but argument 23 has type 'suseconds_t'
netdebug.c:332: warning: format '%lu' expects type 'long unsigned int', but argument 25 has type 'suseconds_t'
netdebug.c:332: warning: format '%lu' expects type 'long unsigned int', but argument 27 has type 'suseconds_t'
netdebug.c:332: warning: format '%lu' expects type 'long unsigned int', but argument 29 has type 'suseconds_t'
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
ocfs2 needs to call out to the hb_ctl program at unmount for all cluster
stacks. The first step is to move the hb_ctl_path sysctl out of the
o2cb code and into the generic stack glue.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
This patch silences the build warnings concerning o2net_init_nst()
and friends when building without CONFIG_DEBUG_FS enabled.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
This patch silences the build warnings concerning o2net_debugfs_init()
and friends when building without CONFIG_DEBUG_FS enabled.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Commit 52f7c21b61 was intended to move
/sys/o2cb to /sys/fs/o2cb, providing /sys/o2cb as a symlink for
backwards compatibility. However, the merge apparently added the
symlink but failed to move the directory, resulting in a duplicate
filename error. It's a one-line change that was missing.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
This patch exposes o2net information via debugfs. The information includes
the list of sockets (sock_containers) as well as the list of outstanding
messages (send_tracking). Useful for o2dlm debugging.
(This patch is derived from an earlier one written by Zach Brown that
exposed the same information via /proc.)
[Mark: checkpatch fixes]
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
We keep seeing bug reports related to NULL pointer derefs in
o2net_set_nn_state(). When I originally wrote up the configurable timeout
patch, I had tried to plan for multiple clusters. This was silly.
The timeout routines all use o2nm_single_cluster so there's no point in
passing an argument at all. This patch removes the arguments and kills those
bugs dead.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
/sys/fs is where we really want file system specific sysfs objects.
Ocfs2-tools has been updated to look in /sys/fs/o2cb. We can maintain
backwards compatibility with old ocfs2-tools by using a sysfs symlink. After
some time (2 years), the symlink can be safely removed. This patch also adds
documentation to make it easier for people to figure out what /sys/fs/o2cb
is used for.
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Currently, o2net connects to a node on hb_up and disconnects on
hb_down and net timeout.
It disconnects on net timeout is ok, but it should attempt to
reconnect back. This is because sometimes nodes get overloaded
enough that the network connection breaks but the disk hb does not.
And if we get into that situation, we either fence (unnecessarily)
or wait for its disk hb to die (and sometimes hang in the process).
So in this updated scheme, when the network disconnects, we keep
attempting to reconnect till we succeed or we get a disk hb down
event.
If the other node is really dead, then we will eventually get a
node down event. If not, we should be able to connect again and
continue.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
In some situations, ocfs2_set_nn_state might get called with sc = NULL and
valid = 0. If sc = NULL, we can't dereference it to get the o2nm_node
member. Instead, do what o2net_initialize_handshake does and use NULL when
calling o2net_reconnect_delay and o2net_idle_timeout.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This patchset moves le*_add_cpu and be*_add_cpu functions from OCFS2 to core
header (1st), converts ext3 filesystem to this API (2nd) and replaces XFS
different named functions with new ones (3rd).
There are many places where these functions will be useful. Just look at:
grep -r 'cpu_to_[ble12346]*([ble12346]*_to_cpu.*[-+]' linux-src/ Patch for
ext3 is an example how conversions will probably look like.
This patch:
- move inline functions which add native byte order variable to
little/big endian variable to core header
* le16_add_cpu(__le16 *var, u16 val)
* le32_add_cpu(__le32 *var, u32 val)
* le64_add_cpu(__le64 *var, u64 val)
* be32_add_cpu(__be32 *var, u32 val)
- add for completeness:
* be16_add_cpu(__be16 *var, u16 val)
* be64_add_cpu(__be64 *var, u64 val)
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: Timothy Shimmin <tes@sgi.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, when ocfs2 nodes connect via TCP, they advertise their
compatibility level. If the versions do not match, two nodes cannot speak
to each other and they disconnect. As a result, this provides no forward or
backwards compatibility.
This patch implements a simple protocol negotiation at the dlm level by
introducing a major/minor version number scheme for entities that
communicate. Specifically, o2dlm has a major/minor version for interaction
with o2dlm on other nodes, and ocfs2 itself has a major/minor version for
interacting with the filesystem on other nodes.
This will allow rolling upgrades of ocfs2 clusters when changes to the
locking or network protocols can be done in a backwards compatible manner.
In those cases, only the minor number is changed and the negotatied protocol
minor is returned from dlm join. In the far less likely event that a
required protocol change makes backwards compatibility impossible, we simply
bump the major number.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The userspace ABI of ocfs2's internal cluster stack (o2cb) was broken by
commit c60b717879 "kset: convert ocfs2 to
use kset_create". Specifically, the '/sys/o2cb' kset was moved to
'/sys/fs/o2cb'. This breaks all ocfs2 tools and renders the
filesystem unmountable.
This fix moves '/sys/o2cb' back where it belongs.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Bump the printed version to 1.5.0. This helps us quickly identify which
version of Ocfs2 a bug filer is running.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Lots of people are having trouble with the default timeouts, which are too
low. These new values are derived from an informal survey taken on
ocfs2-users, as well as data from bug reports. This should reduce the amount
of cluster disconnects and subsequent fencing seen during normal workloads.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The meta lock now covers both meta data and data, so this just removes the
now-redundant data lock.
Combining locks saves us a round of lock mastery per inode and one less lock
to ping between nodes during read/write.
We don't lose much - since meta locks were always held before a data lock
(and at the same level) ordered writeout mode (the default) ensured that
flushing for the meta data lock also pushed out data anyways.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The node maps that are set/unset by these votes are no longer relevant, thus
we can remove the mount and umount votes. Since those are the last two
remaining votes, we can also remove the entire vote infrastructure.
The vote thread has been renamed to the downconvert thread, and the small
amount of functionality related to managing it has been moved into
fs/ocfs2/dlmglue.c. All references to votes have been removed or updated.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>