The range check for b-tree level parameter in nilfs_btree_root_broken()
is wrong; it accepts the case of "level == NILFS_BTREE_LEVEL_MAX" even
though the level is limited to values in the range of 0 to
(NILFS_BTREE_LEVEL_MAX - 1).
Since the level parameter is read from storage device and used to index
nilfs_btree_path array whose element count is NILFS_BTREE_LEVEL_MAX, it
can cause memory overrun during btree operations if the boundary value
is set to the level parameter on device.
This fixes the broken sanity check and adds a comment to clarify that
the upper bound NILFS_BTREE_LEVEL_MAX is exclusive.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add code to check sizes of on-disk data of metadata files such as inode
size, segment usage size, DAT entry size, and checkpoint size. Although
these sizes are read from disk, the current implementation doesn't check
them.
If these sizes are not sane on disk, it can cause out-of-range access to
metadata or memory access overrun on metadata block buffers due to
overflow in sundry calculations.
Both lower limit and upper limit of metadata sizes are verified to
prevent these issues.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With this ioctl the segment usage entries in the SUFILE can be updated
from userspace.
This is useful, because it allows the userspace GC to modify and update
segment usage entries for specific segments, which enables it to avoid
unnecessary write operations.
If a segment needs to be cleaned, but there is no or very little
reclaimable space in it, the cleaning operation basically degrades to a
useless moving operation. In the end the only thing that changes is the
location of the data and a timestamp in the segment usage information.
With this ioctl the GC can skip the cleaning and update the segment
usage entries directly instead.
This is basically a shortcut to cleaning the segment. It is still
necessary to read the segment summary information, but the writing of
the live blocks can be skipped if it's not worth it.
[konishi.ryusuke@lab.ntt.co.jp: add description of NILFS_IOCTL_SET_SUINFO ioctl]
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
other BUG variant in a static inline (i.e. not in a #define) then
that header really should be including <linux/bug.h> and not just
expecting it to be implicitly present.
We can make this change risk-free, since if the files using these
headers didn't have exposure to linux/bug.h already, they would have
been causing compile failures/warnings.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This adds a new ioctl command which limits range of segment to be
allocated. This is intended to gather data whithin a range of the
partition before shrinking the filesystem, or to control new log
location for some purpose.
If a range is specified by the ioctl, segment allocator of nilfs tries
to allocate new segments from the range unless no free segments are
available there.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
The size of super root structure depends on inode size, so
NILFS_SR_BYTES macro should be a function of the inode size. This
fixes the issue.
Even though a different size value will be written for a possible
future filesystem with extended inode, but fortunately this does not
break disk format compatibility.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
move NILFS_SUPER_MAGIC macro to linux/magic.h from linux/nilfs2_fs.h
in the same manner as other filesystem magic number defined in the file.
Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This records the number of used blocks per checkpoint in each
checkpoint entry of cpfile. Even though userland tools can get the
block count via nilfs_get_cpinfo ioctl, it was not updated by the
nilfs2 kernel code. This fixes the issue and makes it available for
userland tools to calculate used amount per checkpoint.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Jiro SEKIBA <jir@unicus.jp>
This is a similar change to those in ext2/ext3 codebase (commit
40a063f669 and a4ae309486, respectively).
The addition of 64k block capability in the rec_len_from_disk and
rec_len_to_disk functions added a bit of math overhead which slows
down file create workloads needlessly when the architecture cannot
even support 64k blocks. This will cut the corner.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Replaces uses of own inode flags (i.e. NILFS_SECRM_FL, NILFS_UNRM_FL,
NILFS_COMPR_FL, and so forth) with common inode flags, and removes the
own flag declarations.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This allows other projects to carry copies of the header file related
to ABI and disk format (i.e. "nilfs2_fs.h") without it or distributors
having to worry about effects on the project's overall license terms.
It's also desired for switching the license of nilfs library to LGPL.
Jiro SEKIBA pointed out these license issues (Message-ID:
<87tylo7msw.wl%jir@sekiba.com>), and he suggested switching license of
the library and nilfs2_fs.h to GNU Lesser General Public License. We
take in his suggestion to avoid the license issues.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Jiro SEKIBA <jir@unicus.jp>
Cc: linux-nilfs <linux-nilfs@vger.kernel.org>
This flag is a fake used to distinguish type of super block instance.
And, it got obsolete by the unification of sb.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
On-memory inode structures of nilfs have a member "i_cno" which stores
a checkpoint number related to the inode. For gc-inodes, this field
indicates version of data each gc-inode caches for GC. Log writer
temporarily uses "i_cno" to transfer the latest checkpoint number.
This stops the latter use and lets only gc-inodes use it.
The purpose of this patch is to allow the successive change use
"i_cno" for inode lookup.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Compatibility of nilfs partitions is now managed with three feature
sets. This changes old compatibility check with revision number so
that it can accept future revisions.
Note that we can stop support of experimental versions of nilfs that
doesn't know the feature sets by incrementing NILFS_CURRENT_REV. We
don't have to do it soon, but it would be a possible option whenever
the need arises.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This inserts sanity check that refuses to mount a filesystem with
unsupported block size.
Previously, kernel code of nilfs was looking only limitation of
devices though mkfs.nilfs2 limits the range of block sizes; there was
no check that prevents rec_len overflow with larger block sizes.
With this change, block sizes larger than 64KB or smaller than 1KB
will get rejected explicitly by kernel.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
With 64KB blocksize, a directory entry can have size 64KB which does
not fit into 16 bits we have for entry length. So this patch stores
0xffff instead and converts value when read from / written to disk.
Nilfs derives its directory implementation from ext2 filesystem, and
this draws upon the corresponding change on ext2.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This adds three new fields to nilfs_super_block structure, compatible
feature set, readonly-compatible feature set, and incompatible feature
set in order to prepare for future disk format modifications.
The role of these fields conforms to those of ext3 or other
filesystems. Most important flags are the incompatible feature set;
it is used to refuse to mount the filesystem which sets an
incompatible feature the kernel doesn't know about.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This inserts comments indicating hexadecimal offset in declaration of
nilfs_super_block structure so that people can know offset of its
fields without counting from the head.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Current s_volume_name has 16 bytes, which is too small as modern filesystem.
s_last_mounted resides just after s_volume_name and has 64 bytes.
s_last_mounted is historically came from ext2, but not used in nilfs2 at all.
Deleting s_last_mounted member and merging that space with s_volume_name
enlarge s_volume_name upto 80 bytes for volume label.
When user land tools see the old header for new disk, it will just ignore
additional bytes stored in s_last_mounted. While, old disk format has only
16 bytes label, it doesn't affects in case seeing the new header for old disk.
Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This adds a field to record the latest checkpoint number in the
nilfs_segment_summary structure. This will help to recover the latest
checkpoint number from logs on disk. This field is intended for
crucial cases in which super blocks have lost pointer to the latest
log.
Even though this will change the disk format, both backward and
forward compatibility is preserved by a size field prepared in the
segment summary header.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This kills the following checkpatch warnings:
WARNING: please, no space before tabs
+^I__le32^Is_first_ino; ^I^I/* First non-reserved inode */$
WARNING: please, no space before tabs
+^I__le16 s_inode_size; ^I^I/* Size of an inode */$
WARNING: please, no space before tabs
+^Ichar^Is_volume_name[16]; ^I/* volume name */$
WARNING: please, no space before tabs
+^Ichar^Is_last_mounted[64]; ^I/* directory where last mounted */$
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This adds a function to send discard requests for given array of
segment numbers, and calls the function when garbage collection
succeeded.
Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>