This adds Makefile and Kconfig for f2fs, and updates Makefile and Kconfig files
in the fs directory.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This moves all of the f2fs debugging files into debugfs. The files are
located in /sys/kernel/debug/f2fs/
Note, I think we are generating all of the same information in each of
the files for every unique f2fs filesystem in the machine. This copies
the functionality that was present in the proc files, but this should be
fixed up in the future.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[jaegeuk.kim@samsung.com: merged 3 debugfs entries into a *status* entry]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds roll-forward routines to recover fsynced data.
- F2FS uses basically roll-back model with checkpointing.
- In order to implement fsync(), there are two approaches as follows.
1. A roll-back model with checkpointing at every fsync()
: This is a naive method, but suffers from very low performance.
2. A roll-forward model
: F2FS adopts this model where all the fsynced data should be recovered, which
were written after checkpointing was done. In order to figure out the data,
F2FS keeps a "fsync" mark in direct node blocks. In addition, F2FS remains
the location of next node block in each direct node block for reconstructing
the chain of node blocks during the recovery.
- In order to enhance the performance, F2FS keeps a "dentry" mark also in direct
node blocks. If this is set during the recovery, F2FS replays adding a dentry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This implements xattr and acl functionalities.
- F2FS uses a node page to contain use extended attributes.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds address space operations for data.
- F2FS supports readpages(), writepages(), and direct_IO().
- Because of out-of-place writes, f2fs_direct_IO() does not write data in place.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds memory operations and file/file_inode operations.
- F2FS supports fallocate(), mmap(), fsync(), and basic ioctl().
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds specific functions to manage NAT pages, a cache for NAT entries, free
nids, direct/indirect node blocks for indexing data, and address space for node
pages.
- The key information of an NAT entry consists of a node id and a block address.
- An NAT page is composed of block addresses covered by a certain range of NAT
entries, which is maintained by the address space of meta_inode.
- A radix tree structure is used to cache NAT entries. The index for the tree
is a node id.
- When there is no free nid, F2FS should scan NAT entries to find new one. In
order to avoid scanning frequently, F2FS manages a list containing a number of
free nids in memory. Only when free nids in the list are exhausted, scanning
process, build_free_nids(), is triggered.
- F2FS has direct and indirect node blocks for indexing data. This patch adds
fuctions related to the node block management such as getting, allocating, and
truncating node blocks to index data.
- In order to cache node blocks in memory, F2FS has a node_inode with an address
space for node pages. This patch also adds the address space operations for
node_inode.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds functions required by the checkpoint operations.
Basically, f2fs adopts a roll-back model with checkpoint blocks written in the
CP area. The checkpoint procedure includes as follows.
- write_checkpoint()
1. block_operations() freezes VFS calls.
2. submit cached bios.
3. flush_nat_entries() writes NAT pages updated by dirty NAT entries.
4. flush_sit_entries() writes SIT pages updated by dirty SIT entries.
5. do_checkpoint() writes,
- checkpoint block (#0)
- orphan inode blocks
- summary blocks made by active logs
- checkpoint block (copy of #0)
6. unblock_opeations()
In order to provide an address space for meta pages, f2fs_sb_info has a special
inode, namely meta_inode. This patch also adds the address space operations for
meta_inode.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds the implementation of superblock operations for f2fs, which includes
- init_f2fs_fs/exit_f2fs_fs
- f2fs_mount
- super_operations of f2fs
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds a document describing the mount options, proc entries, usage, and
design of Flash-Friendly File System, namely F2FS.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
The matrix-keymap module is currently lacking a proper module license,
add one so we don't have this module tainting the entire kernel. This
issue has been present since commit 1932811f42 ("Input: matrix-keymap
- uninline and prepare for device tree support")
Signed-off-by: Florian Fainelli <florian@openwrt.org>
CC: stable@vger.kernel.org # v3.5+
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
1) Netlink socket dumping had several missing verifications and checks.
In particular, address comparisons in the request byte code
interpreter could access past the end of the address in the
inet_request_sock.
Also, address family and address prefix lengths were not validated
properly at all.
This means arbitrary applications can read past the end of certain
kernel data structures.
Fixes from Neal Cardwell.
2) ip_check_defrag() operates in contexts where we're in the process
of, or about to, input the packet into the real protocols
(specifically macvlan and AF_PACKET snooping).
Unfortunately, it does a pskb_may_pull() which can modify the
backing packet data which is not legal if the SKB is shared. It
very much can be shared in this context.
Deal with the possibility that the SKB is segmented by using
skb_copy_bits().
Fix from Johannes Berg based upon a report by Eric Leblond.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
ipv4: ip_check_defrag must not modify skb before unsharing
inet_diag: validate port comparison byte code to prevent unsafe reads
inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run()
inet_diag: validate byte code to prevent oops in inet_diag_bc_run()
inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state
This reverts commits a50915394f and
d7c3b937bd.
This is a revert of a revert of a revert. In addition, it reverts the
even older i915 change to stop using the __GFP_NO_KSWAPD flag due to the
original commits in linux-next.
It turns out that the original patch really was bogus, and that the
original revert was the correct thing to do after all. We thought we
had fixed the problem, and then reverted the revert, but the problem
really is fundamental: waking up kswapd simply isn't the right thing to
do, and direct reclaim sometimes simply _is_ the right thing to do.
When certain allocations fail, we simply should try some direct reclaim,
and if that fails, fail the allocation. That's the right thing to do
for THP allocations, which can easily fail, and the GPU allocations want
to do that too.
So starting kswapd is sometimes simply wrong, and removing the flag that
said "don't start kswapd" was a mistake. Let's hope we never revisit
this mistake again - and certainly not this many times ;)
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ip_check_defrag() might be called from af_packet within the
RX path where shared SKBs are used, so it must not modify
the input SKB before it has unshared it for defragmentation.
Use skb_copy_bits() to get the IP header and only pull in
everything later.
The same is true for the other caller in macvlan as it is
called from dev->rx_handler which can also get a shared SKB.
Reported-by: Eric Leblond <eric@regit.org>
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 782fd30406.
We are going to reinstate the __GFP_NO_KSWAPD flag that has been
removed, the removal reverted, and then removed again. Making this
commit a pointless fixup for a problem that was caused by the removal of
__GFP_NO_KSWAPD flag.
The thing is, we really don't want to wake up kswapd for THP allocations
(because they fail quite commonly under any kind of memory pressure,
including when there is tons of memory free), and these patches were
just trying to fix up the underlying bug: the original removal of
__GFP_NO_KSWAPD in commit c654345924 ("mm: remove __GFP_NO_KSWAPD")
was simply bogus.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add logic to verify that a port comparison byte code operation
actually has the second inet_diag_bc_op from which we read the port
for such operations.
Previously the code blindly referenced op[1] without first checking
whether a second inet_diag_bc_op struct could fit there. So a
malicious user could make the kernel read 4 bytes beyond the end of
the bytecode array by claiming to have a whole port comparison byte
code (2 inet_diag_bc_op structs) when in fact the bytecode was not
long enough to hold both.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>