After build_free_nids() searches free nid candidates from nat pages and
current journal blocks, it checks all the candidates if they are allocated
so that the nat cache has its nid with an allocated block address.
In this procedure, previously we used
list_for_each_entry_safe(fnid, next_fnid, &nm_i->free_nid_list, list).
But, this is not covered by free_nid_list_lock, resulting in null pointer bug.
This patch moves this checking routine inside add_free_nid() in order not to use
the spin_lock.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
When nm_i->fcnt > 2 * MAX_FREE_NIDS, stop scanning other NAT entries.
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
[Jaegeuk Kim: fix handling the return value of add_free_nid()]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch does two cleanups:
1. remove unused variable "fcnt" in build_free_nids().
2. make scan_nat_page() as void type and remove useless variable "fcnt".
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Directly drop the free_nid cache when nm_i->fcnt > 2 * MAX_FREE_NIDS
Since there is NOT nmi->free_nid_list_lock spinlock protection between
a sequential calling of alloc_nid() and alloc_nid_failed(), some other
threads may already add new free_nid to the free_nid_list during this
period.
We need to make sure nmi->fcnt is never > 2 * MAX_FREE_NIDS.
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
[Jaegeuk Kim: fit the coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
When testing f2fs on an SSD, I found some 128 page IOs followed by 1 page IO
were issued by f2fs_write_node_pages.
This means that there were some mishandling flows which degrades performance.
Previous f2fs_write_node_pages determines the number of pages to be written,
nr_to_write, as follows.
1. The bio_get_nr_vecs returns 129 pages.
2. The bio_alloc makes a room for 128 pages.
3. The initial 128 pages go into one bio.
4. The existing bio is submitted, and a new bio is prepared for the last 1 page.
5. Finally, sync_node_pages submits the last 1 page bio.
The problem is from the use of bio_get_nr_vecs, so this patch replace it
with max_hw_blocks using queue_max_sectors.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
try_to_free_nats() is usually called with parameter nr_shrink as
"nm_i->nat_cnt - NM_WOUT_THRESHOLD"
by flush_nat_entries() during checkpointing process.
However, this is inconsistent with the actual threshold check as
"if (nm_i->nat_cnt < 2 * NM_WOUT_THRESHOLD)"
, which will ignore the free_nats requests when
NM_WOUT_THRESHOLD < nm_i->nat_cnt < 2 * NM_WOUT_THRESHOLD
So fix the threshold check condition.
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
We call lock_page when we need to update a page after readpage.
Between grab and lock page, the page can be truncated by other thread.
So, we should check the page after lock_page whether it was truncated or not.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
In order to avoid build_free_nid lock contention, let's change the order of
function calls as follows.
At first, check whether there is enough free nids.
- If available, just get a free nid with spin_lock without any overhead.
- Otherwise, conduct build_free_nids.
: scan nat pages, journal nat entries, and nat cache entries.
We should consider carefullly not to serve free nids intermediately made by
build_free_nids.
We can get stable free nids only after build_free_nids is done.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
It is more obvious that add_free_nid checks whether the free nid is zero or not.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
add tracepoints for tracing the truncate operations
like truncate node/data blocks, f2fs_truncate etc.
Tracepoints are added at entry and exit of operation
to trace the success & failure of operation.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: combine and modify the tracepoint structures]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
In the previous version, f2fs uses global locks according to the usage types,
such as directory operations, block allocation, block write, and so on.
Reference the following lock types in f2fs.h.
enum lock_type {
RENAME, /* for renaming operations */
DENTRY_OPS, /* for directory operations */
DATA_WRITE, /* for data write */
DATA_NEW, /* for data allocation */
DATA_TRUNC, /* for data truncate */
NODE_NEW, /* for node allocation */
NODE_TRUNC, /* for node truncate */
NODE_WRITE, /* for node write */
NR_LOCK_TYPE,
};
In that case, we lose the performance under the multi-threading environment,
since every types of operations must be conducted one at a time.
In order to address the problem, let's share the locks globally with a mutex
array regardless of any types.
So, let users grab a mutex and perform their jobs in parallel as much as
possbile.
For this, I propose a new global lock scheme as follows.
0. Data structure
- f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
- f2fs_sb_info -> node_write
1. mutex_lock_op(sbi)
- try to get an avaiable lock from the array.
- returns the index of the gottern lock variable.
2. mutex_unlock_op(sbi, index of the lock)
- unlock the given index of the lock.
3. mutex_lock_all(sbi)
- grab all the locks in the array before the checkpoint.
4. mutex_unlock_all(sbi)
- release all the locks in the array after checkpoint.
5. block_operations()
- call mutex_lock_all()
- sync_dirty_dir_inodes()
- grab node_write
- sync_node_pages()
Note that,
the pairs of mutex_lock_op()/mutex_unlock_op() and
mutex_lock_all()/mutex_unlock_all() should be used together.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch reduces redundant spin_lock operations in alloc_nid_failed().
The alloc_nid_failed() does not need to delete entry and add one again
by triggering spin_lock and spin_unlock redundantly.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
In order to do GC more reliably, I'd like to lock the vicitm summary page
until its GC is completed, and also prevent any checkpoint process.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
In get_node_page, we do not need to call lock_page all the time.
If the node page is cached as uptodate,
1. grab_cache_page locks the page,
2. read_node_page unlocks the page, and
3. lock_page is called for further process.
Let's avoid this.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
In the checkpoint flow, the f2fs investigates the total nat cache entries.
Previously, if an entry has NULL_ADDR, f2fs drops the entry and adds the
obsolete nid to the free nid list.
However, this free nid will be reused sooner, resulting in its nat entry miss.
In order to avoid this, we don't need to drop the nat cache entry at this moment.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
The build_free_nid should not add free nids over nm_i->max_nid.
But, there was a hole that invalid free nid was added by the following scenario.
Let's suppose nm_i->max_nid = 150 and the last NAT page has 100 ~ 200 nids.
build_free_nids
- get_current_nat_page loads the last NAT page
- scan_nat_page can add 100 ~ 200 nids
-> Bug here!
So, when scanning an NAT page, we should check each candidate whether it is
over max_nid or not.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
If the return value of releasepage is equal to zero, the page cannot be reclaimed.
Instead, we should return 1 in order to reclaim clean pages.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
When we build new free nids, let's scan the just next NAT page instead of
skipping a couple of previously scanned pages in order to reuse free nids in
there.
Otherwise, we can use too much wide range of nids even though several nids were
deallocated, and also their node pages can be cached in the node_inode's address
space.
This means that we can retain lots of clean pages in the main memory, which
induces mm's reclaiming overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Currently, f2fs doesn't reclaim any node pages.
However, if we found that a node page was truncated by checking its block
address with zero during f2fs_write_node_page, we should not skip that node
page and return zero to reclaim it.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch reduces redundant locking and unlocking pages during read operations.
In f2fs_readpage, let's use wait_on_page_locked() instead of lock_page.
And then, when we need to modify any data finally, let's lock the page so that
we can avoid lock contention.
[readpage rule]
- The f2fs_readpage returns unlocked page, or released page too in error cases.
- Its caller should handle read error, -EIO, after locking the page, which
indicates read completion.
- Its caller should check PageUptodate after grab_cache_page.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
In all the breaking conditions in get_node_path, 'n' is used to
track index in offset[] array, but while breaking out also, in all
paths n++ is done.
So, remove the ++ from breaking paths. Also, avoid
reset of 'level=0' in first case.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
We can remove the call to find_get_page to get a page from the cache
and check for up-to-date, instead we can make use of grab_cache_page
part itself to fetch the page from the cache.
So, removing the call and moving the PageUptodate at proper place, also
taken care of moving the lock_page condition in the page_hit part.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>