Imho, "atomic_t call_count" is ugly and should die. It buys nothing and
in fact it can grow more than necessary, expand doesn't check if it was
already incremented by another task.
Kill it, and introduce "static int core_name_size" updated by
expand_corename(). This is obviously racy too but harmless, and
core_name_size never grows for no reason.
We do not bother to to calculate the "right" new size, we simply do
kmalloc(size_we_need) and use ksize() to rely on kmalloc_index's decision.
Finally change format_corename() to use expand_corename(), krealloc(NULL)
is fine.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Colin Walters <walters@verbum.org>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Lucas De Marchi <lucas.de.marchi@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The cp_inodes_count and cp_blocks_count are represented as __le64 type in
on-disk structure (struct nilfs_checkpoint). But analogous fields in
in-core structure (struct nilfs_root) are represented by atomic_t type.
This patch replaces atomic_t on atomic64_t type in representation of
inodes_count and blocks_count fields in struct nilfs_root.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: Joern Engel <joern@logfs.org>
Cc: Clemens Eisserer <linuxhippy@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, NILFS2 returns 0 as free inodes count (f_ffree) and current
used inodes count as total file nodes in file system (f_files):
df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/loop0 2 2 0 100% /mnt/nilfs2
This patch implements real calculation of free inodes count. First of
all, it is calculated total file nodes in file system as
(desc_blocks_count * groups_per_desc_block * entries_per_group). Then, it
is calculated free inodes count as difference the total file nodes and
used inodes count. As a result, we have such output for NILFS2:
df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/loop0 4194304 2114701 2079603 51% /mnt/nilfs2
Reported-by: Clemens Eisserer <linuxhippy@gmail.com>
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Joern Engel <joern@logfs.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Calling kthread_run with a single name parameter causes it to be handled
as a format string. Many callers are passing potentially dynamic string
content, so use "%s" in those cases to avoid any potential accidents.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix various weird constructions of strncpy(dst, src, strlen(src)).
Length limits should be about the space available in the destination,
not repurposed as a method to either always include or always exclude a
trailing NULL byte. Either the NULL should always be copied (using
strlcpy), or it should not be copied (using something like memcpy).
Readable code should not depend on the weird behavior of strncpy when it
hits the length limit. Better to avoid the anti-pattern entirely.
[akpm@linux-foundation.org: revert getdelays.c part due to missing bsd/string.h]
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [staging]
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [acpi]
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ursula Braun <ursula.braun@de.ibm.com>
Cc: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that the LRU to add a page to is decided at LRU-add time, remove the
misleading lru parameter from __pagevec_lru_add. A consequence of this
is that the pagevec_lru_add_file, pagevec_lru_add_anon and similar
helpers are misleading as the caller no longer has direct control over
what LRU the page is added to. Unused helpers are removed by this patch
and existing users of pagevec_lru_add_file() are converted to use
lru_cache_add_file() directly and use the per-cpu pagevecs instead of
creating their own pagevec.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alexey Lyahkov <alexey.lyashkov@gmail.com>
Cc: Andrew Perepechko <anserper@ya.ru>
Cc: Robin Dong <sanbai@taobao.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Bernd Schubert <bernd.schubert@fastmail.fm>
Cc: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch introduces mmap_vmcore().
Don't permit writable nor executable mapping even with mprotect()
because this mmap() is aimed at reading crash dump memory. Non-writable
mapping is also requirement of remap_pfn_range() when mapping linear
pages on non-consecutive physical pages; see is_cow_mapping().
Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
remap_vmalloc_range_pertial at the same time for a single vma.
do_munmap() can correctly clean partially remapped vma with two
functions in abnormal case. See zap_pte_range(), vm_normal_page() and
their comments for details.
On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
limitation comes from the fact that the third argument of
remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.
[akpm@linux-foundation.org: use min(), switch to conventional error-unwinding approach]
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Lisa Mitchell <lisa.mitchell@hp.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Tested-by: Maxim Uvarov <muvarov@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The previous patches newly added holes before each chunk of memory and
the holes need to be count in vmcore file size. There are two ways to
count file size in such a way:
1) suppose m is a poitner to the last vmcore object in vmcore_list.
Then file size is (m->offset + m->size), or
2) calculate sum of size of buffers for ELF header, program headers,
ELF note segments and objects in vmcore_list.
Although 1) is more direct and simpler than 2), 2) seems better in that
it reflects internal object structure of /proc/vmcore. Thus, this patch
changes get_vmcore_size_elf{64, 32} so that it calculates size in the
way of 2).
As a result, both get_vmcore_size_elf{64, 32} have the same definition.
Merge them as get_vmcore_size.
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Lisa Mitchell <lisa.mitchell@hp.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now ELF note segment has been copied in the buffer on vmalloc memory.
To allow user process to remap the ELF note segment buffer with
remap_vmalloc_page, the corresponding VM area object has to have
VM_USERMAP flag set.
[akpm@linux-foundation.org: use the conventional comment layout]
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Lisa Mitchell <lisa.mitchell@hp.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The reasons why we don't allocate ELF note segment in the 1st kernel
(old memory) on page boundary is to keep backward compatibility for old
kernels, and that if doing so, we waste not a little memory due to
round-up operation to fit the memory to page boundary since most of the
buffers are in per-cpu area.
ELF notes are per-cpu, so total size of ELF note segments depends on
number of CPUs. The current maximum number of CPUs on x86_64 is 5192,
and there's already system with 4192 CPUs in SGI, where total size
amounts to 1MB. This can be larger in the near future or possibly even
now on another architecture that has larger size of note per a single
cpu. Thus, to avoid the case where memory allocation for large block
fails, we allocate vmcore objects on vmalloc memory.
This patch adds elfnotes_buf and elfnotes_sz variables to keep pointer
to the ELF note segment buffer and its size. There's no longer the
vmcore object that corresponds to the ELF note segment in vmcore_list.
Accordingly, read_vmcore() has new case for ELF note segment and
set_vmcore_list_offsets_elf{64,32}() and other helper functions starts
calculating offset from sum of size of ELF headers and size of ELF note
segment.
[akpm@linux-foundation.org: use min(), fix error-path vzalloc() leaks]
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Lisa Mitchell <lisa.mitchell@hp.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Treat memory chunks referenced by PT_LOAD program header entries in
page-size boundary in vmcore_list. Formally, for each range [start,
end], we set up the corresponding vmcore object in vmcore_list to
[rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].
This change affects layout of /proc/vmcore. The gaps generated by the
rearrangement are newly made visible to applications as holes.
Concretely, they are two ranges [rounddown(start, PAGE_SIZE), start] and
[end, roundup(end, PAGE_SIZE)].
Suppose variable m points at a vmcore object in vmcore_list, and
variable phdr points at the program header of PT_LOAD type the variable
m corresponds to. Then, pictorially:
m->offset +---------------+
| hole |
phdr->p_offset = +---------------+
m->offset + (paddr - start) | |\
| kernel memory | phdr->p_memsz
| |/
+---------------+
| hole |
m->offset + m->size +---------------+
where m->offset and m->offset + m->size are always page-size aligned.
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Lisa Mitchell <lisa.mitchell@hp.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allocate ELF headers on page-size boundary using __get_free_pages()
instead of kmalloc().
Later patch will merge PT_NOTE entries into a single unique one and
decrease the buffer size actually used. Keep original buffer size in
variable elfcorebuf_sz_orig to kfree the buffer later and actually used
buffer size with rounded up to page-size boundary in variable
elfcorebuf_sz separately.
The size of part of the ELF buffer exported from /proc/vmcore is
elfcorebuf_sz.
The merged, removed PT_NOTE entries, i.e. the range [elfcorebuf_sz,
elfcorebuf_sz_orig], is filled with 0.
Use size of the ELF headers as an initial offset value in
set_vmcore_list_offsets_elf{64,32} and
process_ptload_program_headers_elf{64,32} in order to indicate that the
offset includes the holes towards the page boundary.
As a result, both set_vmcore_list_offsets_elf{64,32} have the same
definition. Merge them as set_vmcore_list_offsets.
[akpm@linux-foundation.org: add free_elfcorebuf(), cleanups]
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Lisa Mitchell <lisa.mitchell@hp.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
VM page reclaim uses dirty and writeback page states to determine if
flushers are cleaning pages too slowly and that page reclaim should
stall waiting on flushers to catch up. Page state in NFS is a bit more
complex and a clean page can be unreclaimable due to being unstable
which is effectively "dirty" from the perspective of the VM from reclaim
context. Similarly, if the inode is currently being committed then it's
similar to being under writeback.
This patch adds a is_dirty_writeback() handled for NFS that checks if a
pages backing inode is being committed and should be accounted as
writeback and if a page has private state indicating that it is
effectively dirty.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Zlatko Calusic <zcalusic@bitsync.net>
Cc: dormando <dormando@rydia.net>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>