The patch aims to fix a performance issue for the syscall
personality(PER_LINUX32).
On IA-64 box, the syscall personality (PER_LINUX32) has poor performance
because it failed to find the Linux/x86 execution domain. Then it tried
to load the kernel module however it failed always and it used the default
execution domain PER_LINUX instead. Requesting kernel modules is very
expensive. It caused the performance issue. (see the function
lookup_exec_domain in kernel/exec_domain.c).
To resolve the issue, execution domain Linux/x86 is always registered in
initialization time for IA-64 architecture.
Signed-off-by: Xiaolan Huang <xiaolan.huang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
If "max_purges" from PAL is 0, it actually means 1.
However it was not handled later when a hot-added cpu pass the
max_purges from PAL. This makes systems easy to go BUG_ON().
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
All architectures use an effectively identical definition of online_page(), so
just make it common code. x86-64, ia64, powerpc and sh are actually
identical; x86-32 is slightly different.
x86-32's differences arise because it puts its hotplug pages in the highmem
zone. We can handle this in the generic code by inspecting the page to see if
its in highmem, and update the totalhigh_pages count appropriately. This
leaves init_32.c:free_new_highpage with a single caller, so I folded it into
add_one_highpage_init.
I also removed an incorrect comment referring to the NUMA case; any NUMA
details have already been dealt with by the time online_page() is called.
[akpm@linux-foundation.org: fix indenting]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com>
Tested-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a NUMA memory configuration issue in 2.6.24:
A 2-node machine of ours has got the following memory layout:
Node 0: 0 - 2 Gbytes
Node 0: 4 - 8 Gbytes
Node 1: 8 - 16 Gbytes
Node 0: 16 - 18 Gbytes
"efi_memmap_init()" merges the three last ranges into one.
"register_active_ranges()" is called as follows:
efi_memmap_walk(register_active_ranges, NULL);
i.e. once for the 4 - 18 Gbytes range. It picks up the node
number from the start address, and registers all the memory for
the node #0.
"register_active_ranges()" should be called as follows to
make sure there is no merged address range at its entry:
efi_memmap_walk(filter_memory, register_active_ranges);
"filter_memory()" is similar to "filter_rsvd_memory()",
but the reserved memory ranges are not filtered out.
Signed-off-by: Zoltan Menyhart <Zoltan.Menyhart@bull.net>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Untangle the chaos of page size determination in this function by
simply using PAGE_SIZE << compound_order().
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
show_mem() has no need to print the amount of free swap space manually because
show_free_areas() does this already and is called by the former.
The two outputs only differ in text formatting:
printk("Free swap = %lukB\n", ...);
printk("Free swap: %6ldkB\n", ...);
Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This attached patch significantly shrinks boot memory allocation on ia64.
It does this by not allocating per_cpu areas for cpus that can never
exist.
In the case where acpi does not have any numa node description of the
cpus, I defaulted to assigning the first 32 round-robin on the known
nodes.. For the !CONFIG_ACPI I used for_each_possible_cpu().
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
A simple fix. The existing pernodesize reservation is not taking into
account a second array of pg_data_t structures. This is normally not
important because the PAGE_ALIGN macro reserves adequate space.
I made the compute_pernodesize steps in the same order as the fill_pernode
steps to make the correlation more clear.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The patch defines kernel parameter "nptcg=". The parameter overrides max number
of concurrent global TLB purges which is reported from either PAL_VM_SUMMARY or
SAL PALO.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
According to SDM2.2, Itanium supports multiple outstanding ptc.g instructions.
But current kernel function ia64_global_tlb_purge() uses a spinlock to serialize
ptc.g instructions issued by multiple processors. This serialization might have
scalability issue on a big SMP machine where many processors could purge TLB
in parallel.
The patch fixes this problem by issuing multiple ptc.g instructions in
ia64_global_tlb_purge(). It also adds support for the "PALO" table to get
a platform view of the max number of outstanding ptc.g instructions (which
may be different from the processor view found from PAL_VM_SUMMARY).
PALO specification can be found at: http://www.dig64.org/home/DIG64_PALO_R1_0.pdf
spinaphore implementation by Matthew Wilcox.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Dynamic TR resource should be managed in the uniform way.
Add two interfaces for kernel:
ia64_itr_entry: Allocate a (pair of) TR for caller.
ia64_ptr_entry: Purge a (pair of ) TR by caller.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
ia64 named their handler kprobes_fault_handler while all other
arches used kprobe_fault_handler. Change the function definition
and header declaration.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
__FUNCTION__ is gcc-specific, use __func__
Long lines have been kept where they exist, some small spacing changes
have been done.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patchset adds a flags variable to reserve_bootmem() and uses the
BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
between crashkernel area and already used memory.
This patch:
Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
If that flag is set, the function returns with -EBUSY if the memory already
has been reserved in the past. This is to avoid conflicts.
Because that code runs before SMP initialisation, there's no race condition
inside reserve_bootmem_core().
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: <linux-arch@vger.kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This requires making die() and die_if_kernel() return a value, and their
callers to honor this (and be prepared that it returns).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Improve performance of memory allocations on ia64 by avoiding a global TLB
purge to purge a single page from the file cache. This happens whenever we
evict a page from the buffer cache to make room for some other allocation.
Test case: Run 'find /usr -type f | xargs cat > /dev/null' in the
background to fill the buffer cache, then run something that uses memory,
e.g. 'gmake -j50 install'. Instrumentation showed that the number of
global TLB purges went from a few millions down to about 170 over a 12
hours run of the above.
The performance impact is particularly noticeable under virtualization,
because a virtual TLB is generally both larger and slower to purge than
a physical one.
Signed-off-by: Christophe de Dinechin <ddd@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There is a section mismatch when building CONFIG_FLATMEM=y kernels
that also have CONFIG_HOTPLUG_CPU=y
WARNING: vmlinux.o(.text+0x5a902): Section mismatch: reference to \
.init.text:__alloc_bootmem (between 'per_cpu_init' and 'count_pages')
The issue occurs because per_cpu_init() in mm/contig.c is
marked __cpuinit (which is #define'd to nothing on a hot
plug cpu configuration) call __alloc_bootmem() (which is
an __init function). The usage is actually safe because
the __alloc_bootmem() is inside an "if (first_time)" test
so that the call is only made while it is still legal to
do so.
But the warning is irritating. Move the allocation to
find_memory().
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch fixes the following section mismatches:
<-- snip -->
...
WARNING: vmlinux.o(.text+0x5b5c2): Section mismatch: reference to .init.text:memmap_init_zone (between 'memmap_init' and 'virtual_memmap_init')
WARNING: vmlinux.o(.text+0x5b842): Section mismatch: reference to .init.text:memmap_init_zone (between 'virtual_memmap_init' and 'ia64_mmu_init')
...
<-- snip -->
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
is_init() is an ambiguous name for the pid==1 check. Split it into
is_global_init() and is_container_init().
A cgroup init has it's tsk->pid == 1.
A global init also has it's tsk->pid == 1 and it's active pid namespace
is the init_pid_ns. But rather than check the active pid namespace,
compare the task structure with 'init_pid_ns.child_reaper', which is
initialized during boot to the /sbin/init process and never changes.
Changelog:
2.6.22-rc4-mm2-pidns1:
- Use 'init_pid_ns.child_reaper' to determine if a given task is the
global init (/sbin/init) process. This would improve performance
and remove dependence on the task_pid().
2.6.21-mm2-pidns2:
- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
ppc,avr32}/traps.c for the _exception() call to is_global_init().
This way, we kill only the cgroup if the cgroup's init has a
bug rather than force a kernel panic.
[akpm@linux-foundation.org: fix comment]
[sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
[bunk@stusta.de: kernel/pid.c: remove unused exports]
[sukadev@us.ibm.com: Fix capability.c to work with threaded init]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Acked-by: Pavel Emelianov <xemul@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Herbert Poetzel <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch uses vm_get_page_prot() to setup vma->vm_page_prot.
Though inside vm_get_page_prot() the protection flags is AND with
(VM_READ|VM_WRITE|VM_EXEC|VM_SHARED), it does not hurt correct code.
Signed-off-by: Coly Li <coyli@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch set frees the restriction that makedumpfile users should install a
vmlinux file (including the debugging information) into each system.
makedumpfile command is the dump filtering feature for kdump. It creates a
small dumpfile by filtering unnecessary pages for the analysis. To
distinguish unnecessary pages, it needs a vmlinux file including the debugging
information. These days, the debugging package becomes a huge file, and it is
hard to install it into each system.
To solve the problem, kdump developers discussed it at lkml and kexec-ml. As
the result, we reached the conclusion that necessary information for dump
filtering (called "vmcoreinfo") should be embedded into the first kernel file
and it should be accessed through /proc/vmcore during the second kernel.
(http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.0/1806.html)
Dan Aloni created the patch set for the above implementation.
(http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.1/1053.html)
And I updated it for multi architectures and memory models.
(http://lists.infradead.org/pipermail/kexec/2007-August/000479.html)
Signed-off-by: Dan Aloni <da-x@monatomic.org>
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>