Currently cpu and unit are always identity mapped. To allow more
efficient large page support on NUMA and lazy allocation for possible
but offline cpus, cpu -> unit mapping needs to be non-linear and/or
sparse. This can be easily implemented by adding a cpu -> unit
mapping array and using it whenever looking up the matching unit for a
cpu.
The only unusal conversion is in pcpu_chunk_addr_search(). The passed
in address is unit0 based and unit0 might not be in use so it needs to
be converted to address of an in-use unit. This is easily done by
adding the unit offset for the current processor.
[ Impact: allows non-linear/sparse cpu -> unit mapping, no visible change yet ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
percpu core doesn't need to tack all the allocated pages. It needs to
know whether certain pages are populated and a way to reverse map
address to page when freeing. This patch drops pcpu_chunk->page[] and
use populated bitmap and vmalloc_to_page() lookup instead. Using
vmalloc_to_page() exclusively is also possible but complicates first
chunk handling, inflates cache footprint and prevents non-standard
memory allocation for percpu memory.
pcpu_chunk->page[] was used to track each page's allocation and
allowed asymmetric population which happens during failure path;
however, with single bitmap for all units, this is no longer possible.
Bite the bullet and rewrite (de)populate functions so that things are
done in clearly separated steps such that asymmetric population
doesn't happen. This makes the (de)population process much more
modular and will also ease implementing non-standard memory usage in
the future (e.g. large pages).
This makes @get_page_fn parameter to pcpu_setup_first_chunk()
unnecessary. The parameter is dropped and all first chunk helpers are
updated accordingly. Please note that despite the volume most changes
to first chunk helpers are symbol renames for variables which don't
need to be referenced outside of the helper anymore.
This change reduces memory usage and cache footprint of pcpu_chunk.
Now only #unit_pages bits are necessary per chunk.
[ Impact: reduced memory usage and cache footprint for bookkeeping ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Now that all first chunk allocator helpers allocate and map the first
chunk themselves, there's no need to have optional default alloc/map
in pcpu_setup_first_chunk(). Drop @populate_pte_fn and only leave
@dyn_size optional and make all other params mandatory.
This makes it much easier to follow what pcpu_setup_first_chunk() is
doing and what actual differences tweaking each parameter results in.
[ Impact: drop unused code path ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Generalize and move x86 setup_pcpu_lpage() into
pcpu_lpage_first_chunk(). setup_pcpu_lpage() now is a simple wrapper
around the generalized version. Other than taking size parameters and
using arch supplied callbacks to allocate/free/map memory,
pcpu_lpage_first_chunk() is identical to the original implementation.
This simplifies arch code and will help converting more archs to
dynamic percpu allocator.
While at it, factor out pcpu_calc_fc_sizes() which is common to
pcpu_embed_first_chunk() and pcpu_lpage_first_chunk().
[ Impact: code reorganization and generalization ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Generalize and move x86 setup_pcpu_4k() into pcpu_4k_first_chunk().
setup_pcpu_4k() now is a simple wrapper around the generalized
version. Other than taking size parameters and using arch supplied
callbacks to allocate/free memory, pcpu_4k_first_chunk() is identical
to the original implementation.
This simplifies arch code and will help converting more archs to
dynamic percpu allocator.
While at it, s/pcpu_populate_pte_fn_t/pcpu_fc_populate_pte_fn_t/ for
consistency.
[ Impact: code reorganization and generalization ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
The only extra feature @unit_size provides is making dead space at the
end of the first chunk which doesn't have any valid usecase. Drop the
parameter. This will increase consistency with generalized 4k
allocator.
James Bottomley spotted missing conversion for the default
setup_per_cpu_areas() which caused build breakage on all arcsh which
use it.
[ Impact: drop unused code path ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Ingo Molnar <mingo@elte.hu>
Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix
changes. As alpha in percpu tree uses 'weak' attribute instead of
inline assembly, there's no need for __used attribute.
Conflicts:
arch/alpha/include/asm/percpu.h
arch/mn10300/kernel/vmlinux.lds.S
include/linux/percpu-defs.h
* git://git.infradead.org/iommu-2.6: (38 commits)
intel-iommu: Don't keep freeing page zero in dma_pte_free_pagetable()
intel-iommu: Introduce first_pte_in_page() to simplify PTE-setting loops
intel-iommu: Use cmpxchg64_local() for setting PTEs
intel-iommu: Warn about unmatched unmap requests
intel-iommu: Kill superfluous mapping_lock
intel-iommu: Ensure that PTE writes are 64-bit atomic, even on i386
intel-iommu: Make iommu=pt work on i386 too
intel-iommu: Performance improvement for dma_pte_free_pagetable()
intel-iommu: Don't free too much in dma_pte_free_pagetable()
intel-iommu: dump mappings but don't die on pte already set
intel-iommu: Combine domain_pfn_mapping() and domain_sg_mapping()
intel-iommu: Introduce domain_sg_mapping() to speed up intel_map_sg()
intel-iommu: Simplify __intel_alloc_iova()
intel-iommu: Performance improvement for domain_pfn_mapping()
intel-iommu: Performance improvement for dma_pte_clear_range()
intel-iommu: Clean up iommu_domain_identity_map()
intel-iommu: Remove last use of PHYSICAL_PAGE_MASK, for reserving PCI BARs
intel-iommu: Make iommu_flush_iotlb_psi() take pfn as argument
intel-iommu: Change aligned_size() to aligned_nrpages()
intel-iommu: Clean up intel_map_sg(), remove domain_page_mapping()
...
fix hang with HIGHMEM_64G and 32bit resource. According to hpa and
Linus, use (resource_size_t)-1 to fend off big ranges.
Analyzed by hpa
Reported-and-tested-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These macros had two bugs:
- the type of the mask was not correctly expanded to the full size of
the argument being expanded, resulting in possible loss of high bits
when mixing types.
- the alignment argument was evaluated twice, despite the macro looking
like a fancy function (but it really does need to be a macro, since
it works on arbitrary integer types)
Noticed by Peter Anvin, and with a fix that is a modification of his
suggestion (bug noticed by Yinghai Lu).
Cc: Peter Anvin <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
sh: LCDC dcache flush for deferred io
sh: Fix compiler error and include the definition of IS_ERR_VALUE
sh: re-add LCDC fbdev support to the Migo-R defconfig
sh: fix se7724 ceu names
sh: ms7724se: Enable sh_eth in defconfig.
arch/sh/boards/mach-se/7206/io.c: Remove unnecessary semicolons
sh: ms7724se: Add sh_eth support
nommu: provide follow_pfn().
sh: Kill off unused DEBUG_BOOTMEM symbol.
perf_counter tools: add cpu_relax()/rmb() definitions for sh.
sh64: Hook up page fault events for software perf counters.
sh: Hook up page fault events for software perf counters.
sh: make set_perf_counter_pending() static inline.
clocksource: sh_tmu: Make undefined TCOR behaviour less undefined.
When arch/sh/include/asm/syscall_32.h is included from a file that
doesn't also include linux/err.h the following error is produced,
In file included from /home/matt/src/kernels/sh-2.6/arch/sh/include/asm/syscall.h:5,
from kernel/trace/trace_syscalls.c:3:
/home/matt/src/kernels/sh-2.6/arch/sh/include/asm/syscall_32.h: In function 'syscall_get_error':
/home/matt/src/kernels/sh-2.6/arch/sh/include/asm/syscall_32.h:28: error: implicit declaration of function 'IS_ERR_VALUE'
make[2]: *** [kernel/trace/trace_syscalls.o] Error 1
make[1]: *** [kernel/trace] Error 2
make: *** [kernel] Error 2
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
We can run a 32-bit kernel on boxes with an IOMMU, so we need
pci_unmap_addr() etc. to work -- without it, drivers will leak mappings.
To be honest, this whole thing looks like it's more pain than it's
worth; I'm half inclined to remove the no-op #else case altogether.
But this is the minimal fix, which just does the right thing if
CONFIG_DMAR is set.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org [ for 2.6.30 ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use "ceu0" and "ceu1" as CEU names instead of "ceu".
This fixes "memchunk" kernel command line selection
on the solution engine 7724 board.
With this patch applied use "memchunk.ceu0=1m" or
"memchunk.ceu1=1m" on kernel command line to override
physically memory size to one meg for CEU0 or CEU1.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (47 commits)
perf report: Add --symbols parameter
perf report: Add --comms parameter
perf report: Add --dsos parameter
perf_counter tools: Adjust only prelinked symbol's addresses
perf_counter: Provide a way to enable counters on exec
perf_counter tools: Reduce perf stat measurement overhead/skew
perf stat: Use percentages for scaling output
perf_counter, x86: Update x86_pmu after WARN()
perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run
perf stat: Improve output
perf stat: Fix multi-run stats
perf stat: Add -n/--null option to run without counters
perf_counter tools: Remove dead code
perf_counter: Complete counter swap
perf report: Print sorted callchains per histogram entries
perf_counter tools: Prepare a small callchain framework
perf record: Fix unhandled io return value
perf_counter tools: Add alias for 'l1d' and 'l1i'
perf-report: Add bare minimum PERF_EVENT_READ parsing
perf-report: Add modes for inherited stats and no-samples
...
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
Add Fenghua Yu as temporary co-maintainer for ia64
[IA64] address compiler warnings perfmon.c/salinfo.c
[IA64] Remove unnecessary semicolons
[IA64] sprintf should not be used with same source & destination address
Nathan reported that
| commit 73d60b7f74
| Author: Yinghai Lu <yinghai@kernel.org>
| Date: Tue Jun 16 15:33:00 2009 -0700
|
| page-allocator: clear N_HIGH_MEMORY map before we set it again
|
| SRAT tables may contains nodes of very small size. The arch code may
| decide to not activate such a node. However, currently the early boot
| code sets N_HIGH_MEMORY for such nodes. These nodes therefore seem to be
| active although these nodes have no present pages.
|
| For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too
unintentionally and incorrectly clears the cpuset.mems cgroup attribute on
an i386 kvm guest, meaning that cpuset.mems can not be used.
Fix this by only clearing node_states[N_NORMAL_MEMORY] for 64bit only.
and need to do save/restore for that in find_zone_movable_pfn
Reported-by: Nathan Lynch <ntl@pobox.com>
Tested-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
alpha percpu access requires custom SHIFT_PERCPU_PTR() definition for
modules to work around addressing range limitation. This is done via
generating inline assembly using C preprocessing which forces the
assembler to generate external reference. This happens behind the
compiler's back and makes the compiler think that static percpu variables
in modules are unused.
This used to be worked around by using __unused attribute for percpu
variables which prevent the compiler from omitting the variable; however,
recent declare/definition attribute unification change broke this as
__used can't be used for declaration. Also, in the process,
PER_CPU_ATTRIBUTES definition in alpha percpu.h got broken.
This patch adds PER_CPU_DEF_ATTRIBUTES which is only used for definitions
and make alpha use it to add __used for percpu variables in modules. This
also fixes the PER_CPU_ATTRIBUTES double definition bug.
Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: maximilian attems <max@stro.at>
Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
perfmon.c has a dubious cast directly from "int" to "void *". Add
an intermediate cast to "long" to keep gcc happy.
salinfo.c uses "down_trylock()" in a highly creative way (explained
in the comments in the file) ... but it does kick out this warning:
arch/ia64/kernel/salinfo.c:195: warning: ignoring return value of 'down_trylock'
which people occasionally try to "fix" in ways that do not work. Use some
casts to keep gcc quiet.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>