And other message logging neatening.
Other miscellanea:
o coalesce formats
o realign arguments
o standardize a couple of macros
o use __func__ instead of embedding the function name
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Pull percpu consistent-ops changes from Tejun Heo:
"Way back, before the current percpu allocator was implemented, static
and dynamic percpu memory areas were allocated and handled separately
and had their own accessors. The distinction has been gone for many
years now; however, the now duplicate two sets of accessors remained
with the pointer based ones - this_cpu_*() - evolving various other
operations over time. During the process, we also accumulated other
inconsistent operations.
This pull request contains Christoph's patches to clean up the
duplicate accessor situation. __get_cpu_var() uses are replaced with
with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().
Unfortunately, the former sometimes is tricky thanks to C being a bit
messy with the distinction between lvalues and pointers, which led to
a rather ugly solution for cpumask_var_t involving the introduction of
this_cpu_cpumask_var_ptr().
This converts most of the uses but not all. Christoph will follow up
with the remaining conversions in this merge window and hopefully
remove the obsolete accessors"
* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
irqchip: Properly fetch the per cpu offset
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
Revert "powerpc: Replace __get_cpu_var uses"
percpu: Remove __this_cpu_ptr
clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
sparc: Replace __get_cpu_var uses
avr32: Replace __get_cpu_var with __this_cpu_write
blackfin: Replace __get_cpu_var uses
tile: Use this_cpu_ptr() for hardware counters
tile: Replace __get_cpu_var uses
powerpc: Replace __get_cpu_var uses
alpha: Replace __get_cpu_var
ia64: Replace __get_cpu_var uses
s390: cio driver &__get_cpu_var replacements
s390: Replace __get_cpu_var uses
mips: Replace __get_cpu_var uses
MIPS: Replace __get_cpu_var uses in FPU emulator.
arm: Replace __this_cpu_ptr with raw_cpu_ptr
...
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x). This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.
Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.
__get_cpu_var() is defined as :
#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.
this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.
This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less registers
are used when code is generated.
At the end of the patch set all uses of __get_cpu_var have been removed so
the macro is removed too.
The patch set includes passes over all arches as well. Once these operations
are used throughout then specialized macros can be defined in non -x86
arches as well in order to optimize per cpu access by f.e. using a global
register that may be set to the per cpu base.
Transformations done to __get_cpu_var()
1. Determine the address of the percpu instance of the current processor.
DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(&y);
2. Same as #1 but this time an array structure is involved.
DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(y);
3. Retrieve the content of the current processors instance of a per cpu
variable.
DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)
Converts to
int x = __this_cpu_read(y);
4. Retrieve the content of a percpu struct
DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);
Converts to
memcpy(&x, this_cpu_ptr(&y), sizeof(x));
5. Assignment to a per cpu variable
DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;
Converts to
__this_cpu_write(y, x);
6. Increment/Decrement etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
__this_cpu_inc(y)
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull arch/tile changes from Chris Metcalf:
"These mostly just address smaller issues reported to me"
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
arch: tile: kernel: unaligned.c: Cleaning up uninitialized variables
drivers/tty/hvc/hvc_tile.c: use PTR_ERR_OR_ZERO
replace strict_strto* call with kstrto*
tile: Update comments for generic idle conversion
tile: cleanup the comment in init_pgprot
tile: use BOOTMEM_DEFAULT instead of magic number 0 for reserve_bootmem flags
Currently hugepage migration is available for all archs which support
pmd-level hugepage, but testing is done only for x86_64 and there're
bugs for other archs. So to avoid breaking such archs, this patch
limits the availability strictly to x86_64 until developers of other
archs get interested in enabling this feature.
Simply disabling hugepage migration on non-x86_64 archs is not enough to
fix the reported problem where sys_move_pages() hits the BUG_ON() in
follow_page(FOLL_GET), so let's fix this by checking if hugepage
migration is supported in vma_migratable().
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
remove obsolete calls to strict_strto* and replace them
with kstrto* calls accordingly.
Signed-off-by: Daniel Walter <dwalter@google.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
In tile vmlinux, the rodata area start after the _sdata.
The rodata area is included between [_sdata, __end_rodata),
and is handled at an earlier point.
The page walk starts at __end_rodata, not _sdata.
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
A config option to allow a variant vmap() using huge pages that was never
upstreamed had some bits of code related to it scattered around the tile
architecture; the config option was removed downstream and this commit
cleans up the scattered evidence of it from the upstream as well.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Unlike global OOM handling, memory cgroup code will invoke the OOM killer
in any OOM situation because it has no way of telling faults occuring in
kernel context - which could be handled more gracefully - from
user-triggered faults.
Pass a flag that identifies faults originating in user space from the
architecture-specific fault handlers to generic code so that memcg OOM
handling can be improved.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The memcg code can trap tasks in the context of the failing allocation
until an OOM situation is resolved. They can hold all kinds of locks
(fs, mm) at this point, which makes it prone to deadlocking.
This series converts memcg OOM handling into a two step process that is
started in the charge context, but any waiting is done after the fault
stack is fully unwound.
Patches 1-4 prepare architecture handlers to support the new memcg
requirements, but in doing so they also remove old cruft and unify
out-of-memory behavior across architectures.
Patch 5 disables the memcg OOM handling for syscalls, readahead, kernel
faults, because they can gracefully unwind the stack with -ENOMEM. OOM
handling is restricted to user triggered faults that have no other
option.
Patch 6 reworks memcg's hierarchical OOM locking to make it a little
more obvious wth is going on in there: reduce locked regions, rename
locking functions, reorder and document.
Patch 7 implements the two-part OOM handling such that tasks are never
trapped with the full charge stack in an OOM situation.
This patch:
Back before smart OOM killing, when faulting tasks were killed directly on
allocation failures, the arch-specific fault handlers needed special
protection for the init process.
Now that all fault handlers call into the generic OOM killer (see commit
609838cfed: "mm: invoke oom-killer from remaining unconverted page
fault handlers"), which already provides init protection, the
arch-specific leftovers can be removed.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently hugepage migration works well only for pmd-based hugepages
(mainly due to lack of testing,) so we had better not enable migration of
other levels of hugepages until we are ready for it.
Some users of hugepage migration (mbind, move_pages, and migrate_pages) do
page table walk and check pud/pmd_huge() there, so they are safe. But the
other users (softoffline and memory hotremove) don't do this, so without
this patch they can try to migrate unexpected types of hugepages.
To prevent this, we introduce hugepage_migration_support() as an
architecture dependent check of whether hugepage are implemented on a pmd
basis or not. And on some architecture multiple sizes of hugepages are
available, so hugepage_migration_support() also checks hugepage size.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This was really only useful for TILE64 when we mapped the
kernel data with small pages. Now we use a huge page and we
really don't want to map different parts of the kernel
data in different ways.
We retain the __write_once name in case we want to bring
it back to life at some point in the future.
Note that this change uncovered a latent bug where the
"smp_topology" variable happened to always be aligned mod 8
so we could store two "int" values at once, but when we
eliminated __write_once it ended up only aligned mod 4.
Fix with an explicit annotation.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
This chip is no longer being actively developed for (it was superceded
by the TILEPro64 in 2008), and in any case the existing compiler and
toolchain in the community do not support it. It's unlikely that the
kernel works with TILE64 at this point as the configuration has not been
tested in years. The support is also awkward as it requires maintaining
a significant number of ifdefs. So, just remove it altogether.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
We use virt_to_pte(NULL, va) a lot, which isn't very obvious.
I added virt_to_kpte(va) as a more obvious wrapper function,
that also validates the va as being a kernel adddress.
And, I fixed the semantics of virt_to_pte() so that we handle
the pud and pmd the same way, and we now document the fact that
we handle the final pte level differently.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
The existing code relied on the hardware definition (<arch/chip.h>)
to specify how much VA and PA space was available. It's convenient
to allow customizing this for some configurations, so provide symbols
MAX_PA_WIDTH and MAX_VA_WIDTH in <asm/page.h> that can be modified
if desired.
Additionally, move away from the MEM_XX_INTRPT nomenclature to
define the start of various regions within the VA space. In fact
the cleaner symbol is, for example, MEM_SV_START, to indicate the
start of the area used for supervisor code; the actual address of the
interrupt vectors is not as important, and can be changed if desired.
As part of this change, convert from "intrpt1" nomenclature (which
built in the old privilege-level 1 model) to a simple "intrpt".
Also strip out some tilepro-specific code supporting modifying the
PL the kernel could run at, since we don't actually support using
different PLs in tilepro, only tilegx.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Technically, user privilege is anything less than kernel
privilege. We modify the existing user_mode() macro to have
this semantic (and use it in a couple of places it wasn't being
used before), and add an IS_KERNEL_EX1() macro to the assembly
code as well.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
This tile-specific API had a minor bug, in that if a super huge (>4GB)
page mapped a particular address range, we wouldn't handle it correctly.
As part of fixing that bug, I also cleaned up some of the pud and pmd
accessors to make them more consistent.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Nothing in the codebase was using them, and as written they took
"unsigned long" as the physical address rather than "phys_addr_t",
which is wrong on tilepro anyway. Rather than fixing stale APIs,
just remove them; if there's ever demand for them on this platform,
we can put them back.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
With this change, tile Linux now supports address-space layout
randomization for shared objects, stack, heap and vdso.
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Tony Lu <zlu@tilera.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
This change adds infrastructure (CONFIG_TILE_HVGLUE_TRACE) that
provides C code wrappers for the calls the kernel makes to the Tilera
hypervisor. This allows standard kernel infrastructure like FTRACE to
be able to instrument hypervisor calls.
To allow direct calls to the true API, we export their names with a
leading underscore as well. This is important for the few contexts
where we need to make hypervisor calls without touching the stack.
As part of this change, we also switch from creating the symbols
with linker magic to creating them with assembler magic. This lets
us provide a symbol type and generally make them appear more as symbols
and less as just random values in the Elf namespace.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>