memmove may be called from module code copy_pages(btrfs), and it may
call memcpy, which may call back to C code, so it needs to use
_GLOBAL_TOC to set up r2 correctly.
This fixes following error when I tried to boot an le guest:
Vector: 300 (Data Access) at [c000000073f97210]
pc: c000000000015004: enable_kernel_altivec+0x24/0x80
lr: c000000000058fbc: enter_vmx_copy+0x3c/0x60
sp: c000000073f97490
msr: 8000000002009033
dar: d000000001d50170
dsisr: 40000000
current = 0xc0000000734c0000
paca = 0xc00000000fff0000 softe: 0 irq_happened: 0x01
pid = 815, comm = mktemp
enter ? for help
[c000000073f974f0] c000000000058fbc enter_vmx_copy+0x3c/0x60
[c000000073f97510] c000000000057d34 memcpy_power7+0x274/0x840
[c000000073f97610] d000000001c3179c copy_pages+0xfc/0x110 [btrfs]
[c000000073f97660] d000000001c3c248 memcpy_extent_buffer+0xe8/0x160 [btrfs]
[c000000073f97700] d000000001be4be8 setup_items_for_insert+0x208/0x4a0 [btrfs]
[c000000073f97820] d000000001be50b4 btrfs_insert_empty_items+0xf4/0x140 [btrfs]
[c000000073f97890] d000000001bfed30 insert_with_overflow+0x70/0x180 [btrfs]
[c000000073f97900] d000000001bff174 btrfs_insert_dir_item+0x114/0x2f0 [btrfs]
[c000000073f979a0] d000000001c1f92c btrfs_add_link+0x10c/0x370 [btrfs]
[c000000073f97a40] d000000001c20e94 btrfs_create+0x204/0x270 [btrfs]
[c000000073f97b00] c00000000026d438 vfs_create+0x178/0x210
[c000000073f97b50] c000000000270a70 do_last+0x9f0/0xe90
[c000000073f97c20] c000000000271010 path_openat+0x100/0x810
[c000000073f97ce0] c000000000272ea8 do_filp_open+0x58/0xd0
[c000000073f97dc0] c00000000025ade8 do_sys_open+0x1b8/0x300
[c000000073f97e30] c00000000000a008 syscall_exit+0x0/0x7c
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This fixes some bugs in emulate_step(). First, the setting of the carry
bit for the arithmetic right-shift instructions was not correct on 64-bit
machines because we were masking with a mask of type int rather than
unsigned long. Secondly, the sld (shift left doubleword) instruction was
using the wrong instruction field for the register containing the shift
count.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit cd64d1697c ("powerpc: mtmsrd not defined") added a check for
CONFIG_PPC_CPU were a check for CONFIG_PPC_FPU was clearly intended.
Fixes: cd64d1697c ("powerpc: mtmsrd not defined")
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
__clear_user and copy_page load from the TOC and are also exported
to modules. This means we have to use _GLOBAL_TOC() so that we
create the global entry point that sets up the TOC.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This series adds support for building the powerpc 64-bit
LE kernel using the new ABI v2. We already supported
running ABI v2 userspace programs but this adds support
for building the kernel itself using the new ABI.
Unaligned stores take alignment exceptions on POWER7 running in little-endian.
This is a dumb little-endian base memcpy that prevents unaligned stores.
Once booted the feature fixup code switches over to the VMX copy loops
(which are already endian safe).
The question is what we do before that switch over. The base 64bit
memcpy takes alignment exceptions on POWER7 so we can't use it as is.
Fixing the causes of alignment exception would slow it down, because
we'd need to ensure all loads and stores are aligned either through
rotate tricks or bytewise loads and stores. Either would be bad for
all other 64bit platforms.
[ I simplified the loop a bit - Anton ]
Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If an assembly function that calls back into c code is exported to
modules, we need to ensure r2 is setup correctly. There are only
two places crazy enough to do it (two of which are my fault).
Signed-off-by: Anton Blanchard <anton@samba.org>
Some of the assembler files in lib/ make use of the fact that in the
ELFv1 ABI, the caller guarantees to provide stack space to save the
parameter registers r3 ... r10. This guarantee is no longer present
in ELFv2 for functions that have no variable argument list and no
more than 8 arguments.
Change the affected routines to temporarily store registers in the
red zone and/or the top of their own stack frame (in the space
provided to save r31 .. r29, which is actually not used in these
routines).
In opal_query_takeover, simply always allocate a stack frame;
the routine is not performance critical.
Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
binutils is smart enough to know that a branch to a function
descriptor is actually a branch to the functions text address.
Alan tells me that binutils has been doing this for 9 years.
Signed-off-by: Anton Blanchard <anton@samba.org>
Turn Anton's memcpy / copy_tofrom_user test into something that can
live in tools/testing/selftests.
It requires one turd in arch/powerpc/lib/memcpy_64.S, but it's pretty
harmless IMHO.
We are sailing very close to the wind with the feature macros. We define
them to nothing, which currently means we get a few extra nops and
include the unaligned calls.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
GCC 4.8 now generates out-of-line vr save/restore functions when
optimizing for size. They are needed for the raid6 altivec support.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Merge a pile of fixes that went into the "merge" branch (3.13-rc's) such
as Anton Little Endian fixes.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The powerpc 64-bit __copy_tofrom_user() function uses shifts to handle
unaligned invocations. However, these shifts were designed for
big-endian systems: On little-endian systems, they must shift in the
opposite direction.
This commit relies on the C preprocessor to insert the correct shifts
into the assembly code.
[ This is a rare but nasty LE issue. Most of the time we use the POWER7
optimised __copy_tofrom_user_power7 loop, but when it hits an exception
we fall back to the base __copy_tofrom_user loop. - Anton ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
So that it can be used by other codes. No function change.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add a VMX optimised xor, used primarily for RAID5. On a POWER7 blade
this is a decent win:
32regs : 17932.800 MB/sec
altivec : 19724.800 MB/sec
The bigger gain is when the same test is run in SMT4 mode, as it
would if there was a lot of work going on:
8regs : 8377.600 MB/sec
altivec : 15801.600 MB/sec
I tested this against an array created without the patch, and also
verified it worked as expected on a little endian kernel.
[ Fix !CONFIG_ALTIVEC build -- BenH ]
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch addresses unaligned single precision floating point loads
and stores in the single-step code. The old implementation
improperly treated an 8 byte structure as an array of two 4 byte
words, which is a classic little endian bug.
Signed-off-by: Tom Musta <tmusta@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch modifies the unaligned access routines of the sstep.c
module so that it properly reverses the bytes of storage operands
in the little endian kernel kernel. This is implemented by
breaking an unaligned little endian access into a combination of
single byte accesses plus an overal byte reversal operation.
Signed-off-by: Tom Musta <tmusta@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We need to fix some endian issues in our memcpy code. For now
just enable the generic memcpy routine for little endian builds.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We need to fix some endian issues in our checksum code. For now
just enable the generic checksum routines for little endian builds.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The csum_partial_copy_generic() function saves the PowerPC non-volatile
r14, r15, and r16 registers for the main checksum-and-copy loop.
Unfortunately, it fails to restore them upon error exit from this loop,
which results in silent corruption of these registers in the presumably
rare event of an access exception within that loop.
This commit therefore restores these register on error exit from the loop.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The csum_partial_copy_generic() uses register r7 to adjust the remaining
bytes to process. Unfortunately, r7 also holds a parameter, namely the
address of the flag to set in case of access exceptions while reading
the source buffer. Lacking a quantum implementation of PowerPC, this
commit instead uses register r9 to do the adjusting, leaving r7's
pointer uncorrupted.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We've been keeping that field in thread_struct for a while, it contains
the "limit" of the current stack pointer and is meant to be used for
detecting stack overflows.
It has a few problems however:
- First, it was never actually *used* on 64-bit. Set and updated but
not actually exploited
- When switching stack to/from irq and softirq stacks, it's update
is racy unless we hard disable interrupts, which is costly. This
is fine on 32-bit as we don't soft-disable there but not on 64-bit.
Thus rather than fixing 2 in order to implement 1 in some hypothetical
future, let's remove the code completely from 64-bit. In order to avoid
a clutter of ifdef's, we remove the updates from C code completely
during interrupt stack switching, and instead maintain it from the
asm helper that is used to do the stack switching in the first place.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The stmw instruction was incorrectly decoded as an update form instruction
and thus the RA register was being clobbered.
Also, the utility routine to write memory to unaligned addresses breaks the
operation into smaller aligned accesses but was incorrectly incrementing
the address by only one; it needs to increment the address by the size of
the smaller aligned chunk.
Signed-off-by: Tom Musta <tmusta@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>