linux-apfs

mirror of https://github.com/linux-apfs/linux-apfs.git synced 2026-05-01 15:00:59 -07:00

Author	SHA1	Message	Date
Oliver O'Halloran	6670783606	powerpc/sstep: Fix emulation fall-through There is a switch fallthough in instr_analyze() which can cause an invalid instruction to be emulated as a different, valid, instruction. The rld* (opcode 30) case extracts a sub-opcode from bits 3:1 of the instruction word. However, the only valid values of this field are 001 and 000. These cases are correctly handled, but the others are not which causes execution to fall through into case 31. Breaking out of the switch causes the instruction to be marked as unknown and allows the caller to deal with the invalid instruction in a manner consistent with other invalid instructions. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:08 +10:00
Lennart Sorensen	dd21731022	powerpc/sstep: Fix sstep.c compile on powerpcspe Commit `be96f63375` ("powerpc: Split out instruction analysis part of emulate_step()") introduced ldarx and stdcx into the instructions in sstep.c, which are not accepted by the assembler on powerpcspe, but does seem to be accepted by the normal powerpc assembler even in 32 bit mode. Wrap these two instructions in a __powerpc64__ check like it is everywhere else in the file. Fixes: `be96f63375` ("powerpc: Split out instruction analysis part of emulate_step()") Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:07 +10:00
Daniel Axtens	8fe088850f	powerpc: rework sparse for lib/xor_vmx.c Sparse doesn't seem to be passing -maltivec around properly, leading to lots of errors: .../include/altivec.h:34:2: error: Use the "-maltivec" flag to enable PowerPC AltiVec support arch/powerpc/lib/xor_vmx.c:27:16: error: Expected ; at end of declaration arch/powerpc/lib/xor_vmx.c:27:16: error: got signed arch/powerpc/lib/xor_vmx.c:60:9: error: No right hand side of '*'-expression arch/powerpc/lib/xor_vmx.c:60:9: error: Expected ; at end of statement arch/powerpc/lib/xor_vmx.c:60:9: error: got v1_in ... arch/powerpc/lib/xor_vmx.c:87:9: error: too many errors Only include the altivec.h header for non-__CHECKER__ builds. For builds with __CHECKER__, make up some stubs instead, as suggested by Balbir. (The vector size of 16 is arbitrary.) Suggested-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Tested-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-04-27 09:33:37 +10:00
Michael Ellerman	b4c6afdc3a	powerpc: Make generic_memcpy() private to copy_32.S generic_memcpy() is only called from copy_32.S, so there's no reason for it to be global. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-04-11 20:30:41 +10:00
Michael Ellerman	a1b5344620	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next Freescale updates from Scott: "Highlights include 8xx optimizations, 32-bit checksum optimizations, 86xx consolidation, e5500/e6500 cpu hotplug, more fman and other dt bits, and minor fixes/cleanup."	2016-03-14 20:05:14 +11:00
Christophe Leroy	7e393220b6	powerpc: optimise csum_partial() call when len is constant csum_partial is often called for small fixed length packets for which it is suboptimal to use the generic csum_partial() function. For instance, in my configuration, I got: * One place calling it with constant len 4 * Seven places calling it with constant len 8 * Three places calling it with constant len 14 * One place calling it with constant len 20 * One place calling it with constant len 24 * One place calling it with constant len 32 This patch renames csum_partial() to __csum_partial() and implements csum_partial() as a wrapper inline function which * uses csum_add() for small 16bits multiple constant length * uses ip_fast_csum() for other 32bits multiple constant * uses __csum_partial() in all other cases Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>	2016-03-09 10:44:18 -06:00
Torsten Duwe	9a7841ae8d	powerpc/ftrace: Use $(CC_FLAGS_FTRACE) when disabling ftrace Rather than open-coding -pg whereever we want to disable ftrace, use the existing $(CC_FLAGS_FTRACE) variable. This has the advantage that it will work in future when we use a different set of flags to enable ftrace. Signed-off-by: Torsten Duwe <duwe@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-03-07 14:53:55 +11:00
Christophe Leroy	f867d556dd	powerpc32: optimise csum_partial() loop On the 8xx, load latency is 2 cycles and taking branches also takes 2 cycles. So let's unroll the loop. This patch improves csum_partial() speed by around 10% on both: * 8xx (single issue processor with parallel execution) * 83xx (superscalar 6xx processor with dual instruction fetch and parallel execution) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>	2016-03-04 23:03:45 -06:00
Christophe Leroy	48821a34b1	powerpc32: optimise a few instructions in csum_partial() r5 does contain the value to be updated, so lets use r5 all way long for that. It makes the code more readable. To avoid confusion, it is better to use adde instead of addc The first addition is useless. Its only purpose is to clear carry. As r4 is a signed int that is always positive, this can be done by using srawi instead of srwi Let's also remove the comment about bdnz having no overhead as it is not correct on all powerpc, at least on MPC8xx In the last part, in our situation, the remaining quantity of bytes to be proceeded is between 0 and 3. Therefore, we can base that part on the value of bit 31 and bit 30 of r4 instead of anding r4 with 3 then proceding on comparisons and substractions. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>	2016-03-04 23:00:52 -06:00
Christophe Leroy	7aef413656	powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user() csum_partial_copy_generic() does the same as copy_tofrom_user and also calculates the checksum during the copy. Unlike copy_tofrom_user(), the existing version of csum_partial_copy_generic() doesn't take benefit of the cache. This patch is a rewrite of csum_partial_copy_generic() based on copy_tofrom_user(). The previous version of csum_partial_copy_generic() was handling errors. Now we have the checksum wrapper functions to handle the error case like in powerpc64 so we can make the error case simple: just return -EFAULT. copy_tofrom_user() only has r12 available => we use it for the checksum r7 and r8 which contains pointers to error feedback are used, so we stack them. On a TCP benchmark using socklib on the loopback interface on which checksum offload and scatter/gather have been deactivated, we get about 20% performance increase. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>	2016-03-04 22:53:27 -06:00
Christophe Leroy	37e08cad8f	powerpc: inline ip_fast_csum() In several architectures, ip_fast_csum() is inlined There are functions like ip_send_check() which do nothing much more than calling ip_fast_csum(). Inlining ip_fast_csum() allows the compiler to optimise better Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [scottwood: whitespace and cast fixes] Signed-off-by: Scott Wood <oss@buserror.net>	2016-03-04 21:49:49 -06:00
Christophe Leroy	03bc8b0fc8	powerpc32: checksum_wrappers_64 becomes checksum_wrappers The powerpc64 checksum wrapper functions adds csum_and_copy_to_user() which otherwise is implemented in include/net/checksum.h by using csum_partial() then copy_to_user() Those two wrapper fonctions are also applicable to powerpc32 as it is based on the use of csum_partial_copy_generic() which also exists on powerpc32 This patch renames arch/powerpc/lib/checksum_wrappers_64.c to arch/powerpc/lib/checksum_wrappers.c and makes it non-conditional to CONFIG_WORD_SIZE Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>	2016-03-04 21:47:47 -06:00
Christophe Leroy	e0f82bdf2d	powerpc: unexport csum_tcpudp_magic csum_tcpudp_magic is now an inline function, so there is nothing to export Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>	2016-03-04 21:47:22 -06:00
Anton Blanchard	dc4fbba11e	powerpc: Create disable_kernel_{fp,altivec,vsx,spe}() The enable_kernel_*() functions leave the relevant MSR bits enabled until we exit the kernel sometime later. Create disable versions that wrap the kernel use of FP, Altivec VSX or SPE. While we don't want to disable it normally for performance reasons (MSR writes are slow), it will be used for a debug boot option that does this and catches bad uses in other areas of the kernel. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-12-01 13:52:25 +11:00
LEROY Christophe	400c47d81c	powerpc32: memset: only use dcbz once cache is enabled memset() uses instruction dcbz to speed up clearing by not wasting time loading cache line with data that will be overwritten. Some platform like mpc52xx do no have cache active at startup and can therefore not use memset(). Allthough no part of the code explicitly uses memset(), GCC may make calls to it. This patch modifies memset() such that at startup, memset() unconditionally skip the optimised bloc that uses dcbz instruction. Once the initial MMU is set up, in machine_init() we patch memset() by replacing this inconditional jump by a NOP Tested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-09-17 10:36:53 +10:00
LEROY Christophe	1cd03890ea	powerpc32: memcpy: only use dcbz once cache is enabled memcpy() uses instruction dcbz to speed up copy by not wasting time loading cache line with data that will be overwritten. Some platform like mpc52xx do no have cache active at startup and can therefore not use memcpy(). Allthough no part of the code explicitly uses memcpy(), GCC makes calls to it. This patch modifies memcpy() such that at startup, memcpy() unconditionally jumps to generic_memcpy() which doesn't use the dcbz instruction. Once the initial MMU is set up, in machine_init() we patch memcpy() by replacing this inconditional jump by a NOP Reported-by: Michal Sojka <sojkam1@fel.cvut.cz> Tested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-09-17 10:36:44 +10:00
LEROY Christophe	295ffb4189	powerpc/32: Few optimisations in memcpy This patch adds a few optimisations in memcpy functions by using lbzu/stbu instead of lxb/stb and by re-ordering insn inside a loop to reduce latency due to loading Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <scottwood@freescale.com>	2015-08-07 22:59:29 -05:00
LEROY Christophe	0b05e2d671	powerpc/32: cacheable_memcpy becomes memcpy cacheable_memcpy uses dcbz instruction and is more efficient than memcpy when the destination is in RAM. If the destination is in an io area, memcpy_toio() is normally used, not memcpy This patch renames memcpy as generic_memcpy, and renames cacheable_memcpy as memcpy On MPC885, we get approximatly 7% increase of the transfer rate on an FTP reception Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <scottwood@freescale.com>	2015-08-07 22:59:27 -05:00
LEROY Christophe	c152f149ce	powerpc/32: Merge the new memset() with the old one cacheable_memzero() which has become the new memset() and the old memset() are quite similar, so just merge them. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <scottwood@freescale.com>	2015-08-07 22:59:24 -05:00
LEROY Christophe	5b2a32e806	powerpc/32: memset(0): use cacheable_memzero cacheable_memzero uses dcbz instruction and is more efficient than memset(0) when the destination is in RAM This patch renames memset as generic_memset, and defines memset as a prolog to cacheable_memzero. This prolog checks if the byte to set is 0. If not, it falls back to generic_memcpy() cacheable_memzero disappears as it is not referenced anywhere anymore Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <scottwood@freescale.com>	2015-08-07 22:59:21 -05:00
LEROY Christophe	df087e450d	Partially revert "powerpc: Remove duplicate cacheable_memcpy/memzero functions" This partially reverts commit 'powerpc: Remove duplicate cacheable_memcpy/memzero functions ("b05ae4ee602b7dc90771408ccf0972e1b3801a35")' Functions cacheable_memcpy/memzero are more efficient than memcpy/memset as they use the dcbz instruction which avoids refill of the cacheline with the data that we will overwrite. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <scottwood@freescale.com>	2015-08-07 22:59:21 -05:00
LEROY Christophe	92c985f1d7	powerpc: put csum_tcpudp_magic inline csum_tcpudp_magic() is only a few instructions, and does modify really few registers. So it is not worth having it as a separate function and suffer function branching and saving of volatile registers. This patch makes it inline by use of the already existing csum_tcpudp_nofold() function. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <scottwood@freescale.com>	2015-08-07 22:59:19 -05:00
Linus Torvalds	08d183e3c1	Merge tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc updates from Michael Ellerman: - disable the 32-bit vdso when building LE, so we can build with a 64-bit only toolchain. - EEH fixes from Gavin & Richard. - enable the sys_kcmp syscall from Laurent. - sysfs control for fastsleep workaround from Shreyas. - expose OPAL events as an irq chip by Alistair. - MSI ops moved to pci_controller_ops by Daniel. - fix for kernel to userspace backtraces for perf from Anton. - merge pseries and pseries_le defconfigs from Cyril. - CXL in-kernel API from Mikey. - OPAL prd driver from Jeremy. - fix for DSCR handling & tests from Anshuman. - Powernv flash mtd driver from Cyril. - dynamic DMA Window support on powernv from Alexey. - LLVM clang fixes & workarounds from Anton. - reworked version of the patch to abort syscalls when transactional. - fix the swap encoding to support 4TB, from Aneesh. - various fixes as usual. - Freescale updates from Scott: Highlights include more 8xx optimizations, an e6500 hugetlb optimization, QMan device tree nodes, t1024/t1023 support, and various fixes and cleanup. * tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (180 commits) cxl: Fix typo in debug print cxl: Add CXL_KERNEL_API config option powerpc/powernv: Fix wrong IOMMU table in pnv_ioda_setup_bus_dma() powerpc/mm: Change the swap encoding in pte. powerpc/mm: PTE_RPN_MAX is not used, remove the same powerpc/tm: Abort syscalls in active transactions powerpc/iommu/ioda2: Enable compile with IOV=on and IOMMU_API=off powerpc/include: Add opal-prd to installed uapi headers powerpc/powernv: fix construction of opal PRD messages powerpc/powernv: Increase opal-irqchip initcall priority powerpc: Make doorbell check preemption safe powerpc/powernv: pnv_init_idle_states() should only run on powernv macintosh/nvram: Remove as unused powerpc: Don't use gcc specific options on clang powerpc: Don't use -mno-strict-align on clang powerpc: Only use -mtraceback=no, -mno-string and -msoft-float if toolchain supports it powerpc: Only use -mabi=altivec if toolchain supports it powerpc: Fix duplicate const clang warning in user access code vfio: powerpc/spapr: Support Dynamic DMA windows vfio: powerpc/spapr: Register memory and define IOMMU v2 ...	2015-06-24 08:46:32 -07:00
Anton Blanchard	1fb3f5a7ca	powerpc: Only use -mabi=altivec if toolchain supports it The -mabi=altivec option is not recognised on LLVM, so use call cc-option to check for support. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 17:33:05 +10:00
David Hildenbrand	5f76eea88d	sched/preempt, powerpc: Disable preemption in enable_kernel_altivec() explicitly enable_kernel_altivec() has to be called with disabled preemption. Let's make this explicit, to prepare for pagefault_disable() not touching preemption anymore. Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: airlied@linux.ie Cc: akpm@linux-foundation.org Cc: bigeasy@linutronix.de Cc: borntraeger@de.ibm.com Cc: daniel.vetter@intel.com Cc: heiko.carstens@de.ibm.com Cc: herbert@gondor.apana.org.au Cc: hocko@suse.cz Cc: hughd@google.com Cc: mst@redhat.com Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: schwidefsky@de.ibm.com Cc: yang.shi@windriver.com Link: http://lkml.kernel.org/r/1431359540-32227-14-git-send-email-dahi@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 08:39:17 +02:00

1 2 3 4 5 ...

242 Commits