Commit Graph

48 Commits

Author SHA1 Message Date
Linus Torvalds
ec0d7f18ab Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull fpu state cleanups from Ingo Molnar:
 "This tree streamlines further aspects of FPU handling by eliminating
  the prepare_to_copy() complication and moving that logic to
  arch_dup_task_struct().

  It also fixes the FPU dumps in threaded core dumps, removes and old
  (and now invalid) assumption plus micro-optimizes the exit path by
  avoiding an FPU save for dead tasks."

Fixed up trivial add-add conflict in arch/sh/kernel/process.c that came
in because we now do the FPU handling in arch_dup_task_struct() rather
than the legacy (and now gone) prepare_to_copy().

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, fpu: drop the fpu state during thread exit
  x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state()
  coredump: ensure the fpu state is flushed for proper multi-threaded core dump
  fork: move the real prepare_to_copy() users to arch_dup_task_struct()
2012-05-23 10:59:07 -07:00
Suresh Siddha
55ccf3fe3f fork: move the real prepare_to_copy() users to arch_dup_task_struct()
Historical prepare_to_copy() is mostly a no-op, duplicated for majority of
the architectures and the rest following the x86 model of flushing the extended
register state like fpu there.

Remove it and use the arch_dup_task_struct() instead.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1336692811-30576-1-git-send-email-suresh.b.siddha@intel.com
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-05-16 15:16:26 -07:00
Filippo Arcidiacono
5d920bb929 sh: initial stack protector support.
This implements basic -fstack-protector support, based on the early ARM
version in c743f38013. The SMP case is
limited to the initial canary value, while the UP case handles per-task
granularity (limited to 32-bit sh until a new enough sh64 compiler
manifests itself).

Signed-off-by: Filippo Arcidiacono <filippo.arcidiacono@st.com>
Reviewed-by: Carmelo Amoroso <carmelo.amoroso@st.com>
Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-04-19 15:45:57 +09:00
Paul Mundt
f03c4866d3 sh: fix up fallout from system.h disintegration.
Quite a bit of fallout all over the place, nothing terribly exciting.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-03-30 19:29:57 +09:00
David Howells
e839ca5287 Disintegrate asm/system.h for SH
Disintegrate asm/system.h for SH.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-sh@vger.kernel.org
2012-03-28 18:30:03 +01:00
Joe Perches
ff2d8b19a3 treewide: convert uses of ATTRIB_NORETURN to __noreturn
Use the more commonly used __noreturn instead of ATTRIB_NORETURN.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:03 -08:00
Mathias Krause
201fbceb25 sh, exec: remove redundant set_fs(USER_DS)
The address limit is already set in flush_old_exec() so those calls to
set_fs(USER_DS) are redundant.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2011-06-14 15:15:58 +09:00
Paul Mundt
0f0ebd980e sh: arch/sh/kernel/process_32.c needs linux/prefetch.h.
Trivial build fix for certain configurations that don't grab
linux/prefetch.h via alternate means (specifically SH-2 and SH-3 parts).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2011-05-24 17:25:23 +09:00
David Howells
d7627467b7 Make do_execve() take a const filename pointer
Make do_execve() take a const filename pointer so that kernel_execve() compiles
correctly on ARM:

arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type

This also requires the argv and envp arguments to be consted twice, once for
the pointer array and once for the strings the array points to.  This is
because do_execve() passes a pointer to the filename (now const) to
copy_strings_kernel().  A simpler alternative would be to cast the filename
pointer in do_execve() when it's passed to copy_strings_kernel().

do_execve() may not change any of the strings it is passed as part of the argv
or envp lists as they are some of them in .rodata, so marking these strings as
const should be fine.

Further kernel_execve() and sys_execve() need to be changed to match.

This has been test built on x86_64, frv, arm and mips.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-17 18:07:43 -07:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Paul Mundt
9d56dd3b08 sh: Mass ctrl_in/outX to __raw_read/writeX conversion.
The old ctrl in/out routines are non-portable and unsuitable for
cross-platform use. While drivers/sh has already been sanitized, there
is still quite a lot of code that is not. This converts the arch/sh/ bits
over, which permits us to flag the routines as deprecated whilst still
building with -Werror for the architecture code, and to ensure that
future users are not added.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-01-26 12:58:40 +09:00
Paul Mundt
fbb82b0365 sh: machine_ops based reboot support.
This provides a machine_ops-based reboot interface loosely cloned from
x86, and converts the native sh32 and sh64 cases over to it.

Necessary both for tying in SMP support and also enabling platforms like
SDK7786 to add support for their microcontroller-based power managers.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-01-20 16:42:52 +09:00
Paul Mundt
644755e786 Merge branches 'sh/xstate', 'sh/hw-breakpoints' and 'sh/stable-updates' 2010-01-13 13:02:55 +09:00
Paul Mundt
0ea820cf9b sh: Move over to dynamically allocated FPU context.
This follows the x86 xstate changes and implements a task_xstate slab
cache that is dynamically sized to match one of hard FP/soft FP/FPU-less.

This also tidies up and consolidates some of the SH-2A/SH-4 FPU
fragmentation. Now fpu state restorers are commonly defined, with the
init_fpu()/fpu_init() mess reworked to follow the x86 convention.
The fpu_init() register initialization has been replaced by xstate setup
followed by writing out to hardware via the standard restore path.

As init_fpu() now performs a slab allocation a secondary lighterweight
restorer is also introduced for the context switch.

In the future the DSP state will be rolled in here, too.

More work remains for math emulation and the SH-5 FPU, which presently
uses its own special (UP-only) interfaces.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-01-13 12:51:40 +09:00
Paul Mundt
70e068eef9 sh: Move start_thread() out of line.
start_thread() will become a bit heavier with the xstate freeing to be
added in, so move it out-of-line in preparation.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-01-12 18:52:00 +09:00
Paul Mundt
7025bec912 sh: Kill off dead UBC headers.
Nothing is using these now, so kill them all off.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-01-05 19:16:35 +09:00
Paul Mundt
6424db52e2 Merge branch 'master' into sh/hw-breakpoints
Conflict between FPU thread flag migration and debug
thread flag addition.

Conflicts:
	arch/sh/include/asm/thread_info.h
	arch/sh/include/asm/ubc.h
	arch/sh/kernel/process_32.c
2009-12-08 15:47:12 +09:00
Paul Mundt
09a0729477 sh: hw-breakpoints: Add preliminary support for SH-4A UBC.
This adds preliminary support for the SH-4A UBC to the hw-breakpoints API.
Presently only a single channel is implemented, and the ptrace interface
still needs to be converted. This is the first step to cleaning up the
long-standing UBC mess, making the UBC more generally accessible, and
finally making it SMP safe.

An additional abstraction will be layered on top of this as with the perf
events code to permit the various CPU families to wire up support for
their own specific UBCs, as many variations exist.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-12-08 15:02:27 +09:00
Paul Mundt
6ba653830c sh: Fix up the FPU emulation build.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-11-25 12:07:31 +09:00
Stuart Menefy
d3ea9fa0a5 sh: Minor optimisations to FPU handling
A number of small optimisations to FPU handling, in particular:

 - move the task USEDFPU flag from the thread_info flags field (which
   is accessed asynchronously to the thread) to a new status field,
   which is only accessed by the thread itself. This allows locking to
   be removed in most cases, or can be reduced to a preempt_lock().
   This mimics the i386 behaviour.

 - move the modification of regs->sr and thread_info->status flags out
   of save_fpu() to __unlazy_fpu(). This gives the compiler a better
   chance to optimise things, as well as making save_fpu() symmetrical
   with restore_fpu() and init_fpu().

 - implement prepare_to_copy(), so that when creating a thread, we can
   unlazy the FPU prior to copying the thread data structures.

Also make sure that the FPU is disabled while in the kernel, in
particular while booting, and for newly created kernel threads,

In a very artificial benchmark, the execution time for 2500000
context switches was reduced from 50 to 45 seconds.

Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-11-24 17:45:38 +09:00
Paul Mundt
49fb2cd257 Merge branch 'master' into sh/st-integration 2009-11-24 16:32:11 +09:00
Giuseppe CAVALLARO
a0458b07c1 sh: add sleazy FPU optimization
sh port of the sLeAZY-fpu feature currently implemented for some architectures
such us i386.

Right now the SH kernel has a 100% lazy fpu behaviour.
This is of course great for applications that have very sporadic or no FPU use.
However for very frequent FPU users...  you take an extra trap every context
switch.
The patch below adds a simple heuristic to this code: after 5 consecutive
context switches of FPU use, the lazy behavior is disabled and the context
gets restored every context switch.
After 256 switches, this is reset and the 100% lazy behavior is returned.

Tests with LMbench showed no regression.
I saw a little improvement due to the prefetching (~2%).

The tests below also show that, with this sLeazy patch, indeed,
the number of FPU exceptions is reduced.
To test this. I hacked the lat_ctx LMBench to use the FPU a little more.

   sLeasy implementation
   ===========================================
   switch_to calls            |  79326
   sleasy   calls             |  42577
   do_fpu_state_restore  calls|  59232
   restore_fpu   calls        |  59032

   Exceptions:  0x800 (FPU disabled  ): 16604

   100% Leazy (default implementation)
   ===========================================
   switch_to  calls            |  79690
   do_fpu_state_restore calls  |  53299
   restore_fpu  calls          |   53101

   Exceptions: 0x800 (FPU disabled  ):  53273

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-11-24 16:23:38 +09:00
Paul Mundt
4c978ca319 sh: Clean up more superfluous symbol exports.
Many of these symbols went away completely, or we just never cared about
them in the first place. Trim the exports down to the essential set.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-10-27 11:51:19 +09:00
Jon Frosdick
b46373e0d4 sh: Use internal watchdog timer to perform reset
This patches will trigger a reboot using the watchdog
timer instead of double fault.  Unlike the previous
method, this one actually works in 32 bit mode.

Reset should also be cleaner.

Signed-off-by: Jon Frosdick <jon.frosdick@st.com>
Signed-off-by: Carl Shaw <carl.shaw@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-08-24 16:20:44 +09:00
Matt Fleming
7816fecd03 sh: Mark __switch_to() as __notrace_funcgraph
Annotate __switch_to() so that the function graph tracer does not try to
trace it. Use __notrace_funcgraph, as opposed to notrace, so that other
tracers can continue to trace __switch_to().

The reason that we don't want to trace __switch_to() with the function
graph tracer is because of how the return address stack in task_struct
is implemented. When we enter __switch_to we store the real return
address on prev's ret_stack. When we return from __switch_to() we've
patched the return address on the kernel stack to be
return_to_handler. Calling return_to_handler we do,

       -> ftrace_return_to_handler()
       	  -> ftrace_pop_return_ftrace()

Which tries to pop the real return address from current->ret_stack. The
problem being that we stored the return address on prev->ret_stack, but
current now points to next, and next->ret_stack doesn't contain the
correct return address (and is possibly even empty).

Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-07-11 10:08:06 +09:00