The code was wrong in several aspects. The locking order was
inconsistent, the device aquire code did not reset a variable
after a wakeup and the wakeup handling was not working for
applications where multiple chips are sharing a single
hardware controller.
When a hardware controller is available the locking is now
reduced to the hardware controller lock and the waitqueue is
moved to the hardware controller structure in order to avoid
a wake_up_all().
The problem was pointed out by Ben Dooks, who also found the
missing variable reset as main cause for his deadlock problem.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
1. Establish a simple API for process freezing defined in linux/include/sched.h:
frozen(process) Check for frozen process
freezing(process) Check if a process is being frozen
freeze(process) Tell a process to freeze (go to refrigerator)
thaw_process(process) Restart process
frozen_process(process) Process is frozen now
2. Remove all references to PF_FREEZE and PF_FROZEN from all
kernel sources except sched.h
3. Fix numerous locations where try_to_freeze is manually done by a driver
4. Remove the argument that is no longer necessary from two function calls.
5. Some whitespace cleanup
6. Clear potential race in refrigerator (provides an open window of PF_FREEZE
cleared before setting PF_FROZEN, recalc_sigpending does not check
PF_FROZEN).
This patch does not address the problem of freeze_processes() violating the rule
that a task may only modify its own flags by setting PF_FREEZE. This is not clean
in an SMP environment. freeze(process) is therefore not SMP safe!
Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the following cleanups:
- make needlessly global code static
- remove the following unused global functions:
- blkdev_scsi_issue_flush_fn
- __blk_attempt_remerge
- remove the following unused EXPORT_SYMBOL's:
- blk_phys_contig_segment
- blk_hw_contig_segment
- blkdev_scsi_issue_flush_fn
- __blk_attempt_remerge
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the following possible cleanups:
- make the needlessly global function __nvram_set_checksum static
- #if 0 the unused global function nvram_set_checksum
- remove the EXPORT_SYMBOL's for both functions
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes use of ALIGN() to remove duplicate round-up code.
Signed-off-by: Nick Wilson <njw@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
o Following patch provides purely cosmetic changes and corrects CodingStyle
guide lines related certain issues like below in kexec related files
o braces for one line "if" statements, "for" loops,
o more than 80 column wide lines,
o No space after "while", "for" and "switch" key words
o Changes:
o take-2: Removed the extra tab before "case" key words.
o take-3: Put operator at the end of line and space before "*/"
Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Makes kexec_crashdump() take a pt_regs * as an argument. This allows to
get exact register state at the point of the crash. If we come from direct
panic assertion NULL will be passed and the current registers saved before
crashdump.
This hooks into two places:
die(): check the conditions under which we will panic when calling
do_exit and go there directly with the pt_regs that caused the fatal
fault.
die_nmi(): If we receive an NMI lockup while in the kernel use the
pt_regs and go directly to crash_kexec(). We're probably nested up badly
at this point so this might be the only chance to escape with proper
information.
Signed-off-by: Alexander Nyberg <alexn@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: "Vivek Goyal" <vgoyal@in.ibm.com>
o Support for /proc/vmcore interface. This interface exports elf core image
either in ELF32 or ELF64 format, depending on the format in which elf headers
have been stored by crashed kernel.
o Added support for CONFIG_VMCORE config option.
o Removed the dependency on /proc/kcore.
From: "Eric W. Biederman" <ebiederm@xmission.com>
This patch has been refactored to more closely match the prevailing style in
the affected files. And to clearly indicate the dependency between
/proc/kcore and proc/vmcore.c
From: Hariprasad Nellitheertha <hari@in.ibm.com>
This patch contains the code that provides an ELF format interface to the
previous kernel's memory post kexec reboot.
Signed off by Hariprasad Nellitheertha <hari@in.ibm.com>
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds support for retrieving the address of elf core header if one
is passed in command line.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch provides the interfaces necessary to read the dump contents,
treating it as a high memory device.
Signed off by Hariprasad Nellitheertha <hari@in.ibm.com>
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch retrieves the max_pfn being used by previous kernel and stores it
in a safe location (saved_max_pfn) before it is overwritten due to user
defined memory map. This pfn is used to make sure that user does not try to
read the physical memory beyond saved_max_pfn.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add kexec support for s390 architecture.
From: Milton Miller <miltonm@bga.com>
- Fix passing of first argument to relocate_kernel assembly.
- Fix Kconfig description.
- Remove wrong comment and comments that describe obvious things.
- Allow only KEXEC_TYPE_DEFAULT as image type -> dump not supported.
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch introduces the architecture independent implementation the
sys_kexec_load, the compat_sys_kexec_load system calls.
Kexec on panic support has been integrated into the core patch and is
relatively clean.
In addition the hopefully architecture independent option
crashkernel=size@location has been docuemented. It's purpose is to reserve
space for the panic kernel to live, and where no DMA transfer will ever be
setup to access.
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Alexander Nyberg <alexn@telia.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a new preemption model: 'Voluntary Kernel Preemption'. The
3 models can be selected from a new menu:
(X) No Forced Preemption (Server)
( ) Voluntary Kernel Preemption (Desktop)
( ) Preemptible Kernel (Low-Latency Desktop)
we still default to the stock (Server) preemption model.
Voluntary preemption works by adding a cond_resched()
(reschedule-if-needed) call to every might_sleep() check. It is lighter
than CONFIG_PREEMPT - at the cost of not having as tight latencies. It
represents a different latency/complexity/overhead tradeoff.
It has no runtime impact at all if disabled. Here are size stats that show
how the various preemption models impact the kernel's size:
text data bss dec hex filename
3618774 547184 179896 4345854 424ffe vmlinux.stock
3626406 547184 179896 4353486 426dce vmlinux.voluntary +0.2%
3748414 548640 179896 4476950 445016 vmlinux.preempt +3.5%
voluntary-preempt is +0.2% of .text, preempt is +3.5%.
This feature has been tested for many months by lots of people (and it's
also included in the RHEL4 distribution and earlier variants were in Fedora
as well), and it's intended for users and distributions who dont want to
use full-blown CONFIG_PREEMPT for one reason or another.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patches add dynamic sched domains functionality that was
extensively discussed on lkml and lse-tech. I would like to see this added to
-mm
o The main advantage with this feature is that it ensures that the scheduler
load balacing code only balances against the cpus that are in the sched
domain as defined by an exclusive cpuset and not all of the cpus in the
system. This removes any overhead due to load balancing code trying to
pull tasks outside of the cpu exclusive cpuset only to be prevented by
the tasks' cpus_allowed mask.
o cpu exclusive cpusets are useful for servers running orthogonal
workloads such as RT applications requiring low latency and HPC
applications that are throughput sensitive
o It provides a new API partition_sched_domains in sched.c
that makes dynamic sched domains possible.
o cpu_exclusive cpusets sets are now associated with a sched domain.
Which means that the users can dynamically modify the sched domains
through the cpuset file system interface
o ia64 sched domain code has been updated to support this feature as well
o Currently, this does not support hotplug. (However some of my tests
indicate hotplug+preempt is currently broken)
o I have tested it extensively on x86.
o This should have very minimal impact on performance as none of
the fast paths are affected
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Acked-by: Paul Jackson <pj@sgi.com>
Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Consolidate balance-on-exec with balance-on-fork. This is made easy by the
sched-domains RCU patches.
As well as the general goodness of code reduction, this allows the runqueues
to be unlocked during balance-on-fork.
schedstats is a problem. Maybe just have balance-on-event instead of
distinguishing fork and exec?
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Instead of requiring architecture code to interact with the scheduler's
locking implementation, provide a couple of defines that can be used by the
architecture to request runqueue unlocked context switches, and ask for
interrupts to be enabled over the context switch.
Also replaces the "switch_lock" used by these architectures with an oncpu
flag (note, not a potentially slow bitflag). This eliminates one bus
locked memory operation when context switching, and simplifies the
task_running function.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reimplement the balance on exec balancing to be sched-domains aware. Use this
to also do balance on fork balancing. Make x86_64 do balance on fork over the
NUMA domain.
The problem that the non sched domains aware blancing became apparent on dual
core, multi socket opterons. What we want is for the new tasks to be sent to
a different socket, but more often than not, we would first load up our
sibling core, or fill two cores of a single remote socket before selecting a
new one.
This gives large improvements to STREAM on such systems.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the very aggressive idle stuff that has recently gone into 2.6 - it is
going against the direction we are trying to go. Hopefully we can regain
performance through other methods.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>