The 32 bit unsigned substraction (next - jiffies) in stop_hz_timer can
overflow if jiffies gets advanced between next_timer_interrupt and the read
under the xtime lock. The cast to a u64 then results in a large value
which causes the cpu to wait too long. Fix this by casting next and
jiffies independently to u64 before subtracting them.
(Spotted by Zachary Amsden <zach@vmware.com>)
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Under certain timing conditions, a race during boot occurs where timer
ticks are being processed on remote CPUs. The remote timer ticks can
increment jiffies, and if this happens during a window when a timeout is
very close to expiring but a local tick has not yet been delivered, you can
end up with
1) No softirq pending
2) A local timer wheel which is not synced to jiffies
3) No high resolution timer active
4) A local timer which is supposed to fire before the current jiffies value.
In this circumstance, the comparison in next_timer_interrupt overflows,
because the base of the comparison for high resolution timers is jiffies,
but for the softirq timer wheel, it is relative the the current base of the
wheel (jiffies_base).
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Problem:
If we put a probe onto a callq instruction and the probe is executed,
kernel panic of Bad RIP value occurs.
Root cause:
If resume_execution() found 0xff at first byte of p->ainsn.insn, it must
check the _second_ byte. But current resume_execution check _first_ byte
again.
I changed it checks second byte of p->ainsn.insn.
Kprobes on i386 don't have this problem, because the implementation is a
little bit different from x86_64.
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Satoshi Oshima <soshima@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
o Kdump second kernel boot fails after a system crash if second kernel
is UP and acpi=off and if crash occurred on a non-boot cpu.
o Issue here is that MP tables report boot cpu lapic id as 0 but second
kernel is booting on a different processor and MP table data is stale
in this context. Hence apic_id_registered() check fails in setup_local_APIC()
when called from APIC_init_uniprocessor().
o Problem is not seen if ACPI is enabled as in that case
boot_cpu_physical_apicid is read from the LAPIC.
o Problem is not seen with SMP kernels as well because in this case also
boot_cpu_physical_apicid is read from LAPIC. (smp_boot_cpus()).
o The problem is fixed by reading boot_cpu_physical_apicid from LAPIC
if it is a UP kernel and CRASH_DUMP is enabled.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix some outstanding issues with the pxa2xx_spi driver when running on a
PXA270:
- Wrong timeout calculation in the setup function due to different
peripheral clock rates in the PXAxxx family.
- Bad handling of SSSR_TFS interrupts in interrupt_transfer function.
- Added locking to interface between the pump_messages workqueue and the
pump_transfers tasklet.
Much thanks to Juergen Beisert for the extensive testing on the PXA270.
Signed-off-by: Stephen Street <stephen@streetfiresound.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- remove the following global function that is both unused and
unimplemented:
- register_firmware()
- make the following needlessly global function static:
- firmware_class_uevent()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This driver supports the SPI controller on the MPC83xx SoC devices from
Freescale. Note, this driver supports only the simple shift register SPI
controller and not the descriptor based CPM or QUICCEngine SPI controller.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Because several developers asked me about referenced but missing
spi_add_master(), I think that this patch should be applied ... it
corrects comments so they refer to spi_register_master() instead.
Signed-off-by: dmitry pervushin <dpervushin@ru.mvista.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes coverity bug id #1237. After the while loop, it is possible for
i == ISDN_LMSNLEN. If this happens the terminating '\0' is written after
the end of the array.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Cc: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It's too easy to incorrectly call cpuset_zone_allowed() in an atomic
context without __GFP_HARDWALL set, and when done, it is not noticed until
a tight memory situation forces allocations to be tried outside the current
cpuset.
Add a 'might_sleep_if()' check, to catch this earlier on, instead of
waiting for a similar check in the mutex_lock() code, which is only rarely
invoked.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Update the kernel/cpuset.c:cpuset_zone_allowed() comment.
The rule for when mm/page_alloc.c should call cpuset_zone_allowed()
was intended to be:
Don't call cpuset_zone_allowed() if you can't sleep, unless you
pass in the __GFP_HARDWALL flag set in gfp_flag, which disables
the code that might scan up ancestor cpusets and sleep.
The explanation of this rule in the comment above cpuset_zone_allowed() was
stale, as a result of a restructuring of some __alloc_pages() code in
November 2005.
Rewrite that comment ...
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a couple of infrequently encountered 'sleeping function called from
invalid context' in the cpuset hooks in __alloc_pages. Could sleep while
interrupts disabled.
The routine cpuset_zone_allowed() is called by code in mm/page_alloc.c
__alloc_pages() to determine if a zone is allowed in the current tasks
cpuset. This routine can sleep, for certain GFP_KERNEL allocations, if the
zone is on a memory node not allowed in the current cpuset, but might be
allowed in a parent cpuset.
But we can't sleep in __alloc_pages() if in interrupt, nor if called for a
GFP_ATOMIC request (__GFP_WAIT not set in gfp_flags).
The rule was intended to be:
Don't call cpuset_zone_allowed() if you can't sleep, unless you
pass in the __GFP_HARDWALL flag set in gfp_flag, which disables
the code that might scan up ancestor cpusets and sleep.
This rule was being violated in a couple of places, due to a bogus change
made (by myself, pj) to __alloc_pages() as part of the November 2005 effort
to cleanup its logic, and also due to a later fix to constrain which swap
daemons were awoken.
The bogus change can be seen at:
http://linux.derkeiler.com/Mailing-Lists/Kernel/2005-11/4691.html
[PATCH 01/05] mm fix __alloc_pages cpuset ALLOC_* flags
This was first noticed on a tight memory system, in code that was disabling
interrupts and doing allocation requests with __GFP_WAIT not set, which
resulted in __might_sleep() writing complaints to the log "Debug: sleeping
function called ...", when the code in cpuset_zone_allowed() tried to take
the callback_sem cpuset semaphore.
We haven't seen a system hang on this 'might_sleep' yet, but we are at
decent risk of seeing it fairly soon, especially since the additional
cpuset_zone_allowed() check was added, conditioning wakeup_kswapd(), in
March 2006.
Special thanks to Dave Chinner, for figuring this out, and a tip of the hat
to Nick Piggin who warned me of this back in Nov 2005, before I was ready
to listen.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The OSC set and query functions do not allocate enough space for return
values, and set the output buffer length to a false, too large value. This
causes the acpi-ca code to assume that the output buffer is larger than it
actually is, and overwrite memory when copying acpi return buffers into
this caller provided buffer. In some cases this can cause kernel oops if
the memory that is overwritten is a pointer. This patch will change these
calls to use a dynamically allocated output buffer, thus allowing the
acpi-ca code to decide how much space is needed.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: "Yu, Luming" <luming.yu@intel.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While doing some inotify stress testing, I hit the following race. In
inotify_release(), it's possible for a watch to be removed from the lists
in between dropping dev->mutex and taking inode->inotify_mutex. The
reference we hold prevents the watch from being freed, but not from being
removed.
Checking the dev's idr mapping will prevent a double list_del of the
same watch.
Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Acked-by: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rml@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A bad calculation/loop in __section_nr() could result in incorrect section
information being put into sysfs memory entries. This primarily impacts
memory add operations as the sysfs information is used while onlining new
memory.
Fix suggested by Dave Hansen.
Note that the bug may not be obvious from the patch. It actually occurs in
the function's return statement:
return (root_nr * SECTIONS_PER_ROOT) + (ms - root);
In the existing code, root_nr has already been multiplied by
SECTIONS_PER_ROOT.
Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Bernd Schmidt points out that binfmt_flat is now leaving the exec file open
while the application runs. This offsets all the application's fd numbers.
We should have closed the file within exec(), not at exit()-time.
But there doesn't seem to be a lot of point in doing all this just to avoid
going over RLIMIT_NOFILE by one fd for a few microseconds. So take the EMFILE
checking out again. This will cause binfmt_flat to again fail LTP's
exec-should-return-EMFILE-when-fdtable-is-full test. That test appears to be
wrong anyway - Open Group specs say nothing about exec() returning EMFILE.
Cc: Bernd Schmidt <bernd.schmidt@analog.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make a read of a HID device block until data is available. Without it, the
read goes into a busy-wait loop until data is available.
Cc: Greg KH <greg@kroah.com>
Acked-by: Vojtech Pavlik <vojtech@suse.cz>
Cc: Dmitry Torokhov <dtor_core@ameritech.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>