This is an updated version of Eric Biederman's is_init() patch.
(http://lkml.org/lkml/2006/2/6/280). It applies cleanly to 2.6.18-rc3 and
replaces a few more instances of ->pid == 1 with is_init().
Further, is_init() checks pid and thus removes dependency on Eric's other
patches for now.
Eric's original description:
There are a lot of places in the kernel where we test for init
because we give it special properties. Most significantly init
must not die. This results in code all over the kernel test
->pid == 1.
Introduce is_init to capture this case.
With multiple pid spaces for all of the cases affected we are
looking for only the first process on the system, not some other
process that has pid == 1.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: <lxc-devel@lists.sourceforge.net>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert i386 apm.c from kernel_thread(), whose export is deprecated, to
kthread API.
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Need to enable/disable all the counters instead of just counter 0.
This affects all cpus with family=6, including i386/core. Usual symptom:
only counter 0 provides samples. Other counters don't produce samples.
Signed-off-by: Arun Sharma <arun.sharma@google.com>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: John Levon <levon@movementarian.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The functions efi_call_phys_prelog and efi_call_phys_epilog in
arch/i386/kernel/efi.c wrap the spinlock efi_rt_lock: efi_call_phys_prelog
returns with the lock held, and efi_call_phys_epilog releases the lock
without acquiring it. Add lock annotations to these two functions so that
sparse can check callers for lock pairing, and so that sparse will not
complain about these functions since they intentionally use locks in this
manner.
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make PROT_WRITE imply PROT_READ for a number of architectures which don't
support write only in hardware.
While looking at this, I noticed that some architectures which do not
support write only mappings already take the exact same approach. For
example, in arch/alpha/mm/fault.c:
"
if (cause < 0) {
if (!(vma->vm_flags & VM_EXEC))
goto bad_area;
} else if (!cause) {
/* Allow reads even for write-only mappings */
if (!(vma->vm_flags & (VM_READ | VM_WRITE)))
goto bad_area;
} else {
if (!(vma->vm_flags & VM_WRITE))
goto bad_area;
}
"
Thus, this patch brings other architectures which do not support write only
mappings in-line and consistent with the rest. I've verified the patch on
ia64, x86_64 and x86.
Additional discussion:
Several architectures, including x86, can not support write-only mappings.
The pte for x86 reserves a single bit for protection and its two states are
read only or read/write. Thus, write only is not supported in h/w.
Currently, if i 'mmap' a page write-only, the first read attempt on that page
creates a page fault and will SEGV. That check is enforced in
arch/blah/mm/fault.c. However, if i first write that page it will fault in
and the pte will be set to read/write. Thus, any subsequent reads to the page
will succeed. It is this inconsistency in behavior that this patch is
attempting to address. Furthermore, if the page is swapped out, and then
brought back the first read will also cause a SEGV. Thus, any arbitrary read
on a page can potentially result in a SEGV.
According to the SuSv3 spec, "if the application requests only PROT_WRITE, the
implementation may also allow read access." Also as mentioned, some
archtectures, such as alpha, shown above already take the approach that i am
suggesting.
The counter-argument to this raised by Arjan, is that the kernel is enforcing
the write only mapping the best it can given the h/w limitations. This is
true, however Alan Cox, and myself would argue that the inconsitency in
behavior, that is applications can sometimes work/sometimes fails is highly
undesireable. If you read through the thread, i think people, came to an
agreement on the last patch i posted, as nobody has objected to it...
Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Andi Kleen <ak@muc.de>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Ian Molton <spyro@f2s.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert the i386 summit subarch apicid_to_node to use node information
provided by the SRAT. It was discussed a little on LKML a few weeks ago
and was seen as an acceptable fix. The current way of obtaining the nodeid
static inline int apicid_to_node(int logical_apicid)
{
return logical_apicid >> 5;
}
is just not correct for all summit systems/bios. Assuming the apicid
matches the Linux node number require a leap of faith that the bios mapped
out the apicids a set way. Modern summit HW (IBM x460) does not layout its
bios in the manner for various reasons and is unable to boot i386 numa.
The best way to get the correct apicid to node information is from the SRAT
table during boot. It lays out what apicid belongs to what node. I use
this information to create a table for use at run time.
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Avoid possible deadlock on a BUG() inside down_write(mmap_sem). The deadlock
can only occur if something has gone horridly wrong, because a fault here
shouldn't happen.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
IA32 manual says if micorcode update's size is 0, then the size is
default size (2048 bytes). But this doesn't suggest all microcode
update's size should be above 2048 bytes to me. We actually had a
microcode update whose size is 1024 bytes. The patch just removed the
check.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Tigran Aivazian <tigran@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add sysfs support. Currently each CPU has three microcode related
attributes. One is 'version' which shows current ucode version of CPU.
Tools can use the attribute do validation or show CPU ucode status. one is
'reload' which allows manually reloading ucode. Another is
'processor_flags', which exports processor flags, so we can write tools to
check if CPU has latest ucode. Also add suspend/resume and CPU hotplug
support.
[akpm@osdl.org: cleanups, build fix]
[bunk@stusta.de: Kconfig fixes]
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Tigran Aivazian <tigran@veritas.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Using request_firmware to pull ucode from userspace, so we don't need the
application 'microcode_ctl' to assist. We name each ucode file according
to CPU's info as intel-ucode/family-model-stepping. In this way we could
split ucode file as small one. This has a lot of advantages such as
selectively update and validate microcode for specific models, better
manage microcode file, easily write tools for administerators and so on.
with the changes, we should put all intel-ucode/xx-xx-xx microcode files
into the firmware dir (I had a tool to split previous big data file into
small one and later we will release new style data file). The init script
should be changed to just loading the driver without unloading
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Tigran Aivazian <tigran@veritas.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clean up microcode update driver and make it more readable.
[akpm@osdl.org: cleanups]
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Tigran Aivazian <tigran@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
[PATCH] Don't set calgary iommu as default y
[PATCH] i386/x86-64: New Intel feature flags
[PATCH] x86: Add a cumulative thermal throttle event counter.
[PATCH] i386: Make the jiffies compares use the 64bit safe macros.
[PATCH] x86: Refactor thermal throttle processing
[PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
[PATCH] Fix unwinder warning in traps.c
[PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
[PATCH] x86: Move direct PCI scanning functions out of line
[PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
[PATCH] Don't leak NT bit into next task
[PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
[PATCH] Fix some broken white space in ia32_signal.c
[PATCH] Initialize argument registers for 32bit signal handlers.
[PATCH] Remove all traces of signal number conversion
[PATCH] Don't synchronize time reading on single core AMD systems
[PATCH] Remove outdated comment in x86-64 mmconfig code
[PATCH] Use string instructions for Core2 copy/clear
[PATCH] x86: - restore i8259A eoi status on resume
[PATCH] i386: Split multi-line printk in oops output.
...
Detect the situations in which the time after a resume from disk would be
earlier than the time before the suspend and prevent them from happening on
i386.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: John Stultz <johnstul@us.ibm.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The name of the pagedir_nosave variable does not make sense any more, so it
seems reasonable to change it to something more meaningful.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The functions prepare_set and post_set in kernel/cpu/mtrr/generic.c wrap
the spinlock set_atomicity_lock: prepare_set returns with the lock held,
and post_set releases the lock without acquiring it. Add lock annotations
to these two functions so that sparse can check callers for lock pairing,
and so that sparse will not complain about these functions since they
intentionally use locks in this manner.
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove all references to xtime in i386 and replace them w/
get/set_timeofday(). Requires some ugly and uncertain changes to APM, but
has been lightly tested to work.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Voyager fiddles with current->signal.tty without locking. It turns out
that the code in question has already cleared current->signal.tty correctly
because daemonize() does the right thing already.
The signal handling also appears to be incorrect as it does an unprotected
sigfillset that also appears unneccessary. As I don't have a bowtie and am
therefore not a qualified voyager maintainer I leave that to James.
Signed-off-by: Alan Cox <alan@redhat.com>
Acked-by: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If we're going to implement smp_call_function_single() on three architecture
with the same prototype then it should have a declaration in a
non-arch-specific header file.
Move it into <linux/smp.h>.
Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Continiung the series of small patches necessary for the perfmon subsystem,
here is a patch that adds support for the smp_call_function_single()
function for i386. It exists for almost all other architectures but i386.
The perfmon subsystem needs it in one case to free some state on a
designated remote CPU.
Signed-off-by: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current VMSPLIT Kconfig option is disabled whenever highmem is on.
This is a bit screwy because the people who need to change VMSPLIT the most
tend to be the ones with highmem and constrained lowmem.
So, remove the highmem dependency. But, re-include the dependency for the
"full 1GB of lowmem" option. You can't have the full 1GB of lowmem and
highmem because of the need for the vmalloc(), kmap(), etc... areas.
I thought there would be at least a bit of tweaking to do to
get it to work, but everything seems OK.
Boot tested on a 4GB x86 machine, and a 12GB 3-node NUMA-Q:
elm3b82:~# cat /proc/meminfo
MemTotal: 3695412 kB
MemFree: 3659540 kB
...
LowTotal: 2909008 kB
LowFree: 2892324 kB
...
elm3b82:~# zgrep PAE /proc/config.gz
CONFIG_X86_PAE=y
larry:~# cat /proc/meminfo
MemTotal: 11845900 kB
MemFree: 11786748 kB
...
LowTotal: 2855180 kB
LowFree: 2830092 kB
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>