Use thread info flags to track use of debug registers and IO bitmaps.
- add TIF_DEBUG to track when debug registers are active
- add TIF_IO_BITMAP to track when I/O bitmap is used
- modify __switch_to() to use the new TIF flags
Performance tested on Pentium II, ten runs of LMbench context switch
benchmark (smaller is better:)
before after
avg 3.65 3.39
min 3.55 3.33
Signed-off-by: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use "+m" rather than a combination of "=m" and "m" for improved clarity
and consistency.
This also fixes some inlines that incorrectly didn't tell the compiler
that they read the old value at all, potentially causing the compiler to
generate bogus code. It appear that all of those potential bugs were
hidden by the use of extra "volatile" specifiers on the data structures
in question, though.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use the lock validator framework to prove spinlock and rwlock locking
correctness.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use the lock validator framework to prove rwsem locking correctness.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clean up the x86 irqflags.h file:
- macro => inline function transformation
- simplifications
- style fixes
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hide the magic in alternative.h and provide some dummy inline functions
for the UP case (gcc should manage to optimize away these calls). No
changes in module.c.
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch implements an API whereby an application can determine the
label of its peer's Unix datagram sockets via the auxiliary data mechanism of
recvmsg.
Patch purpose:
This patch enables a security-aware application to retrieve the
security context of the peer of a Unix datagram socket. The application
can then use this security context to determine the security context for
processing on behalf of the peer who sent the packet.
Patch design and implementation:
The design and implementation is very similar to the UDP case for INET
sockets. Basically we build upon the existing Unix domain socket API for
retrieving user credentials. Linux offers the API for obtaining user
credentials via ancillary messages (i.e., out of band/control messages
that are bundled together with a normal message). To retrieve the security
context, the application first indicates to the kernel such desire by
setting the SO_PASSSEC option via getsockopt. Then the application
retrieves the security context using the auxiliary data mechanism.
An example server application for Unix datagram socket should look like this:
toggle = 1;
toggle_len = sizeof(toggle);
setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len);
recvmsg(sockfd, &msg_hdr, 0);
if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
cmsg_hdr->cmsg_level == SOL_SOCKET &&
cmsg_hdr->cmsg_type == SCM_SECURITY) {
memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
}
}
sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow
a server socket to receive security context of the peer.
Testing:
We have tested the patch by setting up Unix datagram client and server
applications. We verified that the server can retrieve the security context
using the auxiliary data mechanism of recvmsg.
Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com>
Acked-by: Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ->retrigger() irq op to consolidate hw_irq_resend() implementations.
(Most architectures had it defined to NOP anyway.)
NOTE: ia64 needs testing. i386 and x86_64 tested.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
drivers/built-in.o: In function `sgivwfb_set_par':
sgivwfb.c:(.text+0x88583): undefined reference to `sgivwfb_mem_phys'
sgivwfb.c:(.text+0x88596): undefined reference to `sgivwfb_mem_phys'
sgivwfb.c:(.text+0x885a8): undefined reference to `sgivwfb_mem_phys'
drivers/built-in.o: In function `sgivwfb_check_var':
sgivwfb.c:(.text+0x88ad0): undefined reference to `sgivwfb_mem_size'
drivers/built-in.o: In function `sgivwfb_mmap':
sgivwfb.c:(.text+0x88c75): undefined reference to `sgivwfb_mem_size'
sgivwfb.c:(.text+0x88c7f): undefined reference to `sgivwfb_mem_phys'
drivers/built-in.o: In function `sgivwfb_probe':
sgivwfb.c:(.init.text+0x4060): undefined reference to `sgivwfb_mem_size'
sgivwfb.c:(.init.text+0x4065): undefined reference to `sgivwfb_mem_phys'
sgivwfb.c:(.init.text+0x4076): undefined reference to `sgivwfb_mem_phys'
sgivwfb.c:(.init.text+0x409c): undefined reference to `sgivwfb_mem_size'
sgivwfb.c:(.init.text+0x410e): undefined reference to `sgivwfb_mem_size'
sgivwfb.c:(.init.text+0x4113): undefined reference to `sgivwfb_mem_phys'
sgivwfb.c:(.init.text+0x4162): undefined reference to `sgivwfb_mem_size'
sgivwfb.c:(.init.text+0x4168): undefined reference to `sgivwfb_mem_phys'
make: *** [.tmp_vmlinux1] Error 1
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sysfs entries 'sched_mc_power_savings' and 'sched_smt_power_savings' in
/sys/devices/system/cpu/ control the MC/SMT power savings policy for the
scheduler.
Based on the values (1-enable, 0-disable) for these controls, sched groups
cpu power will be determined for different domains. When power savings
policy is enabled and under light load conditions, scheduler will minimize
the physical packages/cpu cores carrying the load and thus conserving
power(with a perf impact based on the workload characteristics... see OLS
2005 CMP kernel scheduler paper for more details..)
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Con Kolivas <kernel@kolivas.org>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move the i386 VDSO down into a vma and thus randomize it.
Besides the security implications, this feature also helps debuggers, which
can COW a vma-backed VDSO just like a normal DSO and can thus do
single-stepping and other debugging features.
It's good for hypervisors (Xen, VMWare) too, which typically live in the same
high-mapped address space as the VDSO, hence whenever the VDSO is used, they
get lots of guest pagefaults and have to fix such guest accesses up - which
slows things down instead of speeding things up (the primary purpose of the
VDSO).
There's a new CONFIG_COMPAT_VDSO (default=y) option, which provides support
for older glibcs that still rely on a prelinked high-mapped VDSO. Newer
distributions (using glibc 2.3.3 or later) can turn this option off. Turning
it off is also recommended for security reasons: attackers cannot use the
predictable high-mapped VDSO page as syscall trampoline anymore.
There is a new vdso=[0|1] boot option as well, and a runtime
/proc/sys/vm/vdso_enabled sysctl switch, that allows the VDSO to be turned
on/off.
(This version of the VDSO-randomization patch also has working ELF
coredumping, the previous patch crashed in the coredumping code.)
This code is a combined work of the exec-shield VDSO randomization
code and Gerd Hoffmann's hypervisor-centric VDSO patch. Rusty Russell
started this patch and i completed it.
[akpm@osdl.org: cleanups]
[akpm@osdl.org: compile fix]
[akpm@osdl.org: compile fix 2]
[akpm@osdl.org: compile fix 3]
[akpm@osdl.org: revernt MAXMEM change]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Cc: Gerd Hoffmann <kraxel@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Using C code for current_thread_info() lets the compiler optimize it.
With gcc 4.0.2, kernel is smaller:
text data bss dec hex filename
3645212 555556 312024 4512792 44dc18 2.6.17-rc6-nb-post/vmlinux
3647276 555556 312024 4514856 44e428 2.6.17-rc6-nb/vmlinux
-------
-2064
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move the phys_core_id and cpu_core_id to cpuinfo_x86 structure. Similar
patch for x86_64 is already accepted by Andi earlier this week.
[akpm@osdl.org: fix warning]
Signed-off-by: Rohit Seth <rohitseth@google.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When new node becomes enable by hot-add, new sysfs file must be created for
new node. So, if new node is enabled by add_memory(), register_one_node() is
called to create it. In addition, I386's arch_register_node() and a part of
register_nodes() of powerpc are consolidated to register_one_node() as a
generic_code().
This is tested by Tiger4(IPF) with node hot-plug emulation.
Signed-off-by: Keiichiro Tokunaga <tokuanga.keiich@jp.fujitsu.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
typo fixes
Clean up 'inline is not at beginning' warnings for usb storage
Storage class should be first
i386: Trivial typo fixes
ixj: make ixj_set_tone_off() static
spelling fixes
fix paniced->panicked typos
Spelling fixes for Documentation/atomic_ops.txt
move acknowledgment for Mark Adler to CREDITS
remove the bouncing email address of David Campbell
Intel now has support for Architectural Performance Monitoring Counters
( Refer to IA-32 Intel Architecture Software Developer's Manual
http://www.intel.com/design/pentium4/manuals/253669.htm ). This
feature is present starting from Intel Core Duo and Intel Core Solo processors.
What this means is, the performance monitoring counters and some performance
monitoring events are now defined in an architectural way (using cpuid).
And there will be no need to check for family/model etc for these architectural
events.
Below is the patch to use this performance counters in nmi watchdog driver.
Patch handles both i386 and x86-64 kernels.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On some i386/x86_64 systems, sending an NMI IPI as a broadcast will
reset the system. This seems to be a BIOS bug which affects machines
where one or more cpus are not under OS control. It occurs on HT
systems with a version of the OS that is not compiled without HT
support. It also occurs when a system is booted with max_cpus=n where
2 <= n < cpus known to the BIOS. The fix is to always send NMI IPI as
a mask instead of as a broadcast.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
x86_64 and i386 behave inconsistently when sending an IPI on vector 2
(NMI_VECTOR). Make both behave the same, so IPI 2 is sent as NMI.
The crash code was abusing send_IPI_allbutself() by passing a code
instead of a vector, it only worked because crash knew about the
internal code of send_IPI_allbutself(). Change crash to use NMI_VECTOR
instead, and remove the comment about how crash was abusing the function.
This patch is a pre-requisite for fixing the problem where sending an
IPI as NMI would reboot some Dell Xeon systems. I cannot fix that
problem while crash continus to abuse send_IPI_allbutself().
It also removes the inconsistency between i386 and x86_64 for
NMI_VECTOR. That will simplify all the RAS code that needs to bring
all the cpus to a clean stop, even when one or more cpus are spinning
disabled.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When a process changes CPUs while doing the non atomic cpu_local_*
operations it might operate on the local_t of a different CPUs.
Fix that by disabling preemption.
Pointed out by Christopher Lameter
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
During some profiling I noticed that default_idle causes a lot of
memory traffic. I think that is caused by the atomic operations
to clear/set the polling flag in thread_info. There is actually
no reason to make this atomic - only the idle thread does it
to itself, other CPUs only read it. So I moved it into ti->status.
Converted i386/x86-64/ia64 for now because that was the easiest
way to fix ACPI which also manipulates these flags in its idle
function.
Cc: Nick Piggin <npiggin@novell.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>