In ia64 kernel, the O_LARGEFILE flag is forced when opening a file. This
is problematic for execution of 32 bit processes, which are not largefile
aware, either by SW emulation or by HW execution.
For such processes, the problem is two-fold:
1) When trying to open a file that is larger than 4G
the operation should fail, but it's not
2) Writing to offset larger than 4G should fail, but
it's not
The proposed patch takes advantage of the way 32 bit processes are
identified in ia64 systems. Such processes have PER_LINUX32 for their
personality. With the patch, the ia64 kernel will not enforce the
O_LARGEFILE flag if the current process has PER_LINUX32 set. The behavior
for all other architectures remains unchanged.
Signed-off-by: Yoav Zach <yoav.zach@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current Kprobes when patching the original instruction with the break
instruction tries to retain the original qualifying predicate(qp), however
for cmp.crel.ctype where ctype == unc, which is a special instruction
always needs to be executed irrespective of qp. Hence, if the instruction
we are patching is of this type, then we should not copy the original qp to
the break instruction, this is because we always want the break fault to
happen so that we can emulate the instruction.
This patch is based on the feedback given by David Mosberger
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A cleanup of the ia64 kprobes implementation such that all of the bundle
manipulation logic is concentrated in arch_prepare_kprobe().
With the current design for kprobes, the arch specific code only has a
chance to return failure inside the arch_prepare_kprobe() function.
This patch moves all of the work that was happening in arch_copy_kprobe()
and most of the work that was happening in arch_arm_kprobe() into
arch_prepare_kprobe(). By doing this we can add further robustness checks
in arch_arm_kprobe() and refuse to insert kprobes that will cause problems.
Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As many of you know that kprobes exist in the main line kernel for various
architecture including i386, x86_64, ppc64 and sparc64. Attached patches
following this mail are a port of Kprobes and Jprobes for IA64.
I have tesed this patches for kprobes and Jprobes and this seems to work fine.
I have tested this patch by inserting kprobes on various slots and various
templates including various types of branch instructions.
I have also tested this patch using the tool
http://marc.theaimsgroup.com/?l=linux-kernel&m=111657358022586&w=2 and the
kprobes for IA64 works great.
Here is list of TODO things and pathes for the same will appear soon.
1) Support kprobes on "mov r1=ip" type of instruction
2) Support Kprobes and Jprobes to exist on the same address
3) Support Return probes
3) Architecture independent cleanup of kprobes
This patch adds the kdebug die notification mechanism needed by Kprobes.
For break instruction on Branch type slot, imm21 is ignored and value
zero is placed in IIM register, hence we need to handle kprobes
for switch case zero.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com>
From: Rusty Lynch <rusty.lynch@intel.com>
At the point in traps.c where we recieve a break with a zero value, we can
not say if the break was a result of a kprobe or some other debug facility.
This simple patch changes the informational string to a more correct "break
0" value, and applies to the 2.6.12-rc2-mm2 tree with all the kprobes
patches that were just recently included for the next mm cut.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The preempt_count member of struct thread_info is currently either defined
as int, unsigned int or __s32 depending on arch. This patch makes the type
of preempt_count an int on all archs.
Having preempt_count be an unsigned type prevents the catching of
preempt_count < 0 bugs, and using int on some archs and __s32 on others is
not exactely "neat" - much nicer when it's just int all over.
A previous version of this patch was already ACK'ed by Robert Love, and the
only change in this version of the patch compared to the one he ACK'ed is
that this one also makes sure the preempt_count member is consistently
commented.
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It allows a selectable timer interrupt frequency of 100, 250 and 1000 HZ.
Reducing the timer frequency may have important performance benefits on
large systems.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
pfn_to_nid is undefined. We haven't had this interface on ia64. The
sys_mbind patches need it.
Oh, the paddr_to_nid call could fail when DISCONTIG+NUMA is configured
because there isn't any ACPI SRAT NUMA information.
Signed-off-by: Bob Picco <bob.picco@hp.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the bits to make the XPC code use the uncached
allocator rather than calling into the mspec driver. It also includes the
mspec.h header which is required to build the XPC modules.
Signed-off-by: Jes Sorensen <jes@wildopensource.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the ia64 uncached page allocator and the generic
allocator (genalloc). The uncached allocator was formerly part of the SN2
mspec driver but there are several other users of it so it has been split
off from the driver.
The generic allocator can be used by device driver to manage special memory
etc. The generic allocator is based on the allocator from the sym53c8xx_2
driver.
Various users on ia64 needs uncached memory. The SGI SN architecture requires
it for inter-partition communication between partitions within a large NUMA
cluster. The specific user for this is the XPC code. Another application is
large MPI style applications which use it for synchronization, on SN this can
be done using special 'fetchop' operations but it also benefits non SN
hardware which may use regular uncached memory for this purpose. Performance
of doing this through uncached vs cached memory is pretty substantial. This
is handled by the mspec driver which I will push out in a seperate patch.
Rather than creating a specific allocator for just uncached memory I came up
with genalloc which is a generic purpose allocator that can be used by device
drivers and other subsystems as they please. For instance to handle onboard
device memory. It was derived from the sym53c7xx_2 driver's allocator which
is also an example of a potential user (I am refraining from modifying sym2
right now as it seems to have been under fairly heavy development recently).
On ia64 memory has various properties within a granule, ie. it isn't safe to
access memory as uncached within the same granule as currently has memory
accessed in cached mode. The regular system therefore doesn't utilize memory
in the lower granules which is mixed in with device PAL code etc. The
uncached driver walks the EFI memmap and pulls out the spill uncached pages
and sticks them into the uncached pool. Only after these chunks have been
utilized, will it start converting regular cached memory into uncached memory.
Hence the reason for the EFI related code additions.
Signed-off-by: Jes Sorensen <jes@wildopensource.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A lot of the code in arch/*/mm/hugetlbpage.c is quite similar. This patch
attempts to consolidate a lot of the code across the arch's, putting the
combined version in mm/hugetlb.c. There are a couple of uglyish hacks in
order to covert all the hugepage archs, but the result is a very large
reduction in the total amount of code. It also means things like hugepage
lazy allocation could be implemented in one place, instead of six.
Tested, at least a little, on ppc64, i386 and x86_64.
Notes:
- this patch changes the meaning of set_huge_pte() to be more
analagous to set_pte()
- does SH4 need s special huge_ptep_get_and_clear()??
Acked-by: William Lee Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is the core of the (much simplified) early reclaim. The goal of this
patch is to reclaim some easily-freed pages from a zone before falling back
onto another zone.
One of the major uses of this is NUMA machines. With the default allocator
behavior the allocator would look for memory in another zone, which might be
off-node, before trying to reclaim from the current zone.
This adds a zone tuneable to enable early zone reclaim. It is selected on a
per-zone basis and is turned on/off via syscall.
Adding some extra throttling on the reclaim was also required (patch
4/4). Without the machine would grind to a crawl when doing a "make -j"
kernel build. Even with this patch the System Time is higher on
average, but it seems tolerable. Here are some numbers for kernbench
runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run:
wall user sys %cpu ctx sw. sleeps
---- ---- --- ---- ------ ------
No patch 1009 1384 847 258 298170 504402
w/patch, no reclaim 880 1376 667 288 254064 396745
w/patch & reclaim 1079 1385 926 252 291625 548873
These numbers are the average of 2 runs of 3 "make -j" runs done right
after system boot. Run-to-run variability for "make -j" is huge, so
these numbers aren't terribly useful except to seee that with reclaim
the benchmark still finishes in a reasonable amount of time.
I also looked at the NUMA hit/miss stats for the "make -j" runs and the
reclaim doesn't make any difference when the machine is thrashing away.
Doing a "make -j8" on a single node that is filled with page cache pages
takes 700 seconds with reclaim turned on and 735 seconds without reclaim
(due to remote memory accesses).
The simple zone_reclaim syscall program is at
http://www.bork.org/~mort/sgi/zone_reclaim.c
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch implements a number of smp_processor_id() cleanup ideas that
Arjan van de Ven and I came up with.
The previous __smp_processor_id/_smp_processor_id/smp_processor_id API
spaghetti was hard to follow both on the implementational and on the
usage side.
Some of the complexity arose from picking wrong names, some of the
complexity comes from the fact that not all architectures defined
__smp_processor_id.
In the new code, there are two externally visible symbols:
- smp_processor_id(): debug variant.
- raw_smp_processor_id(): nondebug variant. Replaces all existing
uses of _smp_processor_id() and __smp_processor_id(). Defined
by every SMP architecture in include/asm-*/smp.h.
There is one new internal symbol, dependent on DEBUG_PREEMPT:
- debug_smp_processor_id(): internal debug variant, mapped to
smp_processor_id().
Also, i moved debug_smp_processor_id() from lib/kernel_lock.c into a new
lib/smp_processor_id.c file. All related comments got updated and/or
clarified.
I have build/boot tested the following 8 .config combinations on x86:
{SMP,UP} x {PREEMPT,!PREEMPT} x {DEBUG_PREEMPT,!DEBUG_PREEMPT}
I have also build/boot tested x64 on UP/PREEMPT/DEBUG_PREEMPT. (Other
architectures are untested, but should work just fine.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There've been reports of problems with CONFIG_PREEMPT=y and the high
floating point partition. This is caused by the possibility of preemption
and rescheduling on a different processor while saving or restioirng the
high partition.
The only places where the FPU state is touched are in ptrace, in
switch_to(), and where handling a floating-point exception. In switch_to()
preemption is off. So it's only in trap.c and ptrace.c that we need to
prevent preemption.
Here is a patch that adds commentary to make the conditions clear, and adds
appropriate preempt_{en,dis}able() calls to make it so. In trap.c I use
preempt_enable_no_resched(), as we're about to return to user space where
the preemption flag will be checked anyway.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When Linux is running on the Xen virtual machine monitor, physical
addresses are virtualised and cannot be directly referenced by the AGP
GART. This patch fixes the GART driver for Xen by adding a layer of
abstraction between physical addresses and 'GART addresses'.
Architecture-specific functions are also defined for allocating and freeing
the GATT. Xen requires this to ensure that table really is contiguous from
the point of view of the GART.
These extra interface functions are defined as 'no-ops' for all existing
architectures that use the GART driver.
Signed-off-by: Keir Fraser <keir@xensource.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Patch from Charles Spirakis
Some linux customers want to optimize their applications on the latest
hardware but are not yet willing to upgrade to the latest kernel. This
patch provides a way to plug in an alternate, basic, and GPL'ed PMU
subsystem to help with their monitoring needs or for specialty work. It
can also be used in case of serious unexpected bugs in perfmon. Mutual
exclusion between the two subsystems is guaranteed, hence no conflict
can arise from both subsystem being present.
Acked-by: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The 2.6 kernel has CPE error thresholding.
This patch lets SAL know of this error handling feature.
The changes are SN specific.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Change the SAL call for POD mode to be reentrant.
This change is SN specific.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
These days <linux/ioctl32.h> handles everything, no need for an asm
header on just two architectures.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>