This change introduces new flags for the hv_install_context()
API that passes a page table pointer to the hypervisor. Clients
can explicitly request 4K, 16K, or 64K small pages when they
install a new context. In practice, the page size is fixed at
kernel compile time and the same size is always requested every
time a new page table is installed.
The <hv/hypervisor.h> header changes so that it provides more abstract
macros for managing "page" things like PFNs and page tables. For
example there is now a HV_DEFAULT_PAGE_SIZE_SMALL instead of the old
HV_PAGE_SIZE_SMALL. The various PFN routines have been eliminated and
only PA- or PTFN-based ones remain (since PTFNs are always expressed
in fixed 2KB "page" size). The page-table management macros are
renamed with a leading underscore and take page-size arguments with
the presumption that clients will use those macros in some single
place to provide the "real" macros they will use themselves.
I happened to notice the old hv_set_caching() API was totally broken
(it assumed 4KB pages) so I changed it so it would nominally work
correctly with other page sizes.
Tag modules with the page size so you can't load a module built with
a conflicting page size. (And add a test for SMP while we're at it.)
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Use direct load/store for the get_user/put_user.
Previously, we would call out to a helper routine that would do the
appropriate thing and then return, handling the possible exception
internally. Now we inline the load or store, along with a "we succeeded"
indication in a register; if the load or store faults, we write a
"we failed" indication into the same register and then return to the
following instruction. This is more efficient and gives us more compact
code, as well as being more in line with what other architectures do.
The special futex assembly source file for TILE-Gx also disappears in
this change; we just use the same inlining idiom there as well, putting
the appropriate atomic operations directly into futex_atomic_op_inuser()
(and thus into the FUTEX_WAIT function).
The underlying atomic copy_from_user, copy_to_user functions were
renamed using the (cryptic) x86 convention as copy_from_user_ll and
copy_to_user_ll.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
The toolchain supports big-endian mode now, so add support for building
the kernel to run big-endian as well.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
There were some correctness issues with this code that are now fixed
with this change. The change is likely less performant than it could
be, but it should no longer be vulnerable to any races with memory
operations on the memory network while invalidating a range of memory.
This code is run infrequently so performance isn't critical, but
correctness definitely is.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Pragmatically it couldn't be wrong to cast pointers to long to compare
them (since all kernel addresses are in the top half of VA space),
but it's more correct to cast to unsigned long.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
We were carefully computing a value to use for the number of loops
to spin for, and then ignoring it.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Add a comment explaining why this is important, and add a CFLAGS_REMOVE
clause to the Makefile to make sure it happens.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
The empty_zero_page[] export is required for ZERO_PAGE() module references.
The #includes are due to changes in implicit inclusion, and should of
course have been in the sources from the beginning.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
An earlier Tilera compiler generated calls to an "__ll_mul"
function for long long multiplication. Our libgcc supported that
as an alias for the normal __muldi3 routine, so we made it available
to kernel modules as well. However, for a while now the compiler
has internally been generating only the standard __muldi3 symbol,
and the version we are giving back to the community does not have
the __ll_mul alias, so we are removing it from the kernel too.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
The 32-bit TILEPro support uses some #defines in <asm/atomic_32.h>
for atomic support routines in assembly. To make this more explicit,
I've turned those includes into includes of <asm/atomic_32.h>, which
should hopefully make it clear that they shouldn't be bombed into
<linux/atomic.h> in any cleanups.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
This support was partially present in the existing code (look for
"__tilegx__" ifdefs) but with this change you can build a working
kernel using the TILE-Gx toolchain and ARCH=tilegx.
Most of these files are new, generally adding a foo_64.c file
where previously there was just a foo_32.c file.
The ARCH=tilegx directive redirects to arch/tile, not arch/tilegx,
using the existing SRCARCH mechanism in the top-level Makefile.
Changes to existing files:
- <asm/bitops.h> and <asm/bitops_32.h> changed to factor the
include of <asm-generic/bitops/non-atomic.h> in the common header.
- <asm/compat.h> and arch/tile/kernel/compat.c changed to remove
the "const" markers I had put on compat_sys_execve() when trying
to match some recent similar changes to the non-compat execve.
It turns out the compat version wasn't "upgraded" to use const.
- <asm/opcode-tile_64.h> and <asm/opcode_constants_64.h> were
previously included accidentally, with the 32-bit contents. Now
they have the proper 64-bit contents.
Finally, I had to hack the existing hacky drivers/input/input-compat.h
to add yet another "#ifdef" for INPUT_COMPAT_TEST (same as x86_64).
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> [drivers/input]
Otherwise, it's possible to end up with the prefetcher pulling
data into cache that the code believes has been flushed.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
This semantic was already true for atomic operations within the kernel,
and this change makes it true for the fast atomic syscalls (__NR_cmpxchg
and __NR_atomic_update) as well. Previously, user-space had to use
the fast atomic syscalls exclusively to update memory, since raw stores
could lose a race with the atomic update code even when the atomic update
hadn't actually modified the value.
With this change, we no longer write back the value to memory if it
hasn't changed. This allows certain types of idioms in user space to
work as expected, e.g. "atomic exchange" to acquire a spinlock, followed
by a raw store of zero to release the lock.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Commit 8d7718aa08 changed "int"
to "u32" in the prototypes but not the definition.
I missed this when I saw the patch go by on LKML.
We cast "u32 *" to "int *" since we are tying into the underlying
atomics framework, and atomic_t uses int as its value type.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
The first issue fixed in this patch is that pending rwlock write locks
could lock out new readers; this could cause a deadlock if a read lock was
held on cpu 1, a write lock was then attempted on cpu 2 and was pending,
and cpu 1 was interrupted and attempted to re-acquire a read lock.
The write lock code was modified to not lock out new readers.
The second issue fixed is that there was a narrow race window where a tns
instruction had been issued (setting the lock value to "1") and the store
instruction to reset the lock value correctly had not yet been issued.
In this case, if an interrupt occurred and the same cpu then tried to
manipulate the lock, it would find the lock value set to "1" and spin
forever, assuming some other cpu was partway through updating it. The fix
is to enforce an interrupt critical section around the tns/store pair.
In addition, this change now arranges to always validate that after
a readlock we have not wrapped around the count of readers, which
is only eight bits.
Since these changes make the rwlock "fast path" code heavier weight,
I decided to move all the rwlock code all out of line, leaving only the
conventional spinlock code with fastpath inlines. Since the read_lock
and read_trylock implementations ended up very similar, I just expressed
read_lock in terms of read_trylock.
As part of this change I also eliminate support for the now-obsolete
tns_atomic mode.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
The Tilera architecture traditionally supports 64KB page sizes
to improve TLB utilization and improve performance when the
hardware is being used primarily to run a single application.
For more generic server scenarios, it can be beneficial to run
with 4KB page sizes, so this commit allows that to be specified
(by modifying the arch/tile/include/hv/pagesize.h header).
As part of this change, we also re-worked the PTE management
slightly so that PTE writes all go through a __set_pte() function
where we can do some additional validation. The set_pte_order()
function was eliminated since the "order" argument wasn't being used.
One bug uncovered was in the PCI DMA code, which wasn't properly
flushing the specified range. This was benign with 64KB pages,
but with 4KB pages we were getting some larger flushes wrong.
The per-cpu memory reservation code also needed updating to
conform with the newer percpu stuff; before it always chose 64KB,
and that was always correct, but with 4KB granularity we now have
to pay closer attention and reserve the amount of memory that will
be requested when the percpu code starts allocating.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
This is a grab bag of changes with no actual change to generated code.
This includes whitespace and comment typos, plus a couple of stale
comments being removed.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
It now takes an additional argument so it can be used to
flush-and-invalidate pages that are cached using hash-for-home
as well those that are cached with coherence point on a single cpu.
This allows it to be used more widely for changing the coherence
point of arbitrary pages when necessary.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
This avoids having to maintain an additional separate assembly
file, and of course the inline is slightly more efficient as well.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
The current implementations of __ndelay and __udelay call a hypervisor
service to delay, but the hypervisor service isn't actually implemented
very well, and the consensus is that Linux should handle figuring this
out natively and not use a hypervisor service.
By converting nanoseconds to cycles, and then spinning until the
cycle counter reaches the desired cycle, we get several benefits:
first, we are sensitive to the actual clock speed; second, we use
less power by issuing a slow SPR read once every six cycles while
we delay; and third, we properly handle the case of an interrupt by
exiting at the target time rather than after some number of cycles.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
The convention changed to, e.g., ".data..page_aligned". This commit
fixes the places in the tile architecture that were still using the
old convention. One tile-specific section (.init.page) was dropped
in favor of just using an "aligned" attribute.
Sam Ravnborg <sam@ravnborg.org> pointed out __PAGE_ALIGNED_BSS, etc.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
This change fixes a bug that memchr() will read the first word
of the source even if the length is zero. Ironically, the code
was originally written with a test to avoid exactly this problem,
but to make the code conform to Linux coding standards with all
declarations preceding all statements, the first load from memory
was moved up above that test as the initial value for a variable.
The change just moves all the variable declarations to the top
of the file, with no initializers, so that the test can also be
at the top of the file.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>