The GuestCtl1 CP0 register can contain the GuestID used for root TLB
operations, which affects TLB matching. The other TLB registers are
already dumped out to the log on a machine check exception due to
multiple matching TLB entries, so also dump the value of the GuestCtl1
register if GuestIDs are supported.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13232/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The GuestID for root TLB operations (GuestCtl1.RID) is modified by TLB
reads, so needs preserving by dump_tlb() like the ASID field of EntryHi.
Also dump the GuestID of each entry if it exists alongside the ASID, as
it forms an important part of the TLB entry when VZ guests are used.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13230/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
ARCH_USE_BUILTIN_BSWAP will use __builtin_bswap16(), __builtin_bswap32()
and __builtin_bswap64() where available. This allows better instruction
scheduling. On pre-R2 processors it will result in 32 bit and 64 bit
swapping being performed in a call to a __bswapsi2() rsp. __bswapdi2()
functions, so we add these, too.
For a 4.2 kernel with GCC 4.9 this yields the following kernel sizes:
text data bss dec hex filename
3996071 155804 88992 4240867 40b5e3 vmlinux ip22 baseline
3985687 159900 88992 4234579 409d53 vmlinux ip22 + bswap patch
6913157 378552 251024 7542733 7317cd vmlinux ip27 baseline
6878581 378552 251024 7508157 7290bd vmlinux ip27 + bswap patch
5773777 268752 187424 6229953 5f0fc1 vmlinux malta baseline
5773401 268752 187424 6229577 5f0e49 vmlinux malta + bswap patch
Presumably the code size improvments yield better cache hit rate thus
better performance compensating for the extra function call but this
will still need to be benchmarked.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The generic field definitions (i.e. present before MIPS32/MIPS64) in
mipsregs.h are conventionally not prefixed with MIPS_, so rename the
recently added MIPS_ENTRYLO_* definitions for the G, V, D, and C fields
to ENTRYLO_*. Also rearrange to put the EntryLo and EntryHi definitions
in the right place in the file.
Fixes: 8ab6abcb6a ("MIPS: mipsregs.h: Add EntryLo bit definitions")
Reported-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10725/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The TLB registers are dumped in a couble of places:
- sysrq_tlbdump_single() - when dumping TLB state.
- do_mcheck() - in response to a machine check error.
The main TLB registers also differ between r3k and r4k, but r4k appears
to be assumed.
Refactor this code into a dump_tlb_regs() function, implemented for both
r3k and r4k, and used by both of the above functions.
Fixes: d1e9a4f547 ("MIPS: Add SysRq operation to dump TLBs on all CPUs")
Suggested-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10721/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Move the initialisation of the CP0.Wired register implemented by Toshiba
TX3922 and TX3927 processors from `tx39_cache_init' to `tlb_init' where
it belongs, correcting code structure and making sure initialisation
does not rely on `tx39_cache_init' being called before `tlb_init' to
work correctly.
Make `r3k_have_wired_reg' static as it's no longer externally referred
to; remove a stale declaration too.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10195/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
XPA extends the physical addresses on MIPS32, including the EntryLo
registers. Update dump_tlb() to concatenate the PFNX field from the high
end of the EntryLo registers (as read by mfhc0).
The width of physical and virtual addresses are also separated to show
only 8 nibbles of virtual but 11 nibbles of physical with XPA.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Steven J. Hill <Steven.Hill@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10077/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The RI/XI bits when present are above the PFN field in the EntryLo
registers, at bits 63,62 when read with dmfc0, and bits 31,30 when read
with mfc0. This makes them appear as part of the physical address, since
the other bits are masked with PAGE_MASK, for example:
Index: 253 pgmask=16kb va=77b18000 asid=75
[pa=1000744000 c=5 d=1 v=1 g=0] [pa=100134c000 c=5 d=1 v=1 g=0]
The physical addresses have bit 36 set, which corresponds to bit 30 of
EntryLo1, the XI bit.
Explicitly mask off the RI and XI bits from the printed physical
address, and print the RI and XI bits separately if they exist, giving
output more like this:
Index: 226 pgmask=16kb va=77be0000 asid=79
[ri=0 xi=1 pa=01288000 c=5 d=1 v=1 g=0] [ri=0 xi=0 pa=010e4000 c=5 d=0 v=1 g=0]
Cc: linux-mips@linux-mips.org
Cc: James Hogan <james.hogan@imgtec.com>
Cc: David Daney <ddaney@caviumnetworks.com>
Patchwork: https://patchwork.linux-mips.org/patch/10080/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The EHINV bit in EntryHi allows a TLB entry to be properly marked
invalid so that EntryHi doesn't have to be set to a unique value to
avoid machine check exceptions due to multiple matching entries.
Unfortunately dump_tlb() doesn't take this into account so it will print
all the uninteresting invalid TLB entries if the current ASID happens to
be 00. Therefore add a condition to skip entries which are marked
invalid with the EHINV bit.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10076/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The TLB only matches the ASID when the global bit isn't set, so
dump_tlb() shouldn't really be skipping global entries just because the
ASID doesn't match. Fix the condition to read the TLB entry's global bit
from EntryLo0. Note that after a TLB read the global bits in both
EntryLo registers reflect the same global bit in the TLB entry.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10079/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Refactor the TLB matching code in dump_tlb() slightly so that the
conditions which can cause a TLB entry to be skipped can be more easily
extended. This should prevent the match condition getting unwieldy once
it is updated to take further conditions into account.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10081/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Use the new tlb read hazard macros from <asm/hazards.h> rather than the
local BARRIER() macro which uses 7 ops regardless of the kernel
configuration.
We use mtc0_tlbr_hazard for the hazard between mtc0 to the index
register and the tlbr, and tlb_read_hazard for the hazard between the
tlbr and the mfc0 of the TLB registers written by tlbr.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10074/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Correct a regression introduced with 8453eebd [MIPS: Fix strnlen_user()
return value in case of overlong strings.] causing assembler warnings
and broken code generated in __strnlen_kernel_nocheck_asm:
arch/mips/lib/strnlen_user.S: Assembler messages:
arch/mips/lib/strnlen_user.S:64: Warning: Macro instruction expanded into multiple instructions in a branch delay slot
with the CPU_DADDI_WORKAROUNDS option set, resulting in the function
looping indefinitely upon mounting NFS root.
Use conditional assembly to avoid a microMIPS code size regression.
Using $at unconditionally would cause such a regression as there are no
16-bit instruction encodings available for ALU operations using this
register. Using $v1 unconditionally would produce short microMIPS
encodings, but would prevent this register from being used across calls
to this function.
The extra LI operation introduced is free, replacing a NOP originally
scheduled into the delay slot of the branch that follows.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10205/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Computing sum introduces true data dependency. This patch removes some
true data depdendencies, hence increases instruction level parallelism.
This patch brings up to 50% csum performance gain on Loongson 3a.
One example about how this patch works is in CSUM_BIGCHUNK1:
// ** original ** vs ** patch applied **
ADDC(sum, t0) ADDC(t0, t1)
ADDC(sum, t1) ADDC(t2, t3)
ADDC(sum, t2) ADDC(sum, t0)
ADDC(sum, t3) ADDC(sum, t2)
In the original implementation, each ADDC(sum, ...) depends on the sum
value updated by previous ADDC(as source operand).
With this patch applied, the first two ADDC operations are independent,
hence can be executed simultaneously if possible.
Another example is in the "copy and sum calculating chunk":
// ** original ** vs ** patch applied **
STORE(t0, UNIT(0) ... STORE(t0, UNIT(0) ...
ADDC(sum, t0) ADDC(t0, t1)
STORE(t1, UNIT(1) ... STORE(t1, UNIT(1) ...
ADDC(sum, t1) ADDC(sum, t0)
STORE(t2, UNIT(2) ... STORE(t2, UNIT(2) ...
ADDC(sum, t2) ADDC(t2, t3)
STORE(t3, UNIT(3) ... STORE(t3, UNIT(3) ...
ADDC(sum, t3) ADDC(sum, t2)
With this patch applied, ADDC and the **next next** ADDC are independent.
Signed-off-by: chenj <chenj@lemote.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9608/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
MIPS R6 dropped the unaligned load and store instructions so
we need to re-write this part of the code for R6 to store
one byte at a time.
Signed-off-by: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
MIPS R6 does not support the unaligned load and store instructions
so we add a special MIPS R6 case to copy one byte at a time if we
need to read/write to unaligned memory addresses.
Signed-off-by: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>