Files
retrowin32/doc/performance.md
2023-01-16 19:02:49 -08:00

1.5 KiB

Performance notes

Dumping assembly

$ cargo install cargo-show-asm
$ cargo asm --wasm -p win32 mov_r32

Profiling on Mac

$ brew install cargo-instruments
$ cargo instruments --release -t time -p retrowin32 -- exe/zip/zip.exe 200

Registers struct

Registers are known named slots, e.g. eax, ebx. It's natural to represent them as like

struct Registers {
    eax: u32,
    ebx: u32,
    ...
}

But most instructions refer to registers indirectly, as an integer. So to look up a register you might write code like:

enum Reg { EAX, EBX, ... }
fn get_reg(regs: &Registers, reg: Reg) {
    match reg {
        Reg::EAX => regs.eax,
        ...
    }
}

Unfortunately it seems that, even if the values of the Reg enum are integers that cleanly map to "the nth u32 in the registers struct", the above get_reg function gets generated by LLVM as a switch table rather than math. (The behavior seems the same between C++ and Rust so it seems to be an LLVM thing; it generates more efficient code when regs is a global but it's still not ideal.)

If you instead do something that has the same layout in memory but is more clearly integer-indexed:

struct Registers {
    r32: [u32; 8],
}

then the code generated is ideal. But then accessing those registers in Rust code ends up pretty miserable relative to the named registers.

So instead we just use the first struct with #[repr(C)] and do some casting to get the efficient codegen of the latter.