
I started by searching for interesting strings:

```
[Forge] > Enter Name:
[Forge] > Enter Key:
[*] Consult the Norns...
[VM] Stack Overflow!
The ribbon tightens! %s has forged Gleipnir!
```

The `[VM] Stack Overflow!` string led me to the VM dispatcher at `0x140001850`.

### Program Flow

Tracing from `main` (`0x140001010`):

1. Display banner/story
2. Read username into buffer
3. Read serial key into buffer
4. Call validation function `0x140001e10`
5. Display success or failure message

---

## The Yggdrasil Virtual Machine

### VM Structure (0x140001850)

The VM uses a simple architecture:
- **9 registers**: R0-R8 (64-bit each)
- **Stack**: 1024 entries
- **Instruction pointer** and **stack pointer**

### Opcode Table

| Opcode | Mnemonic | Format | Description |
|--------|----------|--------|-------------|
| 0x00 | HALT | `00` | Stop execution |
| 0x01 | LOAD | `01 reg imm64` | Load 64-bit immediate into register |
| 0x02 | MOV | `02 dst src` | Copy register to register |
| 0x03 | ADD | `03 dst src` | dst += src |
| 0x04 | SUB | `04 dst src` | dst -= src |
| 0x05 | XOR | `05 dst src` | dst ^= src |
| 0x06 | MUL | `06 dst src` | dst *= src |
| 0x07 | PUSH | `07 reg` | Push register to stack |
| 0x08 | POP | `08 reg` | Pop stack to register |
| 0x09 | JMP | `09 addr64` | Unconditional jump |
| 0x0A | JZ | `0A addr64` | Jump if R0 == 0 |
| 0x0B | NOP | `0B` | No operation |

---

## Serial Validation (0x140001e10)

The validation function:

1. **Parses serial** as 4 hex values separated by non-alphanumeric characters
   - Example: `AAAA-BBBB-CCCC-DDDD`
   - Each part can be up to 16 hex digits (64-bit)

2. **Initializes VM context** via `0x140001b90`
   - R0-R8 set to 0
   - Serial parts loaded into R5-R8

3. **Generates bytecode** via `0x140001bf0` based on username

4. **Executes VM** via `0x140001850`

5. **Checks result**: Success if `R0 == 0x13371337CAFEBABE`

---

## Bytecode Generator (0x140001bf0)

### PRNG Seeding

The username is hashed using **FNV-1a**:

```c
uint32_t fnv1a_hash(char *username) {
    uint32_t hash = 0x811c9dc5;  // FNV offset basis
    while (*username) {
        hash = (*username ^ hash) * 0x1000193;  // FNV prime
        username++;
    }
    return hash;
}
```

### LCG PRNG

The hash seeds a Linear Congruential Generator:

```c
uint32_t lcg_next(uint32_t state) {
    return state * 0x41c64e6d + 0x3039;
}
```

### Coefficient Extraction

Four coefficients (C1-C4) are generated, each from two LCG steps:

```c
state1 = lcg_next(state);
state2 = lcg_next(state1);
coefficient = ((state1 >> 16) & 0x7fff) | (state2 & 0x7fff0000);
```

This creates a 30-bit value with bit 15 always 0.

Four additional PRNG values (p1-p4) are extracted:
```c
state = lcg_next(state);
p_value = (state >> 16) & 0x7fff;
```

### Generated Bytecode Structure

The bytecode performs these operations:

```asm
LOAD R1, C1          ; Load coefficient 1
LOAD R2, C2          ; Load coefficient 2
LOAD R3, C3          ; Load coefficient 3
LOAD R4, C4          ; Load coefficient 4

XOR  R1, R5          ; R1 = C1 ^ S1 (serial part 1)
LOAD R0, p1          ; Load PRNG offset
ADD  R1, R0          ; R1 = (C1 ^ S1) + p1

ADD  R2, R6          ; R2 = C2 + S2
LOAD R0, p2
XOR  R2, R0          ; R2 = (C2 + S2) ^ p2

SUB  R3, R7          ; R3 = C3 - S3
LOAD R0, p3
ADD  R3, R0          ; R3 = (C3 - S3) + p3

XOR  R4, R8          ; R4 = C4 ^ S4
LOAD R0, p4
XOR  R4, R0          ; R4 = (C4 ^ S4) ^ p4

MOV  R0, R1          ; Start accumulation
ADD  R0, R2          ; R0 = R1 + R2
ADD  R0, R3          ; R0 = R0 + R3
ADD  R0, R4          ; R0 = R0 + R4
HALT
```

---

## The Equation

From the bytecode analysis, the final equation is:

```
R0 = (C1 ^ S1 + p1) + ((C2 + S2) ^ p2) + (C3 - S3 + p3) + ((C4 ^ S4) ^ p4)
```

Where:
- **C1-C4**: PRNG-derived coefficients (from username)
- **S1-S4**: Serial parts (user input)
- **p1-p4**: PRNG-derived offsets
- **Target**: `R0 == 0x13371337CAFEBABE`

### Solving for S4

Since we have 4 unknowns and 1 equation, we can fix S1, S2, S3 and solve for S4:

```
TARGET = partial + ((C4 ^ S4) ^ p4)

where partial = (C1 ^ S1 + p1) + ((C2 + S2) ^ p2) + (C3 - S3 + p3)

Solving:
  (C4 ^ S4) ^ p4 = TARGET - partial
  C4 ^ S4 = (TARGET - partial) ^ p4
  S4 = C4 ^ ((TARGET - partial) ^ p4)
```

---

## Heimdall Anti-Debug (0x140001fc0)

The anti-debug system uses three checks:

1. **IsDebuggerPresent()** - Windows API
2. **Timing check** - rdtsc before/after a loop, fails if > 100000 cycles
3. **PEB.BeingDebugged** - Direct PEB flag check

If any check triggers, Heimdall injects additional bytecode:
```asm
LOAD R5, 0xBADF00D
ADD  R0, R5
```

This corrupts the calculation, making the serial fail even if mathematically correct.