8 Commits

Author SHA1 Message Date
Xuan-Son Nguyen
7f40c3f477 feat: add HalStorage (#656)
## Summary

Continue my changes to introduce the HAL infrastructure from
https://github.com/crosspoint-reader/crosspoint-reader/pull/522

This PR touches quite a lot of files, but most of them are just name
changing. It should not have any impacts to the end behavior.

## Additional Context

My plan is to firstly add this small shim layer, which sounds useless at
first, but then I'll implement an emulated driver which can be helpful
for testing and for development.

Currently, on my fork, I'm using a FS driver that allow "mounting" a
local directory from my computer to the device, much like the `-v` mount
option on docker. This allows me to quickly reset `.crosspoint`
directory if anything goes wrong. I plan to upstream this feature when
this PR get merged.

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? NO
2026-02-09 07:29:14 +11:00
Daniel Chelling
83315b6179 perf: optimize large EPUB indexing from O(n^2) to O(n) (#458)
## Summary

Optimizes EPUB metadata indexing for large books (2000+ chapters) from
~30 minutes to ~50 seconds by replacing O(n²) algorithms with O(n log n)
hash-indexed lookups.

Fixes #134

## Problem

Three phases had O(n²) complexity due to nested loops:

| Phase | Operation | Before (2768 chapters) |
|-------|-----------|------------------------|
| OPF Pass | For each spine ref, scan all manifest items | ~25 min |
| TOC Pass | For each TOC entry, scan all spine items | ~5 min |
| buildBookBin | For each spine item, scan ZIP central directory | ~8.4
min |

Total: **~30+ minutes** for first-time indexing of large EPUBs.

## Solution

Replace linear scans with sorted hash indexes + binary search:

- **OPF Pass**: Build `{hash(id), len, offset}` index from manifest,
binary search for each spine ref
- **TOC Pass**: Build `{hash(href), len, spineIndex}` index from spine,
binary search for each TOC entry
- **buildBookBin**: New `ZipFile::fillUncompressedSizes()` API - single
ZIP central directory scan with batch hash matching

All indexes use FNV-1a hashing with length as secondary key to minimize
collisions. Indexes are freed immediately after each phase.

## Results

**Shadow Slave EPUB (2768 chapters):**

| Phase | Before | After | Speedup |
|-------|--------|-------|---------|
| OPF pass | ~25 min | 10.8 sec | ~140x |
| TOC pass | ~5 min | 4.7 sec | ~60x |
| buildBookBin | 506 sec | 34.6 sec | ~15x |
| **Total** | **~30+ min** | **~50 sec** | **~36x** |

**Normal EPUB (87 chapters):** 1.7 sec - no regression.

## Memory

Peak temporary memory during indexing:
- OPF index: ~33KB (2770 items × 12 bytes)
- TOC index: ~33KB (2768 items × 12 bytes)
- ZIP batch: ~44KB (targets + sizes arrays)

All indexes cleared immediately after each phase. No OOM risk on
ESP32-C3.

## Note on Threshold

All optimizations are gated by `LARGE_SPINE_THRESHOLD = 400` to preserve
existing behavior for small books. However, the algorithms work
correctly for any book size and are faster even for small books:

| Book Size | Old O(n²) | New O(n log n) | Improvement |
|-----------|-----------|----------------|-------------|
| 10 ch | 100 ops | 50 ops | 2x |
| 100 ch | 10K ops | 800 ops | 12x |
| 400 ch | 160K ops | 4K ops | 40x |

If preferred, the threshold could be removed to use the optimized path
universally.

## Testing

- [x] Shadow Slave (2768 chapters): 50s first-time indexing, loads and
navigates correctly
- [x] Normal book (87 chapters): 1.7s indexing, no regression
- [x] Build passes
- [x] clang-format passes

## Files Changed

- `lib/Epub/Epub/parsers/ContentOpfParser.h/.cpp` - OPF manifest index
- `lib/Epub/Epub/BookMetadataCache.h/.cpp` - TOC index + batch size
lookup
- `lib/ZipFile/ZipFile.h/.cpp` - New `fillUncompressedSizes()` API
- `lib/Epub/Epub.cpp` - Timing logs

<details>
<summary><b>Algorithm Details</b> (click to expand)</summary>

### Phase 1: OPF Pass - Manifest to Spine Lookup

**Problem**: Each `<itemref idref="ch001">` in spine must find matching
`<item id="ch001" href="...">` in manifest.

```
OLD: For each of 2768 spine refs, scan all 2770 manifest items
     = 7.6M string comparisons

NEW: While parsing manifest, build index:
     { hash("ch001"), len=5, file_offset=120 }
     
     Sort index, then binary search for each spine ref:
     2768 × log₂(2770) ≈ 2768 × 11 = 30K comparisons
```

### Phase 2: TOC Pass - TOC Entry to Spine Index Lookup

**Problem**: Each TOC entry with `href="chapter0001.xhtml"` must find
its spine index.

```
OLD: For each of 2768 TOC entries, scan all 2768 spine entries
     = 7.6M string comparisons

NEW: At beginTocPass(), read spine once and build index:
     { hash("OEBPS/chapter0001.xhtml"), len=25, spineIndex=0 }
     
     Sort index, binary search for each TOC entry:
     2768 × log₂(2768) ≈ 30K comparisons
     
     Clear index at endTocPass() to free memory.
```

### Phase 3: buildBookBin - ZIP Size Lookup

**Problem**: Need uncompressed file size for each spine item (for
reading progress). Sizes are in ZIP central directory.

```
OLD: For each of 2768 spine items, scan ZIP central directory (2773 entries)
     = 7.6M filename reads + string comparisons
     Time: 506 seconds

NEW: 
  Step 1: Build targets from spine
          { hash("OEBPS/chapter0001.xhtml"), len=25, index=0 }
          Sort by (hash, len)
  
  Step 2: Single pass through ZIP central directory
          For each entry:
            - Compute hash ON THE FLY (no string allocation)
            - Binary search targets
            - If match: sizes[target.index] = uncompressedSize
  
  Step 3: Use sizes array directly (O(1) per spine item)
  
  Total: 2773 entries × log₂(2768) ≈ 33K comparisons
  Time: 35 seconds
```

### Why Hash + Length?

Using 64-bit FNV-1a hash + string length as a composite key:
- Collision probability: ~1 in 2⁶⁴ × typical_path_lengths
- No string storage needed in index (just 12-16 bytes per entry)
- Integer comparisons are faster than string comparisons
- Verification on match handles the rare collision case

</details>

---

_AI-assisted development. All changes tested on hardware._
2026-01-28 01:29:15 +11:00
Dave Allie
fb5fc32c5d Add exFAT support (#150)
## Summary

* Swap to updated SDCardManager which uses SdFat
* Add exFAT support
  * Swap to using FsFile everywhere
* Use newly exposed `SdMan` macro to get to static instance of
SDCardManager
* Move a bunch of FsHelpers up to SDCardManager
2025-12-30 16:09:30 +11:00
Dave Allie
071ccb9d1b Custom zip parsing (#140)
## Summary

* Use custom zip central directory parsing to lower memory usage when
loading zipped epub content
2025-12-29 21:17:29 +11:00
Jonas Diemer
926c786705 Keep ZipFile open to speed up getting file stats. (#76)
Still a bit raw, but gets the time required to determine the size of
each chapter (for reading progress) down from ~25ms to 0-1ms.

This is done by keeping the zipArchive open (so simple ;)).

Probably we don't need to cache the spine sizes anymore then...

---------

Co-authored-by: Dave Allie <dave@daveallie.com>
2025-12-21 14:38:51 +11:00
Dave Allie
c7a32fe41f Remove tinyxml2 dependency replace with expat parsers (#9) 2025-12-13 19:36:01 +11:00
Dave Allie
de453fed1d Stream inflated EPUB HTMLs down to disk instead of inflating in memory (#4)
* Downgrade miniz for stability

* Stream HTML from ZIP down to disk instead of loading all in mem
2025-12-08 00:39:17 +11:00
Dave Allie
2ccdbeecc8 Public release 2025-12-03 22:06:45 +11:00