crosspoint-reader

mirror of https://github.com/crosspoint-reader/crosspoint-reader.git synced 2026-02-13 15:13:44 -08:00

Author	SHA1	Message	Date
Xuan-Son Nguyen	7f40c3f477	feat: add HalStorage (#656 ) ## Summary Continue my changes to introduce the HAL infrastructure from https://github.com/crosspoint-reader/crosspoint-reader/pull/522 This PR touches quite a lot of files, but most of them are just name changing. It should not have any impacts to the end behavior. ## Additional Context My plan is to firstly add this small shim layer, which sounds useless at first, but then I'll implement an emulated driver which can be helpful for testing and for development. Currently, on my fork, I'm using a FS driver that allow "mounting" a local directory from my computer to the device, much like the `-v` mount option on docker. This allows me to quickly reset `.crosspoint` directory if anything goes wrong. I plan to upstream this feature when this PR get merged. --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? NO	2026-02-09 07:29:14 +11:00
Daniel Chelling	83315b6179	perf: optimize large EPUB indexing from O(n^2) to O(n) (#458 ) ## Summary Optimizes EPUB metadata indexing for large books (2000+ chapters) from ~30 minutes to ~50 seconds by replacing O(n²) algorithms with O(n log n) hash-indexed lookups. Fixes #134 ## Problem Three phases had O(n²) complexity due to nested loops: \| Phase \| Operation \| Before (2768 chapters) \| \|-------\|-----------\|------------------------\| \| OPF Pass \| For each spine ref, scan all manifest items \| ~25 min \| \| TOC Pass \| For each TOC entry, scan all spine items \| ~5 min \| \| buildBookBin \| For each spine item, scan ZIP central directory \| ~8.4 min \| Total: ~30+ minutes for first-time indexing of large EPUBs. ## Solution Replace linear scans with sorted hash indexes + binary search: - OPF Pass: Build `{hash(id), len, offset}` index from manifest, binary search for each spine ref - TOC Pass: Build `{hash(href), len, spineIndex}` index from spine, binary search for each TOC entry - buildBookBin: New `ZipFile::fillUncompressedSizes()` API - single ZIP central directory scan with batch hash matching All indexes use FNV-1a hashing with length as secondary key to minimize collisions. Indexes are freed immediately after each phase. ## Results Shadow Slave EPUB (2768 chapters): \| Phase \| Before \| After \| Speedup \| \|-------\|--------\|-------\|---------\| \| OPF pass \| ~25 min \| 10.8 sec \| ~140x \| \| TOC pass \| ~5 min \| 4.7 sec \| ~60x \| \| buildBookBin \| 506 sec \| 34.6 sec \| ~15x \| \| Total \| ~30+ min \| ~50 sec \| ~36x \| Normal EPUB (87 chapters): 1.7 sec - no regression. ## Memory Peak temporary memory during indexing: - OPF index: ~33KB (2770 items × 12 bytes) - TOC index: ~33KB (2768 items × 12 bytes) - ZIP batch: ~44KB (targets + sizes arrays) All indexes cleared immediately after each phase. No OOM risk on ESP32-C3. ## Note on Threshold All optimizations are gated by `LARGE_SPINE_THRESHOLD = 400` to preserve existing behavior for small books. However, the algorithms work correctly for any book size and are faster even for small books: \| Book Size \| Old O(n²) \| New O(n log n) \| Improvement \| \|-----------\|-----------\|----------------\|-------------\| \| 10 ch \| 100 ops \| 50 ops \| 2x \| \| 100 ch \| 10K ops \| 800 ops \| 12x \| \| 400 ch \| 160K ops \| 4K ops \| 40x \| If preferred, the threshold could be removed to use the optimized path universally. ## Testing - [x] Shadow Slave (2768 chapters): 50s first-time indexing, loads and navigates correctly - [x] Normal book (87 chapters): 1.7s indexing, no regression - [x] Build passes - [x] clang-format passes ## Files Changed - `lib/Epub/Epub/parsers/ContentOpfParser.h/.cpp` - OPF manifest index - `lib/Epub/Epub/BookMetadataCache.h/.cpp` - TOC index + batch size lookup - `lib/ZipFile/ZipFile.h/.cpp` - New `fillUncompressedSizes()` API - `lib/Epub/Epub.cpp` - Timing logs <details> <summary><b>Algorithm Details</b> (click to expand)</summary> ### Phase 1: OPF Pass - Manifest to Spine Lookup Problem: Each `<itemref idref="ch001">` in spine must find matching `<item id="ch001" href="...">` in manifest. ``` OLD: For each of 2768 spine refs, scan all 2770 manifest items = 7.6M string comparisons NEW: While parsing manifest, build index: { hash("ch001"), len=5, file_offset=120 } Sort index, then binary search for each spine ref: 2768 × log₂(2770) ≈ 2768 × 11 = 30K comparisons ``` ### Phase 2: TOC Pass - TOC Entry to Spine Index Lookup Problem: Each TOC entry with `href="chapter0001.xhtml"` must find its spine index. ``` OLD: For each of 2768 TOC entries, scan all 2768 spine entries = 7.6M string comparisons NEW: At beginTocPass(), read spine once and build index: { hash("OEBPS/chapter0001.xhtml"), len=25, spineIndex=0 } Sort index, binary search for each TOC entry: 2768 × log₂(2768) ≈ 30K comparisons Clear index at endTocPass() to free memory. ``` ### Phase 3: buildBookBin - ZIP Size Lookup Problem: Need uncompressed file size for each spine item (for reading progress). Sizes are in ZIP central directory. ``` OLD: For each of 2768 spine items, scan ZIP central directory (2773 entries) = 7.6M filename reads + string comparisons Time: 506 seconds NEW: Step 1: Build targets from spine { hash("OEBPS/chapter0001.xhtml"), len=25, index=0 } Sort by (hash, len) Step 2: Single pass through ZIP central directory For each entry: - Compute hash ON THE FLY (no string allocation) - Binary search targets - If match: sizes[target.index] = uncompressedSize Step 3: Use sizes array directly (O(1) per spine item) Total: 2773 entries × log₂(2768) ≈ 33K comparisons Time: 35 seconds ``` ### Why Hash + Length? Using 64-bit FNV-1a hash + string length as a composite key: - Collision probability: ~1 in 2⁶⁴ × typical_path_lengths - No string storage needed in index (just 12-16 bytes per entry) - Integer comparisons are faster than string comparisons - Verification on match handles the rare collision case </details> --- _AI-assisted development. All changes tested on hardware._	2026-01-28 01:29:15 +11:00
Dave Allie	fb5fc32c5d	Add exFAT support (#150 ) ## Summary * Swap to updated SDCardManager which uses SdFat * Add exFAT support * Swap to using FsFile everywhere * Use newly exposed `SdMan` macro to get to static instance of SDCardManager * Move a bunch of FsHelpers up to SDCardManager	2025-12-30 16:09:30 +11:00
Dave Allie	071ccb9d1b	Custom zip parsing (#140 ) ## Summary * Use custom zip central directory parsing to lower memory usage when loading zipped epub content	2025-12-29 21:17:29 +11:00
Jonas Diemer	926c786705	Keep ZipFile open to speed up getting file stats. (#76 ) Still a bit raw, but gets the time required to determine the size of each chapter (for reading progress) down from ~25ms to 0-1ms. This is done by keeping the zipArchive open (so simple ;)). Probably we don't need to cache the spine sizes anymore then... --------- Co-authored-by: Dave Allie <dave@daveallie.com>	2025-12-21 14:38:51 +11:00
Dave Allie	c7a32fe41f	Remove tinyxml2 dependency replace with expat parsers (#9 )	2025-12-13 19:36:01 +11:00
Dave Allie	de453fed1d	Stream inflated EPUB HTMLs down to disk instead of inflating in memory (#4 ) * Downgrade miniz for stability * Stream HTML from ZIP down to disk instead of loading all in mem	2025-12-08 00:39:17 +11:00
Dave Allie	2ccdbeecc8	Public release	2025-12-03 22:06:45 +11:00

8 Commits