CASE STUDY 01 · OPEN SOURCE · C++ / JSI / SIMD
Making base64 fast enough to disappear
Three pull requests to react-native-quick-base64 that took base64 off the flamegraph in a production React Native image pipeline — shipped in v3.0.0.
- ~150×
- decode speedup on large payloads (SIMD micro-benchmark)
- 2-3×
- end-to-end improvement in a real RN image pipeline
- v3.0.0
- shipped in a major release — maintainer-validated
The problem
Base64 is the kind of thing you never think about until it’s the slowest line in a flamegraph.
I kept hitting it in a React Native image pipeline: every encode and decode of a large payload ran on the JS thread, and react-native-quick-base64 was already the fastest JavaScript-reachable option. So the ceiling wasn’t the library’s algorithm — it was everything happening between JavaScript and native on every single call. That’s where I went looking.
The decision
There were three separate costs, and I wanted to be honest about which ones were actually worth touching. The rule I gave myself: a speedup has to be worth the complexity it adds, and the complexity has to stay contained — no architecture-specific code leaking into the public API, no behaviour changes that force a migration.
That ruled out a rewrite. What it ruled in was a sequence of targeted changes, each one measurable on its own: first stop copying data needlessly across the JSI boundary, then make the actual codec use the CPU’s SIMD units, then delete an encoding round-trip I shouldn’t have been paying for at all.
What I did
Eliminating ArrayBuffer copies at the JSI boundary (#49)
Every call was copying the payload as it crossed between JavaScript and C++. For small strings that’s noise; for image-sized buffers it’s most of the cost. This PR reworked the JSI boundary to operate on the underlying buffer directly instead of duplicating it on the way in and out.
A SIMD-accelerated codec via simdutf (#50)
The core change: replace the scalar base64.h implementation with simdutf, which encodes and decodes using SIMD instructions — processing 16 or 32 bytes per instruction instead of one. simdutf picks the best instruction set available at runtime, so the same binary stays fast across the architectures React Native actually ships on.
Skipping the UTF-16 → UTF-8 round-trip on decode (#51)
JavaScript strings are UTF-16; the decoder was re-encoding them to UTF-8 before doing any work. Using getStringData to read the string data directly removed that round-trip entirely — a smaller win than the SIMD codec, but free, and it compounds on every decode.
The proof
On a SIMD micro-benchmark, large-payload decode came out roughly 150× faster than the scalar path. That number is the headline, but it’s also the least honest one on its own — micro-benchmarks measure the codec in isolation. End to end, inside the real image pipeline where base64 is one step among many, the improvement landed at a still-decisive 2–3×. Both numbers matter; neither means anything without the other.
All three changes landed in v3.0.0 — a major release of a library I don’t maintain. The external sign-off is the part I can’t give myself.
One honest moment
The first version of the SIMD path looked great in the benchmark and was subtly wrong on a payload-length edge case — the padding handling diverged from the scalar path on certain sizes. The benchmark was happy; the test suite was not. That’s exactly the trade-off I’d flagged going in: SIMD buys speed and charges correctness risk, so the path is only worth shipping behind tests that can catch it lying.
How it went
-
Stop copying buffers at the boundary
Reworked the JSI boundary to operate on the buffer directly instead of duplicating it on every call — most of the cost on image-sized payloads.
-
Swap in the SIMD codec
Replaced the scalar base64 with simdutf — 16–32 bytes per instruction, best instruction set chosen at runtime. The decisive win.
-
Drop the UTF-16 round-trip
Used getStringData to read the string directly on decode, removing an encoding round-trip that ran every single time.
Working on something similar?
Book a 30-min call →