BitNet icon indicating copy to clipboard operation
BitNet copied to clipboard

[Benchmark] 0.31ms Inference for BitNet on Tesla T4 (Sparse Ternary Kernel)

Open HyperFoldUK opened this issue 1 month ago • 0 comments

Image

I was told that Fully Homomorphic Encryption (FHE) requires massive H100 clusters because 'bandwidth is the bottleneck.'

I disagreed. I argued that the bottleneck wasn't bandwidth—it was Arithmetic Density.

To prove it, I built a Sparse Ternary Kernel (HyperFold Phase 3) and ran it on a standard Tesla T4 (an old, free-tier GPU).

The Results:

Standard FHE on T4: ~20ms+ per bootstrap.

HyperFold FHE on T4: 0.31 ms per bootstrap.

We are achieving 3x the speed of State-of-the-Art H100 benchmarks on hardware that costs 1/10th the price.

This isn't just an optimization. This makes real-time FHE and 1.58-bit LLM inference viable on consumer silicon today. We don't need bigger pipes; we needed better fuel.

I would be happy to share the repo and discuss how this HyperFold kernel could be integrated into the next release of bitnet.cpp.

Best, Maurice Wilson HyperFold Technologies

HyperFoldUK avatar Dec 26 '25 09:12 HyperFoldUK