How an SSD Works: An Introduction to Quantum Physics
Article URL: https://read.thecoder.cafe/p/how-an-ssd-works
Comments URL: https://news.ycombinator.com/item?id=47777284
Points: 3
# Comments: 0
Article URL: https://read.thecoder.cafe/p/how-an-ssd-works
Comments URL: https://news.ycombinator.com/item?id=47777284
Points: 3
# Comments: 0
Article URL: https://corelab.tech/samsung-990-evo-plus-linux-ssd-heat-test/
Comments URL: https://news.ycombinator.com/item?id=47735354
Points: 1
# Comments: 1
Article URL: https://9to5mac.com/2026/04/09/diy-macbook-neo-upgrade-can-boost-the-ssd-to-1tb-using-iphone-parts/
Comments URL: https://news.ycombinator.com/item?id=47707158
Points: 11
# Comments: 1
Article URL: https://github.com/apple/ml-ssd
Comments URL: https://news.ycombinator.com/item?id=47626941
Points: 3
# Comments: 0
We built a native inference engine that runs quantized LLMs directly from SSD using memory-mapped I/O. The model never fully loads into RAM β the OS pages weights on demand as each layer executes.
*What it does:* - Mixtral 8x22B (80GB, 141B params) runs on a machine with 48GB RAM - Model loads in 0.3 seconds (vs 190s with llama.cpp) - Produces correct output: "What is 2+2?" β "The sum of 2 and 2 is 4." - Zero dependencies β custom tensor engine, custom GGUF parser, no ggml/llama.cpp
*How it works:* - `mmap()` the GGUF file. The OS handles SSDβRAM paging transparently - Quantize the input to Q8_K, compute dot products directly against Q4_K/Q5_K/Q6_K weights in the quantized domain β no dequantization to float32 - AVX2 SIMD + 8-thread parallel matvec - For MoE models: only 2 of 8 experts are active per token, so most weights stay cold on disk
*The hard part we solved:* GGUF models are calibrated for a specific dot product computation path (ggml's "quantize input β integer multiply-accumulate β late float conversion"). If you naively dequantize weights to float32 and do a standard dot product, the per-operation error is tiny (~0.001%) but compounds across 56 transformer layers into completely wrong output. We had to reverse-engineer and match ggml's exact scalar computation β block-level integer accumulation with 8-lane parallel reduction β to get correct results.
*What it doesn't do (yet):* - Speed: ~0.08 tok/s on the 80GB model (CPU-only, no GPU offload) - No interactive chat UI - Only K-quant GGUF formats (Q4_K_M, Q5_K_M, Q6_K β covers ~90% of models on HuggingFace) - Windows only (Linux stubs exist but untested)
The architecture comes from my "work in progress" WayOS (https://github.com/cloudlinqed/WayOS), an AI-first OS that treats SSD/RAM/VRAM as a unified memory hierarchy.
GitHub: https://github.com/cloudlinqed/WayInfer
Comments URL: https://news.ycombinator.com/item?id=47614947
Points: 1
# Comments: 0
Article URL: https://github.com/SharpAI/SwiftLM
Comments URL: https://news.ycombinator.com/item?id=47604354
Points: 77
# Comments: 47
Article URL: https://www.tomshardware.com/pc-components/ssds/samsung-announces-bm9k1-pcie-5-0-qlc-ssd
Comments URL: https://news.ycombinator.com/item?id=47558931
Points: 3
# Comments: 1
Iβve been building an open-source backup CLI in Go: https://github.com/Cloudstic/cli
Docs: https://docs.cloudstic.com
Features:
- encrypted backups
- content-addressed deduplication
- local / S3 / B2 / SFTP storage
- local / Google Drive / OneDrive / SFTP sources
- restore to ZIP or directory
One thing I wanted to get right was portable drives. If the same external SSD moves between machines, the tool uses its GPT partition UUID to keep the backup history tied to the drive itself, instead of treating every new mount path as a different source.Recent posts:
- https://blog.cloudstic.com/2026/03/12/backing-up-portable-drives/
- https://blog.cloudstic.com/2026/03/16/practical-backups-with-cloudstic-profiles/
Would love feedback
Comments URL: https://news.ycombinator.com/item?id=47398576
Points: 1
# Comments: 0
I built a prototype GPU-based vector search system that runs locally on a consumer PC.
Hardware:
RTX 3090 consumer CPU NVMe SSD
Dataset:
~70 million vectors (384 dimensions)
Performance:
~48 ms search latency for top-k results.
This corresponds to roughly ~1.45 billion vector comparisons per second on a single GPU.
The system uses a custom GPU kernel and a two-stage search pipeline (binary filtering + floating-point reranking).
My goal was to explore whether large-scale vector search could run efficiently on consumer hardware instead of large datacenter clusters.
After thousands of hours of work and many failed attempts the results finally became stable enough to benchmark.
I'm currently exploring how far this approach can scale.
I'm currently exploring how far this approach can scale.
I'd be very interested to hear how others approach large-scale vector search on consumer hardware.
Happy to answer questions.
Comments URL: https://news.ycombinator.com/item?id=47400954
Points: 1
# Comments: 4
Article URL: https://news.ycombinator.com/submitted?id=redohmy
Comments URL: https://news.ycombinator.com/item?id=47372795
Points: 2
# Comments: 0
Article URL: https://www.macrumors.com/2026/03/10/macbook-neo-slower-ssd-speeds/
Comments URL: https://news.ycombinator.com/item?id=47332370
Points: 1
# Comments: 2
Hi HN,
Iβve been exploring how far large language models can be pushed on machines with limited memory.
I built an experimental runtime and architecture approach focused on making extremely large models more feasible on systems with around 32GB of RAM.
The core idea is combining several efficiency techniques:
ternary weight representation {-1, 0, +1} (~1.58 bits per weight), sparse execution that skips zero weights, memory-mapped layer streaming from NVMe storage, and lightweight tensor unpacking optimized for Apple Silicon.
Instead of keeping the entire model in RAM, weights can be streamed from fast SSD storage and unpacked during execution. This shifts the bottleneck from memory capacity toward storage bandwidth and compute efficiency.
Early experiments show significant compression compared to FP16 weights (for example TinyLlama-1.1B shrinking from ~2.05GB to ~0.24GB with ternary packing).
The project is still experimental, but the goal is to explore whether extreme compression + sparsity + SSD streaming can make much larger models practical on consumer machines.
Paper: https://opengraviton.github.io/paper.html
Runtime: https://github.com/opengraviton/graviton-native
Iβd really appreciate feedback from people working on inference engines, quantization, or efficient model architectures.
Comments URL: https://news.ycombinator.com/item?id=47315029
Points: 2
# Comments: 1
Article URL: https://github.com/tanishqkumar/ssd
Comments URL: https://news.ycombinator.com/item?id=47252343
Points: 1
# Comments: 0
Article URL: https://github.com/jundot/omlx
Comments URL: https://news.ycombinator.com/item?id=47247294
Points: 4
# Comments: 1