Phison ran a 20B model on a phone SoC by tiering it onto flash instead of DRAM

Direct:

Phison (8299.TWO) ran aiDAPTIV on MediaTek's (2454.TW) Dimensity 9500 at MDDC 2026 and got a reported 20-billion-parameter model running on one mobile-class device. The mechanism is the same one Phison has pushed up and down the stack all year: treat NAND as a memory tier so the model doesn't have to live entirely in DRAM.

aiDAPTIV+ slices the model and the KV cache, parks inactive slices on flash, and pulls them into working memory on demand. The closest analogy is OS paging applied to transformer weights, though Phison hasn't published the architecture in that detail. On PCs the numbers are already public: Phison and Acer (2353.TW) showed gpt-oss-120b on a laptop with 32GB of memory against the roughly 96GB a conventional load needs. The Dimensity demo is that approach pushed onto a phone SoC with an NPU instead of a discrete GPU.

Why it matters now is timing. DRAM is in a severe shortage. HBM is taking a growing share of DRAM wafers, LPDDR5 contract pricing is up double digits, and low-end phones are reportedly reverting to 4GB to protect margins. In that environment, running a bigger model on less DRAM by leaning on flash the device already ships is a BOM argument aimed straight at OEMs squeezed on memory cost.

The hedge: this is a conference demo, and Phison sells controllers and flash, so the framing is self-interested. No token-rate or time-to-first-token figures for the Dimensity run were disclosed. Swapping weights off NAND has a latency floor no middleware erases. Sustained tokens/sec under real context length is the number that decides whether this is product or theater, and it isn't out yet.

For SSD readers: flash endurance and controller behavior are starting to look like inference specs, not just storage specs. The same NAND-as-memory-tier pitch Phison made for Pascari at GTC is scaling down to embedded. If it holds, write amplification and DWPD on the boot device start mattering for AI performance on hardware that never had a GPU.

Drafted with AI assistance against parallel reporting.

Sources:

DigiTimes, Phison aiDAPTIV / Dimensity 9500 edge AI (May 2026, paywalled stub)
Tom's Hardware, Phison aiDAPTIV+ PC inference demo (2026)
Blocks & Files, aiDAPTIV+ KV-cache memory extension, gpt-oss-120b on 32GB (Jan 2026)
Phison press release, Pascari as AI memory tier at GTC 2026 (Mar 2026)
TrendForce / IDC, 2026 DRAM shortage and mobile LPDDR pricing (2026)

Reddit: https://ift.tt/Geop3UO