Home / Technology / Apple's AI Leaps: Flash Memory Powers New On-Device Models

Apple's AI Leaps: Flash Memory Powers New On-Device Models

9 Jun

•

Summary

New on-device AI models store parameters in NAND flash, not DRAM.
Routing decisions are made once per prompt, not per token.
Active parameter count scales from 1B to 4B based on task.

Apple's AI Leaps: Flash Memory Powers New On-Device Models

Apple announced its third-generation foundation models, AFM 3, at WWDC26, fundamentally altering on-device AI capabilities. These new models, developed with Google, move the entire weight set off DRAM and into NAND flash storage. This innovation bypasses the traditional memory limitations that capped the size of local AI models.

The AFM 3 Core Advanced model boasts 20 billion parameters stored in flash. To overcome slow NAND-to-DRAM bandwidth, it makes routing decisions once per prompt, selecting specific experts to load into DRAM. This approach allows for dynamic scaling of active parameters, from 1 billion for simple tasks to 4 billion for complex ones.

While Apple has detailed the memory architecture, practical deployment constraints like energy usage and thermal performance remain undisclosed. A full technical report with benchmarks is expected later this summer. This development presents enterprise architects with a substantial 20-billion-parameter on-device option for agentic workloads, shifting the primary constraint from model capability to device hardware.

Disclaimer: This story has been auto-aggregated and auto-summarised by a computer program. This story has not been edited or created by the Feedzop team.