Home / Technology / Apple's AI Leaps: Flash Memory Powers New On-Device Models
Apple's AI Leaps: Flash Memory Powers New On-Device Models
9 Jun
Summary
- New on-device AI models store parameters in NAND flash, not DRAM.
- Routing decisions are made once per prompt, not per token.
- Active parameter count scales from 1B to 4B based on task.

Apple announced its third-generation foundation models, AFM 3, at WWDC26, fundamentally altering on-device AI capabilities. These new models, developed with Google, move the entire weight set off DRAM and into NAND flash storage. This innovation bypasses the traditional memory limitations that capped the size of local AI models.
The AFM 3 Core Advanced model boasts 20 billion parameters stored in flash. To overcome slow NAND-to-DRAM bandwidth, it makes routing decisions once per prompt, selecting specific experts to load into DRAM. This approach allows for dynamic scaling of active parameters, from 1 billion for simple tasks to 4 billion for complex ones.
While Apple has detailed the memory architecture, practical deployment constraints like energy usage and thermal performance remain undisclosed. A full technical report with benchmarks is expected later this summer. This development presents enterprise architects with a substantial 20-billion-parameter on-device option for agentic workloads, shifting the primary constraint from model capability to device hardware.