NAND Organization – NewMaxx’s SSD Page

NAND is designed to be accessed in parallel for the best performance. To this end, it is organized in various ways to promote parallelization, from big to small. This section provides a basic understanding of this organization.

Old Section

NAND Package
This is probably the organizational level of flash you are most familiar with as any SSD will have one or more NAND packages on its printed circuit board (PCB). A package is made up of one or more NAND dies that are stacked vertically in series. There are limits to how many dies may be stacked due to height, signal integrity, and yield viability.
NAND packages can use retimers to repropagate the signal down wires that are attached to every die. The dies are stacked in an offset manner from one to the next to accommodate these wires, a layout which can require dummy dies to prevent warpage. The typical limit at this time is sixteen dies per package, for sixteen die package or hex die package (16DP or HDP), which puts some limitations on how much flash a drive can have depending on the form factor.
M.2 2280 consumers drives usually will have two to eight NAND packages in total, with packages on one or both sides. With 1Tb NAND dies becoming more common, one or two TB per package is possible with up to 16TB being technically feasible. NAND packages have other characteristics such as number of chip enable (CE) pins which is a separate topic but effectively determines how the controller can interact with and use the dies.
NAND Die
The NAND die is the logical unit (number, LUN) of NAND flash which is directly interfaced. Ideally the controller will have at least one die per channel, but it is possible to interleave more. Peak performance on consumer parts is usually reached with four dies per channel which with a powerful eight-channel controller implies 32 dies. The controller is able to switch between banks of dies and dies within a bank so that multiple are active simultaneously. A die with enough bad blocks effectively bricks the consumer SSD.
Planes
Each die will have one or more planes for internal parallelization. Two-plane dies were very common up until recently where four-plane (quad-plane) dies have become the typical and six-plane (hexa-plane) dies are a reality. Multi-planar operations and “awareness” of the internal structure of such dies allows the flash manufacturer to use special techniques to get better performance and to mitigate diminishing returns. The controller can access the same plane offset across multiple dies for a sort of “superplane” and internally dies can access planes independently depending on the workload.
Sub-plane
Certain flash architectures can use sub-planes, or divisions within a plane, to increase performance for smaller I/O. There has also been a “tile” architecture which splits the buffers for each plane for a similar reason, and which may also have redundancy.
Superblock
Dies accessed in parallel may be accessed from the same block offset for a “superblock” that remains open for writes, ideally sequentially. The superblock is an important structure because blocks have characteristics related to their position in the flash dies which benefits from grouping and there’s also parity information available to repair truly bad data, including if bad blocks must be retired and spares cycled into use.
Block
Each block within a die has a large number of pages and the size of the granular block has increased with flash layer count and density. A larger block size introduces multiple issues related to wear, not only due to the time a block may be kept open but also because garbage collection and maintenance is executed on the block level. This may even become a problem for reads, as with read disturb, on future workloads, requiring additional writes for data refresh.
Sub-block
Blocks can also be accessed in smaller chunks or divisions for a variety of reasons, although usually to reduce wear by working around the normal block granularity.
Superpage
Pages with the same offset across multiples dies can be accessed or written at once in what is called a “superpage.” These pages will be of the same type of mode, such as SLC mode.
Physical Page
The other important granularity for NAND flash dies, aside from the block, is the page. Modern flash usually has a physical page size of 16KiB. The actual size is larger as they is ECC and metadata outside of the user data. Flash with two or more bits per cell will have two or more pages per cell and wordline with different characteristics. Upper and lower pages have different levels of impact and sensitivity.
Logical Page (Sub-page)
Filesystems and typical I/O are smaller than the physical page size, usually at 4KiB. Four of these logical or sub pages can fit into a single physical page, often by write coalescing. There are also read techniques for smaller I/O.
Wordline
A wordline is a collection of cells forming one page per bit. Triple-level cells (TLC), as an example, will have three bits per cell and therefore three pages across the word line of cells. Bits are organized from least (LSB) to most significant bit (MSB) and the programming steps engaged work around the characteristics of these pages to mitigate program disturb for adjacent wordlines.
Bitline
Bitlines, or strings, are the other dimension of flash which also has to be accessed carefully to reduce disturb.
Cell
The cell is essentially the atomic unit of flash, being the smallest effective structure of data relevance. The cell holds a charge from 1 (empty) to 0 by which data may be discerned. As NAND is non-volatile, the goal is to keep this charge intact or trapped, which involves voltage and quantum properties. Over time the charge will escape or change which impacts the voltage thresholds which are used to determine bit values.
There are multiple architectures, including floating gate and charge trap flash, but the basic idea is to etch memory holes or pillars for access to these memory cells (after deposition). This can become more complicated as the layer count increases, requiring string-stacking in decks (see above). Future flash may use split-gate (SG) architectures to double the cell count which improved relative endurance due to structural shape.