Introduction to SSDs
If you're reading this, you probably already know that a solid state drive is faster than a hard drive. That much is obvious the first time you boot off one. The more interesting question – and the reason this guide exists – is why. Why does one SSD hit 14 GB/s while another tops out at 500 MB/s? Why do some drives slow to a crawl when they're 80% full? Why does the same controller perform differently depending on which flash is behind it? The answers come down to how these drives are actually built and how the components interact.
An SSD replaces the spinning platters and mechanical read heads of a hard drive with NAND flash memory – silicon chips with no moving parts. There's no seek time and no rotational latency; the controller addresses data electronically. That alone makes random access hundreds of times faster than a hard drive, which is why even a basic SSD transforms the responsiveness of an older system. But the hardware is far from simple. Inside every SSD is a controller running specialized firmware, some form of volatile memory for metadata and caching, NAND flash organized into a hierarchy of dies, planes, blocks, and pages, and a flash translation layer that manages it all behind the scenes. The choices a manufacturer makes at each of these levels – controller architecture, flash type, caching strategy, interface – shape the drive's performance, endurance, and cost.
This guide walks through all of it. The goal is not just to help you pick the right drive, though it will do that. It's to give you enough understanding of the underlying technology that spec sheets and marketing claims stop being opaque. When you know what SLC caching actually does, or why QLC endurance numbers look the way they do, or what a controller's channel count means for sustained writes, you can evaluate drives on your own terms instead of relying on someone else's recommendation.
How to Use This Guide
The guide is split into two parts.
Part I: The Essentials (Chapters 1–8) covers what most readers need. We go through the hardware architecture of an SSD, how NAND flash stores and moves data, the interfaces and protocols that connect a drive to your system, practical buying guidance, drive health and maintenance, and how NAND is manufactured. If you're choosing a drive, building a system, or want to understand what the numbers on a spec sheet actually mean, Part I is self-contained.
Part II: Advanced Topics (Chapters 9–11) goes deeper into the engineering. This is where we get into voltage distributions and programming sequences, detailed error correction, enterprise storage topics, and emerging memory technologies. None of it is required for making purchasing decisions, but it's the kind of depth that's hard to find outside of whitepapers and ISSCC presentations. If you want to understand how TLC flash programs three bits into a single cell or what Flexible Data Placement changes at the protocol level, Part II is there for you.
You don't have to read linearly. Each chapter is reasonably self-contained, and cross-references point you to relevant sections when a concept builds on something covered elsewhere.
Buying an SSD and short on time? Chapters 1, 5, 6, and 7 cover what you need – how drives connect to your system, how to choose one, and how to maintain it. Come back for the rest when you're curious about what's happening under the hood.
SSD Hardware Architecture
A solid state drive is made up of a few basic components: a controller, some amount of volatile memory, NAND flash, and supporting circuitry on a printed circuit board (PCB). The choice and combination of these components determines the drive's performance, endurance, and intended role. A budget DRAM-less NVMe drive and a high-end PCIe 5.0 flagship can look identical from the outside – the differences are entirely in what's on the board.

Figure 1: Internal architecture of a solid state drive.
The Controller
The controller is the brain of the SSD. It's technically a microcontroller – a specialized processor, or more accurately an application-specific integrated circuit (ASIC), that manages the flash and handles communication with the host. The "controller" label covers not just the CPU cores but also the ECC engine, the flash interface logic, the host interface, and other specialized hardware blocks that all work together.
Most SSD controllers are based on ARM architectures, commonly Cortex-R5 or Cortex-R8 variants, which are optimized for real-time, low-latency workloads. Some manufacturers use alternatives – Silicon Motion has used ARC cores in their SATA controllers, and RISC-V is emerging as an option – but ARM dominates. Current consumer controllers have anywhere from one to five cores, clocking in the 500 MHz to 1.5+ GHz range. Newer PCIe 5.0 controllers like the Phison E28 and Silicon Motion SM2508 are built on 6nm process nodes and push higher core counts and clock speeds than their predecessors.
More cores generally means higher IOPS and better overall performance, but the core configuration matters too. Some designs specialize cores for different tasks – Samsung's controllers, for instance, use distinct core types for management, reads, writes, and host interaction. Phison's CoXProcessor technology adds co-processors alongside the main cores. These design choices shape a drive's performance profile in ways that raw core counts don't capture.
Beyond the CPU cores, the controller contains buffers, registers, and error correction and defect management logic operating at multiple levels – SRAM, DRAM, and NAND – to provide data path protection from the moment data enters the drive to when it's committed to flash. The controller also communicates with the flash packages over a bus, one per channel, with bandwidth determined by the I/O speed of the flash (measured in megatransfers per second, or MT/s) and the maximum the controller supports.
Volatile Memory: SRAM and DRAM
Volatile memory loses its contents on power loss. In an SSD, volatile memory serves several purposes: caching the controller's firmware, managing commands and instructions, temporarily storing data, and – most importantly – holding the metadata that the flash translation layer (FTL) needs to operate efficiently.
SRAM
Every SSD controller has some amount of SRAM (static random-access memory) embedded in its design, typically on the order of single-digit megabytes. SRAM is faster than DRAM but far more expensive per bit, so controllers only include a small amount. Think of it as the equivalent of a CPU's L1/L2 cache. It handles the most latency-sensitive tasks: boot code, write buffering, intermediate operations, and a portion of the metadata.
If a controller has no external DRAM chip, it is considered "DRAM-less." Some DRAM-less controllers do include a small amount of embedded SDRAM on-die – enough for a limited write cache – but they still lack the dedicated external DRAM package that higher-end drives use.
Why DRAM Matters
DRAM (dynamic random-access memory) is orders of magnitude faster to access than NAND flash. On an SSD, its primary role is storing the mapping table – the data structure that translates between logical block addresses (LBAs, what the operating system sees) and physical block addresses (PBAs, where data actually sits on the flash). The FTL needs to look up and update this table constantly, and doing so from NAND would be far too slow for good performance.
The general rule is one byte of mapping data per kilobyte of stored data, because the FTL needs to track a 4-byte address for every 4 KiB block. This means a 1 TB drive typically needs about 1 GB of DRAM to hold its full mapping table. Having less than this can impact performance under certain workloads, particularly random writes, when the controller has to swap portions of the map in and out of NAND.
DRAM also helps reduce write amplification by allowing the controller to defer and coalesce metadata updates. Instead of committing every address change to NAND immediately, the controller can batch those updates – resulting in fewer writes to flash and better endurance.
Current drives use DDR4 or LPDDR4/LPDDR4X, with DDR3 largely phased out of new designs. High-end PCIe 5.0 drives commonly use LPDDR4X for its balance of latency and power efficiency. Access latency is the most important factor for this use case, ahead of raw bandwidth. Unlike a hard drive, the SSD's DRAM is typically not used as a write cache for user data – its job is metadata.
DRAM-less Designs and Host Memory Buffer
Not every SSD has DRAM. Budget NVMe drives often skip the external DRAM chip to reduce cost, relying on the controller's SRAM and a feature called Host Memory Buffer (HMB). HMB allows the SSD controller to borrow a small portion of the system's main RAM to cache its mapping table. This requires OS and driver support, which modern versions of Windows provide.
Earlier DRAM-less controllers typically used 30–40 MB of HMB, with Windows capping the default allocation around 100 MB. Modern controllers supporting HMB 3.0 can request larger allocations, with 64 MB or more becoming common. Accessing HMB is significantly slower than the controller's own SRAM – it has to traverse the NVMe link to reach system memory – but it's still far faster than going to NAND for every mapping lookup.
For consumer workloads, a good DRAM-less drive with HMB performs well enough for most users. The penalty shows up primarily under sustained random writes or heavy multitasking, where the mapping table is constantly being updated. For a secondary drive or a system that mostly handles sequential workloads (gaming, media storage), the cost savings are usually worth it.
Non-Volatile Memory: NAND Flash
Non-volatile memory retains data even when powered off. In SSDs, this is NAND flash – the storage medium where your data actually lives. The "NVM" in NVMe (Non-Volatile Memory Express) refers to this. NAND is covered in depth in Chapters 3 and 4, but a few points are relevant to the hardware overview.
NAND is much faster to access than the magnetic storage on a hard drive, especially for random workloads. However, it's also much slower than DRAM, which is why the volatile memory hierarchy matters so much.
The NAND also stores a copy of all FTL metadata, including the mapping table. This is the master copy – the version in DRAM is a working cache. FTL metadata is typically stored in SLC (single-level cell) mode for faster access and better reliability, since losing or corrupting this data could make the drive unreadable.
One thing worth noting: data retention on NAND is not infinite. If a drive sits powered off for an extended period, charge leakage in the flash cells can eventually cause bit errors. The drive checks and refreshes data on power-on, but this is one reason you shouldn't use an unpowered SSD as a long-term archive.
Other Components
Beyond the main chips, an SSD's PCB includes several supporting components.
A power management integrated circuit (PMIC) regulates voltage and power delivery across the board. Some drives use discrete power management solutions, which can improve efficiency or thermal behavior. Enterprise drives often add power loss protection (PLP) in the form of capacitors or, less commonly, a small battery. PLP ensures that data in the volatile caches can be flushed to NAND during a sudden power loss – critical in server environments. For consumers, a UPS and surge protector provide a similar safety net at the system level.
The rest of the PCB contains the signal traces connecting controller to flash packages, plus the typical resistors, capacitors, and other passive components you'd find on any circuit board. NAND packaging and signal integrity are significant engineering concerns, particularly as flash I/O speeds increase – proper signal routing on the PCB matters more at 3,600 MT/s than it did at 800 MT/s.
The Flash Translation Layer
The flash translation layer (FTL) is not a physical component but the firmware running on the controller that makes the entire drive work. It handles address translation between logical and physical locations, schedules reads and writes, manages error correction, performs wear leveling and garbage collection, tracks bad blocks, and much more. The FTL is what makes raw NAND flash – which has significant constraints on how it can be written and erased – look like a simple block device to your operating system.
At its core, the FTL is a hardware abstraction layer. The host sends read and write commands to logical addresses; the FTL figures out where those addresses map to on the physical flash, handles all the complexity of NAND management behind the scenes, and returns the data. The quality of a drive's FTL implementation – how efficiently it manages mapping, how aggressively it garbage collects, how smartly it handles SLC caching – is one of the biggest differentiators between controllers, even when the underlying flash is the same.
There's growing interest in shifting some FTL responsibilities to the host side. Host-Based FTL (HB-FTL), related to concepts like Open-Channel SSDs and more recently NVMe features like Flexible Data Placement (FDP), allow the host to make smarter decisions about data placement rather than leaving everything to the drive's firmware. This reduces the overhead of garbage collection and can significantly improve performance and endurance, particularly in datacenter environments. For consumer drives, the FTL remains fully managed by the controller. The deeper mechanics of FTL operation – mapping granularity, caching algorithms, journaling strategies – are covered in Chapter 9.
Understanding NAND Flash
NAND flash is the storage medium inside every SSD. Understanding how it works at a basic level – how data is stored, how it's read back, and what limits its lifespan – goes a long way toward making sense of SSD specifications and trade-offs. This chapter covers the fundamentals. The deeper mechanics of voltage programming, error correction algorithms, and disturb effects are covered in Chapter 9.
Cell Types: SLC, MLC, TLC, QLC, PLC
NAND flash cells are categorized by how many bits they store. SLC (single-level cell) stores one bit, MLC (multi-level cell) stores two, TLC (triple-level cell) stores three, QLC (quad-level cell) stores four, and PLC (penta-level cell) stores five. The term "MLC" technically means two or more bits per cell, but the industry convention is to treat it as meaning exactly two.
Fitting more bits into the same cell increases capacity but comes with trade-offs in both performance and endurance. Moving from MLC to TLC increases capacity by 50% (two bits to three), while moving from TLC to QLC only gains 33% (three to four). Each step also reduces endurance and increases programming latency, because distinguishing between more voltage levels requires greater precision and more error correction overhead.
The number of possible voltage states doubles with each additional bit: SLC has 2 states, MLC has 4, TLC has 8, QLC has 16, and PLC has 32. Programming a cell to one of 16 states takes more time and care than programming to one of 2. Reading also gets slower, because more reference voltages are needed to determine which state the cell is in. This is the fundamental relationship between bit density, performance, and endurance that shapes every SSD design decision.

Figure 3: As bits per cell increase, voltage distributions become tighter and more numerous, requiring greater precision to read and program.
It's worth noting that a cell operating in SLC mode (one bit per cell, sometimes called pSLC) is not the same thing as a natively manufactured SLC cell. The underlying hardware is different. A TLC cell being used in SLC mode is faster and more durable than when used in TLC mode, but it's not equivalent to native SLC. This distinction matters for SLC caching, which is covered in Chapter 4.
The market has largely settled on TLC as the mainstream technology, with QLC growing in budget and high-capacity segments. MLC has been phased out of consumer drives. PLC remains in development, with limited production as of this writing.
How Data Is Stored
Each NAND cell stores data as a voltage level. A cell starts erased, with its default bit value of 1 (or 11 for MLC, 111 for TLC, and so on). Programming the cell means increasing its voltage to represent a different value. With SLC, this is straightforward – the cell is either in an erased state (1) or a programmed state (0). With TLC, the same cell needs to hold one of eight distinct voltage levels, each corresponding to a different three-bit value.
Voltage can only be increased during programming, not decreased. This is a fundamental constraint of NAND flash – to change data, you can't just overwrite the cell. The block containing that cell must first be erased back to its default state, then reprogrammed. This erase-before-write requirement drives much of the complexity in SSD firmware, from garbage collection to wear leveling (see Chapter 4).
Programming happens through a series of increasingly precise voltage pulses. The controller starts with larger steps to get close to the target level, then uses progressively smaller steps to fine-tune the final voltage. With multi-bit cells, the process involves multiple passes across the different bit pages (least significant bit, center significant bit, most significant bit for TLC), each building on the previous one. The exact programming sequences are a significant area of manufacturer optimization and are covered in detail in Chapter 9.
Over time, the stored charge can drift due to leakage, shifting a cell's voltage from its intended level. This is one of the mechanisms behind data retention loss and is why cells with more voltage states – where the margins between levels are tighter – are more susceptible to errors as they age.
How Data Is Read
Reading a cell means determining its voltage level by applying reference voltages and checking where the cell's charge falls relative to those thresholds. An SLC cell needs just one reference voltage to distinguish between its two states. A TLC cell needs seven reference voltages to distinguish between eight states. More states means more sensing steps and higher read latency.
For most consumer workloads, you're reading from the base flash – TLC or QLC – while SLC mode is used primarily as a write cache. This means the read performance you experience day-to-day is determined by the native cell type, not the SLC cache speed. Read performance from the base flash is inherently slower than from SLC-mode cells, both because of the additional sensing steps and because of heavier error correction demands on higher-density cells.
Controllers can optimize read performance through calibration techniques, including adjusting reference voltage thresholds over the drive's lifetime as cells wear and voltage distributions shift. Read retry tables and, in some cases, machine learning-based calibration help maintain read performance as the flash ages.
Error Correction: Keeping Your Data Safe
Error correction is essential because NAND flash is inherently imperfect. Cells wear out over program/erase cycles, voltage levels drift over time, neighboring cells can interfere with each other, and even reading data can slightly disturb adjacent cells. Without robust error correction, SSDs would be unusable within a fraction of their rated lifespan.
Modern SSDs use low-density parity-check (LDPC) error correction codes, which replaced the older BCH (Bose-Chaudhuri-Hocquenghem) codes used in earlier drives. The practical advantage of LDPC is its ability to perform both hard-decision and soft-decision decoding. Hard-decision decoding is fast but limited – it reads each cell as being in one state or another with no ambiguity. Soft-decision decoding is more powerful: it recognizes that a cell's voltage might be on the boundary between two states and uses probabilistic methods to determine the most likely value.
In practice, the controller tries hard-decision decoding first because it's faster. Only when that fails does it fall back to soft-decision decoding, using progressively more sensing levels to resolve ambiguous cells. This approach keeps performance high for healthy data while still being able to recover marginal data when needed. The trade-off is latency – soft-decision decoding takes more time, which is why heavily worn SSDs can show increased read latencies.
If ECC fails entirely for a particular page, the drive can still attempt to recover data through parity information distributed across the NAND (similar in concept to a RAID stripe). This is a last resort, but it provides an additional safety net. The detailed mechanics of LDPC decoding and multi-step sensing are covered in Chapter 9.
The Program/Erase Cycle
NAND flash must be erased before it can be rewritten. This erase-then-program sequence is called a program/erase cycle, or P/E cycle, and counting these cycles is how flash endurance is measured. Each cycle causes a small amount of physical damage to the cell – the high voltages involved in erasing gradually degrade the cell's ability to hold a precise charge. After enough cycles, the cell can no longer reliably distinguish between voltage levels, and it's effectively worn out.
The damage from programming and erasing far exceeds anything caused by reading, though reading does introduce its own minor effects (read disturb) that accumulate over time. This is why endurance specifications – typically expressed as terabytes written (TBW) or drive writes per day (DWPD) – are based on write volume, not read volume.
A key constraint is that erases happen at the block level, not the page level. A block contains many pages (the exact number varies by flash generation, but modern blocks are large – hundreds of pages). If the controller needs to update a single page within a block, it can't just erase and rewrite that page. It has to read out all the valid pages in the block, erase the entire block, and write everything back. This is the root cause of write amplification – the drive ends up writing more data to flash than the host actually sent. Managing write amplification through smart garbage collection and wear leveling is one of the FTL's most important jobs and is covered in Chapter 4.
Programming within a block is normally done sequentially – page after page in order – because this produces consistent performance and minimizes disturb effects on adjacent cells. When data flows through the SLC cache first and is then folded to the base TLC or QLC, this sequential write pattern is maintained. Direct-to-NAND writes (bypassing the SLC cache) tend to be more random, which can increase write amplification. The relationship between SLC caching, folding, and sustained write performance is covered in the next chapter.
Endurance varies significantly by cell type. SLC flash can endure on the order of 100,000 P/E cycles, MLC around 3,000–10,000, TLC around 1,000–3,000, and QLC around 500–1,000. These are rough figures – actual endurance depends on the specific flash generation, the ECC implementation, and the controller's wear management. In practice, consumer SSDs typically outlast their warranty period by a wide margin under normal workloads.
How SSDs Organize and Move Data
NAND flash is organized into a hierarchy designed for parallel access. Understanding this hierarchy – and the mechanisms the controller uses to manage data within it – explains most of the performance and endurance characteristics you'll see in practice.
NAND Topology: The Storage Hierarchy
NAND is organized from large to small: packages, dies, planes, blocks, and pages. Each level enables a form of parallelism that the controller exploits for performance.

Figure 2: NAND flash storage hierarchy – each level nests within the previous, from package down to individual pages.
Package
The NAND package is the physical chip you can see on the PCB. Each package contains one or more dies stacked vertically, with a practical limit of sixteen dies per package (16DP) due to height, signal integrity, and yield constraints. The dies are stacked in an offset pattern to accommodate wiring, and dummy dies may be added to prevent warpage. M.2 2280 drives typically have two to eight packages, on one or both sides of the board.
With current flash densities at 512 Gb to 1 Tb per die, a single package can hold one to two terabytes. The number of chip enable (CE) pins on each package determines how the controller can independently address the dies inside.
Die
The die is the logical unit of flash that the controller directly interfaces with. Current TLC dies are commonly 512 Gb to 1 Tb, and QLC dies can exceed 1 Tb, translating to 64–128 GiB or more per die. Peak consumer drive performance is usually reached with about four dies per channel – the controller can interleave operations across dies so that while one is busy programming, another is ready for the next command.
A die with enough accumulated bad blocks can effectively brick a consumer SSD, which is one reason spare blocks and wear leveling matter.
Plane
Each die has multiple planes for internal parallelism. Two-plane dies were standard until recently; four-plane (quad-plane) designs are now typical, and six-plane (hexa-plane) dies exist. Multi-plane operations allow the controller to access the same plane offset across multiple dies for a "superplane" effect, and the flash can access planes independently depending on the workload. More planes per die means more potential bandwidth, up to the limits of the channel and controller.
Certain architectures also use sub-planes – divisions within a plane – to improve performance for smaller I/O operations.
Block
Each plane contains thousands of blocks. The block is the smallest unit of NAND that can be erased, which makes it a critical granularity for garbage collection and wear management. Modern blocks are large – on the order of 24 MiB for current flash – and block size has been growing with layer count and density. Larger blocks mean more data is affected by each erase, which has implications for write amplification and wear.
QLC blocks are larger than TLC blocks at equivalent die density because each cell stores more bits.
Page
The page is the smallest unit the SSD can write. Modern TLC flash typically has 16 KiB physical pages. Multi-bit cells (TLC, QLC) have multiple pages per wordline – a TLC wordline has three pages (one per bit: LSB, CSB, MSB) with different programming characteristics and sensitivities. The lower pages (LSB) are faster and more reliable to program; upper pages (MSB) take longer and are more vulnerable to errors.
The 16 KiB physical page is larger than the typical 4 KiB filesystem block, so each physical page contains four logical sub-pages. The FTL manages this mismatch, coalescing smaller writes where possible and performing read-modify-write operations when needed.
Superblock and Superpage
The controller groups blocks with the same offset across all dies and planes into a superblock, and pages with the same offset across a superblock form a superpage. The controller writes one superpage at a time for maximum parallelism – filling an entire superblock sequentially before moving to the next. This grouping also enables parity protection across the constituent blocks, similar in concept to RAID striping, which provides a recovery mechanism if individual blocks go bad.
Channels and Interleaving
The controller communicates with the flash packages over channels, each with its own dedicated bandwidth. Consumer controllers typically have four to eight channels. Within each channel, the controller can switch between banks of dies so that multiple are active simultaneously – this is interleaving.
The performance benefit comes from overlapping operations. While one die is busy with a relatively slow program operation, the controller can issue commands to other dies on the same channel or different channels. More dies per channel means more interleaving opportunities, up to the controller's saturation point. Ideally you want at least two dies per channel, with four being the sweet spot for most consumer controllers. Beyond that, the returns diminish.
This is why lower-capacity drives are often slower than their higher-capacity siblings – fewer dies means less parallelism. A 500 GB drive with eight dies across four channels will generally outperform a 250 GB version with only four dies across those same channels.

Figure 5: The controller maximizes throughput by interleaving operations across channels and dies, overlapping slow program operations with reads and writes on other dies.
TRIM and Garbage Collection
TRIM
TRIM is a command (technically an ATA command, with UNMAP as the SCSI equivalent for USB-attached drives) that tells the SSD which blocks of data the operating system no longer needs. When you delete a file, the OS marks that space as free in the filesystem, but the SSD has no way to know unless it's told explicitly. TRIM provides that notification.
Once the SSD knows which data is stale, it can factor that into its garbage collection decisions – reclaiming those blocks without having to preserve their contents. Modern operating systems run TRIM automatically, typically once a week in Windows. Formatting a drive will effectively TRIM the entire device within minutes.
TRIM is less critical on modern drives than it once was because garbage collection has become much more aggressive, but it still contributes to maintaining consistent performance and endurance over time.
Garbage Collection
Because NAND must be erased at the block level before it can be rewritten, the SSD needs a way to reclaim partially-used blocks. This is garbage collection. The process works by reading the valid pages from one or more partially-filled blocks, writing them into a new block, and then erasing the old blocks to make them available again.
Garbage collection typically runs in the background during idle time, keeping a pool of erased blocks ready for new writes. When the drive is under heavy load or nearly full, GC becomes more constrained – there are fewer idle windows and fewer free blocks to work with, which can impact write performance. This is one practical reason to avoid filling an SSD completely.
The efficiency of garbage collection – how it selects blocks to clean, when it runs, how it coordinates with wear leveling – is a significant differentiator between controllers. The detailed algorithms are covered in Chapter 9.
Wear Leveling
Wear leveling ensures that program/erase cycles are distributed evenly across all blocks rather than concentrating on a few. Without it, blocks that host frequently-updated data would wear out while blocks holding static data (your OS files, applications that rarely change) would sit nearly untouched.
There are two basic approaches. Dynamic wear leveling writes new data to the least-worn free block, which naturally spreads wear across blocks that are actively being written. Static wear leveling goes further by periodically relocating static data from low-wear blocks to higher-wear blocks, freeing up the low-wear blocks for active use. Most modern drives use both.
The combination of garbage collection and wear leveling – deciding which blocks to clean, where to put the data, and how to balance wear across the device – is one of the FTL's core responsibilities.
SLC Caching
Most consumer SSDs use an SLC cache as a fast write buffer. The drive takes its base flash – TLC or QLC – and operates a portion of it in single-bit (SLC) mode, known as pseudo-SLC or pSLC. Writing one bit per cell instead of three or four is much faster and more durable, but it uses three to four times the physical capacity for the same amount of user data.
The SLC cache absorbs incoming writes at high speed. The data is then migrated to the base flash later, typically during idle time. This is why you'll see high sequential write speeds on a fresh drive that drop once the cache is exhausted.

Figure 4: Sequential write performance drops in tiers as the SLC cache fills – first to direct-to-TLC speeds, then to the folding-bottlenecked state.
Static SLC
Static pSLC is a fixed portion of the drive permanently reserved for SLC-mode operation. It lives in the over-provisioned space outside the user-accessible area, so it's always available regardless of how full the drive is. Static SLC provides consistent performance and has much higher endurance than the base flash – often an order of magnitude more P/E cycles – because it operates in the simpler single-bit mode. The trade-off is that it reduces the total capacity available for user data.
Dynamic SLC
Dynamic pSLC uses empty user space as a temporary SLC cache. As the drive fills up, the dynamic SLC pool shrinks because that space is needed for actual data. This is why a nearly-full SSD often has significantly worse burst write performance than the same drive at 50% capacity – there's less room for SLC caching.
Dynamic SLC shares a wear zone with the base flash, and the conversion process (SLC to native mode) introduces additional write amplification. For TLC, folding SLC data to native mode has an amplification factor of roughly 3x, since three SLC blocks consolidate into one TLC block.
Hybrid
Many drives combine static and dynamic SLC. A fixed static pool provides a baseline of fast, always-available cache, while dynamic SLC expands the cache when space permits. Samsung's TurboWrite is a well-known implementation of this approach. The balance between static and dynamic portions varies by drive and is a deliberate design trade-off between consistency and peak burst performance.
What Happens When the Cache Fills
Once the SLC cache is full, the drive must handle incoming data differently. The specific behavior depends on the drive's design, and this is where performance differences between drives become most visible.
Direct-to-NAND
Modern drives can bypass the SLC cache and write directly to the base flash (TLC or QLC). This gives you the native flash performance – slower than SLC mode but still reasonable on a good TLC drive. Direct-to-NAND writes tend to have higher write amplification than SLC-then-fold because the write pattern is more random, and there's a higher risk of data loss from power failure since upper pages have longer program times.
Folding
Folding is the process of migrating data from SLC blocks to the base flash. Multiple SLC blocks are compressed into a single TLC or QLC block, written sequentially. This happens on-die without direct controller intervention – a form of direct memory access. Sequential writes during folding keep write amplification low, but the process takes time and can create a performance bottleneck while it's running.
The "folding state" is the sustained write scenario where the drive's incoming write speed is limited by how fast it can empty the SLC cache. This is typically the lowest-performance tier on drives with large SLC caches.
Performance Tiers
Many drives have more than two performance levels. A common pattern is: (1) SLC cache speed, (2) direct-to-TLC speed after the cache fills, and (3) a reduced speed when the drive is simultaneously writing new data and folding SLC data in the background. Drives with large dynamic SLC caches, QLC-based drives, and DRAM-less drives are most likely to show pronounced tier drops.
The severity of these transitions depends on the drive's design, the fill state, and the workload. In everyday consumer use – which is bursty rather than sustained – most drives never leave their SLC cache tier. Sustained write benchmarks show the worst case, which is useful for understanding the drive's limits but not representative of typical usage.
Over-Provisioning and Free Space
Over-provisioning (OP) is flash capacity reserved by the drive for internal use – it's not visible to the user. This reserved space provides a pool of free blocks for garbage collection, wear leveling, and spare block replacement.
How It Works
Raw flash capacity is binary (powers of two), but drives are marketed in decimal. A drive with 512 GiB of raw flash sold as a "500 GB" drive has about 10% total OP – the difference between the binary capacity and the marketed size, plus any additional reservation. A "480 GB" version of the same flash would have roughly 15% OP. Drives with more OP have more room for the controller to work with, which can improve sustained write performance and endurance, with diminishing returns.
Beyond the marketed OP, the drive also reserves space for spare blocks, ECC data, static SLC cache, and other metadata. The exact allocation varies by manufacturer and isn't always disclosed.
Dynamic Over-Provisioning
Modern controllers treat any free user space as additional dynamic OP, thanks to TRIM. When the OS tells the drive that blocks are no longer in use, the controller can reclaim that space for internal operations. This means that simply leaving some space free on the drive provides a real benefit – more room for garbage collection, less write amplification, and better sustained performance.
This is especially relevant for DRAM-less and QLC-based drives, which benefit more from extra free space. As a general practice, avoiding filling any SSD past about 80–90% helps maintain consistent performance.
Interfaces and Protocols
An SSD needs two things to communicate with your system: a protocol (the language) and an interface (the physical connection). These determine the maximum bandwidth, latency, feature set, and compatibility of the drive.
SATA and AHCI
SATA (Serial ATA) is the older interface standard, paired with the AHCI (Advanced Host Controller Interface) protocol. SATA SSDs typically come in 2.5" form factors and connect through the same ports that hard drives use. AHCI was designed in the era of spinning disks and exposes features like Native Command Queuing (NCQ), but it was never optimized for the characteristics of solid state storage.
SATA tops out at about 600 MB/s of usable bandwidth – more than enough for hard drives, but a hard ceiling for SSDs. A decent SATA SSD will saturate this interface easily. SATA drives are still common in older systems and as budget options, and they remain useful where NVMe isn't supported or where the workload doesn't demand higher throughput. For new builds, though, NVMe has effectively replaced SATA for primary storage.
PCIe and NVMe
NVMe (Non-Volatile Memory Express) is the protocol designed specifically for solid state storage, running over PCI Express (PCIe). The advantages over AHCI are significant: lower access latency, deeper command queues, more efficient interrupt handling, and direct communication with the CPU over the PCIe bus rather than going through a SATA controller.
PCIe bandwidth scales with the generation and lane count. A PCIe 3.0 x4 drive can reach roughly 3.5 GB/s, PCIe 4.0 x4 doubles that to about 7 GB/s, and PCIe 5.0 x4 doubles again to approximately 14 GB/s. Today, PCIe 4.0 drives are the mainstream sweet spot for most consumers, while PCIe 5.0 drives target the high end.
Modern Windows systems use the Microsoft inbox NVMe driver (StorNVMe), and manufacturers have largely moved away from proprietary drivers – Samsung dropped its NVMe driver starting with the 990-series, for example. DirectStorage 1.0, which allows games to load assets directly from NVMe storage to the GPU with minimal CPU overhead, also works with the inbox driver.

Figure 12: NVMe provides dramatically lower latency and deeper parallelism than AHCI, with a direct PCIe path to the CPU.
Other protocols exist in the storage space: SAS (Serial Attached SCSI) is common in enterprise servers, and UASP (USB Attached SCSI Protocol) is used for external drives over USB. These are relevant in specific contexts but PCIe/NVMe is the consumer standard.
NVMe 2.0+ and What's New
NVMe 2.0, released in 2021, restructured the specification from a monolithic document into a library of specifications – a base spec, separate command sets, and a management interface. The current major base revision is 2.1, with ongoing revision change documents published by NVM Express.
The most significant additions for SSD performance and endurance are:
Zoned Namespaces (ZNS) organize the drive's address space into zones that must be written sequentially. This aligns the host's write pattern with how NAND actually wants to be programmed, reducing the garbage collection burden and write amplification. ZNS requires application-level awareness, which has limited its adoption outside of datacenter environments.
Flexible Data Placement (FDP) is a ratified standard (TP4146) that takes a more pragmatic approach than ZNS. FDP lets the host provide hints about data placement – grouping data with similar lifetimes together – without requiring the strict sequential write model of ZNS. This reduces write amplification while being easier for applications to adopt. FDP is likely to see broader adoption than ZNS, including potentially in consumer drives.
Other additions include NVMe Management Interface (NVMe-MI) for device management and Controller Memory Buffer (CMB) improvements, though these are primarily enterprise-relevant.
Form Factors
2.5" is the standard form factor for SATA SSDs and also used by some U.2 NVMe drives. It connects via a SATA data and power connector, making it a drop-in replacement for laptop or desktop hard drives.
M.2 is the dominant form factor for NVMe consumer SSDs. M.2 drives come in various dimensions defined by width and length – 2280 (22 mm wide, 80 mm long) is by far the most common for desktops and laptops. Drives can be single-sided or double-sided, which matters for laptops with tight clearance – some accept only single-sided drives. M.2 also supports SATA through its keying system (B-key for SATA, M-key for PCIe), so an M.2 slot doesn't necessarily mean NVMe – check the keying and the motherboard specifications.
Figure 6: M.2 keying notches in B and M positions. B-key (pins 12–19 removed) indicates SATA; M-key (pins 59–66 removed) indicates PCIe/NVMe. B+M keyed drives fit both but are limited to SATA or ×2 PCIe. (Diagram by NikNaks, CC BY-SA 3.0, via Wikimedia Commons.)
U.2 and U.3 (SFF-8639 connector) support up to four PCIe lanes and are compatible with both NVMe and SAS. These are primarily used in enterprise and workstation contexts. U.3 is a unified connector that supports SAS, SATA, and NVMe on the same backplane.
EDSFF (Enterprise and Data Center Standard Form Factor) has become the standard in datacenter and hyperscale deployments, with E1.S and E3.S widely adopted. These form factors are optimized for thermal management, hot-swap capability, and high-density rack installations. They're not relevant for consumer builds but are worth knowing about if you work with server hardware.
Adapters exist to convert between many of these form factors – M.2 to PCIe slot, M.2 to U.2, and so on – as long as the protocol is compatible.
External Enclosures and Bridge Chips
External SSDs and enclosures typically use a bridge chip to translate between the drive's native protocol (NVMe or SATA) and the external interface (USB or Thunderbolt). This translation introduces some overhead, particularly for latency-sensitive operations like 4K random writes. Bandwidth is also capped by the external interface – USB 3.2 Gen 2 at 10 Gbps, USB 3.2 Gen 2x2 at 20 Gbps, Thunderbolt 3/4 at 40 Gbps.
Bridge chips have their own firmware, and some features may not pass through the translation. HMB, for example, typically doesn't work through a bridge chip since the drive can't access the host's system memory over USB. TRIM support (via the SCSI UNMAP command over UASP) depends on the bridge chip and OS support.
Some newer controllers combine flash controller and bridge functionality in a single chip – Silicon Motion's SM2320 is one example – which can reduce overhead and board complexity for purpose-built portable SSDs.
USB4 and Thunderbolt 4, both based on the same underlying protocol, are increasingly common for external storage. USB4 supports up to 40 Gbps (matching Thunderbolt 3/4) with USB4 Version 2.0 extending to 80 Gbps. For external SSDs, USB4/Thunderbolt 4 provides enough bandwidth to approach internal NVMe performance for sequential workloads, though latency overhead still affects small random I/O. If you're considering an external NVMe enclosure, USB4 or Thunderbolt is the connection to target.
Encoding and Bandwidth Overhead
Interface bandwidth isn't entirely available for data. Encoding schemes add overhead that reduces the usable throughput.
SATA and PCIe 2.0 use 8b/10b encoding, which means 20% of the bandwidth goes to encoding overhead – every 8 bits of data requires 10 bits on the wire. PCIe 3.0 and later use 128b/130b encoding, reducing the overhead to roughly 1.5%. This is one reason the jump from PCIe 2.0 to 3.0 more than doubled effective bandwidth despite less than doubling the raw signaling rate.
PCIe 6.0, finalized in 2022, switches from NRZ (non-return-to-zero) to PAM4 (pulse-amplitude modulation) signaling to double the data rate without doubling the frequency. Enterprise PCIe 6.0 SSDs are shipping, but consumer drives aren't expected until roughly 2028–2030. PCIe 7.0 is in development.
USB connections carry their own overhead beyond encoding – protocol-level latency typically reduces throughput by around 15% compared to the theoretical maximum. Thunderbolt 3/4 allocates 22 Gbps of its 40 Gbps total bandwidth for data (the rest is reserved for display and other protocols), giving a practical ceiling of about 2.75 GB/s for storage.
Choosing the Right SSD
Any SSD is a significant improvement over a mechanical hard drive. That said, there's a wide range of drives on the market, and the right choice depends on your system, your budget, and what you're actually doing with the drive. Breaking down those factors before you start shopping saves a lot of confusion.
Identifying Your Needs
Form Factor
Start with your system. Check your motherboard manual or laptop specifications to see what's available: 2.5" SATA bays, M.2 slots, or both. If your M.2 slot exists, determine whether it supports SATA, PCIe (NVMe), or both – the keying and chipset routing matter. If you're limited to 2.5" bays, you're looking at SATA drives exclusively. If you have an M.2 slot with PCIe support, NVMe is the way to go for a new build.
Also consider the physical constraints. Desktop cases generally have no issues, but some laptops only accept single-sided M.2 drives due to clearance. Thermal conditions matter too – a drive crammed into a poorly ventilated laptop compartment will behave differently than one under a motherboard heatsink with active airflow.
Budget
You're generally looking for the best balance of performance per dollar and capacity per dollar. The SSD spreadsheet and buying guide are maintained as companion resources to help with specific drive comparisons. A few patterns hold consistently: Samsung charges a premium for its brand and vertical integration. QLC drives favor capacity at the expense of sustained write performance. TLC drives are better for mixed or write-heavy workloads. DRAM-less drives save cost at the expense of consistency under load.
PCIe 5.0 drives now exist and offer impressive sequential throughput, but PCIe 4.0 remains the value sweet spot for most users. The real-world performance gap in typical client workloads is narrow – most of the benefit shows up in sustained sequential transfers, which are a small fraction of daily use.
Don't fixate on the "up to" sequential speeds printed on the box. Those numbers are addressed in the next section.
Workload and Role
Most people just need a reliable SSD for their operating system, applications, and games. Almost any current drive handles that well. But if your workload has specific characteristics, it's worth tailoring your choice:
- Laptop on battery: Prioritize power efficiency, especially at idle. DRAM-less drives and newer low-power controllers (like those on 6nm process nodes) can make a measurable difference in battery life.
- Content creation or development: Favor drives with consistent performance under mixed workloads – a good TLC drive with DRAM and reasonable sustained write speeds.
- NAS caching or write-heavy workloads: Look at steady-state performance and endurance rather than burst speeds. Static SLC cache sizes matter more here than dynamic SLC capacity.
- Game storage: Sequential read performance is the primary concern. Most current NVMe drives are more than adequate, and any PCIe 3.0 or 4.0 NVMe drive clears the performance thresholds for the vast majority of titles. That said, modern games using DirectStorage with GPU decompression can sustain read demands that exceed SATA's ~550 MB/s ceiling, particularly at 4K with high-resolution texture streaming. At lower resolutions, SATA is still viable; at 4K with demanding titles, NVMe becomes a practical requirement rather than a theoretical one. The bottleneck is shifting from pure sequential bandwidth toward sustained throughput under decompression workloads.
Understanding Marketed Specs vs. Real-World Performance
The speeds listed on a drive's product page are marketing numbers. They represent the best-case scenario – typically large sequential transfers at high queue depth, reading from an SLC cache. You will rarely see these numbers in daily use.
Most client workloads operate at low queue depth (QD1–QD4) with small, random I/O patterns. What actually determines how responsive your system feels is latency at low queue depths, particularly for 4K random operations. This is the metric that governs how fast applications launch, how snappy your OS feels, and how quickly file operations complete. It's also the metric where the difference between a PCIe 3.0 drive and a PCIe 5.0 drive is smallest.
Sequential throughput matters for large file transfers – copying video files, installing games, writing disk images – but even here, you're often limited by the source, the filesystem, or the CPU's ability to feed the drive. The drive's SLC cache and single-die performance ceiling are usually the bottleneck, not the interface bandwidth.
The practical takeaway: you may not feel a subjective difference between two NVMe SSDs even if their marketed sequential speeds are an order of magnitude apart. The things that do create a noticeable difference are the presence or absence of DRAM, the quality of the controller's firmware, and the SLC caching strategy – none of which are advertised on the box.
Warranty, TBW, and Endurance
SSD warranties are defined by two limits: a time period (typically three or five years) and a Total Bytes Written (TBW) rating. Whichever limit is hit first applies.
For consumers, TBW is rarely the binding constraint. A typical 1 TB TLC drive rated for 600 TBW would need about 330 GB of writes per day, every day, for five years to hit that limit. Normal desktop usage is nowhere near this. TBW is more relevant for prosumer and write-heavy workloads.
From TBW you can derive Drive Writes Per Day (DWPD): the number of full drive writes per day within the warranty period.
DWPD = TBW / (365 × Years × Capacity in TB)
DWPD is useful if you have a minimum expected write volume – it tells you whether a drive's endurance matches your workload.
To put TBW in perspective with a concrete example: a 1 TB drive rated for 600 TBW over a five-year warranty gives you a DWPD of about 0.33 – meaning you can write roughly 330 GB per day, every day, for five years before hitting the warranted limit. A typical desktop user writes somewhere between 10–40 GB per day (OS operations, application data, downloads, browser cache). At 30 GB/day, that 600 TBW drive would last over 50 years of writes – well beyond any other component's lifespan. Even at 100 GB/day, which would be heavy consumer use, you're looking at over 16 years. TBW anxiety for normal consumer usage is almost always unwarranted.
The actual flash endurance is typically much higher than the warranted TBW, since manufacturers build in significant margin. TBW should never be the primary factor in a consumer purchasing decision.
Five-year warranties are generally the better indicator of a manufacturer's confidence in the drive.
Bill of Materials and Hardware Swaps
A drive's bill of materials (BOM) is the list of components that go into it: controller, flash, DRAM (if present), and supporting chips. For some drives, particularly budget models, the BOM is not fixed. Manufacturers may swap components depending on supply availability – changing the flash from one vendor to another, or substituting a different DRAM chip – without changing the model number.
This practice means two units of the "same" drive can have meaningfully different performance characteristics. The most notable recent example was the WD SN550, where a post-launch BOM change significantly reduced sustained write performance and prompted widespread consumer backlash.
Vertically-integrated manufacturers like Samsung, which produce their own flash and controllers, have more control over BOM consistency. Brands that source components externally are more likely to swap. If BOM consistency matters to you, check reviews and community reports (forums, r/NewMaxx) for the specific model – hardware swaps are usually identified quickly after they occur.
Counterfeit and Fake SSDs
A more extreme version of the BOM problem is outright counterfeit drives. These are typically sold through third-party marketplace sellers and feature manipulated firmware that reports a larger capacity than the flash actually supports. A drive advertised as 2 TB might contain 128 GB of actual flash – it will appear to work until you exceed the real capacity, at which point data is silently lost or corrupted.
Warning signs include prices significantly below market rate, unknown brand names with no verifiable history, and listings that lack specific controller or flash specifications. Tools like H2testw (Windows) or f3 (Linux) can verify that a drive's actual usable capacity matches its reported capacity by writing and reading back data across the full address space. If you're buying from a marketplace seller rather than an authorized retailer, testing a new drive before trusting it with important data is a reasonable precaution.
Motherboard Compatibility
Your motherboard determines what drives you can use and how well they'll perform. A few things to check:
M.2 slot support: Not all M.2 slots are equal. Some support only SATA, some only NVMe (PCIe), and some support both. The keying (B-key, M-key, or B+M) tells you the physical compatibility, but the chipset routing determines the protocol. Check the manual.
Lane allocation and sharing: On many motherboards, M.2 slots share PCIe lanes or SATA ports with other connectors. Using a particular M.2 slot might disable a SATA port or reduce a PCIe x16 slot to x8. With multiple NVMe drives, the total upstream bandwidth from the chipset to the CPU can become a bottleneck, even if each individual slot has sufficient downstream bandwidth. This is primarily a concern for motherboards with three or more M.2 slots.
NVMe boot support: Most motherboards from the last several years support NVMe booting natively through UEFI. Older boards may require a BIOS modification or a UEFI wrapper like Clover to boot from NVMe. If you're adding NVMe to an older system via a PCIe adapter card, verify boot support before purchasing.
PCIe generation: A PCIe 5.0 drive in a PCIe 4.0 slot will work but be limited to PCIe 4.0 speeds, and vice versa. PCIe 5.0 M.2 slots for storage are available on AMD AM5 platforms (X670E, X670, and B650E, with optional support on B650) and Intel 700-series and later chipsets, typically limited to one or two slots connected directly to the CPU.
SSD Health, Maintenance, and Software
SSDs require very little maintenance compared to hard drives – no defragmentation, no head parking, no spindle concerns. But there are a few things worth understanding to keep your drive healthy and performing well over its lifespan.
SMART Monitoring
SMART (Self-Monitoring, Analysis and Reporting Technology) is a built-in health monitoring system on all modern storage devices. Your SSD continuously records values for temperature, total data written, error rates, wear-leveling counts, and other parameters. Reading this data gives you a picture of the drive's health and can provide early warning of potential failure.
Tools like CrystalDiskInfo, Hard Disk Sentinel, and smartmontools can read SMART data on both SATA and NVMe drives. NVMe drives report a standardized set of health attributes through the NVMe health information log, which is more consistent across manufacturers than the vendor-specific SMART attributes on SATA drives.
The most useful SMART values for consumers are the percentage of life used (or remaining), the total data written, and any critical warnings. A drive reporting critical warnings or rapidly increasing error counts should be backed up and replaced. Beyond that, periodic SMART checks are good practice but not something most users need to obsess over.
Temperature
SSD thermals are more nuanced than "keep it cool." NAND flash actually programs better when warm – heat reduces program variation, leading to tighter voltage distributions and fewer errors during writes. However, heat also accelerates charge leakage, which hurts data retention over time. The ideal thermal profile would be warm during writes and cool during storage, but that's not how real systems work.
Controllers are the more temperature-sensitive component. Most are designed to operate in the 0–70°C range, with thermal throttling kicking in around 70–80°C. The composite temperature reported by NVMe drives is what triggers power-state-based throttling, and it may factor in both controller and NAND temperatures.
For consumer use, the practical advice is straightforward: if your system has adequate airflow, thermal management is mostly an aesthetic decision – a motherboard heatsink or aftermarket heatspreader is fine. If the drive is in a poorly ventilated laptop compartment or an HTPC with restricted airflow, additional cooling may be warranted to prevent throttling under sustained workloads. Don't worry about cooling the NAND specifically; the controller is what throttles.
The cross-temperature effect is worth mentioning: data written at one temperature and read at a significantly different temperature can produce higher error rates. This is primarily a concern in enterprise environments with extreme temperature swings, not typical consumer use.
TRIM and OS Optimization
Modern operating systems handle SSD optimization automatically. Windows detects SSDs and schedules TRIM (called "optimize" in the UI) on a weekly basis by default. This tells the drive which blocks are no longer in use, allowing the controller to reclaim them during garbage collection. You generally don't need to touch this.
Defragmenting an SSD is unnecessary and counterproductive – SSDs have no seek time penalty for fragmented data, and defragmentation writes data needlessly. Windows knows this and runs TRIM instead of defrag when it detects an SSD. If you're using third-party defrag software, make sure it isn't defragmenting your SSD.
You can verify that TRIM is working through fsutil behavior query DisableDeleteNotify in an elevated command prompt – a result of 0 means TRIM is enabled. Your drive manufacturer's toolbox software, if available, can also confirm TRIM support.
Secure Erase and Sanitize
There are two levels of SSD wiping: secure erase and sanitize. A secure erase wipes the drive's mapping data, making the data inaccessible through normal means. A sanitize goes further by also erasing the underlying flash blocks, making data recovery effectively impossible.
A standard format in Windows sends TRIM to the drive, which achieves a similar result to a secure erase – the mapping is cleared and the drive will reclaim the blocks. For most consumer purposes, a full format followed by a few minutes of idle time (for the drive to process the TRIM and erase blocks) is sufficient.
For more thorough wiping – selling a drive, decommissioning a system with sensitive data – use the manufacturer's toolbox if it offers a sanitize option. Alternatively, on NVMe drives, the nvme-cli tool under Linux provides both nvme format and nvme sanitize commands. The motherboard BIOS/UEFI may also offer a secure erase option.
Modern drives generally perform a full sanitize even when issued a secure erase command. For self-encrypting drives (SEDs), a crypto-erase – which destroys the encryption key, rendering all encrypted data unreadable – is the fastest and most complete option.
Security and Encryption
Self-encrypting drives (SEDs) automatically encrypt data using a hardware encryption engine, typically AES-256 under the TCG Opal 2.0 specification. The encryption is transparent – data is encrypted on write and decrypted on read, with no performance penalty. Access is controlled through an authentication key.
While most SSD controllers technically support Opal, many consumer drives don't enable the functionality. More importantly, Microsoft stopped trusting hardware SED implementations in late 2019 after security researchers found vulnerabilities in several drives' encryption implementations. BitLocker now defaults to software encryption rather than deferring to the drive's hardware encryption.
The practical implication: if you need full-disk encryption, use BitLocker (Windows) or LUKS (Linux) with software encryption. Don't rely on the drive's hardware encryption alone.
Backup Strategies
Follow the 3-2-1 backup rule: three copies of your data, on two different types of media, with at least one copy offsite (cloud storage, a drive at another location). SSDs are reliable but not infallible – controller failures, firmware bugs, and power events can cause sudden data loss without warning. SMART monitoring helps but doesn't catch everything.
Power Management
NVMe drives support Autonomous Power State Transition (APST), which allows the drive to move between power states based on idle time. Lower power states save energy but add latency when the drive needs to wake up for an I/O request. SATA drives have an equivalent in Aggressive Link Power Management (ALPM).
Power management behavior varies between desktop and laptop configurations. Many desktop systems don't allow NVMe drives to reach their lowest power states, which means laptop battery life testing can show different results than desktop power measurements. The OS negotiates power states with the drive based on reported capabilities, but the actual behavior can vary – thermal throttling, for example, may manifest as forced transitions to lower power states.
For laptop users, power efficiency differences between drives can be measurable. Newer controllers on smaller process nodes (6nm and below) tend to idle more efficiently, and DRAM-less designs inherently draw less power since there's no DRAM to keep refreshed.
Bad Blocks
Every SSD has some bad blocks from the start. Original bad blocks (OBB) are identified and remapped during manufacturing – flash quality might guarantee 95%+ good blocks, with the rest remapped to spare blocks. Growth bad blocks (GBB) develop over the drive's lifetime as cells wear from programming and erasing. The controller monitors blocks continuously and retires them as needed, substituting spare blocks.
This process is transparent to the user. As long as spare blocks remain available, bad blocks have no user-visible impact. When spare blocks run out, the drive is nearing end of life – this is typically reflected in the SMART "percentage used" indicator approaching 100%.
Free Space and Performance
All SSDs slow down when fuller. The mechanism is straightforward: less free space means fewer free blocks for garbage collection, less room for SLC caching, and more overhead for the controller to manage writes. The severity depends on the drive's design – QLC drives, DRAM-less drives, and drives with large dynamic SLC caches are hit hardest.
A reasonable guideline is to keep at least 10–20% of user-accessible space free. For most standard TLC drives with DRAM, even 5–10% free is adequate. For DRAM-less or QLC drives, staying below 80% capacity helps maintain more consistent performance.
The benefit of free space comes through dynamic over-provisioning (covered in Chapter 4). With TRIM enabled, the controller treats any free user space as additional OP – more room for garbage collection and lower write amplification. The returns are real but diminishing: going from 90% full to 80% full has a much larger impact than going from 50% to 40%.

Figure 10: Write amplification drops steeply as over-provisioning increases from 7% to 20%, with diminishing returns beyond that. Under random write workloads, going from 10% to 20% OP roughly halves WAF.
Make sure your SSD is 4K-aligned, especially if you cloned from a hard drive. Misalignment can cause unnecessary read-modify-write operations that hurt performance. Most modern OS installers and cloning tools handle this correctly, but it's worth verifying.
Benchmarking
If you want to test your drive's performance, tools range from simple to comprehensive. CrystalDiskMark is the most common quick benchmark, giving you sequential and random read/write numbers at various queue depths. For more control, FIO (flexible I/O tester) lets you define custom workloads – queue depth, block size, read/write mix, duration – and is available on both Linux and Windows.
Keep in mind that benchmark results vary with the drive's fill state, thermal state, and how recently it was trimmed or preconditioned. A freshly-trimmed drive will benchmark differently than one at 80% capacity after sustained writes.
Toolbox Software and Drivers
Most SSD manufacturers offer a downloadable toolbox application – Samsung Magician, WD Dashboard, Crucial Storage Executive, and so on. These provide SMART monitoring, firmware updates, secure erase options, and sometimes performance optimization features. None of this software is required for normal drive operation; it's supplementary.
Custom NVMe drivers are no longer necessary for most drives. The Microsoft inbox NVMe driver (StorNVMe) handles the vast majority of functionality, including HMB and DirectStorage support. Samsung was one of the last holdouts with a proprietary NVMe driver and dropped it starting with the 990-series.
Some manufacturer toolboxes offer DRAM caching features (Samsung's RAPID Mode, Crucial's Momentum Cache) that cache data in system RAM. These are generally not recommended – the modern Windows cache is already effective, and the additional caching layer adds complexity and another point of failure for power-loss scenarios without meaningful real-world performance improvement.
NAND Manufacturing and Evolution
NAND flash has undergone dramatic changes in how it's physically manufactured, and those changes directly affect the performance, endurance, and cost of the drives you buy. This chapter covers the major architectural shifts and the current state of manufacturing, with enough detail to understand why different flash generations and architectures behave differently.
From 2D to 3D: Why NAND Went Vertical
Early NAND flash was planar – cells were arranged in a single layer on the silicon surface, and increasing density meant shrinking the cells. This worked until the cells became so small that they couldn't reliably hold enough charge to distinguish between voltage states. Cell-to-cell interference, reduced data retention, and plummeting endurance made further planar scaling impractical.
The solution was to go vertical. Instead of shrinking cells, 3D NAND stacks layers of cells on top of each other. This allows each individual cell to be physically larger than it was in late-generation planar flash, which improves charge retention, reduces interference, and increases endurance – even as total density goes up. The trade-off is manufacturing complexity: building dozens or hundreds of layers of precisely aligned structures is significantly harder than etching a single layer.
Floating Gate vs. Charge Trap
The two fundamental cell architectures in NAND flash are floating gate and charge trap.
Floating gate (FG) stores charge on an electrically isolated conductive layer – the floating gate itself – sandwiched between insulating layers. This was the original NAND cell design and was used in most 2D flash and early 3D designs. Floating gate offers good data retention and is the more natural fit for split-gate technologies that could enable higher density in future generations. Intel (now Solidigm, under SK hynix) was the most prominent user of floating gate in 3D NAND. Micron used floating gate through their earlier 3D generations before transitioning to replacement gate at 128L and above.
Charge trap (CT) stores charge in an insulating material rather than a conductive one. The cells are formed as vertical pillars etched through the stacked layers, creating a 3D array. Charge trap has several advantages for scaling: smaller cell and pillar sizes, better immunity to coupling interference between adjacent cells, and a structure that's more amenable to placing control circuitry underneath the flash array – a design approach that goes by various names including periphery-under-cell (PUC), CMOS-under-Array (CuA), or core-over-periphery (CoP). Moving peripheral circuitry under the array frees up die area, which is critical as layer counts increase.
Most manufacturers have adopted charge trap for current and future development. The major variants include BiCS (Kioxia/WD), V-NAND (Samsung), replacement gate or TCAT (Micron), and P-BiCS/SP-BiCS (SK hynix). YMTC produces 3D NAND using its Xtacking architecture, which bonds the NAND array to its peripheral circuitry through a wafer-to-wafer bonding process.
Cell Architecture Families
While all current 3D NAND shares the basic charge trap structure, each manufacturer's implementation has distinct characteristics that affect performance, endurance, and scalability.
Samsung V-NAND is a form of TCAT (terabit cell array transistor) with peripheral circuitry placed under the cell array. Samsung is currently on V9 (286 layers) and demonstrated V10 (400+ layers) at ISSCC 2025.
Kioxia/WD BiCS (Bit Cost Scalable) is the architecture behind both companies' flash, developed from their long-standing partnership. Current production is BiCS8 at 218 layers.
Micron Replacement Gate uses a TCAT variant where certain layers are removed and replaced with gate material during fabrication, optimizing die size. Current generations are G8 (232 layers) and G9 (276 layers), both designed for their CuA peripheral architecture.
SK hynix uses P-BiCS and SP-BiCS charge trap designs. Their 321-layer NAND uses triple-deck stacking – the first manufacturer to ship a three-deck design. SK hynix also acquired Intel's NAND business (now Solidigm), which retains legacy floating gate designs alongside newer development.
These architectural differences influence plane count, page size, programming characteristics, and endurance – which in turn shape the performance profiles of the controllers designed to work with them. A controller optimized for four-plane Micron flash may behave differently than one tuned for Samsung's V-NAND, even if the headline specs look similar.
Layer Counts and String-Stacking
Layer count is the most commonly cited generation marker for 3D NAND. The progression has moved from 32L to 48L, 64L, 96L, 128L, 176L, and into the current 200L+ era, with 232L (Micron), 238L (SK hynix), 276L (Micron), 286L (Samsung), and 321L (SK hynix) in production or shipping. Roadmaps now extend to 500+ layers.
The marketed layer count refers to the number of active cell layers; the actual layer count is higher once you add dummy layers (which reduce program disturb at the edges), string select transistors, and ground select transistors. Samsung's original 32-layer V-NAND, for example, had 39 physical layers.
Pushing beyond roughly 100 active layers in a single stack becomes impractical due to the extreme aspect ratios required for memory hole etching – the holes must be drilled straight through the entire layer stack, and they tend to taper from wider at the top to narrower at the bottom, creating non-uniform cells. The solution is string-stacking: manufacturing two or more shorter stacks and bonding them together. Two 128L stacks become 256L, three 107L stacks become 321L (as with SK hynix's latest), and so on.

Figure 11: Cross-section of a 3D NAND flash structure. Two decks of wordline layers are stacked above CMOS peripheral circuitry, with vertical channel pillars passing through the array. Dummy wordlines at deck edges and the staircase contact area for wordline connections are visible.
String-stacking introduces its own challenges. The stacks must be precisely aligned where they meet, and different manufacturers handle this alignment differently. Each junction adds dummy layers for electrical stability. Yields are lower than single-stack designs. But string-stacking has become universal – it's the only practical path to 300+ layers and is how every manufacturer is scaling today.
Hybrid bonding is emerging as a next-generation technique to improve the connections between stacked layers and between the NAND array and peripheral circuitry, potentially enabling even higher layer counts.
Process Challenges and Scaling Limits
Several fundamental challenges shape where NAND manufacturing goes from here.
High aspect ratio etching is the most critical. Memory holes must be etched vertically through the entire layer stack with extreme precision. As layer counts increase, the aspect ratio grows – the ratio of depth to width – making it harder to maintain uniform hole diameter and straight walls. Cells at the top of the stack end up physically larger than cells at the bottom, which creates non-uniform electrical characteristics that the controller must account for.
Alignment between decks in string-stacked designs must be precise. Misalignment introduces resistance and can create reliability issues at the junction. Each manufacturer has a different approach to this, and the quality of alignment is one of the factors that separates good flash from great flash.
Peripheral circuitry placement becomes more constrained as the array grows. Moving circuitry under the cell array (PUC/CuA/CoP) is now standard, but the power delivery, signal routing, and thermal management of that circuitry gets more complex with each generation.
Split-cell technology is the leading candidate for the next major density improvement. The idea is to divide each memory hole into two cells, effectively doubling the cell count for a given number of layers. This takes advantage of the curved topology of the pillar to create two distinct charge storage regions. SK hynix presented a split-cell charge trap design for 5-bit-per-cell (PLC) NAND at IEDM 2025, representing the most promising path to PLC production. Earlier split-gate work favored floating gate architectures, but charge trap variants are now being developed as well.
PLC (penta-level cell) itself remains in research. No commercial PLC drives have shipped as of early 2026. The challenge is that 32 voltage states per cell requires extraordinary precision in both programming and reading, with error rates that push current ECC to its limits. Split-cell designs may provide the endurance improvement needed to make PLC viable.
Deep Dive: NAND Internals
Advanced Topic – This chapter goes deeper into NAND flash internals for enthusiasts and professionals. Part I (Chapters 1–8) covers everything needed for practical SSD understanding.
This chapter covers the engineering details behind the concepts introduced in Part I: how voltage programming actually works, how error correction handles marginal data, what disturb effects look like at the cell level, and how to calculate theoretical performance from flash specifications.
Voltage and Programming in Detail
A NAND cell's value is determined by its stored voltage, which can only be increased during programming – one of the fundamental reasons flash must be erased before rewriting. Programming proceeds through incremental step-pulse programming (ISPP), where voltage is increased in progressively smaller steps until it reaches the target level.
For multi-bit cells, programming is more involved. TLC uses foggy-fine programming: the least-significant bit (LSB) is programmed first with large voltage pulses, then the center- and most-significant bits (CSB, MSB) are layered on through smaller steps. A final fine programming pass uses the smallest pulses to set the voltage to its precise target state. The result is a multi-stage process where each stage builds on the previous one, and where upper pages (higher bits) take the majority of the total program time. For a detailed visualization of the ISPP voltage staircase and foggy-fine programming phases, see Cai et al., "Errors in Flash-Based Solid State Drives," arXiv:1711.11427, pp.17–18.
Modern 3D NAND allows shortcuts in this sequence. Because cell-to-cell interference is reduced compared to 2D planar flash, some programming steps can be skipped. QLC, for example, might use an 8-16 or 2-8-16 scheme instead of the full 2-4-8-16 sequence. A single-step 16-level program is fastest but has a significantly higher bit error rate; two-step 8-16 is a practical compromise between speed and reliability. These are sometimes called high-speed program (HSP) schemes. Gray code encoding is used to minimize the number of bit flips between adjacent voltage states, reducing the impact of small voltage errors.
Programming voltage (Vpgm) can be optimized on a per-block basis, adjusting subsequent strings based on the results of the first. Predictive and re-program schemes use coarse and fine adjustments. Verify voltage (Vvfy) is checked after programming to confirm the cell reached its target level; techniques like odd-state verification (checking only alternating reference levels) can reduce verification overhead.
Other optimization techniques include multi-pass programming based on threshold voltage (Vth) movement, unselected string pre-charge (USP) to compensate for background pattern dependency (BPD) on neighboring bitlines, and smart start bias control (SBCC), which adapts the programming bias based on results from programming the corresponding lower pages. Machine learning is an active area of research for optimizing these bias decisions.
Bypass voltage (Vpass) is applied to unselected wordlines during programming. Once a cell reaches its target voltage, its bitline is boosted above a threshold to inhibit further programming. Boosting optimization – predicting the right voltage levels for different pages within and across wordlines – is another area where manufacturers invest heavily.
The stored voltage drifts over time due to charge leakage, shifting left or right from the intended level. Additionally, the threshold boundaries themselves can shift through disturb effects. Threshold voltages can be recalibrated to some extent, but drift is an inherent characteristic of NAND that ECC must handle.
Power Loss Protection at the Cell Level
One method of protecting data-at-rest during programming is to back up lower pages (e.g., LSB or LSB+CSB) before programming upper pages. If power is lost mid-program, the lower pages can be recovered. Micron uses a differential storage device built into the NAND for this purpose. An alternative approach uses parity information to reconstruct lower page states. SLC cache writes are inherently less vulnerable because single-bit programming is fast and simple – the data-in-flight window is much shorter.
Read Operations and Voltage Calibration
Reading involves applying reference voltages to determine which state a cell is in. The process is more complex than it appears because retention, wear, and disturb effects shift voltage distributions over time.
Manufacturers use several techniques to maintain read accuracy. Calibrating reads (CALR) and smart Vth tracking reads (SVTR) adjust reference levels based on the actual distribution of voltages in a block. Wordline overdrive reduces resistance-capacitance (RC) delay when switching between wordline levels, minimizing the time and impact of read operations on adjacent cells.
Multi-plane reads and independent plane reads (IPR) improve read throughput by accessing multiple planes simultaneously. This requires splitting control gate drives, increasing the number of sense amplifiers, and providing dedicated voltage generators per plane. Subpage reads (partial page reads) can be faster than full page reads – pulling an 8 KiB chunk instead of the full 16 KiB page reduces bitline settling time, particularly with shielded bitline (SBL) configurations.
Data and cache latches (SADL, XDL) on each plane serve as fast staging registers for read and write operations. Block pairing (odd/even pairs for decoding via BLKSEL) and multi-deck independent reads (as used in Solidigm's 144L QLC) provide additional parallelism.
The raw bit error rate (RBER) increases over the drive's lifetime as cells wear and voltage distributions widen. Threshold calibration and voltage calibration techniques – including read retry tables and Vth distribution algorithms for partially-programmed blocks – are used to maintain read windows. Endurance testing (per JEDEC methodology) typically involves "baking" drives at elevated temperatures to simulate accelerated aging.
Error Correction in Depth
Modern SSDs use LDPC (low-density parity-check) codes, which support both hard-decision and soft-decision decoding. BCH (Bose-Chaudhuri-Hocquenghem) codes, which only support hard-decision decoding, were used in earlier drives.
Hard-decision decoding reads each cell as a definitive state and is fast but limited in correction capability. Soft-decision decoding recognizes that a cell's voltage may fall near a boundary between states and uses probabilistic methods – applying extra sensing levels progressively – to resolve ambiguous values. This is multi-step LDPC: hard-decision first, soft-decision only on failure, with progressively finer sensing as needed. The trade-off is latency – each additional sensing step takes time.
The ECC codeword size affects correction strength. The typical codeword is 1 KiB, but some controllers use 2 KiB codewords (as on the SM2259/SM2259XT used in the Intel 545s) for a higher correction capability at the cost of more flash area, controller die space, and power. The coding rate – the ratio of user data to total data including ECC – determines the overhead.
Beyond page-level ECC, most SSDs implement a RAID-like parity scheme at the superpage level. RAIN (Redundant Array of Independent NAND) or RAISE distributes parity across dies so that a complete die failure can be tolerated without data loss. This is the last line of defense after ECC failure.
SSDs also internally scramble data using a linear feedback shift register (LFSR) with XOR-based descrambling. This is separate from AES encryption used in self-encrypting drives – scrambling is for electrical balancing and wear normalization, not security. Some drives also apply AES encryption to all data on-flash to enable fast cryptographic erase.
Program Disturb and Interference
Programming a cell affects its neighbors. Wordline n introduces disturb on adjacent wordlines n-1 and n+1. For this reason, pages are programmed in a specific sequence to minimize disturb – a technique called shadow programming, which interleaves the programming steps of multiple wordlines.
There are several types of program disturb. In planar NAND, the primary concern was X-direction (along the wordline) interference. 3D NAND adds Y-direction (vertical, between layers) and XY (combined) disturb, each mitigated through different voltage biasing strategies.

Figure 8: Program disturb in 3D NAND – X-disturb (horizontal, along the wordline), Y-disturb (vertical, between layers), and XY-disturb (diagonal). The programmed cell affects its neighbors through electric field stress. During programming, most wordlines (e.g., 72 of 96) are biased with Vpass while neighboring wordlines receive specific bias voltages to counter electric field stress via control gate drivers.
The 3D structure significantly reduces cell-to-cell coupling compared to planar flash because of the larger effective process node (cells are physically larger). This reduction in coupling is one of the main advantages of going vertical and is why 3D NAND can sometimes use simpler, faster programming sequences. MONOS (metal-oxide-nitride-oxide-silicon) and SONOS structures have different coupling characteristics that influence disturb behavior.
Other disturb types include inhibit disturb (from the bias voltages used to prevent unselected cells from being programmed) and hot carrier injection (HCI) issues at string edges, which are mitigated by dummy wordlines. Read disturb, while less severe than program disturb, has a weak programming effect proportional to the voltage gap in less-programmed cells, exacerbated by Vpass. Flash manufacturers manage this through multi-deck architectures where decks can be independently managed, and by placing circuitry underneath the array.
Advanced NAND Topology
Sub-planes
A sub-plane divides a plane into two or more units with dedicated circuitry (page buffers, sense amplifiers) for improved interleaving on smaller I/O. A 16 KiB page/wordline in a plane could be divided into two 8 KiB lines with two sub-planes, improving performance for small random reads. The legacy IMFT (Intel-Micron) tile architecture used 2 KiB tiles (4 KiB pairs) for partial reads. Samsung's 128L flash also uses sub-planes.
Pages, Subpages, and Wordlines
Multi-bit NAND has "strong" pages (first-programmed bits, e.g., LSB) and "weak" pages (subsequent bits) with different reliability characteristics. Subpage-level writes are possible but with significantly reduced reliability outside of pSLC mode.
The 16 KiB physical page contains four 4 KiB logical pages (subpages). Addressing tables are typically page-level but can use multi-level indexing to track subpages. When subpage writes cannot be coalesced, the FTL performs read-modify-write (RMW), with the associated concern of internal page fragmentation. Partial page reads at the subpage level are faster than full page reads – the reduced data volume means shorter bitline settling times.
Peripheral Circuitry
The flash array requires extensive support circuitry: bitline and row (wordline) decoders for cell/block selection, pass-transistors to transfer voltages from global wordlines to physical wordlines, sense amplifiers for reading, charge pumps for voltage generation, and data latches for buffering. Moving this circuitry under the array (PUC/CuA/CoP, as discussed in Chapter 8) introduces trade-offs in power routing, signal integrity, and thermal management that each manufacturer resolves differently.
Through-silicon vias (TSV) provide vertical electrical connections between stacked layers. Within the array itself, dummy wordlines (DWL), selector gates, string and ground selection transistors (SST, GST), and wordline pad connection areas (the "staircase" structure) are all part of the physical layout.
The NAND Interface
Modern NAND uses a double data rate (DDR) interface – sending data on both clock edges – known as "toggle mode" (Samsung/Kioxia) or ONFI DDR (for the ONFI specification). The transfer rate is measured in megatransfers per second (MT/s), which is twice the clock speed.
Interface speeds have scaled rapidly with each flash generation: 64L flash at 533–667 MT/s, 96L at 800 MT/s, 128L up to 1,200 MT/s, 176L at 1,600–2,000 MT/s, 232L at 2,400 MT/s, 276L at 3,600 MT/s, and upcoming 332L targeting 4,800 MT/s. ONFI 5.1 supports 3,600 MT/s; ONFI 6.0 (January 2026) extends NV-LPDDR4 to 4,800 MT/s. Samsung has demonstrated 5.6 GT/s for its 400+ layer V-NAND.
The bidirectional data bus (DQ) uses a strobe signal (DQS) with address latch enable (ALE), command latch enable (CLE), read enable (RE), and write enable (WE) signals. Flash dies typically have 8 or 16 I/O pins, with most consumer flash operating in 8-bit (one byte) mode. A flash die rated at 3,600 MT/s with 8-bit I/O has a peak per-die bandwidth of 3,600 MB/s, though actual usable bandwidth is lower after accounting for command overhead and protocol-level idle cycles.
The increasing MT/s ratings across flash generations reflect not just raw speed improvements but also the ability to interleave more efficiently, reducing the idle time between transfers.
Performance Math
Write Performance
Theoretical write throughput is derived from two values: average program time (tPROG) and page size. For 16 KiB pages with a tPROG of 500 µs (typical for 64L/96L era TLC):
Programs/second = 1 / 0.0005 = 2,000
Throughput = 2,000 × (16 / 1,000) = 32 MB/s per die-plane
With two planes per die and 16 dies (for a 512 GB drive at 256 Gb/die), you get 32-way interleaving:
Total write throughput = 32 × 32 = 1,024 MB/s
TLC program time is typically reported as an average across all three page types (LSB, CSB, MSB), with upper pages taking the majority of the time. SLC mode has significantly lower tPROG – roughly 5× faster according to manufacturer specifications – which is why SLC cache speeds far exceed native flash speeds.
Multi-plane operations multiply the effective page size: four-plane flash programs 4 × 16 = 64 KiB per operation. Sub-planes can improve interleaving further for smaller I/O.
Read Performance
Read latency (tR) is typically an order of magnitude lower than native write latency. This means sequential reads often max out the interface or controller bandwidth long before the flash is the bottleneck. For PCIe 5.0 x4 (~14,000 MB/s theoretical), current controllers like the Phison E28 and SM2508 can approach the interface ceiling with 3,600 MT/s flash.
Per-channel bandwidth is limited by the flash interface speed. An eight-channel controller with 3,600 MT/s flash has a theoretical maximum of 3,600 × 8 = 28,800 MB/s raw bandwidth, but protocol overhead (typically 15–20%) and the PCIe interface limit bring actual sequential reads to the 12,000–14,900 MB/s range seen in current PCIe 5.0 drives.
Four-channel controllers with fast flash can still achieve competitive sequential read speeds – 2,400+ MB/s was common with earlier designs, and current four-channel PCIe 4.0 controllers reach 5,000+ MB/s. The trade-off is in IOPS, where fewer channels mean less parallelism for random workloads.
QLC read latency is roughly twice that of TLC at equivalent geometry, which is one reason QLC drives show weaker random read performance even when sequential read speeds are competitive.
Practical Notes
These calculations provide useful approximations but are not precise predictors of real-world performance. Actual tPROG and tR vary with flash wear, temperature, voltage optimization, and the specific programming sequence used. Manufacturers optimize extensively around these parameters, and the gap between theoretical and measured performance can be significant. The math is most useful for understanding relative performance – why one configuration outperforms another – rather than predicting exact numbers.
Metadata, Tables, and Addressing
The FTL tracks numerous metadata structures in dedicated tables:
- Block wearing information (BWI): Erase cycle counts (total and recent) per block, typically 18–19 bits per entry.
- Block state table (BST): 3 bits per block indicating empty/used status and the presence of bad pages. An additional 2 bits can define the block's current mode (00=SLC, 01=MLC, 10=TLC, 11=QLC).
- Page status: For dynamic SLC, per-page tracking (3 bits) of SLC capability, since pages in multi-level cells don't wear evenly.
- Block erasing table (BET), static wear leveler (SWL), write error table (WET): Supporting structures for garbage collection and wear management.
Address Mapping
The standard mapping ratio is 4 bytes per 4 KiB sector (1 byte per KiB), using a flat indirection table that maps logical page numbers (LPN) to physical page numbers (PPN). With modern TLC at 16 KiB physical pages, the FTL must manage the mismatch between the 4 KiB logical page and the 16 KiB physical page, either through subpage indexing or coalescing.
Mapping table compression reduces DRAM requirements. One approach, described in Intel patents, groups contiguous logical addresses and stores a starting offset plus count, achieving compression ratios around 1.78×. Sequential workloads and I/O requests larger than 4 KiB benefit most from compression. Consoles like the PlayStation 5 use coarse-grain mapping optimized for sequential transfers, allowing them to rely primarily on SRAM without large DRAM allocations.
Alternative FTL structures include binary trees (vs. flat tables), inode-based schemes, and subpage-aware FTLs (subFTL) that handle the 4 KiB-to-16 KiB mismatch more efficiently. Delta/differential compression and progressive SLC programming are other approaches to managing subpage writes. The exact FTL implementation varies by controller and is typically proprietary.
Data Retention, Stale Data, and Scrubbing
Static data that hasn't been rewritten for an extended period exhibits voltage drift and widening distributions, increasing the bit error rate and read latency. The controller mitigates this through scrubbing – periodically reading and evaluating blocks, and relocating data that has degraded beyond a threshold. This is analogous to the background patrol reads used in enterprise storage systems.
In-place refresh (re-pulsing cells without a full erase cycle) is sometimes possible but more often the controller merges partial blocks or rewrites the data entirely. If the drive is powered off for extended periods, charge leakage eventually exceeds what ECC can correct, resulting in data loss. The rate of retention loss depends on temperature (higher temperatures during storage accelerate leakage), cell type (QLC has tighter margins than TLC), and the wear level of the flash.
The relationship between programming temperature and retention temperature is known as cross-temperature or "swing." Data written at high temperature and read at low temperature (or vice versa) shows elevated error rates because the voltage distributions shift with temperature. This is primarily an enterprise concern – consumer drives rarely experience extreme temperature differentials – but it's one of the factors that JEDEC endurance testing accounts for.
Garbage Collection: Merge Operations
Garbage collection reclaims partially-used blocks through three types of merge operations:

Figure 9: The three merge types. Switch merge is most efficient (direct replacement), full merge is least efficient but most common (gathering valid pages from multiple blocks).
- Switch merge: The most efficient. All pages in the source block have been sequentially updated in the destination block, so the source is simply erased. This only applies when the write pattern aligns perfectly.
- Partial merge: A subset of pages has been sequentially updated. The remaining valid pages are copied from the source to the destination, then the source is erased.
- Full merge: The least efficient but most common. Valid pages from multiple source blocks are gathered and written sequentially into a destination block, then the source blocks are erased. This involves the most data movement and the highest write amplification.
The GC algorithm's block selection strategy – which blocks to clean and when – has a major impact on write amplification and latency spikes. Optimal GC scheduling balances the urgency of freeing blocks against the cost of interrupting user I/O and the benefit of choosing blocks with the most invalid pages. Advanced algorithms factor in block wear, data temperature (hot vs. cold), and the predicted idle window duration.
Wear-leveling algorithm specifics include evenness-aware approaches (preventing static data from occupying any single block for too long, minimizing the maximum erase-count difference between blocks) and dual-pool algorithms based on data temperature (hot/cold, inversely proportional to the time between recent writes) and block age (young/old, based on erase count relative to the population). Efficient reliability-aware (ERA) wear-leveling uses bit error rates rather than raw erase counts for more accurate wear assessment.
Advanced SSD Topics
Advanced Topic – This chapter covers specialized and enterprise-adjacent SSD topics. Part I (Chapters 1–8) covers everything needed for practical SSD understanding.
Tiering and Caching Solutions
Tiering
Tiered storage splits data across devices of different performance levels – fast SSDs for hot (frequently accessed) data and slower HDDs or QLC drives for cold (archival) data. The system monitors access patterns and automatically migrates data between tiers based on a heatmap.
Microsoft's Storage Spaces is the most accessible tiering solution on Windows, supporting SSD and HDD tiers with automatic data movement. AMD's StoreMI (based on FuzeDrive by Enmotus) was a popular consumer option but has been discontinued. Enterprise environments have more options including ZFS tiering, Ceph, and purpose-built storage appliances.
On-drive tiering also exists. The SLC cache mechanism in consumer drives is itself a form of internal tiering – fast single-bit storage for incoming writes, with data migrated to the denser native flash. Intel's former H10 and H20 hybrid drives combined Optane memory with QLC NAND on a single M.2 stick, though both have been discontinued following Intel's exit from Optane.
Caching
Caching uses an SSD to accelerate I/O for a larger, slower storage pool – typically an HDD array. Unlike tiering, caching doesn't permanently move data; it maintains a fast copy of the most frequently accessed blocks. Solutions include DrivePool (filesystem-level), PrimoCache (block-level), and ZFS's L2ARC (read cache) and SLOG (write intent log).
For NAS environments using write or read/write caches, redundancy in the cache (mirrored SSDs) is important – a cache failure during a write can result in data loss. Read-only caches don't need redundancy since the authoritative data remains on the backing storage.
RAID with SSDs
SSDs can be used in RAID configurations like any other storage device. However, the performance case for SSD RAID is weaker than for HDDs. SSDs are already fast enough for most workloads, and the benefits of striping (RAID-0) only materialize at high queue depths that consumer workloads rarely reach. RAID-1 (mirroring) for redundancy remains useful.
Consumer-accessible RAID – through motherboard chipsets, Windows Disk Management, Storage Spaces, or Intel RST – is all effectively software RAID, handled by the system CPU with corresponding overhead. Intel's "Fake RAID" (UEFI-assisted RAID) adds firmware for boot management but is still CPU-processed. True hardware RAID controllers exist but are expensive and primarily enterprise-oriented.
RAID with SSDs introduces TRIM support complications. Not all RAID implementations pass TRIM commands to the underlying drives, which can degrade performance over time. Windows Storage Spaces and recent Linux mdadm versions support TRIM in RAID, but verify before relying on it.
PCIe Bifurcation
PCIe bifurcation splits a single high-lane-count PCIe slot into multiple lower-lane-count connections. A x16 slot can be bifurcated into x8/x8 or x8/x4/x4 configurations, allowing multiple NVMe drives to share a single physical slot.
An important distinction: CPU-direct PCIe lanes can be bifurcated (if the motherboard supports it), but chipset lanes are virtual lanes created through multiplexing and are managed differently. Most consumer motherboard PCIe lanes go to the GPU, so bifurcation for storage usually involves sharing with or taking lanes from the GPU slot.
Bifurcation support requires either motherboard-level support (a physical PCIe switch on the board, which is uncommon on consumer boards) or an adapter card with its own PCIe switch – the latter can be expensive. Check your motherboard documentation before purchasing a bifurcation adapter.
Note that PCIe is generation-matched at the lowest common denominator: a PCIe 3.0 x4 drive in a PCIe 4.0 x2 slot runs at x2 PCIe 3.0 speeds.
Console Storage
PlayStation 5
The PS5 uses Kioxia BiCS flash embedded directly in the PCB, originally at 825 GB (updated models ship with 1 TB). Sequential read speeds reach 5.5 GB/s raw, or up to 9 GB/s with the system's Kraken compression (with Oodle and Zlib fallback).
Sony's official specification for the PS5 expansion slot requires a PCIe Gen4 x4 M.2 NVMe drive in 2230, 2242, 2260, 2280, or 22110 sizes, with 5,500 MB/s sequential read recommended for optimal performance. In practice, any PCIe 4.0 NVMe drive works well regardless of whether it hits that sequential read target. Community testing has shown that even PCIe 3.0 NVMe drives function in the slot with slightly longer load times, but this is outside Sony's official compatibility and not guaranteed across firmware updates.
Xbox Series X|S
The Xbox Series X uses a proprietary expansion card based on the CFexpress standard, partnered exclusively with Seagate (though third-party cards are now available from WD and others). Internally, the Series X uses an OEM SN530 (WD) while the Series S uses an SSSTC CL1. The expansion card uses a Seagate design with a Phison E19T controller. The internal M.2 socket supports x2 PCIe 4.0 or x4 PCIe 3.0 operation.
Texture compression uses BCPACK (up to 2× for textures) with Zlib fallback.
DirectStorage and GPU-Direct I/O
DirectStorage allows games to load compressed assets directly from NVMe storage to the GPU, bypassing the traditional CPU decompression bottleneck. The technology works with any NVMe drive using the Microsoft inbox driver – no special SSD hardware is required.
Adoption has been gradual. Forspoken was the first PC title with DirectStorage support, followed by Ratchet & Clank: Rift Apart, Forza Motorsport, and Horizon Forbidden West, among others. Results have been mixed – some titles showed significant load time improvements while others exposed integration challenges. Microsoft has continued developing the ecosystem with Zstandard compression support and the Game Asset Conditioning Library (GACL).
DirectStorage is compatible with Windows 10 version 1909 and later, including the GPU decompression API. However, Windows 11 provides additional storage stack optimizations – specifically the bypass of legacy I/O path layers – that improve DirectStorage performance. Titles using DirectStorage will function on Windows 10, but the full performance benefit is realized on Windows 11.
On-Drive Compression
Some controllers can compress data on the fly before writing to flash, reducing write amplification (potentially below 1.0) and increasing effective capacity. SandForce's controllers popularized this approach, and the technology persists in some enterprise drives (Seagate's DuraWrite, Phison's SmartZIP).
The trade-off is that compression only benefits compressible data – incompressible data (already-compressed media, encrypted files) sees no benefit and may incur controller overhead. Compression also affects benchmarking: tests with zeroed or patterned data will show inflated results compared to real-world incompressible workloads. Data entropy is the key variable.
PCIe 5.0 Thermal Challenges
PCIe 5.0 brought a significant increase in power consumption for SSD controllers. First-generation PCIe 5.0 controllers like the Phison E26 (12nm) drew 7–10W under active workloads, creating thermal challenges for the M.2 form factor. The industry responded with a process node reduction to 6nm for second-generation controllers (Phison E28, SM2508), substantially improving power efficiency at both active and idle states.
Many PCIe 5.0 M.2 drives require heatsinks or active cooling to sustain peak performance without throttling. Motherboard M.2 heatsink designs have evolved accordingly, with some boards offering dedicated airflow paths or integrated heatpipe solutions for M.2 slots.
CXL and Future Interconnects
Compute Express Link (CXL) is an open interconnect standard built on the PCIe physical layer that enables coherent memory sharing between CPUs, accelerators, and memory devices. CXL 2.0 is in production with devices from Samsung, Micron, and others. CXL 3.1 (finalized 2024) adds enhanced fabric capabilities for memory pooling.
CXL is relevant to storage through CXL-attached flash devices that can present NAND as byte-addressable memory – a potential successor to some use cases that Intel Optane Persistent Memory once filled. CXL 3.x fabric-attached memory pools may further blur the line between storage and memory tiers in datacenter environments.
For consumers, CXL is not yet directly relevant. Its impact will be felt indirectly if CXL-attached memory tiers reduce demand for high-endurance NAND in enterprise, potentially shifting NAND supply and pricing dynamics.
CRC and Data Protection
Cyclic redundancy check (CRC) is used both internally within the SSD (protecting data on the bus between controller and flash) and externally (protecting data during NVMe transfers). When NVMe protection types are enabled, each data block carries an additional 8-byte metadata field: 2 bytes for CRC on the user data and up to 6 bytes for additional reference and application tags. This end-to-end data protection is primarily used in enterprise environments where silent data corruption is unacceptable.
Other Memory Technologies
Advanced Topic – This chapter covers memory technologies beyond NAND flash. Part I (Chapters 1–8) covers everything needed for practical SSD understanding.
3D XPoint and Optane (Historical)
3D XPoint, branded as Optane by Intel, was a non-volatile memory technology fundamentally different from NAND. It was write-in-place memory – data could be overwritten without the erase-then-program cycle that NAND requires. This eliminated the need for garbage collection and most over-provisioning, and it meant that performance didn't degrade as the drive filled. 3D XPoint was classified as storage-class memory (SCM), sitting between DRAM and NAND in the performance and cost hierarchy.
Optane's strengths were exceptional low-queue-depth performance, very low access latency, and endurance far beyond any NAND technology. It was particularly valued for write-intensive enterprise workloads and as a persistent memory tier (Intel Optane Persistent Memory) in server configurations. Some SLC NAND implementations were positioned as low-latency competitors, but Optane consistently held advantages in random write latency and endurance.
Intel announced the wind-down of Optane in July 2022, taking a $559M write-off on related equipment. Neither Intel nor Micron (who co-developed the underlying technology) produces 3D XPoint any longer. Micron sold its Dalian fabrication facility, and Intel's Optane SSDs reached end-of-life in 2024. No direct successor has emerged in commercial production, leaving a gap in the storage hierarchy that CXL-attached memory devices and advanced NAND configurations are attempting to fill.
NOR Flash
NOR flash, based on NOR gate logic, is a specialized memory technology with different characteristics than NAND. NOR is significantly faster for random reads – it provides byte-level addressability with enough SRAM to map the entire chip – but is more expensive per bit and slower to write and erase. These characteristics make NOR unsuitable for bulk data storage but ideal for applications requiring fast, reliable code execution: BIOS/UEFI firmware, automotive systems, IoT devices, and other embedded applications where read speed and data integrity matter more than density.
NOR comes in parallel and serial (SPI) variants, with serial NOR being the more common form in modern designs due to its simpler interface and lower pin count.
Hybrid Storage Devices
Some drives have combined multiple memory types on a single device. Intel's Optane Memory H10 and H20 put 3D XPoint and QLC NAND on the same M.2 stick, using the Optane as a cache tier for the QLC storage. Both were discontinued with the rest of the Optane line.
Enmotus's MiDrive (shipped briefly as the FuzeDrive SSD) combined SLC and QLC flash with software-driven data placement. Enmotus has since gone out of business.
The concept of heterogeneous memory on a single device remains valid – having a fast, high-endurance tier co-located with a dense, cost-effective tier is architecturally sound. No widely available consumer products currently use this approach, but the internal SLC/TLC or SLC/QLC caching structure in modern drives is a simpler version of the same idea.
Smartphone and Embedded Flash
Mobile devices use embedded flash standards rather than the SATA or NVMe interfaces found in PCs. eMMC (embedded MultiMediaCard) was the long-standing standard but has been largely replaced by UFS (Universal Flash Storage) in modern smartphones and tablets. UFS 4.0 (2022) and UFS 4.1 (2024) offer speeds up to 4.2 GB/s per lane with significant improvements in power efficiency, making them competitive with entry-level NVMe performance while maintaining the low power profile required for battery-powered devices.
MRAM and Emerging Non-Volatile Memory
Several non-volatile memory technologies are in various stages of development and early production, though none has emerged as a clear successor to NAND for bulk storage.
MRAM (Magnetoresistive RAM) stores data using magnetic states rather than electrical charge. Embedded MRAM (eMRAM) is already in production for specific applications – GlobalFoundries offers it as a cost-effective solution for low-power, non-volatile code and data storage in embedded systems. MRAM's advantage is near-DRAM speed with non-volatility, but current densities are far below what NAND achieves.
Phase-change memory (PCM) was the basis for 3D XPoint and remains an area of research. Other PCM implementations may find use in embedded and automotive applications where NAND's write endurance limitations are a concern.
Resistive RAM (ReRAM/RRAM) and ferroelectric RAM (FeRAM) are additional emerging technologies targeting the gap between DRAM and NAND. Both offer faster write speeds and better endurance than NAND but at much higher cost per bit. Their near-term impact is most likely in embedded and edge computing rather than consumer storage.
Key:Value SSDs and Computational Storage
Traditional SSDs present a block-level interface – the host reads and writes fixed-size blocks at specific addresses. Key:value (KV) SSDs instead implement an object-level interface where the host stores and retrieves data by key, and the drive's firmware handles the mapping internally through an Object Translation Layer (OTL). This eliminates the host-side overhead of translating between application-level data structures and block addresses, potentially improving both performance and efficiency for database and object storage workloads. Samsung has been a leader in KV-SSD development.
Computational storage takes this further by offloading processing to the storage device itself. Rather than moving data from the SSD to the CPU for processing, the drive performs operations – compression, encryption, search, database queries – locally and returns only the results. The SNIA Computational Storage Services (CSS) standard defines the architectural framework. Samsung and ScaleFlux have shipped early devices, and the approach is particularly promising for analytics and AI workloads where the volume of data movement between storage and compute is a bottleneck.
Both technologies are currently enterprise-focused but represent the direction of storage-compute convergence that may eventually influence consumer and prosumer products.
Glossary of Terms
3D NAND – NAND flash with memory cells stacked vertically in multiple layers, as opposed to planar (2D) NAND where cells are arranged in a single layer.
AHCI – Advanced Host Controller Interface. The protocol standard used with SATA storage devices. Designed for hard drives and less efficient than NVMe for solid state storage.
APST – Autonomous Power State Transition. An NVMe feature allowing the drive to manage its own power state transitions based on idle time.
ASIC – Application-Specific Integrated Circuit. In SSDs, the controller is an ASIC designed specifically for managing flash storage.
BiCS – Bit Cost Scalable. Kioxia/WD's 3D NAND architecture, a form of charge trap flash.
Block – The smallest unit of NAND that can be erased. Contains hundreds of pages. Modern blocks are on the order of 24 MiB.
BOM – Bill of Materials. The list of components used to manufacture a drive. BOM swaps – changing components without changing the model number – are a known industry practice.
Charge Trap (CT) – A 3D NAND cell architecture that stores charge in an insulating material rather than a conductive floating gate. Most modern 3D NAND uses charge trap technology.
CXL – Compute Express Link. An open interconnect standard built on PCIe that enables coherent memory sharing between CPUs, accelerators, and memory devices.
Die – A single piece of silicon containing NAND flash cells. The logical unit that the controller directly interfaces with. Current dies hold 512 Gb to 1+ Tb of data.
DRAM – Dynamic Random-Access Memory. Volatile memory used on SSDs to cache the FTL mapping table and metadata. Orders of magnitude faster than NAND.
DRAM-less – An SSD design without external DRAM, relying on the controller's SRAM and Host Memory Buffer (HMB) for mapping metadata.
DWPD – Drive Writes Per Day. The number of times a drive's full capacity can be written per day within the warranty period. Derived from TBW.
ECC – Error Correction Code. Algorithms (typically LDPC in modern SSDs) that detect and correct bit errors in NAND flash data.
EDSFF – Enterprise and Data Center Standard Form Factor. Server-optimized SSD form factors including E1.S and E3.S.
FDP – Flexible Data Placement. An NVMe feature (TP4146) that allows the host to guide data placement on the SSD, reducing write amplification.
Floating Gate (FG) – The original NAND cell architecture, storing charge on an electrically isolated conductive layer. Largely replaced by charge trap in current 3D NAND.
Folding – The process of migrating data from SLC cache blocks to native (TLC/QLC) flash blocks. Done on-die, sequentially, to minimize write amplification.
FTL – Flash Translation Layer. The firmware on the SSD controller that manages address translation, wear leveling, garbage collection, error correction, and all other flash management functions.
Garbage Collection (GC) – The process by which an SSD reclaims partially-used blocks by moving valid data to new blocks and erasing the old ones.
HMB – Host Memory Buffer. A feature allowing NVMe SSDs to use a portion of the host system's RAM for mapping metadata. Used primarily by DRAM-less drives.
Interleaving – Overlapping operations across multiple NAND dies or planes to increase throughput. The primary mechanism for achieving high sequential performance.
IOPS – Input/Output Operations Per Second. A measure of how many individual read or write operations a drive can handle. Most relevant at high queue depths.
ISPP – Incremental Step-Pulse Programming. The process of programming a NAND cell by increasing voltage in progressively smaller steps until the target level is reached.
LBA – Logical Block Address. The address scheme the operating system uses to reference data locations. The FTL translates LBAs to physical locations on the NAND.
LDPC – Low-Density Parity-Check code. The error correction algorithm used in modern SSDs, supporting both hard-decision and soft-decision decoding.
M.2 – A compact form factor and connector standard for SSDs and other expansion cards. M.2 2280 (22 mm × 80 mm) is the most common size for consumer NVMe SSDs.
MLC – Multi-Level Cell. NAND storing 2 bits per cell. Technically refers to any cell with more than one bit, but industry convention uses MLC to mean specifically 2 bits. Largely phased out of consumer drives.
MT/s – Megatransfers Per Second. The data transfer rate of the NAND flash interface, equal to twice the clock speed (DDR).
NAND – A type of non-volatile flash memory named after the NOT-AND logic gate. The primary storage medium in SSDs.
NVMe – Non-Volatile Memory Express. The protocol standard designed for solid state storage over PCIe. Offers lower latency, deeper command queues, and better efficiency than AHCI.
Over-Provisioning (OP) – Flash capacity reserved for internal SSD operations – garbage collection, wear leveling, spare blocks – that is not visible to the user.
P/E Cycle – Program/Erase Cycle. One complete cycle of programming and then erasing a NAND block. The primary measure of flash endurance.
Page – The smallest unit a SSD can write. Typically 16 KiB for modern TLC flash.
Plane – A subdivision of a NAND die with its own page buffer and circuitry. Multi-plane operations enable parallel access within a die. Current flash typically has 4 planes per die.
PLC – Penta-Level Cell. NAND storing 5 bits per cell (32 voltage states). Not yet in commercial production as of early 2026.
pSLC – Pseudo-SLC. Base flash (TLC or QLC) operating in single-bit mode for use as a fast write cache. Not equivalent to native SLC.
PUC/CuA/CoP – Periphery-Under-Cell / CMOS-under-Array / Core-over-Periphery. Design approaches that place peripheral control circuitry underneath the NAND cell array to save die area.
QLC – Quad-Level Cell. NAND storing 4 bits per cell (16 voltage states). Favors capacity and cost over performance and endurance.
Queue Depth (QD) – The number of I/O commands outstanding at the SSD at once. Most consumer workloads operate at QD1–QD4.
RAIN/RAISE – Redundant Array of Independent NAND / Silicon Elements. RAID-like parity protection across NAND dies within an SSD, providing data recovery if a die fails.
SATA – Serial ATA. The older storage interface standard, supporting up to ~600 MB/s bandwidth. Used with 2.5" SSDs and some M.2 drives.
SLC – Single-Level Cell. NAND storing 1 bit per cell (2 voltage states). Fastest and most durable cell type. Native SLC is rare in consumer products; most SLC in consumer SSDs is pSLC.
SMART – Self-Monitoring, Analysis and Reporting Technology. A built-in health monitoring system that tracks drive parameters including temperature, data written, and error rates.
Superblock – A group of blocks with the same offset across all dies and planes, written as a unit for maximum parallelism.
String-Stacking – Manufacturing technique that bonds two or more shorter 3D NAND stacks together to achieve higher total layer counts. Now universally used for 200+ layer NAND.
TBW – Terabytes Written (or Total Bytes Written). The warranted write endurance of an SSD.
TLC – Triple-Level Cell. NAND storing 3 bits per cell (8 voltage states). The current mainstream technology for consumer SSDs, balancing capacity, performance, and endurance.
TRIM – An ATA command (UNMAP for SCSI) that tells the SSD which data blocks are no longer in use, allowing the controller to reclaim them during garbage collection.
V-NAND – Samsung's brand name for its 3D NAND technology, a form of charge trap flash with peripheral circuitry under the cell array.
Wear Leveling – FTL algorithms that distribute P/E cycles evenly across all NAND blocks to prevent premature wear of frequently-written blocks.
Write Amplification (WA) – The ratio of data written to NAND versus data written by the host. A WA of 2.0 means the drive writes twice as much to flash as the host requested. Lower is better.
ZNS – Zoned Namespaces. An NVMe feature that organizes the drive's address space into zones with sequential write requirements, reducing garbage collection overhead.
Standards Organizations
JEDEC
The JEDEC Solid State Technology Association is the primary standards body for the semiconductor industry. For SSDs, JEDEC defines standards for NAND flash specifications, endurance testing methodology, and reliability metrics. JEDEC's endurance and data retention test standards are the basis for the TBW and DWPD ratings on consumer and enterprise drives. JEDEC also standardizes DRAM specifications (DDR4, DDR5, LPDDR) used in SSD designs.
Website: jedec.org
ONFI
The Open NAND Flash Interface Working Group develops interface standards for NAND flash communication. ONFI specifications define the electrical interface, command set, and data transfer protocols between the SSD controller and the NAND flash packages. ONFI 5.1 supports transfer rates up to 3,600 MT/s; ONFI 6.0 (January 2026) extends NV-LPDDR4 to 4,800 MT/s. Samsung and Kioxia use their own "toggle mode" interface specification, though both converge under JEDEC standardization.
Website: onfi.org
SNIA
The Storage Networking Industry Association develops standards for the broader storage industry, including the Solid State Storage Performance Test Specification (SSS PTS) for SSD benchmarking, the Computational Storage Services (CSS) architecture, and various enterprise storage management standards.
Website: snia.org
NVM Express
The NVM Express organization develops and maintains the NVMe specification. The current major base revision is 2.1, structured as a library of specifications covering the base protocol, command sets (including Zoned Namespaces and Key Value), transport specifications (NVMe-oF for fabrics), and management interface (NVMe-MI). NVM Express publishes ongoing revision change documents between major versions.
Website: nvmexpress.org
JTAG
JTAG (Joint Test Action Group) is an industry standard for testing and debugging printed circuit boards. In the SSD context, JTAG access allows hardware-level diagnosis and data recovery through specialized tools like the PC-3000 SSD and software like OpenOCD. This is primarily relevant for professional data recovery when a drive has failed at the firmware or controller level.
Recommended Software and Tools
Health Monitoring
- CrystalDiskInfo – Free, widely-used SMART monitoring tool for both SATA and NVMe drives. Shows temperature, total data written, health percentage, and raw SMART attributes.
- Hard Disk Sentinel – More detailed health monitoring with performance tracking over time. Paid, with a trial version available.
- smartmontools – Command-line SMART monitoring for Linux and Windows. Useful for scripting and remote monitoring. Supports both SATA and NVMe.
Benchmarking
- CrystalDiskMark – The most common quick benchmark. Tests sequential and random read/write at configurable queue depths and thread counts. Good for comparing drives but be aware of SLC cache effects on write results.
- FIO (Flexible I/O Tester) – The industry-standard storage benchmarking tool. Fully configurable workloads: block size, queue depth, read/write mix, I/O pattern, duration, and more. Available on Linux and Windows. ezFIO provides a simplified frontend.
- elbencho – A newer distributed storage benchmarking tool with support for multi-host testing and unique feature testing scenarios.
- ATTO Disk Benchmark – Tests sequential performance across a range of transfer sizes. Useful for identifying the point where a drive saturates its interface.
Manufacturer Toolboxes
Most major SSD manufacturers offer free management software:
- Samsung Magician – SMART monitoring, firmware updates, performance benchmarking, secure erase, over-provisioning management.
- WD Dashboard – Health monitoring, firmware updates, drive settings for Western Digital and SanDisk drives.
- Crucial Storage Executive – Health monitoring, firmware updates, Momentum Cache (generally not recommended).
- Kingston SSD Manager – Health monitoring and firmware updates for Kingston drives.
- Seagate SeaTools – Health monitoring and diagnostics for Seagate drives.
NVMe Management
- nvme-cli – Command-line NVMe management tool for Linux. Supports format, sanitize, firmware update, SMART log retrieval, and low-level NVMe commands that may not be available through manufacturer toolboxes or Windows.
- StorNVMe (inbox driver) – The Microsoft inbox NVMe driver. Sufficient for all consumer use including HMB and DirectStorage support. Custom NVMe drivers are no longer necessary for most drives.
Maintenance and Utilities
- Parted Magic – Bootable Linux environment with GParted, secure erase tools, and disk utilities. Useful for drive preparation and wiping.
- Link Shell Extension – Windows shell extension for creating hard links and junctions. Useful for managing data across multiple drives without duplicating files.
Additional Resources
- SSD Buying Guide – Interactive guide for selecting an SSD based on your needs.
- SSD Spreadsheet – Detailed hardware breakdown of current and recent SSDs (controller, flash, DRAM, cache design).
- r/NewMaxx – Community discussion, buying advice, and additional resources.
- borecraft.com – Technical documents, papers, and tools.
References and Further Reading
Textbooks
-
Micheloni, R., Marelli, A. and Eshghi, K. (2018). Inside Solid State Drives. 2nd ed. New York: Springer. – The primary technical reference for SSD internals, covering controller architecture, FTL design, wear-leveling algorithms, NAND cell physics, and error correction. Cited throughout this guide.
-
Micheloni, R. (ed.) (2024). 3D Flash Memories. 2nd ed. Springer. – Covers 3D NAND architectures including charge trap, replacement gate, string-stacking, and peripheral-under-cell designs.
-
Brewer, J.E. and Gill, M. (2008). Nonvolatile Memory Technologies with Emphasis on Flash. Wiley-IEEE Press. – Foundational reference on non-volatile memory physics and engineering.
Technical Papers and Conference Proceedings
-
Cai, Y., Haratsch, E.F., Mutlu, O. and Mai, K. (2013). "Error Patterns in MLC NAND Flash Memory: Measurement, Characterization, and Analysis." Proceedings of the Design, Automation & Test in Europe Conference (DATE). – Comprehensive analysis of error mechanisms in NAND flash. Available at borecraft.com/documents/Errors_in_Flash_SSDs.pdf
-
Cai, Y., Ghose, S., Haratsch, E.F., Luo, Y. and Mutlu, O. (2017). "Errors in Flash-Based Solid State Drives: Analysis, Mitigation, and Recovery." arXiv:1711.11427. – Extended survey covering voltage programming (ISPP, foggy-fine), read disturb, error correction, data retention, and garbage collection. Available at arxiv.org/abs/1711.11427
-
Zhao, W. and Zhang, T. (2013). "LDPC-in-SSD: Making Advanced Error Correction Codes Work Effectively in Solid State Drives." Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST '13). – Hard vs. soft decision LDPC decoding, progressive sensing, multi-step error correction.
-
Cai, Y., Mutlu, O., Haratsch, E.F. and Mai, K. (2013). "Program Interference in MLC NAND Flash Memory: Characterization, Modeling, and Mitigation." Proceedings of the IEEE International Conference on Computer Design (ICCD). – Cell-to-cell coupling interference in floating gate NAND.
-
Bjørling, M., Gonzalez, J. and Bonnet, P. (2017). "LightNVM: The Linux Open-Channel SSD Subsystem." Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST '17). – Open-channel SSD framework, host-based FTL concepts. Precursor to ZNS and FDP.
-
Lee, S., Shin, D., Kim, Y. and Kim, J. (2008). "LAST: Locality-Aware Sector Translation for NAND Flash Memory-Based Storage Systems." ACM SIGOPS Operating Systems Review, 42(6). – Merge operation types (switch, partial, full) for garbage collection.
-
Jin, Y. et al. (2017). "KAML: A Key-Addressable Multi-Log Structured Merge Tree for SSD." Proceedings of the 2017 IEEE International Conference on Computer Design (ICCD). – Key:value SSD architecture with custom FTL.
-
Prabhakaran, V., Rodeheffer, T.L. and Zhou, L. (2008). "Transactional Flash." Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI '08). – Transactional semantics for flash memory systems.
Industry Standards and Specifications
-
NVM Express (2024). NVM Express Base Specification, Revision 2.1. nvmexpress.org – Current NVMe protocol specification covering command sets, namespaces, and controller features.
-
NVM Express (2023). NVMe Zoned Namespaces Command Set Specification (TP4053). nvmexpress.org – Sequential write zone model for reduced garbage collection.
-
NVM Express (2023). NVMe Flexible Data Placement (TP4146). nvmexpress.org – Host-guided data placement for write amplification reduction. Ratified standard.
-
ONFI (2024). Open NAND Flash Interface Specification, Revision 5.1. onfi.org – NAND flash interface standard supporting up to 3,600 MT/s.
-
ONFI (2026). Open NAND Flash Interface Specification, Revision 6.0. onfi.org – NV-LPDDR4 interface extending to 4,800 MT/s. Published January 2026.
-
JEDEC (2020). JESD218B.01: Solid-State Drive (SSD) Requirements and Endurance Test Method. jedec.org – SSD endurance and retention testing methodology. Basis for TBW and DWPD ratings.
-
JEDEC (2022). JESD79-5B: DDR5 SDRAM Standard. jedec.org – DDR5 DRAM specification, relevant to SSD DRAM subsystems.
-
PCI-SIG (2022). PCI Express Base Specification, Revision 6.0. pcisig.com – PCIe 6.0 with PAM4 signaling, 64 GT/s per lane.
-
PCI-SIG (2019). PCI Express Base Specification, Revision 5.0. pcisig.com – PCIe 5.0, 32 GT/s per lane, NRZ signaling. Current high-end consumer SSD interface.
-
CXL Consortium (2024). Compute Express Link Specification, Revision 3.1. computeexpresslink.org – CXL 3.1 with enhanced fabric capabilities for memory pooling and sharing.
-
SNIA (2022). Computational Storage Architecture and Programming Model. snia.org – Framework for computational storage devices.
-
SNIA (2013). Solid State Storage Performance Test Specification (SSS PTS), Enterprise v2.0.1. snia.org – Standardized SSD benchmarking methodology.
-
UFS Association / JEDEC (2024). Universal Flash Storage (UFS) Specification, Version 4.1. jedec.org – Mobile/embedded flash storage standard, up to 4.2 GB/s per lane.
Manufacturer Technical Documents and White Papers
-
Samsung (2021). "A 176-Stacked 512Gb 3b/Cell 3D-NAND Flash with 10.8Gb/mm² Density." IEEE International Solid-State Circuits Conference (ISSCC 2021), Session 30.3. – Trade-offs of peripheral-under-cell design in Samsung 176L V-NAND.
-
Samsung (2025). "400+ Layer V-NAND." IEEE International Solid-State Circuits Conference (ISSCC 2025). – Samsung V10 demonstration with 5.6 GT/s interface speed.
-
SK hynix (2025). "Split-Cell Charge Trap 5-Bit-Per-Cell NAND." IEEE International Electron Devices Meeting (IEDM 2025). – Charge trap split-cell PLC research, most promising path toward commercial PLC.
-
SK hynix (2025). "321-Layer Triple-Deck 3D NAND." – First triple-deck string-stacking design in production.
-
Micron (2023). "232-Layer 3D NAND (G8) Technology Brief." micron.com – Current-generation Micron NAND with replacement gate architecture and CuA.
-
Micron (2024). "276-Layer 3D NAND (G9) Technology Brief." micron.com – Micron G9 with 3,600 MT/s interface speed.
-
Western Digital (2021). "SSD Endurance: Speeds, Feeds, and Needs." blog.westerndigital.com – Consumer-oriented explanation of TBW, DWPD, and P/E cycles.
-
Seagate (2017). "Multi-Tier Caching Technology." seagate.com – White paper on FireCuda SSHD tiering architecture. Historical reference.
-
Micron (2015). "SSD SMART Attributes and WAF Calculation (TN-FD-23)." micron.com – Write amplification factor calculation methodology using SMART data.
-
Micron (2015). "Dynamic Write Acceleration Brief." micron.com – Write amplification additive factor from dynamic SLC caching.
-
Hambrey, S. (2018). "Threshold Calibration." Flash Memory Summit 2018, Session FTEC-202-1. – Voltage threshold calibration techniques for reducing raw bit error rates.
-
Haratsch, E.F. (2017). "Voltage Calibration." Flash Memory Summit 2017, Session FE22. – Voltage calibration methods for improving read accuracy in aging NAND.
-
Cox, A. "Temperature and Data Retention in NAND Flash." JEDEC presentation. – Temperature dependence of data retention, cross-temperature effects.
-
Microsoft (2022). "DirectStorage Developer Overview." developer.microsoft.com – DirectStorage API for GPU-direct decompression of NVMe storage.
U.S. Patents
- US 20190294345A1 – Controller endurance translation logic (ETL) for DRAM cache management.
- US 8,664,780 – Die stacking methods for NAND flash packages.
- US 10,453,533 – Tile floorplan architecture with shared page buffers (Intel/Micron).
- US 7,180,786 – Row decoder and wordline decoder for flash memory.
- US 10,192,626 – Differential storage device for on-NAND power loss protection (Micron).
- US 9,711,229 – Partial block erase techniques for NAND flash.
- US 10,325,665 – Multi-deck erase operations with independent deck floating (Intel, precursor to 144L QLC).
- US 20110090738A1 – Dummy cells in 3D vertical NAND structures.
- US 9,201,788 – Hybrid block scheme for reducing erase operations during SLC-to-TLC folding.
- US 20190102083A1 – Multi-mode NAND configuration (pTLC + SLC buffer + QLC).
- US 9,858,009 – Foldset management for on-die copyback operations.
- US 7,562,189 – Write-in-place memory architecture (3D XPoint related).
Online Resources
- borecraft.com – SSD technical resources, documents, and tools maintained by NewMaxx.
- guide.borecraft.com – Interactive SSD buying guide.
- SSD Spreadsheet – Detailed hardware breakdown of current and recent SSDs.
- r/NewMaxx – Community discussion, SSD guides, and buying advice.
- nvmexpress.org – NVMe specification documents, technical proposals, and ecosystem information.
- jedec.org – JEDEC standards for NAND, DRAM, and SSD testing methodology.
- onfi.org – Open NAND Flash Interface specifications.
- snia.org – Storage Networking Industry Association standards and educational resources.
- computeexpresslink.org – CXL specification and ecosystem documentation.