Deep Research · Platform & I/O

Project Helix
Zstandard, DirectStorage, and Streaming From SSD

The next Xbox is being built around a content pipeline that treats the NVMe SSD less like storage and more like an extension of the GPU's memory hierarchy. Zstandard is the codec that makes that pipeline practical, DirectStorage 1.4 is the runtime that ships it to PC, and the early shipped titles are showing why a fixed-hardware platform can pull this off where a heterogeneous PC install base still struggles.

Published 2026-05-07 Platform: Xbox · Windows 11 12 Sections

Executive Summary

The pitch for DirectStorage in 2021 was faster loading screens. The pitch in 2026 is structurally different. With the 1.4 release announced at GDC 2026 on March 11, Microsoft has reframed the runtime as a real-time content streaming pipeline: NVMe storage holds compressed asset chunks, the runtime moves them in flight, the GPU (or CPU, or platform-specific silicon) decompresses them, and the result lands in VRAM without round-tripping through a CPU staging buffer. Loading screens are a side effect. The actual product is open worlds that don't pause to think.

Three concrete things changed in DirectStorage 1.4. First, Microsoft added Zstandard alongside GDeflate, with both CPU and GPU decompression paths and an open-source GPU compute shader optimized for "content chunked to 256KB or smaller, consistent with modern game packaging patterns for streaming workloads." Second, it shipped the Game Asset Conditioning Library (GACL), which preconditions BC1, BC3, BC4, and BC5 textures so Zstd extracts up to a 50% better ratio out of them. Third, it added DStorageSetConfiguration2 with global D3D12 CreatorID support so the driver can schedule decompression alongside rendering work.

The reason any of this is worth a findings page is that it isn't only a PC initiative. The Xbox Wire announcement on March 11, 2026 commits the next Xbox to a custom AMD SoC with developer alpha hardware in 2027 and an "order of magnitude leap in ray tracing performance." The broader GDC 2026 keynote coverage (tbreak's writeup is the most detailed) adds DirectStorage and Zstd compression to the platform feature list, alongside Neural Texture Compression, GPU Directed Work Graph Execution, and FSR Next-class upscaling. Today's GamingBolt coverage of Jason Ronald's spring Xbox Game Dev Update has him saying Microsoft is "leaning in very heavily" on Zstandard for direct SSD asset streaming. The same content pipeline is being aimed at Xbox, Windows 11, and the handheld and PC-style devices in between.

≤256 KB Chunk Size, GPU Shader
~50% GACL Ratio Lift on BCn
4 BC Formats Supported
H2 2026 Vendor Driver Tuning

The thesis: Microsoft is standardizing the codec, the conditioning, and the runtime so a single asset packaging strategy scales from a fixed Helix console to a heterogeneous PC install base. Zstd is the format the rest of the industry already speaks (Linux kernel, Btrfs, ZFS, package managers, IETF RFC 8878), and the early shipped titles using GPU decompression on PC have made it clear why a tuned, fixed pipeline buys real value over the open one.

⚠️
The "Magnus / Zen 6 / RDNA 5 / FSR Diamond" specification leaks circulating for Helix come from third-party reporting, not Microsoft's announcement. The official platform feature list at GDC 2026 covered DirectX, DirectStorage with Zstd, Neural Texture Compression, GPU Directed Work Graph Execution, and FSR Next+. Specific SoC nomenclature, CPU/GPU microarchitectures, and NPU specs were not disclosed and are treated here as unverified.

Streaming, Not Loading

A modern open-world title at 4K with high-detail textures can need 8 to 16 GB of unique asset data resident at any given time, against a typical PC GPU memory budget of 8 to 16 GB total. The math doesn't allow for "load everything once and hold it." It demands constant eviction and refill, with the working set turning over as the camera moves.

The legacy I/O path makes that turnover expensive. Files were read by Win32 file APIs into pageable system memory, copied into a CPU-side staging buffer, decompressed by a CPU thread (typically zlib- or LZ4-derived), copied again into a GPU upload heap, and finally DMA-ed across PCIe into VRAM. Every step burns CPU cycles, RAM bandwidth, and PCIe transactions. On a 7 GB/s NVMe drive the SSD is rarely the bottleneck; the orchestration around it is.

DirectStorage cuts most of that orchestration. The runtime batches small reads into the queue depth NVMe is built for, bypasses the legacy I/O stack via BypassIO where the driver supports it, and lets compressed payloads land in a GPU-accessible buffer that the decompression shader can read directly. The relevant figure isn't peak bandwidth, it's request rate: an asset streamer paging textures, geometry, and material parameters at LOD granularity is issuing thousands of small reads per second, and the legacy I/O stack falls over at high request rates well before the drive does.

ℹ️
The "I/O Wall" framing is real, but it's a request-rate wall, not a bandwidth wall. A PCIe 4.0 NVMe drive can push 7 GB/s sequential. The same drive can spend that bandwidth servicing one 7 GB read or fourteen thousand 512 KB reads, and the second case is what asset streaming actually looks like. The win from DirectStorage is in the per-request CPU cost, not in raw throughput.

Compression earns its keep here twice. Smaller payloads mean lower install size and less data pulled per request, but they also act as a bandwidth multiplier. A 1.5x average ratio against a 10 GB/s drive yields ~15 GB/s of effective asset throughput at the GPU, and that's before any of the conditioning tricks below. The constraint that matters then becomes how fast the platform can decompress, and what it has to give up to do it.

What Zstandard Actually Is

Zstandard, or Zstd, is a lossless compressor authored by Yann Collet at Meta and released as an open-source reference implementation under a BSD/GPLv2 dual license. The wire format was published by the IETF as RFC 8878 in February 2021, which is the format version Microsoft is referencing in the DirectStorage SDK. Outside of games, Zstd is everywhere worth caring about: it's the default transparent-compression option for Btrfs, a first-class option for ZFS, the format Linux kernel images have been compressed with since 5.9, a supported codec in Hadoop, Kafka, and most modern container registries, and the format the FreeBSD installer ships in.

Why The Format, Specifically

Zstd uses a hybrid of LZ77-family dictionary compression with a finite state entropy coder (Duda's tabled ANS, the same family of entropy coding behind Oodle Kraken on PlayStation). Tabled ANS gets within a fraction of a percent of arithmetic coding's theoretical limit at a fraction of the decode cost, which is the actual reason Zstd decompresses as fast as it does. A modern libzstd reference decoder lands in the 500 MB/s to 2 GB/s range per CPU core for general data, with the SIMD-accelerated paths in recent versions substantially higher. For comparison, zlib's inflate is in the 200 to 400 MB/s range and not particularly SIMD-friendly.

The other practical reason Microsoft is centering Zstd is that it has tunable compression level (1 to 22, plus negative "fast" levels), supports trained dictionaries that can lift small-payload ratios significantly, and can be implemented as a streaming decoder with bounded memory. That last property matters when the runtime is feeding the codec 256 KB chunks instead of multi-megabyte files. Zstd's frame structure cleanly handles small independent frames, which is exactly the granularity the GPU shader operates on.

None of this is to say Zstd dominates GDeflate (the GPU-friendly Deflate variant Microsoft shipped first in DirectStorage 1.1). GDeflate is more aggressively parallelized for SIMT execution and tends to win on raw GPU throughput per watt of compute. The point of adding Zstd alongside it is choice: Zstd has a wider ecosystem, a better CPU decompression story when the GPU is busy, and better tooling, while GDeflate retains the highest GPU-side throughput when the SMs have headroom. A real shipping game will probably use both, dispatched by workload.

DirectStorage 1.4: The Pipeline

DirectStorage 1.4 reached public preview at version 1.4.0-preview1-2603.504 on March 11, 2026, and the DirectX Developer Blog post is the cleanest place to see what Microsoft actually shipped. The runtime exposes file and memory queues, operates on small chunks with priorities and cancellation, and supplies first-class decompression hooks for both Zstd and GDeflate. The headline additions versus 1.2 / 1.3:

Functionally, the pipeline looks like this: an engine submits a request to a DirectStorage queue naming a file, an offset, a length, a destination resource, and a codec. The runtime issues the NVMe read, lands the compressed bytes in a staging area accessible to the GPU, and dispatches the decompression compute shader (or runs it on the CPU if the policy says so). The decompressed bytes end up in the destination GPU resource without an explicit upload heap copy. The CPU's role is queue management, not data shoveling.

The 256 KB Number

The GPU shader's optimization for sub-256 KB chunks isn't arbitrary. It's the size at which a single compute dispatch can decompress the chunk with good occupancy across a typical modern GPU's SM count without spilling enough state to hurt cache behavior, and it's also roughly the granularity an asset streamer wants to issue at: a streaming geometry chunk, an audio bank slice, or a single 4K BC7 texture mip is in that ballpark. A pipeline aimed at sub-file streaming wants the codec, the runtime, and the engine all agreeing on roughly the same chunk shape.

The driver-side optimization for these new paths is staged. AMD, NVIDIA, and Qualcomm all have H2 2026 timelines for shipping driver-level Zstd decompression tuning. Intel's Microsoft DevBlog quote is more open-ended: they will "share performance improvements in the months ahead." Which means that in spring 2026, the runtime is shipped, the shader is shipped, and the actual silicon optimization is still in flight. That has practical consequences, covered below.

GACL and the BCn Problem

Block Compressed (BCn) texture formats are how GPUs actually consume textures in 2026, and they are notoriously hostile to general-purpose compressors. BC1 through BC7 store textures as fixed 16-byte blocks (8 bytes for BC1 / BC4) containing endpoint colors and per-pixel index bits, with the index and endpoint fields tightly bit-packed. The bit-level layout is great for GPU sampling and bad for an entropy coder, because semantically similar pixels (smoothly varying color, repeated texture patterns) end up with bit patterns that look uncorrelated to a byte-oriented dictionary.

The standard answer is to recondition the data before compression: separate the endpoint and index streams, deinterleave the bit fields, pre-apply a delta or shuffle pass, and feed the result to the general-purpose compressor. Each stream then has stronger internal structure, and the compressor can exploit it. Microsoft's GACL automates this for BC formats and adds a machine-learning-guided variant on top.

GACL Technique Microsoft's description Where the lift comes from
Shuffling "Transforms BCn bit streams to promote additional and lower cost matches for Zstd." Each separated bit-stream is more uniform, so the entropy coder sees lower per-symbol entropy.
Block / component entropy reduction Uses "machine learning to improve outcomes" on top of static shuffling. Learns asset-specific structure that hand-tuned shuffles miss; small offline training cost for a runtime-free win.
Inverse shuffle (runtime) "After a Zstd stream is decompressed at runtime, any shuffle transforms applied during content conditioning are seamlessly reversed by DirectStorage." The inverse pass is folded into the decompression dispatch, so engines see normal BCn bytes.
ℹ️
Format coverage in DirectStorage 1.4 is BC1, BC3, BC4, and BC5 only. Microsoft explicitly defers BC7 to a "future DirectStorage update." That's a real constraint: BC7 is the modern high-quality format used for albedo and material maps in most AAA pipelines. Until BC7 lands, GACL's lift applies primarily to BC1 (legacy color), BC3 (color + alpha), BC4 (single-channel), and BC5 (two-channel, common for normal maps). A title that already standardized on BC7 for surface albedo will get the GACL benefit on its normal maps and detail textures, but not on the largest part of its texture budget.

Microsoft's stated lift is "up to a 50% improvement in Zstd compression ratios for your assets," with no runtime cost beyond the inverse shuffle the decompression shader applies after Zstd decode. That's a build-time change, not a runtime one: ship size and bandwidth get smaller, decompression cost stays roughly the same. For a 100 GB texture-heavy install, "up to 50%" is the difference between fitting in a console's working set or not, even when the BC7-shaped portion of the texture budget waits for the next DirectStorage update.

Where Decompression Runs (And What It Costs)

The most-misunderstood part of the DirectStorage pitch is that GPU decompression is free. It isn't. On PC, the Zstd decompression shader runs on the same SMs that draw the frame. Compute time spent decompressing a chunk is compute time not spent on shading, ray tracing, post-processing, or DLSS / FSR upscaling. The cost is real and it matters. The reason developers still come out ahead is that the alternative (a CPU thread doing the decompression and copying the result across PCIe) is much worse, both in absolute throughput and in main-thread frame impact.

DirectStorage 1.4 lets the application place each request on the GPU path, the CPU path, or let the runtime decide. The right answer is workload-dependent. A title with heavy ray tracing on a midrange GPU can be SM-bound during the frame and prefer CPU decompression on idle cores; a title with a strong CPU bound (open-world simulation, AI, physics) wants the GPU path. A title with both bounds wants asynchronous compute and careful queue scheduling. The new DStorageSetConfiguration2 CreatorID is part of how the driver makes that scheduling decision sanely.

Console Asymmetry: Dedicated Decompression Silicon

The PC and console pictures diverge on this exact axis. The Xbox Series X|S Velocity Architecture ships with a hardware decompression block Microsoft has publicly described as equivalent to roughly four to five Zen 2 CPU cores worth of decompression throughput, sitting outside both the CPU and the GPU. The SSD-to-RAM path on those consoles never burns SM time on decompression at all.

Microsoft has not confirmed that Helix carries forward a dedicated decompression block of the same kind, but the precedent is strong, the codec target (Zstd) is amenable to a fixed-function implementation, and the platform incentive is obvious: a console with a fixed AMD SoC can afford a few square millimeters of die into a decompression engine and reclaim significant GPU compute for upscaling, NPU-driven NPC behavior, and physics. On a heterogeneous PC install base, Microsoft cannot rely on dedicated decompression silicon, which is why DirectStorage 1.4 ships GPU and CPU decompression paths by default.

The compute-reclamation argument matters specifically for gaming workloads that are getting hungrier for GPU ML capacity: DLSS 4 / FSR Next+ frame generation, neural texture compression decoders, ML-driven NPC behavior, neural radiance caches. Every milliwatt of GPU compute spent on decompression is one not spent on those, and the platform that gets to spend its GPU on rendering instead of pipeline plumbing wins per-frame budget the others don't have. The next section is about how that abstract argument has already shown up in shipping titles.

The Messy Reality on PC

DirectStorage with GPU decompression isn't theoretical. It's also not yet a clean win. The three most-discussed shipped titles using the GPU path each tell a different story, and together they explain why a tuned, fixed-platform implementation is going to feel so much better than the open PC version when Helix arrives.

Ratchet & Clank: Rift Apart (2023, Nixxes)

The success case. The first PC title to ship with GDeflate GPU decompression, built on DirectStorage 1.2 with the GPU path enabled at high settings for background asset streaming. The dimension-transition sequences (the title's signature mechanic) are the single most cited example of "this is what DirectStorage was built for." NVIDIA's launch case study for GDeflate uses Rift Apart's load-time numbers as the headline.

Marvel's Spider-Man 2 (2025, Nixxes)

The cautionary tale. Spider-Man 2 PC ships with GDeflate GPU decompression enabled, and on the RTX 4090 independent testing found the feature actively hurts performance:

Resolution Effect of disabling DirectStorage on RTX 4090
4K+10% average framerate
1440p+6% average framerate
1080p+3% average framerate
4K (1% lows)+18 to 25%

Tom's Hardware retested on the RTX 5090 and reported the regression had disappeared, with minor gains in some scenes. Their honest framing was that the 5090 is fast enough to absorb the cost regardless, so Blackwell's actual aptitude for the workload is still an open question pending broader testing across the midrange stack.

Resident Evil Requiem (2026, Capcom)

The weird case. RE:Requiem ships with GDeflate-compressed assets and the GPU decompression path enabled, but independent SpecialK traces show the runtime randomly choosing whether to actually use the GPU. On RTX 5090, 5070, and 5060 the GPU path engages; on a 4060 laptop the runtime falls back to CPU decompression despite the GPU fully supporting the feature. No published explanation for the heuristic exists.

⚠️
Two of the three highest-profile DirectStorage GPU-decompression titles in 2025 / 2026 have shipped with measurable problems. Spider-Man 2's regression on a flagship-class GPU and Resident Evil Requiem's inconsistent path selection are both signals that the runtime, the driver, and the engine integration are still maturing on PC. None of these are codec problems. They're integration and policy problems, the exact things a fixed-hardware platform can settle once.

Conspicuously absent from any of this: a public, shipped title using the Zstd path. As of the 1.4 release, Zstd is documented and downloadable but not yet in any retail title's runtime. The first wave of titles built against 1.4 will be the empirical test of whether Zstd's CPU decompression performance and ecosystem fit translate into better real-world streaming behavior than GDeflate's GPU-friendliness. Expect that question to be answered, with measurements, sometime in late 2026 or early 2027.

What Helix Officially Adds

The Xbox Wire post is short on silicon detail and long on platform framing. The wider GDC 2026 keynote coverage fills in the technical feature list. Splitting the two sources cleanly:

Stated in Xbox Wire

Reported from the GDC 2026 keynote

The following features are sourced from third-party coverage of the keynote (tbreak, Tom's Hardware) rather than the Xbox Wire post. Treat as Microsoft's GDC messaging:

What Microsoft did not disclose: the SoC's commercial name, the CPU and GPU microarchitecture generations, the NPU's TOPS budget, the memory architecture (unified vs split, bandwidth, capacity), or whether a dedicated decompression block carries forward from the Velocity Architecture. The "Magnus" / "Zen 6" / "RDNA 5" / "FSR Diamond" specifications widely repeated in the trade press (tbreak and others) come from third-party reporting, not Microsoft's announcement, and should be treated as unverified until a later disclosure confirms them.

The interesting part of Helix isn't the silicon, anyway. It's that Microsoft is shipping the same content pipeline across a fixed-target console and a wildly variable PC install base, and has the ability to tune the pipeline differently at each end while keeping the asset format portable. That's the structural advantage a fixed-hardware platform has always had, and it shows up everywhere asset streaming touches the silicon:

The asset format, though, is the same. A title built against DirectStorage 1.4 with GACL-conditioned Zstd chunks ships one set of asset packages and lets the runtime resolve them differently per platform. That's the standardization play, and the GamingBolt coverage of Ronald's spring Game Dev Update reinforces it: Microsoft is "leaning in very heavily" on Zstd precisely because the codec works the same everywhere it runs. The variable is where the decompression executes, not what gets decompressed.

Advanced Shader Delivery

Asset streaming is one of two pipelines Microsoft has been quietly rebuilding. The other is Advanced Shader Delivery, and it's worth describing here because the same pattern (move work off the client device, ship a smaller payload, decode just-in-time) shows up in both. The two pipelines are independent in the SDK but converge on the same end goal: a frame that doesn't stutter and a session that doesn't pause.

Just-in-time shader compilation has been the single most visible source of stutter in shipping DX12 / Vulkan titles since the API generation that introduced PSOs. Microsoft's August 2025 introduction post describes the fix: collect each title's shader requirements into a State Object Database (SODB) at build time, and on the server side pair the SODB with a specific GPU and driver profile to produce a Precompiled Shader Database (PSDB). When a player downloads the title through the Xbox app on PC, the app picks the PSDB matching the local GPU and driver and ships precompiled binaries instead of source-level shaders. JIT compilation is bypassed entirely.

Avowed (Obsidian, 2025) is Microsoft's headline case study for the pipeline, with engineering teams reporting "launch times reduced by as much as 85%" once Advanced Shader Delivery was applied. The result is a vendor figure attached to the August 2025 announcement, not an independent measurement, and applies to the specific GPU and driver permutations covered by the PSDB.

The GDC 2026 update formalized the developer integration story: AgilitySDK 1.619 ships an App Identity API (apps declare identity to D3D12 before device creation), a Stats API (PSDB cache hit rates exposed to runtime), PIX integration in the May 2026 release showing those stats as real-time counters, and a feature called Partial Graphics Programs that splits pipeline creation in two so titles with very large PSO counts can reuse common graphics-program prefixes. The initial PSDB delivery is debuting on the ROG Xbox Ally and ROG Xbox Ally X, distributed through the Xbox PC app.

The relevance to asset streaming is the topology, not the mechanism. Both pipelines push a pre-conditioned, platform-aware payload from a server (or local SSD) to the device, do a small amount of decode work near the GPU, and avoid running an expensive process on the client at the moment the player needs the result. Together they're the difference between a 2024-era open-world title that hitches into a new biome and a 2026-era one that doesn't.

What the Research Says

GPU decompression and texture conditioning are active research areas, and the published work explains why Microsoft's design choices in DirectStorage 1.4 land where they do.

GPU decompression architecture: CODAG

CODAG (Sitar et al., 2023) is the most useful recent paper on architecting GPU decompression kernels. Its central finding pushes back on a common assumption: prior GPU decompression schemes assigned specialized thread groups to different decoding stages, which left most of the SMs idle waiting on the critical path. CODAG eliminates the specialization, frees compute resources to run more parallel decompression streams, and lets the GPU's hardware scheduling absorb the multi-latency profile of an LZ-family decoder. They report 13.46x and 5.69x speedups over NVIDIA's RAPIDS RLE implementations, plus a 1.18x lift on Deflate, on the codecs they evaluated.

Worth flagging: CODAG benchmarks RLE and Deflate, not Zstd directly. The architectural lesson (skip thread specialization, exploit GPU hardware scheduling) is what carries over to the DirectStorage 1.4 GPU shader, and it's why a 256 KB chunk size is reasonable: small enough that the dispatch can occupy SMs efficiently, large enough that the entropy coder's setup cost amortizes.

Texture compression that targets the right bottleneck

Two recent papers are directly relevant to the Helix-era "Neural Texture Compression" line. Neural Graphics Texture Compression Supporting Random Access (2024) tackles the constraint that makes neural texture codecs hard for real-time rendering: a sampler needs to fetch a single texel without decoding the entire image. The paper pairs a convolutional encoder with a fully connected decoder operating on positional features, so the GPU can sample at run-time from a learned latent representation rather than reconstructing the whole texture upfront. Hardware Accelerated Neural Block Texture Compression with Cooperative Vectors (2025) takes the next step, mapping the decode onto NVIDIA's cooperative-vectors hardware path in a form usable inside the rendering pipeline.

These are the techniques the Helix "Deep Texture Compression" framing is pointing at. Note the layering: Zstd compresses the bitstream of an asset (BCn block bytes, mesh data, audio); a neural texture codec compresses the image content at a higher level, before the bitstream stage. They're complementary, not competing. A future asset pipeline could ship neural-texture-compressed content packaged as Zstd-compressed chunks streamed via DirectStorage, with each layer recovering a different kind of redundancy.

Entropy coding context

The entropy stage at the heart of Zstd is finite state entropy (Yann Collet's tabled-ANS implementation), descending from Duda's 2013 paper on asymmetric numeral systems. The same family underlies Oodle Kraken on PlayStation. ANS reaches within fractions of a percent of arithmetic coding's theoretical optimum at decode speeds substantially closer to Huffman's, which is the property that lets Zstd compete with Deflate on speed while handily beating it on ratio. The state-table form of tANS is also more amenable to fixed-function implementation than arithmetic coding's range arithmetic, which is one reason a console SoC could plausibly bake a Zstd decoder into silicon. That's an inference, not a confirmed Helix design point.

Verdict & Open Questions

Project Helix and DirectStorage 1.4 together are best understood as a single platform bet rather than two adjacent announcements. Microsoft is committing to a content pipeline (Zstd-compressed, GACL-conditioned, small-chunk, GPU-decompressable) that scales from a fixed AMD console to a heterogeneous PC install base, with the same asset packages working in both. The codec choice (Zstd over a proprietary alternative) is a deliberate ecosystem play.

What's solid

  • DirectStorage 1.4 ships Zstd CPU and GPU paths, with an open-source GPU shader optimized for chunks of 256 KB or smaller, and a new DStorageSetConfiguration2 CreatorID for driver-side scheduling.
  • GACL delivers up to 50% better Zstd ratio on BC1, BC3, BC4, and BC5 textures, with the inverse shuffle folded into the decompression dispatch. BC7 conditioning is in flight, not shipped.
  • Advanced Shader Delivery is shipping with the AgilitySDK 1.619 toolchain, debuting on the ROG Xbox Ally / Ally X, with Avowed's 85% load-time reduction as the headline case study.
  • Project Helix is officially confirmed as a custom AMD SoC console with developer alpha in 2027, and DirectStorage + Zstd is on the GDC 2026 platform-feature list.

What's still open

  • The widely repeated Helix specs (Magnus / Zen 6 / RDNA 5 / FSR Diamond) are third-party reporting, not Microsoft's announcement. Treat as unverified.
  • Whether Helix carries forward a Velocity-Architecture-style dedicated decompression block. Microsoft hasn't said. The precedent and incentive are both strong; the silicon area cost is modest. Best guess: yes, but unconfirmed.
  • Whether Zstd or GDeflate becomes the primary GPU codec in shipping titles. Likely both, with Zstd preferred when CPU decompression is acceptable and GDeflate preferred when SM headroom allows.
  • How 2026 PC titles handle the integration challenges that Spider-Man 2 and Resident Evil Requiem have surfaced. Both are policy / runtime issues, not codec issues, but they're affecting players right now.
  • BC7 conditioning support and its arrival timeline. The largest portion of a modern AAA texture budget is BC7, and GACL doesn't help it yet.

The thesis, restated

The SSD is becoming an extension of the GPU's memory hierarchy. Zstd is the codec that makes the cost of using it acceptable. DirectStorage 1.4 is the runtime that lets engines reach it at the request rates and chunk sizes modern asset streamers actually issue. Project Helix is the platform where Microsoft can tune the whole path end-to-end while shipping the same asset format everywhere else. None of the pieces are individually new. The bet is on shipping them as a coherent whole, and on closing the gap between "what the runtime can do" and "what the runtime does on a given player's machine."

Sources & Further Reading

Primary Microsoft sources

Reporting

Zstandard as a format

Scholarly references (verified, on arxiv)

NVIDIA Technical Blog (GDeflate)