Balancing The System With Other Hardware Features

The biggest technological advantage consoles have over PCs is that consoles are a fully-integrated fixed platform specified by a single manufacturer. In theory, the manufacturer can ensure that the system is properly balanced for the use case, something PC OEMs are notoriously bad at. Consoles generally don't have the problem of wasting a large chunk of the budget on a single high-end component that the rest of the system cannot keep up with, and consoles can more easily incorporate custom hardware when suitable off-the-shelf components aren't available. (This is why the outgoing console generation didn't use desktop-class CPU cores, but dedicated a huge amount of the silicon budget to the GPUs.)

By now, PC gaming has thoroughly demonstrated that increasing SSD speed has little or no impact on gaming performance. NVMe SSDs are several times faster than SATA SSDs on paper, but for almost all PC games that extra performance goes largely unused. In part, this is due to bottlenecks elsewhere in the system that are revealed when storage performance is fast enough to no longer be a serious limitation. The upcoming consoles will include a number of hardware features designed to make it easier for games to take advantage of fast storage, and to alleviate bottlenecks that would be troublesome on a standard PC platform. This is where the console storage tech gets actually interesting, since the SSDs themselves are relatively unremarkable.

Compression: Amplifying SSD Performance

The most important specialized hardware feature the consoles will include to complement storage performance is dedicated data decompression hardware. Game assets must be stored on disk in a compressed form to keep storage requirements somewhat reasonable. Games usually rely on multiple compression methods—some lossy compression methods specialized for certain types of data (eg. audio and images), and some lossless general-purpose algorithm, but almost everything goes through at least one compression method that is fairly computationally complex. GPU architectures have long included hardware to handle decoding video streams and support simple but fast lossy texture compression methods like S3TC and its successors, but that leaves a lot of data to be decompressed by the CPU. Desktop CPUs don't have dedicated decompression engines or instructions, though many instructions in the various SIMD extensions are intended to help with tasks like this. Even so, decompressing a stream of data at several GB/s is not trivial, and special-purpose hardware can do it more efficiently while freeing up CPU time for other tasks. The decompression offload hardware in the upcoming consoles is implemented on the main SoC so that it can unpack data after it traverses the PCIe link from the SSD and resides in the main RAM pool shared by the GPU and CPU cores.

Decompression offload hardware like this isn't found on typical desktop PC platforms, but it's hardly a novel idea. Previous consoles have included decompression hardware, though nothing that would be able to keep pace with NVMe SSDs. Server platforms often include compression accelerators, usually paired with cryptography accelerators: Intel has done such accelerators both as discrete peripherals and integrated into some server chipsets, and IBM's POWER9 and later CPUs have similar accelerator units. These server accelerators more comparable to what the new consoles need, with throughput of several GB/s.

Microsoft and Sony each have tuned their decompression units to fit the performance expected from their chosen SSD designs. They've chosen different proprietary compression algorithms to target: Sony is using RAD's Kraken, a general-purpose algorithm which was originally designed to be used on the current consoles with relatively weak CPUs but vastly lower throughput requirements. Microsoft focused specifically on texture compression, reasoning that textures account for the largest volume of data that games need to read and decompress. They developed a new texture compression algorithm and dubbed it BCPack in a slight departure from their existing DirectX naming conventions for texture compression methods already supported by GPUs.

Compression Offload Hardware
  Microsoft
Xbox Series X
Sony
Playstation 5
Algorithm BCPack Kraken (and ZLib?)
Maximum Output Rate 6 GB/s 22 GB/s
Typical Output Rate 4.8 GB/s 8–9 GB/s
Equivalent Zen 2 CPU Cores 5 9

Sony states that their Kraken-based decompression hardware can unpack the 5.5GB/s stream from the SSD into a typical 8-9 GB/s of uncompressed data, but that can theoretically reach up to 22 GB/s if the data was redundant enough to be highly compressible. Microsoft states their BCPack decompressor can output a typical 4.8 GB/s from the 2.4 GB/s input, but potentially up to 6 GB/s. So Microsoft is claiming slightly higher typical compression ratios, but still a slower output stream due to the much slower SSD, and Microsoft's hardware decompression is apparently only for texture data.

The CPU time saved by these decompression units sounds astounding: the equivalent of about 9 Zen 2 CPU cores for the PS5, and about 5 for the Xbox Series X. Keep in mind these are peak numbers that assume the SSD bandwidth is being fully utilized—real games won't be able to keep these SSDs 100% busy, so they wouldn't need quite so much CPU power for decompression.

The storage acceleration features on the console SoCs aren't limited to just compression offload, and Sony in particular has described quite a few features, but this is where the information released so far is really vague, unsatisfying and open to interpretation. Most of this functionality seems to be intended to reduce overhead, handling some of the more mundane aspects of moving data around without having to get the CPU involved as often, and making sure the hardware decompression process is invisible to the game software.

DMA Engines

Direct Memory Access (DMA) refers to the ability for a peripheral device to read and write to the CPU's RAM without the CPU being involved. All modern high-speed peripherals use DMA for most of their communication with the CPU, but that's not the only use for DMA. A DMA Engine is a peripheral device that exists solely to move data around; it usually doesn't do anything to that data. The CPU can instruct the DMA engine to perform a copy from one region of RAM to another, and the DMA engine does the rote work of copying potentially gigabytes of data without the CPU having to do a mov (or SIMD equivalent) instruction for every piece, and without polluting CPU caches. DMA engines can also often do more than just offload simple copy operations: they commonly support scatter/gather operations to rearrange data somewhat in the process of moving it around. NVMe already has features like scatter/gather lists that can remove the need for a separate DMA engine to provide that feature, but the NVMe commands in these consoles are acting mostly on compressed data.

Even though DMA engines are a peripheral device, you usually won't find them as a standalone PCIe card. It makes the most sense for them to be as close to the memory controller as possible, which means on the chipset or on the CPU die itself.The PS5 SoC includes a DMA engine to handle copying around data coming out of the compression unit. As with the compression engines, this isn't a novel invention so much as a feature missing from standard desktop PCs, which means it's something custom that Sony has to add to what would otherwise be a fairly straightforward AMD APU configuration.

IO Coprocessor

The IO complex in the PS5's SoC also includes a dual-core processor with its own pool of SRAM. Sony has said almost nothing about the internals of this: Mark Cerny describes one core as dedicated to SSD IO, allowing games to "bypass traditional file IO", while the other core is described simply as helping with "memory mapping". For more detail, we have to turn to a patent Sony filed years ago, and hope it reflects what's actually in the PS5.

The IO coprocessor described in Sony's patent offloads portions of what would normally be the operating system's storage drivers. One of its most important duties is to translate between various address spaces. When the game requests a certain range of bytes from one of its files, the game is looking for the uncompressed data. The IO coprocessor figures out which chunks of compressed data are needed and sends NVMe read commands to the SSD. Once the SSD has returned the data, the IO coprocessor sets up the decompression unit to process that data, and the DMA engine to deliver it to the requested locations in the game's memory.

Since the IO coprocessor's two cores are each much less powerful than a Zen 2 CPU core, they cannot be in charge of all interaction with the SSD. The coprocessor handles the most common cases of reading data, and the system falls back to the OS running on the Zen 2 cores for the rest. The coprocessor's SRAM isn't used to buffer the vast amounts of game data flowing through the IO complex; instead this memory holds the various lookup tables used by the IO coprocessor. In this respect, it is similar to an SSD controller with a pool of RAM for its mapping tables, but the job of the IO coprocessor is completely different from what an SSD controller does. This is why it will be useful even with aftermarket third-party SSDs.

Cache Coherency

The last somewhat storage-related hardware feature Sony has disclosed is a set of cache coherency engines. The CPU and GPU on the PS5 SoC share the same 16 GB of RAM, which eliminates the step of copying assets from main RAM to VRAM after they're loaded from the SSD and decompressed. But to get the most benefit from the shared pool of memory, the hardware has to ensure cache coherency not just between the several CPU cores, but also with the GPU's various caches. That's all normal for an APU, but what's novel with the PS5 is that the IO complex also participates. When new graphics assets are loaded into memory through the IO complex and overwrite older assets, it sends cache invalidation signals to any relevant caches—to discard only the stale data, rather than flush the entire GPU caches.

What about the Xbox Series X?

There's a lot of information above about the Playstation 5's custom IO complex, and it's natural to wonder whether the Xbox Series X will have similar capabilities or if it's limited to just the decompression hardware. Microsoft has lumped the storage-related technologies in the new Xbox under the heading of "Xbox Velocity Architecture":

Microsoft defines this as having four components: the SSD itself, the compression engine, a new software API for accessing storage (more on this later), and a hardware feature called Sampler Feedback Streaming. That last one is only distantly related to storage; it's a GPU feature that makes partially resident textures more useful by allowing shader programs to keep a record of which portions of a texture are actually being used. This information can be used to decide what data to evict from RAM and what to load next—such as a higher-resolution version of the texture regions that are actually visible at the moment.

Since Microsoft doesn't mention anything like the other PS5 IO complex features, it's reasonable to assume the Xbox Series X doesn't have those capabilities and its IO is largely managed by the CPU cores. But I wouldn't be too surprised to find out the Series X has a comparable DMA engine, because that's kind of feature has historically shown up in many console architectures.

SSD Details: Xbox Series X and Playstation 5 What To Expect From Next-gen Games
Comments Locked

200 Comments

View All Comments

  • eddman - Monday, June 15, 2020 - link

    Yes, I added the CPU in the paths simply because the data goes through the CPU complex, but not necessarily through the cores.

    "Data coming in from the SSD can be forwarded .... to the GPU (P2P DMA)"

    You mean the data does not go through system RAM? The CPU still has to process the I/O related operations, right?

    It seems nvidia has tackled this issue with a proprietary solution for their workstation products:
    https://developer.nvidia.com/gpudirect
    https://devblogs.nvidia.com/gpudirect-storage/

    They talk about the data path between GPU and storage.

    "The standard path between GPU memory and NVMe drives uses a bounce buffer in system memory that hangs off of the CPU.

    GPU DMA engines cannot target storage. Storage DMA engines cannot target GPU memory through the file system without GPUDirect Storage.

    DMA engines, however, need to be programmed by a driver on the CPU."

    Maybe MS' DirectStorage is similar to nvidia's solution.
  • Oxford Guy - Monday, June 15, 2020 - link

    "Consoles" are nothing more than artificial walled software gardens that exist because of consumer stupidity.

    They offer absolutely nothing the PC platform can't offer, via Linux + Vulkan + OpenGL.

    Period.
  • Oxford Guy - Monday, June 15, 2020 - link

    "but also going a step beyond the PC market to get the most benefit out of solid state storage."

    In order to justify their existence. Too bad it doesn't justify it.

    It's more console smoke and mirrors. People fall for it, though.
  • Oxford Guy - Monday, June 15, 2020 - link

    Consoles made sense when personal computer hardware was too expensive for just playing games, for most consumers.

    Back in the day, when real consoles existed, even computer expansion modules didn't take off. Why? Cost. Those "consoles" were really personal computers. All they needed was a keyboard, writable storage, etc. But, people didn't upgrade ANY console to a computer in large numbers. Even the NES had an expansion port on the bottom that sat unused. Lots of companies had wishful thinking about turning a console into a PC and some of them used that in marketing and were sued vaporware/inadequateanddelayedware (Intellivision).

    Just the cost of adding a real keyboard was more than consumers were willing to pay. Even inexpensive personal computers (PCs!) had chicklet keyboards, like the Atari 400. That thing cost a lot to build because of the stricter EMI emissions standards of its time but Atari used a chicklet keyboard anyway to save money. Sinclair also used them. Many inexpensive "home" computers that had full-travel keyboards were so mushy they were terrible to use. Early home PCs like the VideoBrain highlight just how much companies tried to cut corners just on the keyboard.

    Then, there is the writable storage. Cassettes were too slow and were extremely unreliable. Floppy drives were way too expensive for most PC consumers until the Apple II (where Wozniak developed a software controller to reduce cost a great deal vs. a mechanical one). They remained too expensive for gaming boxes, with the small exception of the shoddy Famicom FDS in Japan.

    All of these problems were solved a long time ago. Writable storage is dirt cheap. Keyboards are dirt cheap. Full-quality graphics display hardware is dirt cheap (as opposed to the true console days when a computer with more pixels/characters would cost a bundle and "consoles" would have much less resolution).

    The only thing remaining is the question: "Is the PC software ecosystem good enough". The answer was a firm no when it was Windows + DirectX. Now that we have Vulkan, though, there is no need for DirectX. Developers can use either the low-latency lower-level Vulkan or the high-level OpenGL, depending upon their needs for specific titles. Consumers and companies don't have to pay the Microsoft tax because Linux is viable.

    There literally is no credible justification for the existence of non-handheld "consoles" anymore. There hasn't been for some time now. The hardware is the same. In the old days a console would have much less RAM memory, due to cost. It would have much lower resolution, typically, due to cost. It wouldn't have high storage capacity, due to cost.

    All of that is moot. There is NOT ONE IOTA of difference between today's "console" and a PC. The walled software garden can evaporate. All it takes is Dorothy to use her bucket of water instead of continuing to drink the Kool-Aid.
  • Oxford Guy - Monday, June 15, 2020 - link

    Back in the day:

    A console had:

    much lower-resolution graphics, designed for TV sets at low cost
    much less RAM
    no floppy drive
    no keyboard
    no hard disk

    A quality personal computer had:

    more RAM, plus expansion (except for Jobs perversities like the original Mac)
    80 column character-based or, later, high-resolution bitmapped monitor graphics
    (there were some home PCs that used televisions but had things like disk drives)
    floppy drive support
    hard disk support (except, again, for the first Mac, which was a bad joke)
    a full-travel full-size non-mushy keyboard
    expansion slots (typically — not the first Mac!)
    an operating system and first-party software (both of which cost)
    thick paperbook manuals
    typically, a more powerful CPU (although not always)

    Today:

    A console has:

    Nothing a PC doesn’t have except for a stupid walled software garden.

    A PC has:

    Everything a console has except for the ludicrous walled software garden, a thing that offers no added value for consumers — quite the opposite.
  • Oxford Guy - Monday, June 15, 2020 - link

    The common claim that "consoles" of today offer more simplicity is a lie, too.

    In the true console days, you'd stick a cartridge in, turn on the power, and press start.

    Today, just as with the "PC" (really the same thing) — you have a complex operating system that needs to be patched relentlessly. You have games that have to be patched relentlessly. You have microtransactions. You have log-ins/accounts and software stores. Literally, you have games on disc that you can't even play until you patch the software to be compatible with the latest OS DRM. Developers also helpfully use that as an opportunity to drastically change gameplay (as with PS3 Topspin) and you have no choice in the matter. Remember, it's always an "upgrade".

    The hardware is identical. Even the controllers, once one of the few advantages of consoles (except for some, like the Atari 5200, which were boneheaded), are the same. They use the same USB ports and such. There is no difference. Even if there were, the rise of Chinese manufacturing and the Internet means you could get a cheap and effective adapter with minimal fuss.

    You want fast storage so badly? You can get it on the PC. You want software that is honed to be fast and efficient? Easily done. It's all x86 stuff.

    Give me justified elaborate custom chips (not frivolous garbage like Apple's T2), truly novel form factors that are needed for special gameplay, and things like that and then, maybe, you might be able to sell to people on the higher end of the Bell curve.

    If I were writing an article on consoles I'd use a headline something like this: "Consoles of 2020: The SSD Speed Gimmick — Betting on the Bell Curve"

    It would be bad enough if there were only one extra stupid walled garden (beyond Windows + DirectX). But to have three is even more irksome.
  • edzieba - Monday, June 15, 2020 - link

    "partially resident textures"

    Megatexturing is back!

    "The most interesting bit about DirectStorage is that Microsoft plans to bring it to Windows, so the new API cannot be relying on any custom hardware and it has to be something that would work on top of a regular NTFS filesystem. "

    The latter does not imply the former. API support just means that the API calls will not fail. It doesn't mean they will be as fast as a system using dedicated hardware to handle those calls. Just like with DXR: you can easily support DXR calls on a GPU without dedicated BVH traversal hardware, they'll just be as slow as unaccelerated raytracing has always been.
    Soft API support for DirectStorage makes sense to aid in Microsoft's quest for 'cross play' between PC and XboX. If the same API calls can be used for both developers are more likely to work into implementing DirectStorage. As long as DirectStorage doesn't have too large a penalty when used on PC without dedicated hardware, the reduction in dev overhead is attractive.
  • eddman - Monday, June 15, 2020 - link

    "The latter does not imply the former. API support just means that the API calls will not fail. It doesn't mean they will be as fast as a system using dedicated hardware to handle those calls."

    True, but apparently nvidia's GPUDirect Storage, which enables direct transfer between GPU and storage, is a software only solution and doesn't require specialized hardware.

    If that's the case, then there's a good chance MS' DirectStorage is a software solution too.

    AFA I can tell, the custom I/O chips in XSX and PS5 are used for compressing the assets to increase the bandwidth, not enable direct GPU-to-storage access.

    We'll know soon enough.
  • ichaya - Monday, June 15, 2020 - link

    You have to ask: What is causing low FPS for current gen games? I think loading textures are by far the largest culprit, and even in cases where it's only a few levels or a few sections of a few levels, it does affect the overall immersion and playability of games where all of this storage tech should help.
  • Oxford Guy - Monday, June 15, 2020 - link

    I love how people forget how there is fast storage available on the "PC" (in quotes because, except for the Switch, these Sony/MS "consoles" are PCs with smoke and mirrors trickery to disguise that fact — the fact that all they are are stupidity taxes).

    Yes, stupidity taxes. That's exactly what "consoles" are, except for the Switch, which has a form factor that differs from PC.

Log in

Don't have an account? Sign up now