Samsung HBM2E ‘Flashbolt’ Memory for GPUs: 16 GB Per Stack, 3.2 Gbps

by Anton Shilov on March 20, 2019 11:00 AM EST

25 Comments | Add A Comment

25 Comments

Samsung has introduced the industry’s first memory that correspond to the HBM2E specification. The company’s new Flashbolt memory stacks increase performance by 33% and offer double per-die as well as double per-package capacity. Samsung introduced its HBM2E DRAMs at GTC, a fitting location since NVIDIA is one of the biggest HBM2 consumers due to their popular GV100 processor.

Samsung’s Flashbolt KGSDs (known good stacked die) are based on eight 16-Gb memory dies interconnected using TSVs (through silicon vias) in an 8-Hi stack configuration. Every Flashbolt package features a 1024-bit bus with a 3.2 Gbps data transfer speed per pin, thus offering up to 410 GB/s of bandwidth per KGSD.

Samsung positions its Flashbolt KGSDs for next-gen datacenter, HPC, AI/ML, and graphics applications. By using four Flashbolt stacks with a processor featuring a 4096-bit memory interface, developers can get 64 GB of memory with a 1.64 TB/s peak bandwidth, something that will be a great advantage for capacity and bandwidth-hungry chips. With two KGSDs they get 32 GB of DRAM with an 820 GB/s peak bandwidth.

Samsung's HBM2 Memory Comparison
	Flashbolt	Aquabolt	Flarebolt
Total Capacity	16 GB	8 GB	8 GB	4 GB	8 GB	4 GB
Bandwidth Per Pin	3.2 Gb/s	2.4 Gb/s	2 Gb/s	2 Gb/s	1.6 Gb/s	1.6 Gb/s
Number of DRAM ICs per Stack	8	8	8	4	8	4
DRAM IC Process Technology	?	20 nm
Effective Bus Width	1024-bit
Voltage	?	1.2 V	1.35 V		1.2 V
Bandwidth per Stack	410 GB/s	307.2 GB/s	256 GB/s		204.8 GB/s

To increase DRAM transfer speed per pin to 3.2 Gbps, Samsung probably had to employ various methods to reduce collateral clock interference between the 5000+ TSVs and ensure clean signals, yet the company does not discuss this in its current announcement. Last year the company did disclose some of the tricks used by its Aquabolt HBM2 DRAMs to increase bandwidth per pin to 2.4 Gbps, so most of these methods have probably evolved in case of the Flashbolt.

In fact, Samsung’s announcement does not state that that the company has started mass production of its Flashbolt HBM2E memory, so it looks like the company has finished development of the technology, but is not yet ready to start shipments of such chips in mass quantities.

25 Comments

View All Comments

marees - Thursday, March 21, 2019 - link
Arcturus is rumoured to be the GPU inside XBox Two
marees - Thursday, March 21, 2019 - link
Here is the source of the rumour

Arcturus is a GPU not an architecture

It may or may not be post GCN
If it is inside XBOX it is extremely unlikely to have HBM due to cost reasons

https://www.reddit.com/r/Amd/comments/ag3ufk/radeo...
darkswordsman17 - Friday, March 22, 2019 - link
There's been a lot of rumors. If I remember some AMD people have said Navi is a new architecture, but supposedly the Navi we'll be getting in Navi 10 in dGPU isn't "real Navi" that was developed for Sony for the PS5.

And so it seems likely that Navi 10 will be an evolution of Vega but designed solely for 7nm, and featuring significant changes. I'm not sure that "GCN based" is that bad if they fix some of the issues they're alleged to (like improving geometry throughput, both through the base hardware units - think there's patents suggesting a 50% improvement per SP, going from 4 to 6, but also in software with Navi allegedly enabling the "NGG Fastpath" that was slated for Vega; there's been suggestions that the ratio of various components that has existed in GCN won't be there either so we should probably see higher ROPs relative to shader count than GCN had). But I'd expect that to some extent, future GPUs will be built on previous ones. That's not to say they can't make big improvements, but I think the talk of "new architecture vs GCN" is somewhat meaningless (as some GCN changes were big enough to be considered new architectures - Vega for instance was quite a big change and features a lot more programmability than previous GCN, but was built around the GCN ratios - which I think was a mistake and seems like Navi is going to be Vega but not built around GCN ratios).
Xajel - Wednesday, March 20, 2019 - link
I wonder when will Samsung release the long announced LCHBM chips? are they targeting HBM3 also with them?
Teckk - Wednesday, March 20, 2019 - link
Curious, the spec for HBM2E won't have the requirements for Voltage or is that not it's place?
wumpus - Wednesday, March 20, 2019 - link
Nice. But nothing to indicate that it isn't limited to high-price systems like nvidia's HPC units.

If Intel/Micron ever get around to trying to make Optane a DRAM replacement, you'll likely want something like this as cache.
tmnvnbl - Wednesday, March 20, 2019 - link
Doubling the density is a huge feat for HBM memories. Because of the close integration with the GPU on the interposer, you are limited to 4 stacks of HBM because of space. Because of the TSVs, you are limited to stacking 8 dies for now. So any capacity boost needs to come from the dies itself.
I am wondering which process they used, and if the dies are a lot larger now.

Also, it seems they improved the IO speed without any architectural changes. Then this must mean they also increased the internal core memory clock to fairly high speeds, pushing into GDDR core clock speeds. So where HBM could relax the need for high internal clock speeds, I guess Samsung just wants more BW at any cost. I wonder what this means for the memory core energy use and temperature in the HBM dies.
mczak - Wednesday, March 20, 2019 - link
I think all their 16gb dram chips use their 18nm process. And yes density for this increased by 33% or so over the older 20nm tech, whereas capacity doubled, so the dies should be a fair bit larger. Although I thought that the hbm2 size was actually quite a bit larger than the samsung dram die size for some reason, so maybe it didn't grow that much...
Santoval - Thursday, March 21, 2019 - link
Huge feat and frankly quite unexpected, as that capacity was expected for HBM3. It doesn't appear that HBM2E is even an official JEDEC spec though. I couldn't find anything about it after a quick googling.
ksec - Wednesday, March 20, 2019 - link
Is this the first time we broke the 1TB/s Memory bandwidth? Or has this been done before theoretically?

And what happened to HBM3 and HBM4, I think both were announced ( Not Shipping but intention to develop ) quite a while ago.

Samsung HBM2E ‘Flashbolt’ Memory for GPUs: 16 GB Per Stack, 3.2 Gbps

Post Your Comment

25 Comments

View All Comments

marees - Thursday, March 21, 2019 - link

marees - Thursday, March 21, 2019 - link

darkswordsman17 - Friday, March 22, 2019 - link

Xajel - Wednesday, March 20, 2019 - link

Teckk - Wednesday, March 20, 2019 - link

wumpus - Wednesday, March 20, 2019 - link

tmnvnbl - Wednesday, March 20, 2019 - link

mczak - Wednesday, March 20, 2019 - link

Santoval - Thursday, March 21, 2019 - link

ksec - Wednesday, March 20, 2019 - link

Log in

Don't have an account? Sign up now