AMD Announces Radeon Instinct MI60 & MI50 Accelerators: Powered By 7nm Vegaby Ryan Smith on November 6, 2018 4:00 PM EST
- Posted in
- Machine Learning
- AMD Instinct
As part of this morning’s Next Horizon event, AMD formally announced the first two accelerator cards based on the company’s previously revealed 7nm Vega GPU. Dubbed the Radeon Instinct MI60 and Radeon Instinct MI50, the two cards are aimed squarely at the enterprise accelerator market, with AMD looking to significantly improve their performance competitiveness in everything from HPC to machine learning.
Both cards are based on AMD’s 7nm GPU, which although we’ve known about at a high level for some time now, we’re only finally getting some more details on. GPU is based on a refined version of AMD’s existing Vega architecture, essentially adding compute-focused features to the chip that are necessary for the accelerator market. Interestingly, in terms of functional blocks here, 7nm Vega is actually rather close to the existing 14nm “Vega 10” GPU: both feature 64 CUs and HBM2. The difference comes down to these extra accelerator features, and the die size itself.
With respect to accelerator features, 7nm Vega and the resulting MI60 & MI50 cards differentiates itself from the previous Vega 10-powered MI25 in a few key areas. 7nm Vega brings support for half-rate double precision – up from 1/16th rate – and AMD is supporting new low precision data types as well. These INT8 and INT4 instructions are especially useful for machine learning inferencing, where high precision isn’t necessary, with AMD able to get up to 4x the perf of an FP16/INT16 data type when using the smallest INT4 data type. However it’s not clear from AMD’s presentation how flexible these new data types are – and with what instructions they can be used – which will be important for understanding the full capabilities of the new GPU. All told, AMD is claiming a peak throughput of 7.4 TFLOPS FP64, 14.7 TFLOPS FP32, and 118 TOPS for INT4.
7nm Vega also buffs up AMD’s memory capabilities. The GPU adds another pair of HBM2 memory controllers, giving it 4 in total. Combined with a modest increase in memory clockspeeds to 2Gbps, and AMD now has a full 1TB/sec of memory bandwidth in the GPU’s fastest configuration. This is even more than NVIDIA’s flagship GV100 GPU, giving AMD the edge in bandwidth. Meanwhile as this is an enterprise-focused GPU, it offers end-to-end ECC, marking the first AMD GPU to offer complete ECC support in several years.
The enterprise flourishes also apply to 7nm Vega’s I/O options. On the PCIe front, AMD has revealed that the GPU supports the recently finalized PCIe 4 standard, which doubles the amount of memory bandwidth per x16 slot to 31.5GB/sec. However AMD isn’t stopping there. The new GPU also includes a pair of off-chip Infinity Fabric links, allowing for the Radeon Instinct cards to be directly connected to each other via the coherent links. I’m still waiting for a confirmed breakdown on the numbers, but it looks like each link supports 50GB/sec down and 50GB/sec up in bandwidth.
Notably, since there are only 2 links per GPU, AMD’s topology options will be limited to variations on rings. So GPUs in 4-way configurations won’t all be able to directly address each other. Meanwhile AMD is still sticking with PCIe cards as their base form factor here – no custom mezzanine-style cards like NVIDIA – so the cards are connected via a bridge on the top. Meanwhile backhaul to the CPU (AMD suggests an Epyc, of course) is handled over PCIe 4.
Finally, looking at the GPU itself, it’s interesting to note just how small it is. Because AMD didn’t significantly bulk up the GPU on CUs, thanks to the 7nm process the new GPU is actually a good bit smaller than the original 484mm2 Vega 10 GPU. The new GPU comes in at 331mm2, packing in 13.2B transistors. Though it should be noted that AMD’s performance estimates are realistically conservative here; while 7nm does bring power consumption down, AMD is still only touting >1.25x performance of MI25 at the same power consumption. The true power in the new cards lies in their new features, rather than standard FP16/FP32 calculations that the existing MI25 card was already geared for.
Wrapping things up, Radeon Instinct MI60 will be shipping in Q4 of this year. AMD has not announced a price, but as a cutting-edge 7nm GPU, don’t expect it to be cheap. MI60 will then be followed by MI50 in Q1 of next year, giving AMD’s customers a second, cheaper option to access 7nm Vega.
Post Your CommentPlease log in or sign up to comment.
View All Comments
shabby - Tuesday, November 6, 2018 - link3rd slide shows Nfinity Fabric? Is that a typo or is amd trolling nvidia?
Samus - Thursday, November 8, 2018 - linkLOL good catch, and hilarious!
FredWebsters - Wednesday, December 26, 2018 - linkI think that it is a good catch too!
olafgarten - Tuesday, November 6, 2018 - linkIt will be interesting to see if AMD can gain any traction in HPC considering how CUDA is deeply ingrained in most applications.
Yojimbo - Tuesday, November 6, 2018 - linkIf AMD gain any traction it won't be with this product. Even not considering the CUDA moat, this AMD card will be "available" in Q4. In fact, they won't have any significant volume with it then. But regardless, they are targeting data centers and since AMD doesn't have wide-scale deployment of their GPU tech in data centers now, there would be a lengthy verification process before full deployment. Once that would happen, and it would probably be at least 6 months, AMD would, again throwing out NVIDIA's superior software library support and any application code users may already have optimized for NVIDIA's hardware, perform roughly equal to a card that NVIDIA would have had in general availability for over a year and a half. Roughly equal, that is, except for AI training workloads where NVIDIA's Tensor Cores would give NVIDIA an advantage. Furthermore, by the time all that happens NVIDIA will be close to releasing their next-generation data center compute card on 7 nm, which could arrive in late 2019 or in 2020 (in terms of availability. they may announce something at GTC San Jose in March or whenever it will be held, but it wouldn't have real availability until months later, much like this AMD MI60 card). NVIDIA, already having their GPUs in data centers, can get their products verified much faster. This MI60 card might end up having to go toe-to-toe with NVIDIA's 7 nm card, in which case there will be no contest, both in hardware capabilities and software support.
ABR - Wednesday, November 7, 2018 - linkCould AMD just implement CUDA, or are there copyright issues there?
Bulat Ziganshin - Wednesday, November 7, 2018 - linkcuda provides access to specifics of geforce architecture. f.e. in lot of places it depends on 32-wide warps. opencl tries to hide GPU arch differences, so it's more universal
zangheiv - Wednesday, November 7, 2018 - linkWrong. And I beg to differ:
1) Nvidia's Volta V100 has similar FP32 and FP16 and AI deep learning performance but Volta is 21B transistors compared to 13B on Vega. Obviously Volta v100 cannot die-shrink to 7nm easily and feasibly.
2) MI60 is for training and performs slightly better than Volta. It goes head-to-head with Volta in training use-cases. Inference is where tensor cores take over.
3) CUDA with AI is not difficult. We're not talking about a game optimization engine and all that fancy geometry and architecture-dependant draw-calls etc. It's AI training and it's quite straight-forward if you have all the math libraries which currently exist with OpenCL and ROCm.
4) CUDA moat you talk about will be the rope around Nvidia's neck. Even Jensen knows that opensource is the future. Intel will also use OpenCL, Xilinx that currently holds the world record in Inference uses Open CL. Google uses OpenCL. MacOS, Microsoft literally everyone. Android is already Vulkan.
5) Currently AMD doesn't need tensor-cores, Xilinx already has that covered. MI60 and Xilinx solutions are way more cost-effective. Not because margins are lower but because 21B monolith V100 is super expensive to produce.
6) MI60 will most certainly gain traction. I'm certain AMD knows more about their customers and the market than you do.
7) Rome and MI60 will use HSA. This is AMD specific and requires proprietary logic within the CPU and GPU. For large-scale simulation use-cases AMD has a definitive advantage with that.
8)You forgot hardware virtualization. This is unique to AMD's solution.
Point is, there's A LOT of things MI60 does better. And architecturally Vega20 is clearly superior in the sense of a smaller footprint, better efficiency and better yield.
Yojimbo - Thursday, November 8, 2018 - link"1) Nvidia's Volta V100 has similar FP32 and FP16 and AI deep learning performance but Volta is 21B transistors compared to 13B on Vega. Obviously Volta v100 cannot die-shrink to 7nm easily and feasibly."
It's on 7 nm. It can run at faster clocks while using less power. I'm also curious, did they give us performance per watt comparisons? NVIDIA could shrink Volta, but why would they? NVIDIA will introduce a new architecture on 7 nm. As far as the 21B transistors, that is including 3 NVLink controllers.
"2) MI60 is for training and performs slightly better than Volta. It goes head-to-head with Volta in training use-cases. Inference is where tensor cores take over."
We don't have benchmarks that show that. You think AMD's selected benchmarks mean anything for practical real-world training runs? And even AMD's benchmarks aren't making such a claim, since they are not using the Tensor Cores for the V100. V100 Tensor Cores are for training. This shows you don't know what you are talking about.
"3) CUDA with AI is not difficult. We're not talking about a game optimization engine and all that fancy geometry and architecture-dependant draw-calls etc. It's AI training and it's quite straight-forward if you have all the math libraries which currently exist with OpenCL and ROCm."
Sure, that's what Raja Koduri said years ago with the M25 release and what have we seen since then? It still takes lots of architecture optimization in software libraries to make robust tools. Last I knew CuDNN and CuBLAS were well ahead of anything available for use on AMD architecture. Being "straightforward" is not the issue. The issue is performance.
"4) CUDA moat you talk about will be the rope around Nvidia's neck"
Uh huh. As if NVIDIA doesn't already have OpenCL tools that perform better on NVIDIA's hardware than AMD's OpenCL tools do on AMD's hardware...
"5) Currently AMD doesn't need tensor-cores, Xilinx already has that covered. MI60 and Xilinx solutions are way more cost-effective. Not because margins are lower but because 21B monolith V100 is super expensive to produce."
"6) MI60 will most certainly gain traction. I'm certain AMD knows more about their customers and the market than you do."
You just stating it to be so without giving valid reasons doesn't convince me somehow. You can assume that because AMD paper launches a card that that card will be successful if you want. Just don't read the AMD's history and you'll be fine.
"7) Rome and MI60 will use HSA. This is AMD specific and requires proprietary logic within the CPU and GPU. For large-scale simulation use-cases AMD has a definitive advantage with that."
"8)You forgot hardware virtualization. This is unique to AMD's solution."
AMD has had hardware virtualization for a while and has not gained much market share in the data center visualization virtualization market thus far.
What about it? Both the V100 and the M60 have it.
"Point is, there's A LOT of things MI60 does better."
Hardware virtualization is a lot? What else have you mentioned?
"And architecturally Vega20 is clearly superior in the sense of a smaller footprint, better efficiency and better yield."
Vega20 is architecturally better because it uses a full node advance and an extra 1 1/2 years to market to match the (theoretical) performance of the V100? That's a strange conclusion. Surely you can see your bias.
Again, Tensor cores are useful for training. And you might want to inform the boards of AMD and Xilinx that they are going to be sharing resources and profits from now on... FPGAs in inference are mmore unproven than even GPUs in inference, btw.
At the same price point, I think a 7 nm GP100 would be preferable for most use cases to the MI60. I doubt AMD has the software stack necessary to make GPU inference that attractive for most workloads. NVIDIA has put a lot of work in neural network optimization compilers, container support, kubernetes support, etc.
The MI60 will never see widespread use. Widespread meaning, say, 5% of the market share for GPU compute. It will be used in small scale for evaluation purposes and perhaps by people for which the hardware virtualization is important but yet still want to put their data in the cloud (which is inherently less secure and more nervy, anyway). It remains to be seen whether the MI60 is a product that allows AMD to begin to get their foot in the door or if it is just another product that will be forgotten to history.
Yojimbo - Thursday, November 8, 2018 - linkOh I missed #5:
I'm not sure why you think AMD and Xilinx are a team. FPGAs have certainly not captured the inference market, anyway. But, again, Tensor Cores are for training, not just inference. AMD is developing their own Tensor Core-like technology and I guess when they come out with it you will decide it will then suddenly become something necessary for them and when you squint and cock your head at a steep enough angle it will almost look like their solution is better. Don't worry about the cost to make the V100. They can charge a lot more for it than AMD can charge for the MI60 (when it is actually available) because the demand for it is a lot higher.