← All hardware
AMD's CDNA 4 flagship with the biggest memory and native FP4/FP6.
Pros
- Largest memory capacity per GPU (288 GB)
- Open ROCm software, less vendor lock-in than CUDA
- Excellent FP64 for HPC plus FP4/FP6 for AI
Cons
- ROCm ecosystem still less mature than CUDA
- 1,400W TDP requires liquid cooling
- Smaller multi-node scale-up fabric than NVLink
✓ Where it shines / best for
- Frontier-scale LLM training and high-throughput inference
- Operators wanting maximum memory and FP4/FP6 throughput per GPU
- Liquid-cooled, high-density AI data center deployments
✕ Not the best fit for
- Anyone without liquid-cooling or high-power data-center infrastructure
- CUDA-only software stacks with no ROCm migration plan
- Edge, on-device, or budget-constrained use
Features
- ✓ AI inference
- ✓ Data-center scale
- ✓ LLM
- ✓ HBM3E
- ✓ AI Training
- ✓ FP4
- ✓ Liquid Cooled
- ✓ High Memory
- ✓ CDNA 4
- ✓ FP6
Pricing
| Plan | Price | Billing | Notes |
|---|---|---|---|
| List/MSRP | Not publicly listed | one-time | Enterprise CDNA 4 accelerator sold through server OEMs/ODMs and cloud providers, not retail. Sold in 8-GPU platforms (air- and liquid-cooled). Contact AMD or partners for quotes. |
| Cloud access | Usage-based | hourly | Offered by cloud/neocloud providers on a per-GPU-hour basis; rates vary by provider. |
Pricing verified from the official source. Prices change often — confirm on the vendor's site before buying.
Specifications
| power | up to 1,400W peak board power |
| memory | 288 GB HBM3e, 8 TB/s |
| architecture | CDNA 4 (4th-gen) |
| compute_units | 256 |
| fp8_performance | 10.1 PFLOPS |
| fp16_performance | 5 PFLOPS |
| fp64_performance | 78.6 TFLOPS |
| fp4_fp6_performance | ~20.1 PFLOPS |
Sponsored
A full review is being generated for this product and will appear here shortly.