← All hardware
A single wafer-scale chip the size of a dinner plate for record inference speed.
Pros
- Fastest measured LLM inference token throughput
- Eliminates multi-GPU model-sharding complexity
- Huge on-chip memory bandwidth (21 PB/s)
Cons
- Very high power and physical footprint (~23 kW per CS-3)
- Niche, specialized ecosystem vs GPUs
- Limited on-chip memory (44 GB) for largest weights
✓ Where it shines / best for
- Ultra-fast, low-latency LLM inference at scale
- Training large models without complex GPU-cluster sharding
- Research labs and enterprises needing extreme compute density
✕ Not the best fit for
- Small teams wanting cheap, commodity GPUs
- On-premises deployments without significant power/cooling
- On-device or edge inference
Features
- ✓ AI inference
- ✓ Data-center scale
- ✓ LLM
- ✓ API access
- ✓ AI Training
- ✓ High Throughput
- ✓ Free tier
- ✓ Openai Compatible
- ✓ Real-time
- ✓ Wafer Scale
Pricing
| Plan | Price | Billing | Notes |
|---|---|---|---|
| CS-3 system (purchase) | Not publicly listed | one-time | The CS-3 wafer-scale system is sold/leased to enterprises and labs via direct sales; widely reported in the low-to-mid seven figures (around ~$2M+ per system). Contact Cerebras for quotes. |
| Inference – Free tier | $0 | monthly | Free API access with rate limits to get started on Cerebras Inference. |
| Inference – Pay-as-you-go | Usage-based | per token | Per-million-token pricing for hosted models (e.g., Llama 3.3 70B, 8B). Specific rates published on the Cerebras inference pricing page; competitive per-token rates. |
| Inference – Enterprise | Custom | custom | Dedicated capacity, higher rate limits, and custom pricing via sales. |
| Cloud compute | Usage-based | varies | Cerebras Cloud offers access to CS-3 clusters without purchasing hardware; custom pricing. |
Pricing verified from the official source. Prices change often — confirm on the vendor's site before buying.
Specifications
| cores | 900,000 |
| form_factor | CS-3 system (~23 kW) |
| performance | 125 PFLOPS peak AI |
| transistors | 4 trillion |
| architecture | WSE-3 wafer-scale engine (TSMC 5nm) |
| on_chip_sram | 44 GB at 21 PB/s |
Sponsored
A full review is being generated for this product and will appear here shortly.