← All hardware
Cerebras WSE-3 (CS-3) logo
hardware Data Center AI Accelerators

Cerebras WSE-3 (CS-3)

by Cerebras Systems

A single wafer-scale chip the size of a dinner plate for record inference speed.

Pros

  • Fastest measured LLM inference token throughput
  • Eliminates multi-GPU model-sharding complexity
  • Huge on-chip memory bandwidth (21 PB/s)

Cons

  • Very high power and physical footprint (~23 kW per CS-3)
  • Niche, specialized ecosystem vs GPUs
  • Limited on-chip memory (44 GB) for largest weights

✓ Where it shines / best for

  • Ultra-fast, low-latency LLM inference at scale
  • Training large models without complex GPU-cluster sharding
  • Research labs and enterprises needing extreme compute density

✕ Not the best fit for

  • Small teams wanting cheap, commodity GPUs
  • On-premises deployments without significant power/cooling
  • On-device or edge inference

Features

  • ✓ AI inference
  • ✓ Data-center scale
  • ✓ LLM
  • ✓ API access
  • ✓ AI Training
  • ✓ High Throughput
  • ✓ Free tier
  • ✓ Openai Compatible
  • ✓ Real-time
  • ✓ Wafer Scale

Pricing

PlanPriceBillingNotes
CS-3 system (purchase)Not publicly listedone-timeThe CS-3 wafer-scale system is sold/leased to enterprises and labs via direct sales; widely reported in the low-to-mid seven figures (around ~$2M+ per system). Contact Cerebras for quotes.
Inference – Free tier$0monthlyFree API access with rate limits to get started on Cerebras Inference.
Inference – Pay-as-you-goUsage-basedper tokenPer-million-token pricing for hosted models (e.g., Llama 3.3 70B, 8B). Specific rates published on the Cerebras inference pricing page; competitive per-token rates.
Inference – EnterpriseCustomcustomDedicated capacity, higher rate limits, and custom pricing via sales.
Cloud computeUsage-basedvariesCerebras Cloud offers access to CS-3 clusters without purchasing hardware; custom pricing.

Pricing verified from the official source. Prices change often — confirm on the vendor's site before buying.

Specifications

cores900,000
form_factorCS-3 system (~23 kW)
performance125 PFLOPS peak AI
transistors4 trillion
architectureWSE-3 wafer-scale engine (TSMC 5nm)
on_chip_sram44 GB at 21 PB/s
Sponsored

A full review is being generated for this product and will appear here shortly.

Compare with

Compare
Compare