← All hardware

hardware Data Center AI Accelerators

Groq LPU

by Groq

A deterministic inference engine built for the lowest latency per token.

Pros

Industry-leading low latency and TTFT
Predictable, deterministic performance
Easy OpenAI-compatible cloud access

Cons

Tiny per-chip SRAM (230 MB) needs many chips for big models
Inference-only; no training
Throughput trails wafer-scale rivals on some models

✓ Where it shines / best for

Real-time, latency-critical LLM applications (chat, agents, voice)
Developers wanting fast, cheap open-model inference via API
High-throughput token generation at scale

✕ Not the best fit for

Model training (LPU is inference-only)
Workloads needing proprietary closed models not on Groq
Very large memory-footprint single-model serving without sharding

Features

✓ API access
✓ Inference
✓ High Throughput
✓ Free tier
✓ Open source
✓ Openai Compatible
✓ Real-time
✓ Low latency
✓ Lpu
✓ Open Models

Pricing

Plan	Price	Billing	Notes
Free tier	$0	ongoing	Free GroqCloud developer access with rate limits for evaluation.
Developer / Pay-as-you-go	from ~$0.05–$0.79	per 1M input tokens	Per-token pricing varies by model (e.g., Llama models among the cheapest; larger/MoE models higher).
Developer / Pay-as-you-go (output)	from ~$0.08–$0.99	per 1M output tokens	Output tokens priced higher than input; exact rate is model-dependent.
Enterprise / On-prem	Custom quote	contract	Dedicated capacity, higher rate limits, and on-prem LPU hardware deployments via sales.

Pricing verified from the official source. Prices change often — confirm on the vendor's site before buying.

Specifications

use	inference-only
latency	sub-100ms time-to-first-token
throughput	hundreds to 1,000+ tokens/sec (model-dependent)
architecture	LPU (deterministic dataflow), SRAM-based
on_chip_sram	230 MB per chip

A full review is being generated for this product and will appear here shortly.

Compare with

NVIDIA GB200 NVL72

A rack-scale exaflop AI supercomputer that acts as one giant GPU.

9.6/10 hardware From $10.50/per hour

NVIDIA GB300 NVL72

Rack-scale Blackwell Ultra: 72 GPUs + 36 Grace CPUs as one giant accelerator

9.5/10 hardware From $12/per hour

NVIDIA GB300 NVL72 (Blackwell Ultra)

Blackwell Ultra rack-scale system tuned for the age of AI reasoning.

9.5/10 hardware From $12/per hour

NVIDIA B300 (Blackwell Ultra)

Blackwell Ultra single-GPU module for AI reasoning at scale

9.4/10 hardware From $30000/one-time

Compare

Compare