VRAM Calculator For LLM

LLM VRAM Usage Estimator

LLM VRAM Usage Estimator

Model Size (Billion Parameters):

Quantization / Precision:

Context Length (Tokens):

Estimated VRAM usage will be displayed here.

Calculation Method & Disclaimer:

This calculator provides a rough estimate. Actual VRAM usage can vary.
Calculation: VRAM ≈ Model Weights + KV Cache + Fixed Overhead.
Model Weights: `Model Size(B) * Bits Per Weight / 8` GB.
KV Cache (Approx.): Uses a simplified formula based on size, context, and quantization level. This is a very rough part of the estimate.
Fixed Overhead (Approx.): Assumed ~1.0-1.5 GB for software, CUDA, etc.
Bit numbers next to quant formats are approximate averages.
Real-world usage depends heavily on loader software (llama.cpp, ExLlamav2, vLLM, etc.), batch size, drivers, and specific model implementation.
Treat this as a guideline, not an exact figure.

Post a Comment