LLM VRAM Usage Estimator
Estimated VRAM usage will be displayed here.
Calculation Method & Disclaimer:
- This calculator provides a rough estimate. Actual VRAM usage can vary.
- Calculation: VRAM ≈ Model Weights + KV Cache + Fixed Overhead.
- Model Weights: `Model Size(B) * Bits Per Weight / 8` GB.
- KV Cache (Approx.): Uses a simplified formula based on size, context, and quantization level. This is a very rough part of the estimate.
- Fixed Overhead (Approx.): Assumed ~1.0-1.5 GB for software, CUDA, etc.
- Bit numbers next to quant formats are approximate averages.
- Real-world usage depends heavily on loader software (llama.cpp, ExLlamav2, vLLM, etc.), batch size, drivers, and specific model implementation.
- Treat this as a guideline, not an exact figure.