Pile202469

If you must quantize, try quantizing just your MLP weights and KV cache to int8 (or fp8). Reduced precision errors get more or less washed out in the softmaxs. Additional finetuning will improve mixed precision performance but is usually not necessary.

Gaia Prime

Explorer

Pile202469

Backlinks

Graph View