If you must quantize, try quantizing just your MLP weights and KV cache to int8 (or fp8). Reduced precision errors get more or less washed out in the softmaxs. Additional finetuning will improve mixed precision performance but is usually not necessary.