FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving Paper • 2501.01005 • Published Jan 2 • 2
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats Paper • 2510.25602 • Published Oct 29 • 76
SINQ Collection This collection contains the models quantized with the SINQ quantization method. • 19 items • Updated 13 days ago • 10
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights Paper • 2509.22944 • Published Sep 26 • 79
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm Paper • 2507.18553 • Published Jul 24 • 40