Thank you for this!
#2
by
arvnoodle
- opened
Tried it on VLLM. Pretty much working! I do have a question though, you think there's a possibility that the 405b hermes can be quantized to 4bit too?
Thank you for trying my model :) I really would love to quantize the 405b hermes, but it does not fit on my local setup, and financial constraints do not allow me to rent cloud gpu, unfortunately.