please GGUF
#1
by
houxiaowei
- opened
please GGUF
it wouldn't matter much. this model isn't supported by llama.cpp, and you need to follow their vllm or sglang instructions.
@zhanghanxiao llama.cpp now has reference implementation of linear attention by providing qwen3 next. Can you guys add support for your model as well.