Post
153
Inspired by the heroes of day zero quants (
@TheBloke
@danielhanchen
@shimmyshimmer
@bartowski
), I decided to join the race by releasing the first FP8 quant of glm-4.7-flash! Not as easy as i expected, but I'm happy i was still able to have it working within a few hours after the original model was released! Interested in feedback if anyone wants to try it out!
marksverdhei/GLM-4.7-Flash-FP8
Note: If my PR to vLLM isn't merged yet you might have to use my fork. Cheers! 🤗
marksverdhei/GLM-4.7-Flash-FP8
Note: If my PR to vLLM isn't merged yet you might have to use my fork. Cheers! 🤗