Guide to Run Qwen3.5 locally! πŸ’œ

#14
by danielhanchen - opened

Hey guys we made a guide to run Qwen3.5 locally on your local device.

Run 3-bit on a 192GB RAM Mac, or 4-bit (MXFP4) on an M3 Ultra with 256GB RAM (or less).

Guide: https://unsloth.ai/docs/models/qwen3.5

GGUF: https://huggingface.co/unsloth/Qwen3.5-397B-A17B-GGUF

Let us know if you have any questions!

qwen3.5-guide

Thank you!

I was wondering if you can help explain (and maybe recommend) which 4bit quant would be the best for my use case of Apple Silicon Mac Studio 512GB -> llama-server -> Roo Code

  1. Which 4 bit is the best for Accuracy?
  2. Which is the best for Speed?
  3. Is there a model that offers that best of both for my Apple Silicon use case?
    (Knowing that on your blog, you often say: "We use the UD-Q4_K_XL quant for the best size/accuracy balance")

IQ4_XS
Q4_K_S
IQ4_NL
Q4_0
Q4_1
Q4_K_M
Q4_K_XL
MXFP4

Would this answer change model-to-model?

Convert to MLX

Will this model outperform the Qwen 3 next 80b thinking model at Q6?

I hope we can come up with a way to run this on 98GB of RAM and 24GB VRAM...

Sign up or log in to comment