Recommended LLaMA.cpp parameters for different models

Following are my llama.cpp setting used to efficiently run different models.

Gemma3:

Gemma3 27bbash
./llama-cli --model bartowski/google_gemma-3-27b-it-qat-GGUF/gemma-3-27b-it-Q8_0.gguf --ctx-size 131072 --temp 1.0 --repeat-penalty 1.0 --min-p 0.01 --top-k 64 --top-p 0.95

note

Recommended is a Min_P of 0.00 (optional), but 0.01 works well, llama.cpp according to unsloth team.

References

https://ollama.com/library/gemma3/blobs/3116c5225075

https://docs.unsloth.ai/basics/gemma-3-how-to-run-and-fine-tune

Microsoft/Phi-4-reasoning

Microsoft Phi4 Reasoning Plusbash
./llama-cli --model bartowski/microsoft_Phi-4-reasoning-plus-GGUF/microsoft_Phi-4-reasoning-plus-Q8_0.gguf --ctx-size 32768 --temp 0.8 --top-k 50 --top-p 0.95 --reasoning-format deepseek  

note

For more complex queries, set --predict to 32768 to allow for longer chain-of-thought (CoT).

References

https://huggingface.co/microsoft/Phi-4-reasoning

Mistralai/Devstral-Small-2505

Devstral-Small-2505bash
./llama-cli --model bartowski/Devstral-Small-2505-GGUF/Devstral-Small-2505-Q8_0.gguf --ctx-size 131072 --temp 0.15  

References

https://huggingface.co/mistralai/Devstral-Small-2505

Qwen/Qwen3

Thinking

Qwen3 Thinkingbash
./llama-cli --model bartowski/Qwen_Qwen3-8B-GGUF/Qwen_Qwen3-8B-Q8_0.gguf --jinja --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --ctx-size 40960 --predict 32768  

Non Thinking

Qwen3 Non Thinkingbash
./llama-cli --model bartowski/Qwen_Qwen3-8B-GGUF/Qwen_Qwen3-8B-Q8_0.gguf --jinja --temp 0.7 --top-k 20 --top-p 0.8 --min-p 0 --ctx-size 40960 --predict 32768

Notes

qwen team suggests to set the --presence-penalty parameter between 0 and 2 to reduce endless repetitions and adds that a higher value may occasionally result in language mixing and a slight decrease in model performance.

References

https://qwen.readthedocs.io/en/latest/run_locally/llama.cpp.html#llama-cli
https://huggingface.co/Qwen/Qwen3-235B-A22B#switching-between-thinking-and-non-thinking-mode

Gemma3:​

Microsoft/Phi-4-reasoning​

Mistralai/Devstral-Small-2505​

Qwen/Qwen3​

Thinking​

Non Thinking​

Gemma3:

Microsoft/Phi-4-reasoning

Mistralai/Devstral-Small-2505

Qwen/Qwen3

Thinking

Non Thinking