📄️ LLaMA.cpp HTTP server knobs
Fast, lightweight, pure C/C++ HTTP server based on httplib, nlohmann::json and llama.cpp.
📄️ Recommended LLaMA.cpp parameters for different models
Following are my llama.cpp setting used to efficiently run different models.
📄️ My LlamaSwap Configuration
My llamaSwap configuration. Individual configurations can be used to run llama.cpp as standalone.
📄️ Qwen 3 LLaMA.cpp tips and tricks
To disable thinking, use (or you can set it in the system prompt):