LLaMA.cpp HTTP server knobs
Fast, lightweight, pure C/C++ HTTP server based on httplib, nlohmann::json and llama.cpp.
Fast, lightweight, pure C/C++ HTTP server based on httplib, nlohmann::json and llama.cpp.
My llamaSwap configuration. Individual configurations can be used to run llama.cpp as standalone.
To disable thinking, use (or you can set it in the system prompt):
Following are my llama.cpp setting used to efficiently run different models.