Inference | Tipu's personal Website

📄️ LLaMA.cpp HTTP server knobs

Fast, lightweight, pure C/C++ HTTP server based on httplib, nlohmann::json and llama.cpp.

Following are my llama.cpp setting used to efficiently run different models.

My llamaSwap configuration. Individual configurations can be used to run llama.cpp as standalone.

To disable thinking, use (or you can set it in the system prompt):