Hosting
Become a Hoster
Share your GPU with the world — or just your team. Create a free plan to share privately, or set up paid subscriptions and keep 80% of revenue.
🚧 LLMFinder is currently in beta. Access is invite-only.
Request an invite →
Requirements
- A GPU server with at least 4GB VRAM (or a CPU with 8GB+ RAM)
- A publicly reachable endpoint (not localhost or private IPs)
- An OpenAI-compatible server (setup guide)
- An invite code — register to request one
One-command setup
The fastest way to get started. The setup wizard detects your GPU, picks a model, configures Docker Compose, and registers with LLMFinder:
curl -O https://llmfinder.net/llmfinder-hoster.py && python3 llmfinder-hoster.py
The wizard:
- Detects your GPU and available VRAM
- Downloads the right model into
~/llmfinder-models/ - Calculates optimal context size from GGUF metadata + VRAM
- Generates a
docker-compose.ymlwith your backend (llama.cpp, Ollama, or vLLM) plus a Cloudflare tunnel for public access - Registers your endpoint with LLMFinder and runs a verification test
Top-level menu (7 options): 1) Setup wizard, 2) Add/update models, 3) Server verification test, 4) Update server URL, 5) Rotate bearer token, 6) Uninstall, 7) Exit.
💡 The Cloudflare tunnel URL changes on every restart (free tier). The script auto-syncs the URL with LLMFinder on each menu open. For a permanent URL, use a named Cloudflare tunnel.
Manual registration
If you already have a server running, register it directly:
curl -X POST https://api.llmfinder.net/hosters/register \
-H "Content-Type: application/json" \
-d '{
"name": "My GPU Server",
"email": "[email protected]",
"endpoint_url": "https://my-server.example.com",
"api_key": "my-bearer-token",
"invite_code": "BETA2026",
"models": [
{
"model_id": "llama-3-8b",
"model_alias": "Llama 3 8B",
"price_per_input_token": 100,
"price_per_output_token": 300,
"context_window": 8192,
"max_tokens": 2048
}
]
}'
Supported server software
| Software | Compatible | Notes |
|---|---|---|
| llama.cpp server | ✅ | Recommended. Supports GGUF models. |
| vLLM | ✅ | Best for large HuggingFace models. |
| Ollama | ✅ | Natively OpenAI-compatible. No bridge needed. |
| Any OpenAI-compatible server | ✅ | Must expose /health and /v1/chat/completions |
Blocked endpoints
The following cannot be registered (ToS violation):
- Localhost or private IPs (127.x, 10.x, 192.168.x) — must be publicly reachable
- Commercial API providers (OpenAI, Anthropic, Google, etc.)
- Commercial model IDs (gpt-4, claude-*, gemini-*)
Verification
After registration, LLMFinder runs two checks:
- Health check —
GET /healthmust return HTTP 200 - Inference test — sends a test prompt, expects a valid response
Once both pass, your server goes live and starts receiving traffic.