- Switch default model from llama3.1:8b to llama3.2:1b (2x faster on CPU) - Limit Ollama context to 2048 tokens and max output to 512 tokens - Reduce retrieval chunks from 4 to 3, chunk content from 800 to 500 chars - Trim conversation history from 10 to 6 messages - Shorten system prompt to reduce input tokens Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| app | ||
| backlog | ||
| .env.example | ||
| .gitignore | ||
| Dockerfile | ||
| docker-compose.yml | ||
| requirements.txt | ||