Skip to main content

Advanced Configuration

For power users, Rephlo exposes finer control over performance, AI behavior, and prompt caching. Unlike the basics, these settings are spread across a few places — some are per-Space, some are per-provider or per-command, and some are app-wide. This page maps out where each one lives and what it does.

The General tab's Performance section — Default Space Token Budget and chat browse-window settings

Token Budgets

The Default Space Token Budget lives in Settings → General → Performance. It sets the starting token limit for each new Space — how much document content the Space can hold before Smart Compaction becomes relevant.

  • Slider range: 5,000 to 1,000,000 tokens (the input box allows down to 1,000).
  • Default: 100,000 tokens.
  • Applies to new Spaces only. Each Space can be re-budgeted individually afterward, and changing this default never rewrites existing Spaces.

A separate chat browse-window size in the same section controls how many recent conversations load into the chat sidebar at once (10–5,000; older chats remain reachable through search). It is a performance tuning knob, not a data limit.

Smart Compaction & Strategy

Smart Compaction keeps a large Space within its token budget by summarizing older or less-critical documents. It is configured per Space, not globally:

  • Smart Compaction is a per-Space toggle and is off by default (opt-in). When on, newly added documents in that Space are compacted automatically.
  • When you compact a Space, Rephlo asks you to pick a strategy and which model should do the summarizing. The model choice is remembered for the rest of your session.

The three strategies trade fidelity against token savings:

StrategyWhat it doesBest for
ConservativeKeeps the most detail; smallest token reduction.When accuracy matters more than saving tokens.
BalancedSummarizes older documents — the recommended middle ground (default).Most projects.
AggressiveKeeps only the most recent key points; maximum savings, more detail loss.Fitting very long material into a small context window.

Whichever mode a Space uses, Rephlo tracks both a raw token total and a compact token total per Space, so you can see the effect of compaction. For the full picture of raw vs. compact data and how compaction interacts with on-device retrieval (RAG), see Smart Compaction & Data Modes.

Provider & Model Tuning

Temperature, maximum response length, and context window are per-provider and per-model settings (with optional per-command overrides) — they are not a single global control. Configure them in the LLM Providers tab; see Providers and Managing Providers.

Temperature

Controls how deterministic vs. creative responses are. Default is 0.7 (balanced).

  • 0.0 – 0.3 — precise and repeatable. Good for code and facts.
  • ~0.7 — balanced. Good for general writing.
  • 1.0+ — more creative and varied. Good for brainstorming.

Max Response Tokens

Caps the length of the AI's reply so it doesn't write an essay when you wanted a sentence. A value around 2,000 is a typical default.

Context Window Limit

Caps how much text is sent to the model. Leave it on Auto to use the provider's default, or set a manual limit to control cost on paid APIs and to avoid out-of-memory errors with local on-device models.

Per-Command Overrides

Individual commands can override the global provider, model, temperature, and max tokens. A command's own settings take precedence over the global provider defaults when it runs. See Execution Workflows.

Prompt Caching

Rephlo can reuse a cached prefix of your prompt to cut cost and latency on supported providers. Caching settings live with provider configuration:

  • Enable prompt caching — turns caching on for supported requests (on by default).
  • Cache TTL — how long a cache entry lives:
    • Ephemeral — about 5 minutes.
    • Extended — about 1 hour.
  • Cache logging — records detailed cache-performance information.

Cache reads are heavily discounted, and the History detail view shows cache-creation and cache-read tokens per request so you can see the savings. Caching only kicks in when the request has a cacheable prefix (for example, a chat with a Space attached). For the full behavior, see Prompt Caching.

Developer Tools

These are aids for troubleshooting, not everyday settings:

  • Debug Mode — enables verbose logging to the app's log file. Useful when diagnosing provider connection problems.
  • Prompt Inspection — surfaces the exact JSON payload sent to the provider, viewable in the History details.

Environment switching (Production / Staging / Local API endpoints) is available only in debug builds and is not part of the regular user experience.


Related: Smart Compaction & Data Modes · Providers · Prompt Caching · History & Audit