Global Configuration
This guide covers the configuration options for the Semantic Router. The system uses a single YAML configuration file that controls all aspects of routing, classification, and security.
Configuration File​
The configuration file is located at config/config.yaml
. Here's the structure based on the actual implementation:
# config/config.yaml - Actual configuration structure
# BERT model for semantic similarity
bert_model:
model_id: sentence-transformers/all-MiniLM-L12-v2
threshold: 0.6
use_cpu: true
# Semantic caching
semantic_cache:
backend_type: "memory" # Options: "memory" or "milvus"
enabled: false
similarity_threshold: 0.8
max_entries: 1000
ttl_seconds: 3600
eviction_policy: "fifo" # Options: "fifo", "lru", "lfu"
# Tool auto-selection
tools:
enabled: false
top_k: 3
similarity_threshold: 0.2
tools_db_path: "config/tools_db.json"
fallback_to_empty: true
# Jailbreak protection
prompt_guard:
enabled: false
use_modernbert: true
model_id: "models/jailbreak_classifier_modernbert-base_model"
threshold: 0.7
use_cpu: true
# vLLM endpoints - your backend models
vllm_endpoints:
- name: "endpoint1"
address: "your-server.com" # Replace with your server
port: 11434
models:
- "your-model" # Replace with your model
weight: 1
# Model configuration
model_config:
"your-model":
pii_policy:
allow_by_default: true
pii_types_allowed: ["EMAIL_ADDRESS", "PERSON"]
preferred_endpoints: ["endpoint1"]
# Classification models
classifier:
category_model:
model_id: "models/category_classifier_modernbert-base_model"
use_modernbert: true
threshold: 0.6
use_cpu: true
pii_model:
model_id: "models/pii_classifier_modernbert-base_presidio_token_model"
use_modernbert: true
threshold: 0.7
use_cpu: true
# Categories and routing rules
categories:
- name: math
use_reasoning: true # Enable reasoning for math
model_scores:
- model: your-model
score: 1.0
- name: computer science
use_reasoning: true # Enable reasoning for code
model_scores:
- model: your-model
score: 1.0
- name: other
use_reasoning: false # No reasoning for general queries
model_scores:
- model: your-model
score: 0.8
default_model: your-model
# Reasoning family configurations - define how different model families handle reasoning syntax
reasoning_families:
deepseek:
type: "chat_template_kwargs"
parameter: "thinking"
qwen3:
type: "chat_template_kwargs"
parameter: "enable_thinking"
gpt-oss:
type: "reasoning_effort"
parameter: "reasoning_effort"
gpt:
type: "reasoning_effort"
parameter: "reasoning_effort"
# Global default reasoning effort level
default_reasoning_effort: "medium"
# Model configurations - assign reasoning families to specific models
model_config:
# Example: DeepSeek model with custom name
"ds-v31-custom":
reasoning_family: "deepseek" # This model uses DeepSeek reasoning syntax
preferred_endpoints: ["endpoint1"]
# Example: Qwen3 model with custom name
"my-qwen3-model":
reasoning_family: "qwen3" # This model uses Qwen3 reasoning syntax
preferred_endpoints: ["endpoint2"]
# Example: Model without reasoning support
"phi4":
# No reasoning_family field - this model doesn't support reasoning mode
preferred_endpoints: ["endpoint1"]
Key Configuration Sections​
Backend Endpoints​
Configure your LLM servers:
vllm_endpoints:
- name: "my_endpoint"
address: "127.0.0.1" # Your server IP
port: 8000 # Your server port
models:
- "llama2-7b" # Model name
weight: 1 # Load balancing weight
Model Settings​
Configure model-specific settings:
model_config:
"llama2-7b":
pii_policy:
allow_by_default: true # Allow PII by default
pii_types_allowed: ["EMAIL_ADDRESS", "PERSON"]
preferred_endpoints: ["my_endpoint"]