Integrate Novita AI as exclusive inference provider - Add Novita AI API integration with DeepSeek-R1-Distill-Qwen-7B model - Remove all local model dependencies - Optimize token allocation for user inputs and context - Add Anaconda environment setup files - Add comprehensive test scripts and documentation
927854c
Novita AI Implementation Summary
β Implementation Complete
All changes have been implemented to switch from local models to Novita AI API as the only inference source.
π Files Modified
1. β
src/config.py
- Added Novita AI configuration section with:
novita_api_key(required, validated)novita_base_url(default: https://api.novita.ai/dedicated/v1/openai)novita_model(default: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2)deepseek_r1_temperature(default: 0.6, validated 0.5-0.7 range)deepseek_r1_force_reasoning(default: True)- Token allocation configuration:
user_input_max_tokens(default: 8000)context_preparation_budget(default: 28000)context_pruning_threshold(default: 28000)prioritize_user_input(default: True)
2. β
requirements.txt
- Added
openai>=1.0.0package
3. β
src/models_config.py
- Changed
primary_providerfrom "local" to "novita_api" - Updated all model IDs to Novita model ID
- Added DeepSeek-R1 optimized parameters:
- Temperature: 0.6 for reasoning, 0.5 for classification/safety
- Top_p: 0.95 for reasoning, 0.9 for classification
force_reasoning_prefix: Truefor reasoning tasks
- Removed all local model configuration (quantization, fallbacks)
4. β
src/llm_router.py (Complete Rewrite)
- Removed all local model loading code
- Removed
LocalModelLoaderdependencies - Added OpenAI client initialization
- Implemented
_call_novita_api()method - Added DeepSeek-R1 optimizations:
_format_deepseek_r1_prompt()- reasoning trigger and math directives_is_math_query()- automatic math detection_clean_reasoning_tags()- response cleanup
- Updated
prepare_context_for_llm()with:- User input priority (never truncated)
- Dedicated 8K token budget for user input
- 28K token context preparation budget
- Dynamic context allocation
- Updated
health_check()for Novita API - Removed all local model methods
5. β
flask_api_standalone.py
- Updated
initialize_orchestrator():- Changed to "Novita AI API Only" mode
- Removed HF_TOKEN dependency
- Set
use_local_models=False - Updated error handling for configuration errors
- Increased
MAX_MESSAGE_LENGTHfrom 10KB to 100KB - Updated logging messages
6. β
src/context_manager.py
- Updated
prune_context()to use config threshold (28000 tokens) - Increased user input storage from 500 to 5000 characters
- Increased system response storage from 1000 to 2000 characters
- Updated interaction context generation to use more of user input
π Environment Variables Required
Create a .env file with the following (see .env.example for full template):
# REQUIRED - Novita AI Configuration
NOVITA_API_KEY=your_api_key_here
NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai
NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
# DeepSeek-R1 Optimized Settings
DEEPSEEK_R1_TEMPERATURE=0.6
DEEPSEEK_R1_FORCE_REASONING=True
# Token Allocation (Optional - defaults provided)
USER_INPUT_MAX_TOKENS=8000
CONTEXT_PREPARATION_BUDGET=28000
CONTEXT_PRUNING_THRESHOLD=28000
PRIORITIZE_USER_INPUT=True
π Installation Steps
Install dependencies:
pip install -r requirements.txtCreate
.envfile:cp .env.example .env # Edit .env and add your NOVITA_API_KEYSet environment variables:
export NOVITA_API_KEY=your_api_key_here export NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai export NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2Start the application:
python flask_api_standalone.py
β¨ Key Features Implemented
DeepSeek-R1 Optimizations
- β Temperature set to 0.6 (recommended range 0.5-0.7)
- β
Reasoning trigger (
<think>prefix) for reasoning tasks - β Automatic math directive detection
- β No system prompts (all instructions in user prompt)
Token Allocation
- β User input: 8K tokens dedicated budget (never truncated)
- β Context preparation: 28K tokens total budget
- β Context pruning: 28K token threshold
- β User input always prioritized over historical context
API Improvements
- β Message length limit: 100KB (increased from 10KB)
- β Better error messages with token estimates
- β Configuration validation with helpful error messages
Database Storage
- β User input storage: 5000 characters (increased from 500)
- β System response storage: 2000 characters (increased from 1000)
π§ͺ Testing Checklist
- Test API health check endpoint
- Test simple inference request
- Test large user input (5K+ tokens)
- Test reasoning tasks (should see reasoning trigger)
- Test math queries (should see math directive)
- Test context preparation (user input should not be truncated)
- Test error handling (missing API key, invalid endpoint)
π Expected Behavior
Startup:
- System initializes Novita AI client
- Validates API key is present
- Logs Novita AI configuration
Inference:
- All requests routed to Novita AI API
- DeepSeek-R1 optimizations applied automatically
- User input prioritized in context preparation
Error Handling:
- Clear error messages if API key missing
- Helpful guidance for configuration issues
- Graceful handling of API failures
π§ Troubleshooting
Issue: "NOVITA_API_KEY is required"
Solution: Set the environment variable:
export NOVITA_API_KEY=your_key_here
Issue: "openai package not available"
Solution: Install dependencies:
pip install -r requirements.txt
Issue: API connection errors
Solution:
- Verify API key is correct
- Check base URL matches your endpoint
- Verify model ID matches your deployment
π Configuration Reference
Model Configuration
- Model ID:
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2 - Context Window: 131,072 tokens (131K)
- Optimized Settings: Temperature 0.6, Top_p 0.95
Token Allocation
- User Input: 8,000 tokens (dedicated, never truncated)
- Context Budget: 28,000 tokens (includes user input + context)
- Output Limits:
- Reasoning: 4,096 tokens
- Synthesis: 2,000 tokens
- Classification: 512 tokens
π― Next Steps
- Set your
NOVITA_API_KEYin environment variables - Test the health check endpoint:
GET /api/health - Send a test request:
POST /api/chat - Monitor logs for Novita AI API calls
- Verify DeepSeek-R1 optimizations are working
π Notes
- All local model code has been removed
- System now depends entirely on Novita AI API
- No GPU/quantization configuration needed
- No model downloading required
- Faster startup (no model loading)