Commit
Β·
927854c
1
Parent(s):
ea87e33
Integrate Novita AI as exclusive inference provider - Add Novita AI API integration with DeepSeek-R1-Distill-Qwen-7B model - Remove all local model dependencies - Optimize token allocation for user inputs and context - Add Anaconda environment setup files - Add comprehensive test scripts and documentation
Browse files- CONDA_SETUP_GUIDE.md +166 -0
- ENV_EXAMPLE_CONTENT.txt +163 -0
- NOVITA_AI_IMPLEMENTATION_SUMMARY.md +212 -0
- QUICK_TEST_NOVITA.md +88 -0
- TEST_NOVITA_CONNECTION.md +220 -0
- environment.yml +43 -0
- flask_api_standalone.py +30 -40
- requirements.txt +3 -0
- setup_conda_env.bat +37 -0
- setup_conda_env.sh +41 -0
- src/config.py +92 -0
- src/context_manager.py +23 -9
- src/llm_router.py +238 -326
- src/models_config.py +29 -45
- test_novita_conda.bat +53 -0
- test_novita_connection.py +275 -0
CONDA_SETUP_GUIDE.md
ADDED
|
@@ -0,0 +1,166 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Anaconda Environment Setup Guide
|
| 2 |
+
|
| 3 |
+
## Quick Start
|
| 4 |
+
|
| 5 |
+
### 1. Create Conda Environment
|
| 6 |
+
|
| 7 |
+
```bash
|
| 8 |
+
# Create environment from environment.yml
|
| 9 |
+
conda env create -f environment.yml
|
| 10 |
+
|
| 11 |
+
# OR create manually
|
| 12 |
+
conda create -n research-ai-assistant python=3.10
|
| 13 |
+
conda activate research-ai-assistant
|
| 14 |
+
```
|
| 15 |
+
|
| 16 |
+
### 2. Activate Environment
|
| 17 |
+
|
| 18 |
+
```bash
|
| 19 |
+
# Windows
|
| 20 |
+
conda activate research-ai-assistant
|
| 21 |
+
|
| 22 |
+
# Linux/Mac
|
| 23 |
+
source activate research-ai-assistant
|
| 24 |
+
# OR
|
| 25 |
+
conda activate research-ai-assistant
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
+
### 3. Install Dependencies
|
| 29 |
+
|
| 30 |
+
```bash
|
| 31 |
+
# Install from requirements.txt
|
| 32 |
+
pip install -r requirements.txt
|
| 33 |
+
|
| 34 |
+
# OR install openai package directly
|
| 35 |
+
pip install openai>=1.0.0
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
### 4. Set Environment Variables
|
| 39 |
+
|
| 40 |
+
```bash
|
| 41 |
+
# Windows (PowerShell)
|
| 42 |
+
$env:NOVITA_API_KEY="your_api_key_here"
|
| 43 |
+
$env:NOVITA_BASE_URL="https://api.novita.ai/dedicated/v1/openai"
|
| 44 |
+
$env:NOVITA_MODEL="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2"
|
| 45 |
+
|
| 46 |
+
# Windows (CMD)
|
| 47 |
+
set NOVITA_API_KEY=your_api_key_here
|
| 48 |
+
set NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai
|
| 49 |
+
set NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
|
| 50 |
+
|
| 51 |
+
# Linux/Mac
|
| 52 |
+
export NOVITA_API_KEY=your_api_key_here
|
| 53 |
+
export NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai
|
| 54 |
+
export NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
### 5. Test Connection
|
| 58 |
+
|
| 59 |
+
```bash
|
| 60 |
+
# Run the test script
|
| 61 |
+
python test_novita_connection.py
|
| 62 |
+
|
| 63 |
+
# OR use the batch script (Windows)
|
| 64 |
+
test_novita_conda.bat
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
## Using Anaconda Prompt (Windows)
|
| 68 |
+
|
| 69 |
+
1. **Open Anaconda Prompt** (search for "Anaconda Prompt" in Start menu)
|
| 70 |
+
|
| 71 |
+
2. **Navigate to project directory:**
|
| 72 |
+
```bash
|
| 73 |
+
cd C:\Users\85jat\GenAI_work_V2\Prototyping\Research_AI_Assistant_V2\Research_AI_Assistant_API
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
3. **Create/activate environment:**
|
| 77 |
+
```bash
|
| 78 |
+
conda env create -f environment.yml
|
| 79 |
+
conda activate research-ai-assistant
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
4. **Install dependencies:**
|
| 83 |
+
```bash
|
| 84 |
+
pip install -r requirements.txt
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
5. **Set environment variables:**
|
| 88 |
+
```bash
|
| 89 |
+
set NOVITA_API_KEY=your_api_key_here
|
| 90 |
+
set NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai
|
| 91 |
+
set NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
6. **Run test:**
|
| 95 |
+
```bash
|
| 96 |
+
python test_novita_connection.py
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
## Environment Management
|
| 100 |
+
|
| 101 |
+
### List environments
|
| 102 |
+
```bash
|
| 103 |
+
conda env list
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
+
### Activate environment
|
| 107 |
+
```bash
|
| 108 |
+
conda activate research-ai-assistant
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
### Deactivate environment
|
| 112 |
+
```bash
|
| 113 |
+
conda deactivate
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
### Remove environment (if needed)
|
| 117 |
+
```bash
|
| 118 |
+
conda env remove -n research-ai-assistant
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
### Update environment
|
| 122 |
+
```bash
|
| 123 |
+
conda env update -f environment.yml --prune
|
| 124 |
+
```
|
| 125 |
+
|
| 126 |
+
## Verification
|
| 127 |
+
|
| 128 |
+
After setup, verify everything works:
|
| 129 |
+
|
| 130 |
+
```bash
|
| 131 |
+
# Activate environment
|
| 132 |
+
conda activate research-ai-assistant
|
| 133 |
+
|
| 134 |
+
# Check Python
|
| 135 |
+
python --version
|
| 136 |
+
|
| 137 |
+
# Check openai package
|
| 138 |
+
python -c "import openai; print(openai.__version__)"
|
| 139 |
+
|
| 140 |
+
# Check configuration
|
| 141 |
+
python -c "from src.config import get_settings; s = get_settings(); print(f'API Key: {s.novita_api_key[:10]}...' if s.novita_api_key else 'API Key: NOT SET')"
|
| 142 |
+
|
| 143 |
+
# Run full test
|
| 144 |
+
python test_novita_connection.py
|
| 145 |
+
```
|
| 146 |
+
|
| 147 |
+
## Troubleshooting
|
| 148 |
+
|
| 149 |
+
### Conda command not found
|
| 150 |
+
- **Windows:** Open Anaconda Prompt instead of regular PowerShell/CMD
|
| 151 |
+
- **Linux/Mac:** Ensure conda is initialized: `conda init bash` or `conda init zsh`
|
| 152 |
+
|
| 153 |
+
### Environment activation fails
|
| 154 |
+
- Try: `conda activate base` first, then `conda activate research-ai-assistant`
|
| 155 |
+
- On Windows: Use Anaconda Prompt instead of regular terminal
|
| 156 |
+
|
| 157 |
+
### Package installation fails
|
| 158 |
+
- Update conda: `conda update conda`
|
| 159 |
+
- Update pip: `pip install --upgrade pip`
|
| 160 |
+
- Try installing from conda-forge: `conda install -c conda-forge openai`
|
| 161 |
+
|
| 162 |
+
### Import errors
|
| 163 |
+
- Ensure environment is activated: `conda activate research-ai-assistant`
|
| 164 |
+
- Verify package is installed: `pip list | grep openai`
|
| 165 |
+
- Reinstall if needed: `pip install --force-reinstall openai>=1.0.0`
|
| 166 |
+
|
ENV_EXAMPLE_CONTENT.txt
ADDED
|
@@ -0,0 +1,163 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# =============================================================================
|
| 2 |
+
# Research AI Assistant API - Environment Configuration
|
| 3 |
+
# =============================================================================
|
| 4 |
+
# Copy this content to a file named .env and fill in your actual values
|
| 5 |
+
# Never commit .env to version control!
|
| 6 |
+
|
| 7 |
+
# =============================================================================
|
| 8 |
+
# Novita AI Configuration (REQUIRED)
|
| 9 |
+
# =============================================================================
|
| 10 |
+
# Get your API key from: https://novita.ai
|
| 11 |
+
NOVITA_API_KEY=your_novita_api_key_here
|
| 12 |
+
|
| 13 |
+
# Dedicated endpoint base URL (default for dedicated endpoints)
|
| 14 |
+
NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai
|
| 15 |
+
|
| 16 |
+
# Your dedicated endpoint model ID
|
| 17 |
+
# Format: model-name:endpoint-id
|
| 18 |
+
NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
|
| 19 |
+
|
| 20 |
+
# =============================================================================
|
| 21 |
+
# DeepSeek-R1 Optimized Settings
|
| 22 |
+
# =============================================================================
|
| 23 |
+
# Temperature: 0.5-0.7 range (0.6 recommended for DeepSeek-R1)
|
| 24 |
+
DEEPSEEK_R1_TEMPERATURE=0.6
|
| 25 |
+
|
| 26 |
+
# Force reasoning trigger: Enable to ensure DeepSeek-R1 uses reasoning pattern
|
| 27 |
+
# Set to True to add `<think>` prefix for reasoning tasks
|
| 28 |
+
DEEPSEEK_R1_FORCE_REASONING=True
|
| 29 |
+
|
| 30 |
+
# =============================================================================
|
| 31 |
+
# Token Allocation Configuration
|
| 32 |
+
# =============================================================================
|
| 33 |
+
# Maximum tokens dedicated for user input (prioritized over context)
|
| 34 |
+
# Recommended: 8000 tokens for large queries
|
| 35 |
+
USER_INPUT_MAX_TOKENS=8000
|
| 36 |
+
|
| 37 |
+
# Maximum tokens for context preparation (includes user input + context)
|
| 38 |
+
# Recommended: 28000 tokens for 32K context window models
|
| 39 |
+
CONTEXT_PREPARATION_BUDGET=28000
|
| 40 |
+
|
| 41 |
+
# Context pruning threshold (should match context_preparation_budget)
|
| 42 |
+
CONTEXT_PRUNING_THRESHOLD=28000
|
| 43 |
+
|
| 44 |
+
# Always prioritize user input over historical context
|
| 45 |
+
PRIORITIZE_USER_INPUT=True
|
| 46 |
+
|
| 47 |
+
# =============================================================================
|
| 48 |
+
# Database Configuration
|
| 49 |
+
# =============================================================================
|
| 50 |
+
# SQLite database path (default: sessions.db)
|
| 51 |
+
# Use /tmp/ for Docker/containerized environments
|
| 52 |
+
DB_PATH=sessions.db
|
| 53 |
+
|
| 54 |
+
# FAISS index path for embeddings (default: embeddings.faiss)
|
| 55 |
+
FAISS_INDEX_PATH=embeddings.faiss
|
| 56 |
+
|
| 57 |
+
# =============================================================================
|
| 58 |
+
# Cache Configuration
|
| 59 |
+
# =============================================================================
|
| 60 |
+
# HuggingFace cache directory (for any remaining model downloads)
|
| 61 |
+
HF_HOME=~/.cache/huggingface
|
| 62 |
+
TRANSFORMERS_CACHE=~/.cache/huggingface
|
| 63 |
+
|
| 64 |
+
# HuggingFace token (optional - only needed if using gated models)
|
| 65 |
+
HF_TOKEN=
|
| 66 |
+
|
| 67 |
+
# Cache TTL in seconds (default: 3600 = 1 hour)
|
| 68 |
+
CACHE_TTL=3600
|
| 69 |
+
|
| 70 |
+
# =============================================================================
|
| 71 |
+
# Session Configuration
|
| 72 |
+
# =============================================================================
|
| 73 |
+
# Session timeout in seconds (default: 3600 = 1 hour)
|
| 74 |
+
SESSION_TIMEOUT=3600
|
| 75 |
+
|
| 76 |
+
# Maximum session size in megabytes (default: 10 MB)
|
| 77 |
+
MAX_SESSION_SIZE_MB=10
|
| 78 |
+
|
| 79 |
+
# =============================================================================
|
| 80 |
+
# Performance Configuration
|
| 81 |
+
# =============================================================================
|
| 82 |
+
# Maximum worker threads for parallel processing (default: 4)
|
| 83 |
+
MAX_WORKERS=4
|
| 84 |
+
|
| 85 |
+
# =============================================================================
|
| 86 |
+
# Mobile Optimization
|
| 87 |
+
# =============================================================================
|
| 88 |
+
# Maximum tokens for mobile responses (default: 1200)
|
| 89 |
+
# Increased from 800 to allow better responses on mobile
|
| 90 |
+
MOBILE_MAX_TOKENS=1200
|
| 91 |
+
|
| 92 |
+
# Mobile request timeout in milliseconds (default: 15000)
|
| 93 |
+
MOBILE_TIMEOUT=15000
|
| 94 |
+
|
| 95 |
+
# =============================================================================
|
| 96 |
+
# API Configuration
|
| 97 |
+
# =============================================================================
|
| 98 |
+
# Flask/Gradio server port (default: 7860)
|
| 99 |
+
GRADIO_PORT=7860
|
| 100 |
+
|
| 101 |
+
# Server host (default: 0.0.0.0 for all interfaces)
|
| 102 |
+
GRADIO_HOST=0.0.0.0
|
| 103 |
+
|
| 104 |
+
# =============================================================================
|
| 105 |
+
# Logging Configuration
|
| 106 |
+
# =============================================================================
|
| 107 |
+
# Logging level: DEBUG, INFO, WARNING, ERROR, CRITICAL (default: INFO)
|
| 108 |
+
LOG_LEVEL=INFO
|
| 109 |
+
|
| 110 |
+
# Log format: json or text (default: json)
|
| 111 |
+
LOG_FORMAT=json
|
| 112 |
+
|
| 113 |
+
# Log directory (default: /tmp/logs)
|
| 114 |
+
LOG_DIR=/tmp/logs
|
| 115 |
+
|
| 116 |
+
# =============================================================================
|
| 117 |
+
# Context Configuration
|
| 118 |
+
# =============================================================================
|
| 119 |
+
# Maximum context tokens (default: 4000)
|
| 120 |
+
# Note: This is overridden by CONTEXT_PREPARATION_BUDGET if set
|
| 121 |
+
MAX_CONTEXT_TOKENS=4000
|
| 122 |
+
|
| 123 |
+
# Cache TTL for context in seconds (default: 300 = 5 minutes)
|
| 124 |
+
CACHE_TTL_SECONDS=300
|
| 125 |
+
|
| 126 |
+
# Maximum cache size (default: 100)
|
| 127 |
+
MAX_CACHE_SIZE=100
|
| 128 |
+
|
| 129 |
+
# Enable parallel processing (default: True)
|
| 130 |
+
PARALLEL_PROCESSING=True
|
| 131 |
+
|
| 132 |
+
# Context decay factor (default: 0.8)
|
| 133 |
+
CONTEXT_DECAY_FACTOR=0.8
|
| 134 |
+
|
| 135 |
+
# Maximum interactions to keep in context (default: 10)
|
| 136 |
+
MAX_INTERACTIONS_TO_KEEP=10
|
| 137 |
+
|
| 138 |
+
# Enable metrics collection (default: True)
|
| 139 |
+
ENABLE_METRICS=True
|
| 140 |
+
|
| 141 |
+
# Enable context compression (default: True)
|
| 142 |
+
COMPRESSION_ENABLED=True
|
| 143 |
+
|
| 144 |
+
# Summarization threshold in tokens (default: 2000)
|
| 145 |
+
SUMMARIZATION_THRESHOLD=2000
|
| 146 |
+
|
| 147 |
+
# =============================================================================
|
| 148 |
+
# Model Selection (for context operations - if still using local models)
|
| 149 |
+
# =============================================================================
|
| 150 |
+
# These are optional and only used if local models are still needed
|
| 151 |
+
# for context summarization or other operations
|
| 152 |
+
CONTEXT_SUMMARIZATION_MODEL=Qwen/Qwen2.5-7B-Instruct
|
| 153 |
+
CONTEXT_INTENT_MODEL=Qwen/Qwen2.5-7B-Instruct
|
| 154 |
+
CONTEXT_SYNTHESIS_MODEL=Qwen/Qwen2.5-7B-Instruct
|
| 155 |
+
|
| 156 |
+
# =============================================================================
|
| 157 |
+
# Security Notes
|
| 158 |
+
# =============================================================================
|
| 159 |
+
# - Never commit .env file to version control
|
| 160 |
+
# - Keep API keys secret and rotate them regularly
|
| 161 |
+
# - Use environment variables in production (not .env files)
|
| 162 |
+
# - Set proper file permissions: chmod 600 .env
|
| 163 |
+
|
NOVITA_AI_IMPLEMENTATION_SUMMARY.md
ADDED
|
@@ -0,0 +1,212 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Novita AI Implementation Summary
|
| 2 |
+
|
| 3 |
+
## β
Implementation Complete
|
| 4 |
+
|
| 5 |
+
All changes have been implemented to switch from local models to Novita AI API as the only inference source.
|
| 6 |
+
|
| 7 |
+
## π Files Modified
|
| 8 |
+
|
| 9 |
+
### 1. β
`src/config.py`
|
| 10 |
+
- Added Novita AI configuration section with:
|
| 11 |
+
- `novita_api_key` (required, validated)
|
| 12 |
+
- `novita_base_url` (default: https://api.novita.ai/dedicated/v1/openai)
|
| 13 |
+
- `novita_model` (default: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2)
|
| 14 |
+
- `deepseek_r1_temperature` (default: 0.6, validated 0.5-0.7 range)
|
| 15 |
+
- `deepseek_r1_force_reasoning` (default: True)
|
| 16 |
+
- Token allocation configuration:
|
| 17 |
+
- `user_input_max_tokens` (default: 8000)
|
| 18 |
+
- `context_preparation_budget` (default: 28000)
|
| 19 |
+
- `context_pruning_threshold` (default: 28000)
|
| 20 |
+
- `prioritize_user_input` (default: True)
|
| 21 |
+
|
| 22 |
+
### 2. β
`requirements.txt`
|
| 23 |
+
- Added `openai>=1.0.0` package
|
| 24 |
+
|
| 25 |
+
### 3. β
`src/models_config.py`
|
| 26 |
+
- Changed `primary_provider` from "local" to "novita_api"
|
| 27 |
+
- Updated all model IDs to Novita model ID
|
| 28 |
+
- Added DeepSeek-R1 optimized parameters:
|
| 29 |
+
- Temperature: 0.6 for reasoning, 0.5 for classification/safety
|
| 30 |
+
- Top_p: 0.95 for reasoning, 0.9 for classification
|
| 31 |
+
- `force_reasoning_prefix: True` for reasoning tasks
|
| 32 |
+
- Removed all local model configuration (quantization, fallbacks)
|
| 33 |
+
|
| 34 |
+
### 4. β
`src/llm_router.py` (Complete Rewrite)
|
| 35 |
+
- Removed all local model loading code
|
| 36 |
+
- Removed `LocalModelLoader` dependencies
|
| 37 |
+
- Added OpenAI client initialization
|
| 38 |
+
- Implemented `_call_novita_api()` method
|
| 39 |
+
- Added DeepSeek-R1 optimizations:
|
| 40 |
+
- `_format_deepseek_r1_prompt()` - reasoning trigger and math directives
|
| 41 |
+
- `_is_math_query()` - automatic math detection
|
| 42 |
+
- `_clean_reasoning_tags()` - response cleanup
|
| 43 |
+
- Updated `prepare_context_for_llm()` with:
|
| 44 |
+
- User input priority (never truncated)
|
| 45 |
+
- Dedicated 8K token budget for user input
|
| 46 |
+
- 28K token context preparation budget
|
| 47 |
+
- Dynamic context allocation
|
| 48 |
+
- Updated `health_check()` for Novita API
|
| 49 |
+
- Removed all local model methods
|
| 50 |
+
|
| 51 |
+
### 5. β
`flask_api_standalone.py`
|
| 52 |
+
- Updated `initialize_orchestrator()`:
|
| 53 |
+
- Changed to "Novita AI API Only" mode
|
| 54 |
+
- Removed HF_TOKEN dependency
|
| 55 |
+
- Set `use_local_models=False`
|
| 56 |
+
- Updated error handling for configuration errors
|
| 57 |
+
- Increased `MAX_MESSAGE_LENGTH` from 10KB to 100KB
|
| 58 |
+
- Updated logging messages
|
| 59 |
+
|
| 60 |
+
### 6. β
`src/context_manager.py`
|
| 61 |
+
- Updated `prune_context()` to use config threshold (28000 tokens)
|
| 62 |
+
- Increased user input storage from 500 to 5000 characters
|
| 63 |
+
- Increased system response storage from 1000 to 2000 characters
|
| 64 |
+
- Updated interaction context generation to use more of user input
|
| 65 |
+
|
| 66 |
+
## π Environment Variables Required
|
| 67 |
+
|
| 68 |
+
Create a `.env` file with the following (see `.env.example` for full template):
|
| 69 |
+
|
| 70 |
+
```bash
|
| 71 |
+
# REQUIRED - Novita AI Configuration
|
| 72 |
+
NOVITA_API_KEY=your_api_key_here
|
| 73 |
+
NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai
|
| 74 |
+
NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
|
| 75 |
+
|
| 76 |
+
# DeepSeek-R1 Optimized Settings
|
| 77 |
+
DEEPSEEK_R1_TEMPERATURE=0.6
|
| 78 |
+
DEEPSEEK_R1_FORCE_REASONING=True
|
| 79 |
+
|
| 80 |
+
# Token Allocation (Optional - defaults provided)
|
| 81 |
+
USER_INPUT_MAX_TOKENS=8000
|
| 82 |
+
CONTEXT_PREPARATION_BUDGET=28000
|
| 83 |
+
CONTEXT_PRUNING_THRESHOLD=28000
|
| 84 |
+
PRIORITIZE_USER_INPUT=True
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
## π Installation Steps
|
| 88 |
+
|
| 89 |
+
1. **Install dependencies:**
|
| 90 |
+
```bash
|
| 91 |
+
pip install -r requirements.txt
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
2. **Create `.env` file:**
|
| 95 |
+
```bash
|
| 96 |
+
cp .env.example .env
|
| 97 |
+
# Edit .env and add your NOVITA_API_KEY
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
3. **Set environment variables:**
|
| 101 |
+
```bash
|
| 102 |
+
export NOVITA_API_KEY=your_api_key_here
|
| 103 |
+
export NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai
|
| 104 |
+
export NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
4. **Start the application:**
|
| 108 |
+
```bash
|
| 109 |
+
python flask_api_standalone.py
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
## β¨ Key Features Implemented
|
| 113 |
+
|
| 114 |
+
### DeepSeek-R1 Optimizations
|
| 115 |
+
- β
Temperature set to 0.6 (recommended range 0.5-0.7)
|
| 116 |
+
- β
Reasoning trigger (`<think>` prefix) for reasoning tasks
|
| 117 |
+
- β
Automatic math directive detection
|
| 118 |
+
- β
No system prompts (all instructions in user prompt)
|
| 119 |
+
|
| 120 |
+
### Token Allocation
|
| 121 |
+
- β
User input: 8K tokens dedicated budget (never truncated)
|
| 122 |
+
- β
Context preparation: 28K tokens total budget
|
| 123 |
+
- β
Context pruning: 28K token threshold
|
| 124 |
+
- β
User input always prioritized over historical context
|
| 125 |
+
|
| 126 |
+
### API Improvements
|
| 127 |
+
- β
Message length limit: 100KB (increased from 10KB)
|
| 128 |
+
- β
Better error messages with token estimates
|
| 129 |
+
- β
Configuration validation with helpful error messages
|
| 130 |
+
|
| 131 |
+
### Database Storage
|
| 132 |
+
- β
User input storage: 5000 characters (increased from 500)
|
| 133 |
+
- β
System response storage: 2000 characters (increased from 1000)
|
| 134 |
+
|
| 135 |
+
## π§ͺ Testing Checklist
|
| 136 |
+
|
| 137 |
+
- [ ] Test API health check endpoint
|
| 138 |
+
- [ ] Test simple inference request
|
| 139 |
+
- [ ] Test large user input (5K+ tokens)
|
| 140 |
+
- [ ] Test reasoning tasks (should see reasoning trigger)
|
| 141 |
+
- [ ] Test math queries (should see math directive)
|
| 142 |
+
- [ ] Test context preparation (user input should not be truncated)
|
| 143 |
+
- [ ] Test error handling (missing API key, invalid endpoint)
|
| 144 |
+
|
| 145 |
+
## π Expected Behavior
|
| 146 |
+
|
| 147 |
+
1. **Startup:**
|
| 148 |
+
- System initializes Novita AI client
|
| 149 |
+
- Validates API key is present
|
| 150 |
+
- Logs Novita AI configuration
|
| 151 |
+
|
| 152 |
+
2. **Inference:**
|
| 153 |
+
- All requests routed to Novita AI API
|
| 154 |
+
- DeepSeek-R1 optimizations applied automatically
|
| 155 |
+
- User input prioritized in context preparation
|
| 156 |
+
|
| 157 |
+
3. **Error Handling:**
|
| 158 |
+
- Clear error messages if API key missing
|
| 159 |
+
- Helpful guidance for configuration issues
|
| 160 |
+
- Graceful handling of API failures
|
| 161 |
+
|
| 162 |
+
## π§ Troubleshooting
|
| 163 |
+
|
| 164 |
+
### Issue: "NOVITA_API_KEY is required"
|
| 165 |
+
**Solution:** Set the environment variable:
|
| 166 |
+
```bash
|
| 167 |
+
export NOVITA_API_KEY=your_key_here
|
| 168 |
+
```
|
| 169 |
+
|
| 170 |
+
### Issue: "openai package not available"
|
| 171 |
+
**Solution:** Install dependencies:
|
| 172 |
+
```bash
|
| 173 |
+
pip install -r requirements.txt
|
| 174 |
+
```
|
| 175 |
+
|
| 176 |
+
### Issue: API connection errors
|
| 177 |
+
**Solution:**
|
| 178 |
+
- Verify API key is correct
|
| 179 |
+
- Check base URL matches your endpoint
|
| 180 |
+
- Verify model ID matches your deployment
|
| 181 |
+
|
| 182 |
+
## π Configuration Reference
|
| 183 |
+
|
| 184 |
+
### Model Configuration
|
| 185 |
+
- **Model ID:** `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2`
|
| 186 |
+
- **Context Window:** 131,072 tokens (131K)
|
| 187 |
+
- **Optimized Settings:** Temperature 0.6, Top_p 0.95
|
| 188 |
+
|
| 189 |
+
### Token Allocation
|
| 190 |
+
- **User Input:** 8,000 tokens (dedicated, never truncated)
|
| 191 |
+
- **Context Budget:** 28,000 tokens (includes user input + context)
|
| 192 |
+
- **Output Limits:**
|
| 193 |
+
- Reasoning: 4,096 tokens
|
| 194 |
+
- Synthesis: 2,000 tokens
|
| 195 |
+
- Classification: 512 tokens
|
| 196 |
+
|
| 197 |
+
## π― Next Steps
|
| 198 |
+
|
| 199 |
+
1. Set your `NOVITA_API_KEY` in environment variables
|
| 200 |
+
2. Test the health check endpoint: `GET /api/health`
|
| 201 |
+
3. Send a test request: `POST /api/chat`
|
| 202 |
+
4. Monitor logs for Novita AI API calls
|
| 203 |
+
5. Verify DeepSeek-R1 optimizations are working
|
| 204 |
+
|
| 205 |
+
## π Notes
|
| 206 |
+
|
| 207 |
+
- All local model code has been removed
|
| 208 |
+
- System now depends entirely on Novita AI API
|
| 209 |
+
- No GPU/quantization configuration needed
|
| 210 |
+
- No model downloading required
|
| 211 |
+
- Faster startup (no model loading)
|
| 212 |
+
|
QUICK_TEST_NOVITA.md
ADDED
|
@@ -0,0 +1,88 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Quick Test: Novita AI Connection with Anaconda
|
| 2 |
+
|
| 3 |
+
## Step-by-Step Instructions
|
| 4 |
+
|
| 5 |
+
### 1. Open Anaconda Prompt
|
| 6 |
+
- Search for "Anaconda Prompt" in Windows Start menu
|
| 7 |
+
- This ensures conda commands work properly
|
| 8 |
+
|
| 9 |
+
### 2. Navigate to Project Directory
|
| 10 |
+
```bash
|
| 11 |
+
cd C:\Users\85jat\GenAI_work_V2\Prototyping\Research_AI_Assistant_V2\Research_AI_Assistant_API
|
| 12 |
+
```
|
| 13 |
+
|
| 14 |
+
### 3. Create Conda Environment (First Time Only)
|
| 15 |
+
```bash
|
| 16 |
+
conda create -n research-ai-assistant python=3.10 -y
|
| 17 |
+
```
|
| 18 |
+
|
| 19 |
+
### 4. Activate Environment
|
| 20 |
+
```bash
|
| 21 |
+
conda activate research-ai-assistant
|
| 22 |
+
```
|
| 23 |
+
|
| 24 |
+
### 5. Install Required Packages
|
| 25 |
+
```bash
|
| 26 |
+
pip install openai>=1.0.0
|
| 27 |
+
pip install -r requirements.txt
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
### 6. Set Environment Variables
|
| 31 |
+
```bash
|
| 32 |
+
# Set your Novita API key
|
| 33 |
+
set NOVITA_API_KEY=your_api_key_here
|
| 34 |
+
set NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai
|
| 35 |
+
set NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
### 7. Run Test
|
| 39 |
+
```bash
|
| 40 |
+
python test_novita_connection.py
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
## Alternative: Use Batch Script
|
| 44 |
+
|
| 45 |
+
Simply double-click or run:
|
| 46 |
+
```bash
|
| 47 |
+
test_novita_conda.bat
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
## Expected Output
|
| 51 |
+
|
| 52 |
+
You should see:
|
| 53 |
+
```
|
| 54 |
+
============================================================
|
| 55 |
+
NOVITA AI CONNECTION TEST
|
| 56 |
+
============================================================
|
| 57 |
+
|
| 58 |
+
============================================================
|
| 59 |
+
TEST 1: Configuration Loading
|
| 60 |
+
============================================================
|
| 61 |
+
β Configuration loaded successfully
|
| 62 |
+
Novita API Key: Set
|
| 63 |
+
Base URL: https://api.novita.ai/dedicated/v1/openai
|
| 64 |
+
Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
|
| 65 |
+
...
|
| 66 |
+
|
| 67 |
+
============================================================
|
| 68 |
+
TEST 4: Simple API Call
|
| 69 |
+
============================================================
|
| 70 |
+
β API call successful!
|
| 71 |
+
Response: ...
|
| 72 |
+
|
| 73 |
+
π All tests passed! Novita AI connection is working correctly.
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
## Troubleshooting
|
| 77 |
+
|
| 78 |
+
**If conda command not found:**
|
| 79 |
+
- Use Anaconda Prompt instead of regular PowerShell
|
| 80 |
+
- Or run: `C:\Users\85jat\anaconda3\Scripts\activate.bat` (adjust path as needed)
|
| 81 |
+
|
| 82 |
+
**If environment activation fails:**
|
| 83 |
+
- Create environment first: `conda create -n research-ai-assistant python=3.10`
|
| 84 |
+
|
| 85 |
+
**If import errors:**
|
| 86 |
+
- Ensure environment is activated: `conda activate research-ai-assistant`
|
| 87 |
+
- Install packages: `pip install openai>=1.0.0`
|
| 88 |
+
|
TEST_NOVITA_CONNECTION.md
ADDED
|
@@ -0,0 +1,220 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Testing Novita AI Connection
|
| 2 |
+
|
| 3 |
+
## Quick Test Instructions
|
| 4 |
+
|
| 5 |
+
### Option 1: Run Test Script (Recommended)
|
| 6 |
+
|
| 7 |
+
1. **Ensure Python is available:**
|
| 8 |
+
```bash
|
| 9 |
+
# Check Python version
|
| 10 |
+
python --version
|
| 11 |
+
# OR
|
| 12 |
+
python3 --version
|
| 13 |
+
# OR (Windows)
|
| 14 |
+
py --version
|
| 15 |
+
```
|
| 16 |
+
|
| 17 |
+
2. **Install dependencies if needed:**
|
| 18 |
+
```bash
|
| 19 |
+
pip install openai>=1.0.0
|
| 20 |
+
pip install -r requirements.txt
|
| 21 |
+
```
|
| 22 |
+
|
| 23 |
+
3. **Set environment variables:**
|
| 24 |
+
```bash
|
| 25 |
+
# Windows (PowerShell)
|
| 26 |
+
$env:NOVITA_API_KEY="your_api_key_here"
|
| 27 |
+
$env:NOVITA_BASE_URL="https://api.novita.ai/dedicated/v1/openai"
|
| 28 |
+
$env:NOVITA_MODEL="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2"
|
| 29 |
+
|
| 30 |
+
# Windows (CMD)
|
| 31 |
+
set NOVITA_API_KEY=your_api_key_here
|
| 32 |
+
set NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai
|
| 33 |
+
set NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
|
| 34 |
+
|
| 35 |
+
# Linux/Mac
|
| 36 |
+
export NOVITA_API_KEY=your_api_key_here
|
| 37 |
+
export NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai
|
| 38 |
+
export NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
4. **Run the test script:**
|
| 42 |
+
```bash
|
| 43 |
+
python test_novita_connection.py
|
| 44 |
+
# OR
|
| 45 |
+
python3 test_novita_connection.py
|
| 46 |
+
# OR (Windows)
|
| 47 |
+
py test_novita_connection.py
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
### Option 2: Manual Python Test
|
| 51 |
+
|
| 52 |
+
Create a simple test file `quick_test.py`:
|
| 53 |
+
|
| 54 |
+
```python
|
| 55 |
+
import os
|
| 56 |
+
from openai import OpenAI
|
| 57 |
+
|
| 58 |
+
# Get API key from environment
|
| 59 |
+
api_key = os.getenv("NOVITA_API_KEY")
|
| 60 |
+
base_url = os.getenv("NOVITA_BASE_URL", "https://api.novita.ai/dedicated/v1/openai")
|
| 61 |
+
model = os.getenv("NOVITA_MODEL", "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2")
|
| 62 |
+
|
| 63 |
+
if not api_key:
|
| 64 |
+
print("ERROR: NOVITA_API_KEY not set!")
|
| 65 |
+
exit(1)
|
| 66 |
+
|
| 67 |
+
print(f"Testing Novita AI connection...")
|
| 68 |
+
print(f"Base URL: {base_url}")
|
| 69 |
+
print(f"Model: {model}")
|
| 70 |
+
|
| 71 |
+
client = OpenAI(
|
| 72 |
+
base_url=base_url,
|
| 73 |
+
api_key=api_key,
|
| 74 |
+
)
|
| 75 |
+
|
| 76 |
+
try:
|
| 77 |
+
response = client.chat.completions.create(
|
| 78 |
+
model=model,
|
| 79 |
+
messages=[{"role": "user", "content": "Say 'Hello' if you can hear me."}],
|
| 80 |
+
max_tokens=20,
|
| 81 |
+
temperature=0.6
|
| 82 |
+
)
|
| 83 |
+
|
| 84 |
+
if response.choices:
|
| 85 |
+
print(f"\nβ SUCCESS! Connection working.")
|
| 86 |
+
print(f"Response: {response.choices[0].message.content}")
|
| 87 |
+
else:
|
| 88 |
+
print("\nβ No response received")
|
| 89 |
+
|
| 90 |
+
except Exception as e:
|
| 91 |
+
print(f"\nβ ERROR: {e}")
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
Run it:
|
| 95 |
+
```bash
|
| 96 |
+
python quick_test.py
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
### Option 3: Test via API Endpoint
|
| 100 |
+
|
| 101 |
+
If the Flask server is running:
|
| 102 |
+
|
| 103 |
+
1. **Start the server:**
|
| 104 |
+
```bash
|
| 105 |
+
python flask_api_standalone.py
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
2. **Test health endpoint:**
|
| 109 |
+
```bash
|
| 110 |
+
curl http://localhost:7860/api/health
|
| 111 |
+
# OR
|
| 112 |
+
# Visit http://localhost:7860/api/health in browser
|
| 113 |
+
```
|
| 114 |
+
|
| 115 |
+
3. **Test chat endpoint:**
|
| 116 |
+
```bash
|
| 117 |
+
curl -X POST http://localhost:7860/api/chat \
|
| 118 |
+
-H "Content-Type: application/json" \
|
| 119 |
+
-d '{"message": "Hello", "session_id": "test-123"}'
|
| 120 |
+
```
|
| 121 |
+
|
| 122 |
+
## Expected Test Results
|
| 123 |
+
|
| 124 |
+
### Successful Test Output:
|
| 125 |
+
```
|
| 126 |
+
============================================================
|
| 127 |
+
NOVITA AI CONNECTION TEST
|
| 128 |
+
============================================================
|
| 129 |
+
|
| 130 |
+
============================================================
|
| 131 |
+
TEST 1: Configuration Loading
|
| 132 |
+
============================================================
|
| 133 |
+
β Configuration loaded successfully
|
| 134 |
+
Novita API Key: Set
|
| 135 |
+
Base URL: https://api.novita.ai/dedicated/v1/openai
|
| 136 |
+
Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
|
| 137 |
+
Temperature: 0.6
|
| 138 |
+
Force Reasoning: True
|
| 139 |
+
User Input Max Tokens: 8000
|
| 140 |
+
Context Preparation Budget: 28000
|
| 141 |
+
|
| 142 |
+
============================================================
|
| 143 |
+
TEST 2: OpenAI Package Check
|
| 144 |
+
============================================================
|
| 145 |
+
β OpenAI package is available
|
| 146 |
+
|
| 147 |
+
============================================================
|
| 148 |
+
TEST 3: Novita AI Client Initialization
|
| 149 |
+
============================================================
|
| 150 |
+
β Novita AI client initialized successfully
|
| 151 |
+
Base URL: https://api.novita.ai/dedicated/v1/openai
|
| 152 |
+
API Key: nv-****
|
| 153 |
+
|
| 154 |
+
============================================================
|
| 155 |
+
TEST 4: Simple API Call
|
| 156 |
+
============================================================
|
| 157 |
+
Sending test request to: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
|
| 158 |
+
Prompt: 'Hello, this is a test. Please respond briefly.'
|
| 159 |
+
β API call successful!
|
| 160 |
+
Response length: XX characters
|
| 161 |
+
Response preview: ...
|
| 162 |
+
|
| 163 |
+
============================================================
|
| 164 |
+
TEST 5: LLM Router Initialization
|
| 165 |
+
============================================================
|
| 166 |
+
Initializing LLM Router...
|
| 167 |
+
β LLM Router initialized successfully
|
| 168 |
+
|
| 169 |
+
Testing health check...
|
| 170 |
+
β Health check result: {'provider': 'novita_api', 'status': 'healthy', ...}
|
| 171 |
+
|
| 172 |
+
============================================================
|
| 173 |
+
TEST 6: Inference Test
|
| 174 |
+
============================================================
|
| 175 |
+
Test prompt: What is the capital of France? Answer in one sentence.
|
| 176 |
+
β Inference successful!
|
| 177 |
+
Response length: XX characters
|
| 178 |
+
Response: ...
|
| 179 |
+
|
| 180 |
+
============================================================
|
| 181 |
+
TEST SUMMARY
|
| 182 |
+
============================================================
|
| 183 |
+
CONFIG: β PASS
|
| 184 |
+
PACKAGE: β PASS
|
| 185 |
+
CLIENT: β PASS
|
| 186 |
+
API_CALL: β PASS
|
| 187 |
+
ROUTER: β PASS
|
| 188 |
+
INFERENCE: β PASS
|
| 189 |
+
|
| 190 |
+
Total: 6/6 tests passed
|
| 191 |
+
|
| 192 |
+
π All tests passed! Novita AI connection is working correctly.
|
| 193 |
+
```
|
| 194 |
+
|
| 195 |
+
## Troubleshooting
|
| 196 |
+
|
| 197 |
+
### Error: "NOVITA_API_KEY is required"
|
| 198 |
+
**Solution:** Set the environment variable:
|
| 199 |
+
```bash
|
| 200 |
+
export NOVITA_API_KEY=your_key_here
|
| 201 |
+
```
|
| 202 |
+
|
| 203 |
+
### Error: "openai package not available"
|
| 204 |
+
**Solution:** Install the package:
|
| 205 |
+
```bash
|
| 206 |
+
pip install openai>=1.0.0
|
| 207 |
+
```
|
| 208 |
+
|
| 209 |
+
### Error: "Failed to initialize Novita AI client"
|
| 210 |
+
**Solution:**
|
| 211 |
+
- Verify API key is correct
|
| 212 |
+
- Check base URL matches your endpoint
|
| 213 |
+
- Verify network connectivity
|
| 214 |
+
|
| 215 |
+
### Error: "API call failed"
|
| 216 |
+
**Solution:**
|
| 217 |
+
- Check API key has proper permissions
|
| 218 |
+
- Verify model ID matches your deployment
|
| 219 |
+
- Check Novita AI service status
|
| 220 |
+
|
environment.yml
ADDED
|
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: research-ai-assistant
|
| 2 |
+
channels:
|
| 3 |
+
- conda-forge
|
| 4 |
+
- defaults
|
| 5 |
+
dependencies:
|
| 6 |
+
- python>=3.10,<3.12
|
| 7 |
+
- pip
|
| 8 |
+
- pip:
|
| 9 |
+
# LLM API Client (required for Novita AI API)
|
| 10 |
+
- openai>=1.0.0
|
| 11 |
+
# Web Framework & Interface
|
| 12 |
+
- aiohttp>=3.9.0
|
| 13 |
+
- httpx>=0.25.0
|
| 14 |
+
# Flask API for external integrations
|
| 15 |
+
- flask>=3.0.0
|
| 16 |
+
- flask-cors>=4.0.0
|
| 17 |
+
- flask-limiter>=3.5.0
|
| 18 |
+
# Security & Validation
|
| 19 |
+
- pydantic-settings>=2.1.0
|
| 20 |
+
- python-dotenv>=1.0.0
|
| 21 |
+
# Database & Persistence
|
| 22 |
+
- sqlalchemy>=2.0.0
|
| 23 |
+
# Data Processing & Utilities
|
| 24 |
+
- pandas>=2.1.0
|
| 25 |
+
- numpy>=1.24.0,<2.0.0
|
| 26 |
+
# Caching & Performance
|
| 27 |
+
- cachetools>=5.3.0
|
| 28 |
+
# Async & Concurrency
|
| 29 |
+
- aiofiles>=23.2.0
|
| 30 |
+
# Logging & Monitoring
|
| 31 |
+
- structlog>=23.2.0
|
| 32 |
+
- prometheus-client>=0.19.0
|
| 33 |
+
- psutil>=5.9.0
|
| 34 |
+
# Utility Libraries
|
| 35 |
+
- python-dateutil>=2.8.0
|
| 36 |
+
- pytz>=2023.3
|
| 37 |
+
- requests>=2.31.0
|
| 38 |
+
# Production WSGI Server
|
| 39 |
+
- gunicorn>=21.2.0
|
| 40 |
+
# Development & Testing
|
| 41 |
+
- pytest>=7.4.0
|
| 42 |
+
- pytest-asyncio>=0.21.0
|
| 43 |
+
|
flask_api_standalone.py
CHANGED
|
@@ -145,7 +145,7 @@ initialization_attempted = False
|
|
| 145 |
initialization_error = None
|
| 146 |
|
| 147 |
def initialize_orchestrator():
|
| 148 |
-
"""Initialize the AI orchestrator with
|
| 149 |
global orchestrator, orchestrator_available, initialization_attempted, initialization_error
|
| 150 |
|
| 151 |
initialization_attempted = True
|
|
@@ -153,7 +153,7 @@ def initialize_orchestrator():
|
|
| 153 |
|
| 154 |
try:
|
| 155 |
logger.info("=" * 60)
|
| 156 |
-
logger.info("INITIALIZING AI ORCHESTRATOR (
|
| 157 |
logger.info("=" * 60)
|
| 158 |
|
| 159 |
from src.agents.intent_agent import create_intent_agent
|
|
@@ -166,27 +166,16 @@ def initialize_orchestrator():
|
|
| 166 |
|
| 167 |
logger.info("β Imports successful")
|
| 168 |
|
| 169 |
-
# Initialize LLM Router -
|
| 170 |
-
|
| 171 |
-
if not hf_token:
|
| 172 |
-
logger.warning("HF_TOKEN not set - may be needed for gated model access")
|
| 173 |
-
else:
|
| 174 |
-
logger.info(f"HF_TOKEN available (for model download only)")
|
| 175 |
-
|
| 176 |
-
# Import GatedRepoError for better error handling
|
| 177 |
try:
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
logger.info("Initializing LLM Router (local models only, no API fallback)...")
|
| 183 |
-
try:
|
| 184 |
-
# Always use local models - API fallback removed
|
| 185 |
-
llm_router = LLMRouter(hf_token=hf_token, use_local_models=True)
|
| 186 |
-
logger.info("β LLM Router initialized (local models only)")
|
| 187 |
except Exception as e:
|
| 188 |
logger.error(f"β Failed to initialize LLM Router: {e}", exc_info=True)
|
| 189 |
-
logger.error("This is a critical error -
|
|
|
|
| 190 |
raise
|
| 191 |
|
| 192 |
logger.info("Initializing Agents...")
|
|
@@ -221,28 +210,29 @@ def initialize_orchestrator():
|
|
| 221 |
orchestrator_available = True
|
| 222 |
logger.info("=" * 60)
|
| 223 |
logger.info("β AI ORCHESTRATOR READY")
|
| 224 |
-
logger.info(" -
|
| 225 |
logger.info(" - MAX_WORKERS: 4")
|
| 226 |
logger.info("=" * 60)
|
| 227 |
|
| 228 |
return True
|
| 229 |
|
| 230 |
-
except
|
| 231 |
-
|
| 232 |
-
|
| 233 |
-
|
| 234 |
-
|
| 235 |
-
|
| 236 |
-
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
|
|
|
|
| 246 |
return False
|
| 247 |
except Exception as e:
|
| 248 |
logger.error("=" * 60)
|
|
@@ -351,12 +341,12 @@ def chat():
|
|
| 351 |
'error': 'Message cannot be empty'
|
| 352 |
}), 400
|
| 353 |
|
| 354 |
-
# Length limit (
|
| 355 |
-
MAX_MESSAGE_LENGTH =
|
| 356 |
if len(message) > MAX_MESSAGE_LENGTH:
|
| 357 |
return jsonify({
|
| 358 |
'success': False,
|
| 359 |
-
'error': f'Message too long. Maximum length is {MAX_MESSAGE_LENGTH} characters'
|
| 360 |
}), 400
|
| 361 |
|
| 362 |
history = data.get('history', [])
|
|
|
|
| 145 |
initialization_error = None
|
| 146 |
|
| 147 |
def initialize_orchestrator():
|
| 148 |
+
"""Initialize the AI orchestrator with Novita AI API only"""
|
| 149 |
global orchestrator, orchestrator_available, initialization_attempted, initialization_error
|
| 150 |
|
| 151 |
initialization_attempted = True
|
|
|
|
| 153 |
|
| 154 |
try:
|
| 155 |
logger.info("=" * 60)
|
| 156 |
+
logger.info("INITIALIZING AI ORCHESTRATOR (Novita AI API Only)")
|
| 157 |
logger.info("=" * 60)
|
| 158 |
|
| 159 |
from src.agents.intent_agent import create_intent_agent
|
|
|
|
| 166 |
|
| 167 |
logger.info("β Imports successful")
|
| 168 |
|
| 169 |
+
# Initialize LLM Router - Novita AI API only
|
| 170 |
+
logger.info("Initializing LLM Router (Novita AI API only)...")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 171 |
try:
|
| 172 |
+
# Always use Novita AI API (local models disabled)
|
| 173 |
+
llm_router = LLMRouter(hf_token=None, use_local_models=False)
|
| 174 |
+
logger.info("β LLM Router initialized (Novita AI API)")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 175 |
except Exception as e:
|
| 176 |
logger.error(f"β Failed to initialize LLM Router: {e}", exc_info=True)
|
| 177 |
+
logger.error("This is a critical error - Novita AI API is required")
|
| 178 |
+
logger.error("Please ensure NOVITA_API_KEY is set in environment variables")
|
| 179 |
raise
|
| 180 |
|
| 181 |
logger.info("Initializing Agents...")
|
|
|
|
| 210 |
orchestrator_available = True
|
| 211 |
logger.info("=" * 60)
|
| 212 |
logger.info("β AI ORCHESTRATOR READY")
|
| 213 |
+
logger.info(" - Novita AI API enabled")
|
| 214 |
logger.info(" - MAX_WORKERS: 4")
|
| 215 |
logger.info("=" * 60)
|
| 216 |
|
| 217 |
return True
|
| 218 |
|
| 219 |
+
except ValueError as e:
|
| 220 |
+
# Handle configuration errors (e.g., missing NOVITA_API_KEY)
|
| 221 |
+
if "NOVITA_API_KEY" in str(e) or "required" in str(e).lower():
|
| 222 |
+
logger.error("=" * 60)
|
| 223 |
+
logger.error("β CONFIGURATION ERROR")
|
| 224 |
+
logger.error("=" * 60)
|
| 225 |
+
logger.error(f"Error: {e}")
|
| 226 |
+
logger.error("")
|
| 227 |
+
logger.error("SOLUTION:")
|
| 228 |
+
logger.error("1. Set NOVITA_API_KEY in environment variables")
|
| 229 |
+
logger.error("2. Ensure NOVITA_BASE_URL is correct")
|
| 230 |
+
logger.error("3. Verify NOVITA_MODEL matches your endpoint")
|
| 231 |
+
logger.error("=" * 60)
|
| 232 |
+
orchestrator_available = False
|
| 233 |
+
initialization_error = f"Configuration Error: {str(e)}"
|
| 234 |
+
else:
|
| 235 |
+
raise
|
| 236 |
return False
|
| 237 |
except Exception as e:
|
| 238 |
logger.error("=" * 60)
|
|
|
|
| 341 |
'error': 'Message cannot be empty'
|
| 342 |
}), 400
|
| 343 |
|
| 344 |
+
# Length limit (allow larger inputs for complex queries)
|
| 345 |
+
MAX_MESSAGE_LENGTH = 100000 # 100KB limit (increased from 10KB)
|
| 346 |
if len(message) > MAX_MESSAGE_LENGTH:
|
| 347 |
return jsonify({
|
| 348 |
'success': False,
|
| 349 |
+
'error': f'Message too long. Maximum length is {MAX_MESSAGE_LENGTH} characters (approximately {MAX_MESSAGE_LENGTH // 4} tokens)'
|
| 350 |
}), 400
|
| 351 |
|
| 352 |
history = data.get('history', [])
|
requirements.txt
CHANGED
|
@@ -107,3 +107,6 @@ debugpy>=1.7.0
|
|
| 107 |
bandit>=1.7.5 # Security linter for Python code
|
| 108 |
safety>=2.3.5 # Dependency vulnerability scanner
|
| 109 |
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
bandit>=1.7.5 # Security linter for Python code
|
| 108 |
safety>=2.3.5 # Dependency vulnerability scanner
|
| 109 |
|
| 110 |
+
# LLM API Client (required for Novita AI API)
|
| 111 |
+
openai>=1.0.0
|
| 112 |
+
|
setup_conda_env.bat
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
@echo off
|
| 2 |
+
REM Setup script for Anaconda environment (Windows)
|
| 3 |
+
REM This script creates and activates a conda environment for the Research AI Assistant
|
| 4 |
+
|
| 5 |
+
echo ============================================================
|
| 6 |
+
echo Setting up Anaconda environment for Research AI Assistant
|
| 7 |
+
echo ============================================================
|
| 8 |
+
|
| 9 |
+
REM Check if conda is available
|
| 10 |
+
where conda >nul 2>&1
|
| 11 |
+
if %ERRORLEVEL% NEQ 0 (
|
| 12 |
+
echo ERROR: conda command not found
|
| 13 |
+
echo Please install Anaconda or Miniconda first
|
| 14 |
+
echo Download from: https://www.anaconda.com/products/distribution
|
| 15 |
+
exit /b 1
|
| 16 |
+
)
|
| 17 |
+
|
| 18 |
+
echo Conda found
|
| 19 |
+
|
| 20 |
+
REM Create environment from environment.yml
|
| 21 |
+
echo.
|
| 22 |
+
echo Creating conda environment from environment.yml...
|
| 23 |
+
conda env create -f environment.yml
|
| 24 |
+
|
| 25 |
+
if %ERRORLEVEL% EQU 0 (
|
| 26 |
+
echo Environment created successfully
|
| 27 |
+
echo.
|
| 28 |
+
echo To activate the environment, run:
|
| 29 |
+
echo conda activate research-ai-assistant
|
| 30 |
+
echo.
|
| 31 |
+
echo Then install remaining dependencies:
|
| 32 |
+
echo pip install -r requirements.txt
|
| 33 |
+
) else (
|
| 34 |
+
echo Environment creation failed
|
| 35 |
+
exit /b 1
|
| 36 |
+
)
|
| 37 |
+
|
setup_conda_env.sh
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
# Setup script for Anaconda environment
|
| 3 |
+
# This script creates and activates a conda environment for the Research AI Assistant
|
| 4 |
+
|
| 5 |
+
echo "============================================================"
|
| 6 |
+
echo "Setting up Anaconda environment for Research AI Assistant"
|
| 7 |
+
echo "============================================================"
|
| 8 |
+
|
| 9 |
+
# Check if conda is available
|
| 10 |
+
if ! command -v conda &> /dev/null; then
|
| 11 |
+
echo "β Error: conda command not found"
|
| 12 |
+
echo " Please install Anaconda or Miniconda first"
|
| 13 |
+
echo " Download from: https://www.anaconda.com/products/distribution"
|
| 14 |
+
exit 1
|
| 15 |
+
fi
|
| 16 |
+
|
| 17 |
+
echo "β Conda found"
|
| 18 |
+
|
| 19 |
+
# Create environment from environment.yml
|
| 20 |
+
echo ""
|
| 21 |
+
echo "Creating conda environment from environment.yml..."
|
| 22 |
+
conda env create -f environment.yml
|
| 23 |
+
|
| 24 |
+
if [ $? -eq 0 ]; then
|
| 25 |
+
echo "β Environment created successfully"
|
| 26 |
+
else
|
| 27 |
+
echo "β Environment creation failed"
|
| 28 |
+
exit 1
|
| 29 |
+
fi
|
| 30 |
+
|
| 31 |
+
# Activate environment
|
| 32 |
+
echo ""
|
| 33 |
+
echo "To activate the environment, run:"
|
| 34 |
+
echo " conda activate research-ai-assistant"
|
| 35 |
+
echo ""
|
| 36 |
+
echo "Or on Windows:"
|
| 37 |
+
echo " conda activate research-ai-assistant"
|
| 38 |
+
echo ""
|
| 39 |
+
echo "Then install remaining dependencies:"
|
| 40 |
+
echo " pip install -r requirements.txt"
|
| 41 |
+
|
src/config.py
CHANGED
|
@@ -174,6 +174,98 @@ class Settings(BaseSettings):
|
|
| 174 |
|
| 175 |
return self._cached_cache_dir
|
| 176 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 177 |
# ==================== Model Configuration ====================
|
| 178 |
|
| 179 |
default_model: str = Field(
|
|
|
|
| 174 |
|
| 175 |
return self._cached_cache_dir
|
| 176 |
|
| 177 |
+
# ==================== Novita AI Configuration ====================
|
| 178 |
+
|
| 179 |
+
novita_api_key: str = Field(
|
| 180 |
+
default="",
|
| 181 |
+
description="Novita AI API key (required)",
|
| 182 |
+
env="NOVITA_API_KEY"
|
| 183 |
+
)
|
| 184 |
+
|
| 185 |
+
novita_base_url: str = Field(
|
| 186 |
+
default="https://api.novita.ai/dedicated/v1/openai",
|
| 187 |
+
description="Novita AI dedicated endpoint base URL",
|
| 188 |
+
env="NOVITA_BASE_URL"
|
| 189 |
+
)
|
| 190 |
+
|
| 191 |
+
novita_model: str = Field(
|
| 192 |
+
default="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2",
|
| 193 |
+
description="Novita AI dedicated endpoint model ID",
|
| 194 |
+
env="NOVITA_MODEL"
|
| 195 |
+
)
|
| 196 |
+
|
| 197 |
+
# DeepSeek-R1 optimized settings
|
| 198 |
+
deepseek_r1_temperature: float = Field(
|
| 199 |
+
default=0.6,
|
| 200 |
+
description="Temperature for DeepSeek-R1 models (0.5-0.7 range, 0.6 recommended)",
|
| 201 |
+
env="DEEPSEEK_R1_TEMPERATURE"
|
| 202 |
+
)
|
| 203 |
+
|
| 204 |
+
deepseek_r1_force_reasoning: bool = Field(
|
| 205 |
+
default=True,
|
| 206 |
+
description="Force DeepSeek-R1 to start with reasoning trigger",
|
| 207 |
+
env="DEEPSEEK_R1_FORCE_REASONING"
|
| 208 |
+
)
|
| 209 |
+
|
| 210 |
+
# Token Allocation Configuration
|
| 211 |
+
user_input_max_tokens: int = Field(
|
| 212 |
+
default=8000,
|
| 213 |
+
description="Maximum tokens dedicated for user input (prioritized over context)",
|
| 214 |
+
env="USER_INPUT_MAX_TOKENS"
|
| 215 |
+
)
|
| 216 |
+
|
| 217 |
+
context_preparation_budget: int = Field(
|
| 218 |
+
default=28000,
|
| 219 |
+
description="Maximum tokens for context preparation (includes user input + context)",
|
| 220 |
+
env="CONTEXT_PREPARATION_BUDGET"
|
| 221 |
+
)
|
| 222 |
+
|
| 223 |
+
context_pruning_threshold: int = Field(
|
| 224 |
+
default=28000,
|
| 225 |
+
description="Context pruning threshold (should match context_preparation_budget)",
|
| 226 |
+
env="CONTEXT_PRUNING_THRESHOLD"
|
| 227 |
+
)
|
| 228 |
+
|
| 229 |
+
prioritize_user_input: bool = Field(
|
| 230 |
+
default=True,
|
| 231 |
+
description="Always prioritize user input over historical context",
|
| 232 |
+
env="PRIORITIZE_USER_INPUT"
|
| 233 |
+
)
|
| 234 |
+
|
| 235 |
+
@validator("novita_api_key", pre=True)
|
| 236 |
+
def validate_novita_api_key(cls, v):
|
| 237 |
+
"""Validate and clean Novita API key"""
|
| 238 |
+
if v is None:
|
| 239 |
+
return ""
|
| 240 |
+
return str(v).strip()
|
| 241 |
+
|
| 242 |
+
@validator("deepseek_r1_temperature", pre=True)
|
| 243 |
+
def validate_deepseek_temperature(cls, v):
|
| 244 |
+
"""Validate DeepSeek-R1 temperature is in recommended range"""
|
| 245 |
+
if isinstance(v, str):
|
| 246 |
+
v = float(v)
|
| 247 |
+
temp = float(v) if v else 0.6
|
| 248 |
+
return max(0.5, min(0.7, temp))
|
| 249 |
+
|
| 250 |
+
@validator("deepseek_r1_force_reasoning", pre=True)
|
| 251 |
+
def validate_force_reasoning(cls, v):
|
| 252 |
+
"""Convert string to boolean for force_reasoning"""
|
| 253 |
+
if isinstance(v, str):
|
| 254 |
+
return v.lower() in ("true", "1", "yes", "on")
|
| 255 |
+
return bool(v)
|
| 256 |
+
|
| 257 |
+
@validator("user_input_max_tokens", pre=True)
|
| 258 |
+
def validate_user_input_tokens(cls, v):
|
| 259 |
+
"""Validate user input token limit"""
|
| 260 |
+
val = int(v) if v else 8000
|
| 261 |
+
return max(1000, min(20000, val))
|
| 262 |
+
|
| 263 |
+
@validator("context_preparation_budget", pre=True)
|
| 264 |
+
def validate_context_budget(cls, v):
|
| 265 |
+
"""Validate context preparation budget"""
|
| 266 |
+
val = int(v) if v else 28000
|
| 267 |
+
return max(4000, min(120000, val))
|
| 268 |
+
|
| 269 |
# ==================== Model Configuration ====================
|
| 270 |
|
| 271 |
default_model: str = Field(
|
src/context_manager.py
CHANGED
|
@@ -439,10 +439,13 @@ Keep the summary concise and focused (approximately 500 tokens)."""
|
|
| 439 |
if not self.llm_router:
|
| 440 |
return ""
|
| 441 |
|
|
|
|
|
|
|
|
|
|
| 442 |
prompt = f"""Summarize this interaction in approximately 50 tokens:
|
| 443 |
|
| 444 |
-
User Input: {
|
| 445 |
-
System Response: {system_response[:
|
| 446 |
|
| 447 |
Provide a brief summary capturing the key exchange."""
|
| 448 |
|
|
@@ -466,8 +469,8 @@ Provide a brief summary capturing the key exchange."""
|
|
| 466 |
""", (
|
| 467 |
interaction_id,
|
| 468 |
session_id,
|
| 469 |
-
user_input[:
|
| 470 |
-
system_response[:
|
| 471 |
summary.strip(),
|
| 472 |
created_at
|
| 473 |
))
|
|
@@ -607,8 +610,8 @@ Keep the summary concise (approximately 100 tokens)."""
|
|
| 607 |
|
| 608 |
Applies smart pruning before formatting.
|
| 609 |
"""
|
| 610 |
-
# Step 4: Prune context if it exceeds token limits
|
| 611 |
-
pruned_context = self.prune_context(context
|
| 612 |
|
| 613 |
# Get context mode (fresh or relevant)
|
| 614 |
session_id = pruned_context.get("session_id")
|
|
@@ -735,19 +738,30 @@ Keep the summary concise (approximately 100 tokens)."""
|
|
| 735 |
# Simple approximation: 4 characters per token
|
| 736 |
return len(text) // 4
|
| 737 |
|
| 738 |
-
def prune_context(self, context: dict, max_tokens: int =
|
| 739 |
"""
|
| 740 |
-
Step 4: Implement Smart Context Pruning
|
| 741 |
|
| 742 |
Prune context to stay within token limit while keeping most recent and relevant content.
|
| 743 |
|
| 744 |
Args:
|
| 745 |
context: Context dictionary to prune
|
| 746 |
-
max_tokens: Maximum token count (default
|
| 747 |
|
| 748 |
Returns:
|
| 749 |
Pruned context dictionary
|
| 750 |
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 751 |
try:
|
| 752 |
# Calculate current token count
|
| 753 |
current_tokens = self._calculate_context_tokens(context)
|
|
|
|
| 439 |
if not self.llm_router:
|
| 440 |
return ""
|
| 441 |
|
| 442 |
+
# Use full user input for context generation (not truncated in prompt)
|
| 443 |
+
# Only truncate for display in prompt if extremely long
|
| 444 |
+
user_input_preview = user_input[:500] if len(user_input) > 500 else user_input
|
| 445 |
prompt = f"""Summarize this interaction in approximately 50 tokens:
|
| 446 |
|
| 447 |
+
User Input: {user_input_preview}
|
| 448 |
+
System Response: {system_response[:500]}
|
| 449 |
|
| 450 |
Provide a brief summary capturing the key exchange."""
|
| 451 |
|
|
|
|
| 469 |
""", (
|
| 470 |
interaction_id,
|
| 471 |
session_id,
|
| 472 |
+
user_input[:5000], # Increased from 500 to 5000 characters
|
| 473 |
+
system_response[:2000], # Increased from 1000 to 2000
|
| 474 |
summary.strip(),
|
| 475 |
created_at
|
| 476 |
))
|
|
|
|
| 610 |
|
| 611 |
Applies smart pruning before formatting.
|
| 612 |
"""
|
| 613 |
+
# Step 4: Prune context if it exceeds token limits (uses config threshold)
|
| 614 |
+
pruned_context = self.prune_context(context)
|
| 615 |
|
| 616 |
# Get context mode (fresh or relevant)
|
| 617 |
session_id = pruned_context.get("session_id")
|
|
|
|
| 738 |
# Simple approximation: 4 characters per token
|
| 739 |
return len(text) // 4
|
| 740 |
|
| 741 |
+
def prune_context(self, context: dict, max_tokens: Optional[int] = None) -> dict:
|
| 742 |
"""
|
| 743 |
+
Step 4: Implement Smart Context Pruning with configurable threshold
|
| 744 |
|
| 745 |
Prune context to stay within token limit while keeping most recent and relevant content.
|
| 746 |
|
| 747 |
Args:
|
| 748 |
context: Context dictionary to prune
|
| 749 |
+
max_tokens: Maximum token count (uses config default if None)
|
| 750 |
|
| 751 |
Returns:
|
| 752 |
Pruned context dictionary
|
| 753 |
"""
|
| 754 |
+
# Use config threshold if not provided
|
| 755 |
+
if max_tokens is None:
|
| 756 |
+
try:
|
| 757 |
+
from .config import get_settings
|
| 758 |
+
settings = get_settings()
|
| 759 |
+
max_tokens = settings.context_pruning_threshold
|
| 760 |
+
logger.debug(f"Using config pruning threshold: {max_tokens} tokens")
|
| 761 |
+
except Exception:
|
| 762 |
+
max_tokens = 2000 # Fallback to default
|
| 763 |
+
logger.warning("Could not load config, using default pruning threshold: 2000")
|
| 764 |
+
|
| 765 |
try:
|
| 766 |
# Calculate current token count
|
| 767 |
current_tokens = self._calculate_context_tokens(context)
|
src/llm_router.py
CHANGED
|
@@ -1,290 +1,213 @@
|
|
| 1 |
-
# llm_router.py -
|
| 2 |
import logging
|
| 3 |
import asyncio
|
| 4 |
from typing import Dict, Optional
|
| 5 |
from .models_config import LLM_CONFIG
|
|
|
|
| 6 |
|
| 7 |
-
# Import
|
| 8 |
try:
|
| 9 |
-
from
|
|
|
|
| 10 |
except ImportError:
|
| 11 |
-
|
| 12 |
-
|
|
|
|
| 13 |
|
| 14 |
logger = logging.getLogger(__name__)
|
| 15 |
|
| 16 |
class LLMRouter:
|
| 17 |
-
def __init__(self, hf_token=None, use_local_models: bool =
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
self.hf_token = hf_token
|
| 21 |
-
self.health_status = {}
|
| 22 |
-
self.use_local_models = use_local_models
|
| 23 |
-
self.local_loader = None
|
| 24 |
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
|
|
|
| 30 |
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
async def route_inference(self, task_type: str, prompt: str, **kwargs):
|
| 53 |
"""
|
| 54 |
-
|
| 55 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
"""
|
| 57 |
-
logger.info(f"Routing inference for task: {task_type}")
|
| 58 |
-
model_config = self._select_model(task_type)
|
| 59 |
-
logger.info(f"Selected model: {model_config['model_id']}")
|
| 60 |
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
raise RuntimeError("Local model loader not available - cannot perform inference")
|
| 64 |
|
| 65 |
try:
|
| 66 |
-
# Handle embedding generation
|
| 67 |
if task_type == "embedding_generation":
|
| 68 |
-
|
|
|
|
|
|
|
| 69 |
else:
|
| 70 |
-
result = await self.
|
| 71 |
|
| 72 |
if result is None:
|
| 73 |
-
logger.error(f"
|
| 74 |
raise RuntimeError(f"Inference failed for task: {task_type}")
|
| 75 |
|
| 76 |
-
logger.info(f"Inference complete for {task_type} (
|
| 77 |
return result
|
| 78 |
|
| 79 |
except Exception as e:
|
| 80 |
-
logger.error(f"
|
| 81 |
-
# Try fallback model if configured
|
| 82 |
-
fallback_model_id = model_config.get("fallback")
|
| 83 |
-
if fallback_model_id and fallback_model_id != model_config["model_id"]:
|
| 84 |
-
logger.warning(f"Attempting fallback model: {fallback_model_id}")
|
| 85 |
-
try:
|
| 86 |
-
fallback_config = model_config.copy()
|
| 87 |
-
fallback_config["model_id"] = fallback_model_id
|
| 88 |
-
fallback_config.pop("fallback", None) # Prevent infinite recursion
|
| 89 |
-
|
| 90 |
-
if task_type == "embedding_generation":
|
| 91 |
-
result = await self._call_local_embedding(fallback_config, prompt, **kwargs)
|
| 92 |
-
else:
|
| 93 |
-
result = await self._call_local_model(fallback_config, prompt, task_type, **{**kwargs, '_is_fallback': True})
|
| 94 |
-
|
| 95 |
-
if result is not None:
|
| 96 |
-
logger.info(f"Inference complete using fallback model: {fallback_model_id}")
|
| 97 |
-
return result
|
| 98 |
-
except Exception as fallback_error:
|
| 99 |
-
logger.error(f"Fallback model also failed: {fallback_error}")
|
| 100 |
-
|
| 101 |
-
# No API fallback - raise error
|
| 102 |
raise RuntimeError(
|
| 103 |
f"Inference failed for task: {task_type}. "
|
| 104 |
-
f"
|
| 105 |
) from e
|
| 106 |
|
| 107 |
-
async def
|
| 108 |
-
"""Call
|
| 109 |
-
if not self.
|
| 110 |
return None
|
| 111 |
|
| 112 |
-
#
|
| 113 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 114 |
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 118 |
|
| 119 |
try:
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 131 |
|
| 132 |
-
try:
|
| 133 |
-
self.local_loader.load_chat_model(
|
| 134 |
-
model_id,
|
| 135 |
-
load_in_8bit=use_8bit,
|
| 136 |
-
load_in_4bit=use_4bit
|
| 137 |
-
)
|
| 138 |
-
except GatedRepoError as e:
|
| 139 |
-
logger.error(f"β Cannot access gated repository {model_id}")
|
| 140 |
-
logger.error(f" Visit https://huggingface.co/{model_id.split(':')[0] if ':' in model_id else model_id} to request access.")
|
| 141 |
-
|
| 142 |
-
# Prevent infinite loops: if this is already a fallback attempt, don't try another fallback
|
| 143 |
-
if is_fallback_attempt:
|
| 144 |
-
logger.error("β Fallback model also failed with gated repository error")
|
| 145 |
-
raise RuntimeError("Both primary and fallback models are gated repositories") from e
|
| 146 |
-
|
| 147 |
-
# Try fallback models in order (fallback, then fallback2)
|
| 148 |
-
fallback_chain = []
|
| 149 |
-
if model_config.get("fallback") and model_config.get("fallback") != model_id:
|
| 150 |
-
fallback_chain.append(model_config.get("fallback"))
|
| 151 |
-
if model_config.get("fallback2") and model_config.get("fallback2") != model_id:
|
| 152 |
-
fallback_chain.append(model_config.get("fallback2"))
|
| 153 |
-
|
| 154 |
-
if fallback_chain:
|
| 155 |
-
last_error = e
|
| 156 |
-
for fallback_idx, fallback_model_id in enumerate(fallback_chain):
|
| 157 |
-
logger.warning(f"Attempting fallback model {fallback_idx + 1}/{len(fallback_chain)}: {fallback_model_id}")
|
| 158 |
-
try:
|
| 159 |
-
# Create fallback config
|
| 160 |
-
fallback_config = model_config.copy()
|
| 161 |
-
fallback_config["model_id"] = fallback_model_id
|
| 162 |
-
# Remove this fallback and subsequent ones to prevent infinite recursion
|
| 163 |
-
fallback_config.pop("fallback", None)
|
| 164 |
-
fallback_config.pop("fallback2", None)
|
| 165 |
-
|
| 166 |
-
# Retry with fallback model (mark as fallback attempt if this is the last fallback)
|
| 167 |
-
is_last_fallback = (fallback_idx == len(fallback_chain) - 1)
|
| 168 |
-
return await self._call_local_model(
|
| 169 |
-
fallback_config,
|
| 170 |
-
prompt,
|
| 171 |
-
task_type,
|
| 172 |
-
**{**kwargs, '_is_fallback': is_last_fallback}
|
| 173 |
-
)
|
| 174 |
-
except GatedRepoError as fallback_gated_error:
|
| 175 |
-
logger.error(f"β Fallback model {fallback_model_id} is also gated")
|
| 176 |
-
last_error = fallback_gated_error
|
| 177 |
-
if fallback_idx == len(fallback_chain) - 1:
|
| 178 |
-
# Last fallback failed
|
| 179 |
-
raise RuntimeError("All models (primary and fallbacks) are gated repositories") from fallback_gated_error
|
| 180 |
-
# Continue to next fallback
|
| 181 |
-
continue
|
| 182 |
-
except Exception as fallback_error:
|
| 183 |
-
logger.error(f"Fallback model {fallback_model_id} failed: {fallback_error}")
|
| 184 |
-
last_error = fallback_error
|
| 185 |
-
if fallback_idx == len(fallback_chain) - 1:
|
| 186 |
-
# Last fallback failed
|
| 187 |
-
raise
|
| 188 |
-
# Continue to next fallback
|
| 189 |
-
continue
|
| 190 |
-
# All fallbacks exhausted
|
| 191 |
-
raise RuntimeError(f"All models failed. Last error: {last_error}") from last_error
|
| 192 |
-
else:
|
| 193 |
-
raise RuntimeError(f"Model {model_id} is a gated repository and no fallback available") from e
|
| 194 |
-
except (RuntimeError, ModuleNotFoundError, ImportError) as e:
|
| 195 |
-
# Check if this is a bitsandbytes error (not a gated repo error)
|
| 196 |
-
error_str = str(e).lower()
|
| 197 |
-
if "bitsandbytes" in error_str or "int8_mm_dequant" in error_str or "validate_bnb_backend" in error_str:
|
| 198 |
-
logger.warning(f"β BitsAndBytes compatibility issue detected: {e}")
|
| 199 |
-
logger.warning(f"β Model {model_id} will be loaded without quantization")
|
| 200 |
-
# Retry without quantization
|
| 201 |
-
try:
|
| 202 |
-
# Disable quantization for this attempt
|
| 203 |
-
fallback_config = model_config.copy()
|
| 204 |
-
fallback_config["use_4bit_quantization"] = False
|
| 205 |
-
fallback_config["use_8bit_quantization"] = False
|
| 206 |
-
return await self._call_local_model(
|
| 207 |
-
fallback_config,
|
| 208 |
-
prompt,
|
| 209 |
-
task_type,
|
| 210 |
-
**kwargs
|
| 211 |
-
)
|
| 212 |
-
except Exception as retry_error:
|
| 213 |
-
logger.error(f"Failed to load model even without quantization: {retry_error}")
|
| 214 |
-
raise RuntimeError(f"Model loading failed: {retry_error}") from retry_error
|
| 215 |
-
else:
|
| 216 |
-
# Not a bitsandbytes error, re-raise
|
| 217 |
-
raise
|
| 218 |
-
|
| 219 |
-
# Format as chat messages if needed
|
| 220 |
-
messages = [{"role": "user", "content": prompt}]
|
| 221 |
-
|
| 222 |
-
# Generate using local model
|
| 223 |
-
result = await asyncio.to_thread(
|
| 224 |
-
self.local_loader.generate_chat_completion,
|
| 225 |
-
model_id=model_id,
|
| 226 |
-
messages=messages,
|
| 227 |
-
max_tokens=max_tokens,
|
| 228 |
-
temperature=temperature
|
| 229 |
-
)
|
| 230 |
-
|
| 231 |
-
logger.info(f"Local model {model_id} generated response (length: {len(result)})")
|
| 232 |
-
logger.info("=" * 80)
|
| 233 |
-
logger.info("LOCAL MODEL RESPONSE:")
|
| 234 |
-
logger.info("=" * 80)
|
| 235 |
-
logger.info(f"Model: {model_id}")
|
| 236 |
-
logger.info(f"Task Type: {task_type}")
|
| 237 |
-
logger.info(f"Response Length: {len(result)} characters")
|
| 238 |
-
logger.info("-" * 40)
|
| 239 |
-
logger.info("FULL RESPONSE CONTENT:")
|
| 240 |
-
logger.info("-" * 40)
|
| 241 |
-
logger.info(result)
|
| 242 |
-
logger.info("-" * 40)
|
| 243 |
-
logger.info("END OF RESPONSE")
|
| 244 |
-
logger.info("=" * 80)
|
| 245 |
-
|
| 246 |
-
return result
|
| 247 |
-
|
| 248 |
-
except GatedRepoError:
|
| 249 |
-
# Re-raise to be handled by caller
|
| 250 |
-
raise
|
| 251 |
except Exception as e:
|
| 252 |
-
logger.error(f"Error calling
|
| 253 |
raise
|
| 254 |
|
| 255 |
-
|
| 256 |
-
"""
|
| 257 |
-
|
| 258 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 259 |
|
| 260 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 261 |
|
| 262 |
-
|
| 263 |
-
#
|
| 264 |
-
|
| 265 |
-
|
| 266 |
-
|
| 267 |
-
|
| 268 |
-
|
| 269 |
-
|
| 270 |
-
|
| 271 |
-
|
| 272 |
-
|
| 273 |
-
|
| 274 |
-
|
| 275 |
-
|
| 276 |
-
|
| 277 |
-
|
| 278 |
-
|
| 279 |
-
|
| 280 |
-
|
| 281 |
-
|
| 282 |
-
|
| 283 |
-
|
| 284 |
-
|
| 285 |
-
|
|
|
|
|
|
|
| 286 |
|
| 287 |
def _select_model(self, task_type: str) -> dict:
|
|
|
|
| 288 |
model_map = {
|
| 289 |
"intent_classification": LLM_CONFIG["models"]["classification_specialist"],
|
| 290 |
"embedding_generation": LLM_CONFIG["models"]["embedding_specialist"],
|
|
@@ -294,64 +217,73 @@ class LLMRouter:
|
|
| 294 |
}
|
| 295 |
return model_map.get(task_type, LLM_CONFIG["models"]["reasoning_primary"])
|
| 296 |
|
| 297 |
-
# REMOVED: _is_model_healthy - no longer needed (local models only)
|
| 298 |
-
# REMOVED: _get_fallback_model - no longer needed (local models only)
|
| 299 |
-
# REMOVED: _call_hf_endpoint - HF API inference removed
|
| 300 |
-
|
| 301 |
async def get_available_models(self):
|
| 302 |
-
"""
|
| 303 |
-
|
| 304 |
-
"""
|
| 305 |
-
return list(LLM_CONFIG["models"].keys())
|
| 306 |
|
| 307 |
async def health_check(self):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 308 |
"""
|
| 309 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 310 |
"""
|
| 311 |
-
|
| 312 |
-
if
|
| 313 |
-
|
| 314 |
|
| 315 |
-
|
| 316 |
-
|
| 317 |
-
# Check if model is loaded (for chat models)
|
| 318 |
-
is_loaded = model_id in self.local_loader.loaded_models or model_id in self.local_loader.loaded_embedding_models
|
| 319 |
-
health_status[model_name] = {
|
| 320 |
-
"model_id": model_id,
|
| 321 |
-
"loaded": is_loaded,
|
| 322 |
-
"healthy": is_loaded # Consider loaded models healthy
|
| 323 |
-
}
|
| 324 |
|
| 325 |
-
|
| 326 |
-
|
| 327 |
-
def prepare_context_for_llm(self, raw_context: Dict, max_tokens: int = 4000) -> str:
|
| 328 |
-
"""Smart context windowing for LLM calls"""
|
| 329 |
|
| 330 |
-
|
| 331 |
-
|
| 332 |
-
|
| 333 |
-
|
| 334 |
-
|
| 335 |
-
|
| 336 |
-
|
| 337 |
-
primary_model_id = LLM_CONFIG["models"]["reasoning_primary"]["model_id"]
|
| 338 |
-
# Strip API suffix if present (though we don't use them anymore)
|
| 339 |
-
base_model_id = primary_model_id.split(':')[0] if ':' in primary_model_id else primary_model_id
|
| 340 |
-
self.tokenizer = AutoTokenizer.from_pretrained(base_model_id)
|
| 341 |
-
except GatedRepoError as e:
|
| 342 |
-
logger.warning(f"Gated repository error loading tokenizer: {e}")
|
| 343 |
-
logger.warning("Using character count estimation instead")
|
| 344 |
-
self.tokenizer = None
|
| 345 |
-
except Exception as e:
|
| 346 |
-
logger.warning(f"Could not load tokenizer: {e}, using character count estimation")
|
| 347 |
-
self.tokenizer = None
|
| 348 |
-
except ImportError:
|
| 349 |
-
logger.warning("transformers library not available, using character count estimation")
|
| 350 |
-
self.tokenizer = None
|
| 351 |
|
| 352 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 353 |
priority_elements = [
|
| 354 |
-
('current_query', 1.0),
|
| 355 |
('recent_interactions', 0.8),
|
| 356 |
('user_preferences', 0.6),
|
| 357 |
('session_summary', 0.4),
|
|
@@ -359,12 +291,15 @@ class LLMRouter:
|
|
| 359 |
]
|
| 360 |
|
| 361 |
formatted_context = []
|
| 362 |
-
total_tokens =
|
| 363 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 364 |
for element, priority in priority_elements:
|
| 365 |
-
# Map element names to context keys
|
| 366 |
element_key_map = {
|
| 367 |
-
'current_query': raw_context.get('user_input', ''),
|
| 368 |
'recent_interactions': raw_context.get('interaction_contexts', []),
|
| 369 |
'user_preferences': raw_context.get('preferences', {}),
|
| 370 |
'session_summary': raw_context.get('session_context', {}),
|
|
@@ -377,55 +312,32 @@ class LLMRouter:
|
|
| 377 |
if isinstance(content, dict):
|
| 378 |
content = str(content)
|
| 379 |
elif isinstance(content, list):
|
| 380 |
-
content = "\n".join([str(item) for item in content[:10]])
|
| 381 |
|
| 382 |
if not content:
|
| 383 |
continue
|
| 384 |
|
| 385 |
-
# Estimate tokens
|
| 386 |
-
|
| 387 |
-
try:
|
| 388 |
-
tokens = len(self.tokenizer.encode(content))
|
| 389 |
-
except:
|
| 390 |
-
# Fallback to character-based estimation (rough: 1 token β 4 chars)
|
| 391 |
-
tokens = len(content) // 4
|
| 392 |
-
else:
|
| 393 |
-
# Character-based estimation (rough: 1 token β 4 chars)
|
| 394 |
-
tokens = len(content) // 4
|
| 395 |
|
| 396 |
if total_tokens + tokens <= max_tokens:
|
| 397 |
formatted_context.append(f"=== {element.upper()} ===\n{content}")
|
| 398 |
total_tokens += tokens
|
| 399 |
-
elif priority > 0.5: # Critical elements - truncate if needed
|
| 400 |
available = max_tokens - total_tokens
|
| 401 |
if available > 100: # Only truncate if we have meaningful space
|
| 402 |
truncated = self._truncate_to_tokens(content, available)
|
| 403 |
formatted_context.append(f"=== {element.upper()} (TRUNCATED) ===\n{truncated}")
|
|
|
|
| 404 |
break
|
| 405 |
|
|
|
|
| 406 |
return "\n\n".join(formatted_context)
|
| 407 |
|
| 408 |
def _truncate_to_tokens(self, content: str, max_tokens: int) -> str:
|
| 409 |
"""Truncate content to fit within token limit"""
|
| 410 |
-
|
| 411 |
-
|
| 412 |
-
|
| 413 |
-
|
| 414 |
-
|
| 415 |
-
return content[:max_chars-3] + "..."
|
| 416 |
-
|
| 417 |
-
try:
|
| 418 |
-
# Tokenize and truncate
|
| 419 |
-
tokens = self.tokenizer.encode(content)
|
| 420 |
-
if len(tokens) <= max_tokens:
|
| 421 |
-
return content
|
| 422 |
-
|
| 423 |
-
truncated_tokens = tokens[:max_tokens-3] # Leave room for "..."
|
| 424 |
-
truncated_text = self.tokenizer.decode(truncated_tokens)
|
| 425 |
-
return truncated_text + "..."
|
| 426 |
-
except Exception as e:
|
| 427 |
-
logger.warning(f"Error truncating with tokenizer: {e}, using character truncation")
|
| 428 |
-
max_chars = max_tokens * 4
|
| 429 |
-
if len(content) <= max_chars:
|
| 430 |
-
return content
|
| 431 |
-
return content[:max_chars-3] + "..."
|
|
|
|
| 1 |
+
# llm_router.py - NOVITA AI API ONLY
|
| 2 |
import logging
|
| 3 |
import asyncio
|
| 4 |
from typing import Dict, Optional
|
| 5 |
from .models_config import LLM_CONFIG
|
| 6 |
+
from .config import get_settings
|
| 7 |
|
| 8 |
+
# Import OpenAI client for Novita AI API
|
| 9 |
try:
|
| 10 |
+
from openai import OpenAI
|
| 11 |
+
OPENAI_AVAILABLE = True
|
| 12 |
except ImportError:
|
| 13 |
+
OPENAI_AVAILABLE = False
|
| 14 |
+
logger = logging.getLogger(__name__)
|
| 15 |
+
logger.error("openai package not available - Novita AI API requires openai package")
|
| 16 |
|
| 17 |
logger = logging.getLogger(__name__)
|
| 18 |
|
| 19 |
class LLMRouter:
|
| 20 |
+
def __init__(self, hf_token=None, use_local_models: bool = False):
|
| 21 |
+
"""
|
| 22 |
+
Initialize LLM Router with Novita AI API only.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
+
Args:
|
| 25 |
+
hf_token: Not used (kept for backward compatibility)
|
| 26 |
+
use_local_models: Must be False (local models disabled)
|
| 27 |
+
"""
|
| 28 |
+
if use_local_models:
|
| 29 |
+
raise ValueError("Local models are disabled. Only Novita AI API is supported.")
|
| 30 |
|
| 31 |
+
self.settings = get_settings()
|
| 32 |
+
self.novita_client = None
|
| 33 |
+
|
| 34 |
+
# Validate OpenAI package
|
| 35 |
+
if not OPENAI_AVAILABLE:
|
| 36 |
+
raise ImportError(
|
| 37 |
+
"openai package is required for Novita AI API. "
|
| 38 |
+
"Install it with: pip install openai>=1.0.0"
|
| 39 |
+
)
|
| 40 |
+
|
| 41 |
+
# Validate API key
|
| 42 |
+
if not self.settings.novita_api_key:
|
| 43 |
+
raise ValueError(
|
| 44 |
+
"NOVITA_API_KEY is required. "
|
| 45 |
+
"Set it in environment variables or .env file"
|
| 46 |
+
)
|
| 47 |
+
|
| 48 |
+
# Initialize Novita AI client
|
| 49 |
+
try:
|
| 50 |
+
self.novita_client = OpenAI(
|
| 51 |
+
base_url=self.settings.novita_base_url,
|
| 52 |
+
api_key=self.settings.novita_api_key,
|
| 53 |
+
)
|
| 54 |
+
logger.info(f"β Novita AI API client initialized")
|
| 55 |
+
logger.info(f" Base URL: {self.settings.novita_base_url}")
|
| 56 |
+
logger.info(f" Model: {self.settings.novita_model}")
|
| 57 |
+
except Exception as e:
|
| 58 |
+
logger.error(f"Failed to initialize Novita AI client: {e}")
|
| 59 |
+
raise RuntimeError(f"Could not initialize Novita AI API client: {e}") from e
|
| 60 |
|
| 61 |
async def route_inference(self, task_type: str, prompt: str, **kwargs):
|
| 62 |
"""
|
| 63 |
+
Route inference to Novita AI API.
|
| 64 |
+
|
| 65 |
+
Args:
|
| 66 |
+
task_type: Type of task (general_reasoning, intent_classification, etc.)
|
| 67 |
+
prompt: Input prompt
|
| 68 |
+
**kwargs: Additional parameters (max_tokens, temperature, etc.)
|
| 69 |
+
|
| 70 |
+
Returns:
|
| 71 |
+
Generated text response
|
| 72 |
"""
|
| 73 |
+
logger.info(f"Routing inference to Novita AI API for task: {task_type}")
|
|
|
|
|
|
|
| 74 |
|
| 75 |
+
if not self.novita_client:
|
| 76 |
+
raise RuntimeError("Novita AI client not initialized")
|
|
|
|
| 77 |
|
| 78 |
try:
|
| 79 |
+
# Handle embedding generation (may need special handling)
|
| 80 |
if task_type == "embedding_generation":
|
| 81 |
+
logger.warning("Embedding generation via Novita API may require special implementation")
|
| 82 |
+
# For now, use chat completion (may need adjustment based on Novita API capabilities)
|
| 83 |
+
result = await self._call_novita_api(task_type, prompt, **kwargs)
|
| 84 |
else:
|
| 85 |
+
result = await self._call_novita_api(task_type, prompt, **kwargs)
|
| 86 |
|
| 87 |
if result is None:
|
| 88 |
+
logger.error(f"Novita AI API returned None for task: {task_type}")
|
| 89 |
raise RuntimeError(f"Inference failed for task: {task_type}")
|
| 90 |
|
| 91 |
+
logger.info(f"Inference complete for {task_type} (Novita AI API)")
|
| 92 |
return result
|
| 93 |
|
| 94 |
except Exception as e:
|
| 95 |
+
logger.error(f"Novita AI API inference failed: {e}", exc_info=True)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
raise RuntimeError(
|
| 97 |
f"Inference failed for task: {task_type}. "
|
| 98 |
+
f"Novita AI API error: {e}"
|
| 99 |
) from e
|
| 100 |
|
| 101 |
+
async def _call_novita_api(self, task_type: str, prompt: str, **kwargs) -> Optional[str]:
|
| 102 |
+
"""Call Novita AI API for inference."""
|
| 103 |
+
if not self.novita_client:
|
| 104 |
return None
|
| 105 |
|
| 106 |
+
# Get model config
|
| 107 |
+
model_config = self._select_model(task_type)
|
| 108 |
+
model_name = kwargs.get('model', self.settings.novita_model)
|
| 109 |
+
|
| 110 |
+
# Get optimized parameters
|
| 111 |
+
max_tokens = kwargs.get('max_tokens', model_config.get('max_tokens', 4096))
|
| 112 |
+
temperature = kwargs.get('temperature',
|
| 113 |
+
model_config.get('temperature', self.settings.deepseek_r1_temperature))
|
| 114 |
+
top_p = kwargs.get('top_p', model_config.get('top_p', 0.95))
|
| 115 |
+
stream = kwargs.get('stream', False)
|
| 116 |
+
|
| 117 |
+
# Format prompt according to DeepSeek-R1 best practices
|
| 118 |
+
formatted_prompt = self._format_deepseek_r1_prompt(prompt, task_type, model_config)
|
| 119 |
|
| 120 |
+
# IMPORTANT: No system prompt - all instructions in user prompt
|
| 121 |
+
messages = [{"role": "user", "content": formatted_prompt}]
|
| 122 |
+
|
| 123 |
+
# Build request parameters
|
| 124 |
+
request_params = {
|
| 125 |
+
"model": model_name,
|
| 126 |
+
"messages": messages,
|
| 127 |
+
"stream": stream,
|
| 128 |
+
"max_tokens": max_tokens,
|
| 129 |
+
"temperature": temperature,
|
| 130 |
+
"top_p": top_p,
|
| 131 |
+
}
|
| 132 |
|
| 133 |
try:
|
| 134 |
+
if stream:
|
| 135 |
+
# Handle streaming response
|
| 136 |
+
response_text = ""
|
| 137 |
+
stream_response = self.novita_client.chat.completions.create(**request_params)
|
| 138 |
+
|
| 139 |
+
for chunk in stream_response:
|
| 140 |
+
if chunk.choices and len(chunk.choices) > 0:
|
| 141 |
+
delta = chunk.choices[0].delta
|
| 142 |
+
if delta and delta.content:
|
| 143 |
+
response_text += delta.content
|
| 144 |
+
|
| 145 |
+
# Clean up reasoning tags if present
|
| 146 |
+
response_text = self._clean_reasoning_tags(response_text)
|
| 147 |
+
logger.info(f"Novita AI API generated response (length: {len(response_text)})")
|
| 148 |
+
return response_text
|
| 149 |
+
else:
|
| 150 |
+
# Handle non-streaming response
|
| 151 |
+
response = self.novita_client.chat.completions.create(**request_params)
|
| 152 |
+
|
| 153 |
+
if response.choices and len(response.choices) > 0:
|
| 154 |
+
result = response.choices[0].message.content
|
| 155 |
+
# Clean up reasoning tags if present
|
| 156 |
+
result = self._clean_reasoning_tags(result)
|
| 157 |
+
logger.info(f"Novita AI API generated response (length: {len(result)})")
|
| 158 |
+
return result
|
| 159 |
+
else:
|
| 160 |
+
logger.error("Novita AI API returned empty response")
|
| 161 |
+
return None
|
| 162 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 163 |
except Exception as e:
|
| 164 |
+
logger.error(f"Error calling Novita AI API: {e}", exc_info=True)
|
| 165 |
raise
|
| 166 |
|
| 167 |
+
def _format_deepseek_r1_prompt(self, prompt: str, task_type: str, model_config: dict) -> str:
|
| 168 |
+
"""
|
| 169 |
+
Format prompt according to DeepSeek-R1 best practices:
|
| 170 |
+
- No system prompt (all instructions in user prompt)
|
| 171 |
+
- Force reasoning trigger for reasoning tasks
|
| 172 |
+
- Add math directive for mathematical problems
|
| 173 |
+
"""
|
| 174 |
+
formatted_prompt = prompt
|
| 175 |
|
| 176 |
+
# Check if we should force reasoning prefix
|
| 177 |
+
force_reasoning = (
|
| 178 |
+
self.settings.deepseek_r1_force_reasoning and
|
| 179 |
+
model_config.get("force_reasoning_prefix", False)
|
| 180 |
+
)
|
| 181 |
|
| 182 |
+
if force_reasoning:
|
| 183 |
+
# Force model to start with reasoning trigger
|
| 184 |
+
formatted_prompt = f"`<think>`\n\n{formatted_prompt}"
|
| 185 |
+
|
| 186 |
+
# Add math directive for mathematical problems
|
| 187 |
+
if self._is_math_query(prompt):
|
| 188 |
+
math_directive = "Please reason step by step, and put your final answer within \\boxed{}."
|
| 189 |
+
formatted_prompt = f"{formatted_prompt}\n\n{math_directive}"
|
| 190 |
+
|
| 191 |
+
return formatted_prompt
|
| 192 |
+
|
| 193 |
+
def _is_math_query(self, prompt: str) -> bool:
|
| 194 |
+
"""Detect if query is mathematical"""
|
| 195 |
+
math_keywords = [
|
| 196 |
+
"solve", "calculate", "compute", "equation", "formula",
|
| 197 |
+
"mathematical", "algebra", "geometry", "calculus", "integral",
|
| 198 |
+
"derivative", "theorem", "proof", "problem"
|
| 199 |
+
]
|
| 200 |
+
prompt_lower = prompt.lower()
|
| 201 |
+
return any(keyword in prompt_lower for keyword in math_keywords)
|
| 202 |
+
|
| 203 |
+
def _clean_reasoning_tags(self, text: str) -> str:
|
| 204 |
+
"""Clean up reasoning tags from response"""
|
| 205 |
+
text = text.replace("`<think>`", "").replace("`</think>`", "")
|
| 206 |
+
text = text.strip()
|
| 207 |
+
return text
|
| 208 |
|
| 209 |
def _select_model(self, task_type: str) -> dict:
|
| 210 |
+
"""Select model configuration based on task type"""
|
| 211 |
model_map = {
|
| 212 |
"intent_classification": LLM_CONFIG["models"]["classification_specialist"],
|
| 213 |
"embedding_generation": LLM_CONFIG["models"]["embedding_specialist"],
|
|
|
|
| 217 |
}
|
| 218 |
return model_map.get(task_type, LLM_CONFIG["models"]["reasoning_primary"])
|
| 219 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 220 |
async def get_available_models(self):
|
| 221 |
+
"""Get list of available models (Novita AI only)"""
|
| 222 |
+
return ["Novita AI API - DeepSeek-R1-Distill-Qwen-7B"]
|
|
|
|
|
|
|
| 223 |
|
| 224 |
async def health_check(self):
|
| 225 |
+
"""Perform health check on Novita AI API"""
|
| 226 |
+
try:
|
| 227 |
+
# Test API with a simple request
|
| 228 |
+
test_response = self.novita_client.chat.completions.create(
|
| 229 |
+
model=self.settings.novita_model,
|
| 230 |
+
messages=[{"role": "user", "content": "test"}],
|
| 231 |
+
max_tokens=5
|
| 232 |
+
)
|
| 233 |
+
|
| 234 |
+
return {
|
| 235 |
+
"provider": "novita_api",
|
| 236 |
+
"status": "healthy",
|
| 237 |
+
"model": self.settings.novita_model,
|
| 238 |
+
"base_url": self.settings.novita_base_url
|
| 239 |
+
}
|
| 240 |
+
except Exception as e:
|
| 241 |
+
logger.error(f"Health check failed: {e}")
|
| 242 |
+
return {
|
| 243 |
+
"provider": "novita_api",
|
| 244 |
+
"status": "unhealthy",
|
| 245 |
+
"error": str(e)
|
| 246 |
+
}
|
| 247 |
+
|
| 248 |
+
def prepare_context_for_llm(self, raw_context: Dict, max_tokens: Optional[int] = None,
|
| 249 |
+
user_input: Optional[str] = None) -> str:
|
| 250 |
"""
|
| 251 |
+
Smart context windowing with user input priority.
|
| 252 |
+
User input is NEVER truncated - context is reduced to fit.
|
| 253 |
+
|
| 254 |
+
Args:
|
| 255 |
+
raw_context: Context dictionary
|
| 256 |
+
max_tokens: Optional override (uses config default if None)
|
| 257 |
+
user_input: Optional explicit user input (takes priority over raw_context['user_input'])
|
| 258 |
"""
|
| 259 |
+
# Use config budget if not provided
|
| 260 |
+
if max_tokens is None:
|
| 261 |
+
max_tokens = self.settings.context_preparation_budget
|
| 262 |
|
| 263 |
+
# Get user input (explicit parameter takes priority)
|
| 264 |
+
actual_user_input = user_input or raw_context.get('user_input', '')
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 265 |
|
| 266 |
+
# Calculate user input tokens (simple estimation: 1 token β 4 chars)
|
| 267 |
+
user_input_tokens = len(actual_user_input) // 4
|
|
|
|
|
|
|
| 268 |
|
| 269 |
+
# Ensure user input fits within dedicated budget
|
| 270 |
+
user_input_max = self.settings.user_input_max_tokens
|
| 271 |
+
if user_input_tokens > user_input_max:
|
| 272 |
+
logger.warning(f"User input ({user_input_tokens} tokens) exceeds max ({user_input_max}), truncating")
|
| 273 |
+
max_chars = user_input_max * 4
|
| 274 |
+
actual_user_input = actual_user_input[:max_chars - 3] + "..."
|
| 275 |
+
user_input_tokens = user_input_max
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 276 |
|
| 277 |
+
# Reserve space for user input (it has highest priority)
|
| 278 |
+
remaining_tokens = max_tokens - user_input_tokens
|
| 279 |
+
if remaining_tokens < 0:
|
| 280 |
+
logger.warning(f"User input ({user_input_tokens} tokens) exceeds total budget ({max_tokens})")
|
| 281 |
+
remaining_tokens = 0
|
| 282 |
+
|
| 283 |
+
logger.info(f"Token allocation: User input={user_input_tokens}, Context budget={remaining_tokens}, Total={max_tokens}")
|
| 284 |
+
|
| 285 |
+
# Priority order for context elements (user input already handled)
|
| 286 |
priority_elements = [
|
|
|
|
| 287 |
('recent_interactions', 0.8),
|
| 288 |
('user_preferences', 0.6),
|
| 289 |
('session_summary', 0.4),
|
|
|
|
| 291 |
]
|
| 292 |
|
| 293 |
formatted_context = []
|
| 294 |
+
total_tokens = user_input_tokens # Start with user input tokens
|
| 295 |
|
| 296 |
+
# Add user input first (unconditionally, never truncated)
|
| 297 |
+
if actual_user_input:
|
| 298 |
+
formatted_context.append(f"=== USER INPUT ===\n{actual_user_input}")
|
| 299 |
+
|
| 300 |
+
# Now add context elements within remaining budget
|
| 301 |
for element, priority in priority_elements:
|
|
|
|
| 302 |
element_key_map = {
|
|
|
|
| 303 |
'recent_interactions': raw_context.get('interaction_contexts', []),
|
| 304 |
'user_preferences': raw_context.get('preferences', {}),
|
| 305 |
'session_summary': raw_context.get('session_context', {}),
|
|
|
|
| 312 |
if isinstance(content, dict):
|
| 313 |
content = str(content)
|
| 314 |
elif isinstance(content, list):
|
| 315 |
+
content = "\n".join([str(item) for item in content[:10]])
|
| 316 |
|
| 317 |
if not content:
|
| 318 |
continue
|
| 319 |
|
| 320 |
+
# Estimate tokens (simple: 1 token β 4 chars)
|
| 321 |
+
tokens = len(content) // 4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 322 |
|
| 323 |
if total_tokens + tokens <= max_tokens:
|
| 324 |
formatted_context.append(f"=== {element.upper()} ===\n{content}")
|
| 325 |
total_tokens += tokens
|
| 326 |
+
elif priority > 0.5 and remaining_tokens > 0: # Critical elements - truncate if needed
|
| 327 |
available = max_tokens - total_tokens
|
| 328 |
if available > 100: # Only truncate if we have meaningful space
|
| 329 |
truncated = self._truncate_to_tokens(content, available)
|
| 330 |
formatted_context.append(f"=== {element.upper()} (TRUNCATED) ===\n{truncated}")
|
| 331 |
+
total_tokens += available
|
| 332 |
break
|
| 333 |
|
| 334 |
+
logger.info(f"Context prepared: {total_tokens}/{max_tokens} tokens (user input: {user_input_tokens}, context: {total_tokens - user_input_tokens})")
|
| 335 |
return "\n\n".join(formatted_context)
|
| 336 |
|
| 337 |
def _truncate_to_tokens(self, content: str, max_tokens: int) -> str:
|
| 338 |
"""Truncate content to fit within token limit"""
|
| 339 |
+
# Simple character-based truncation (1 token β 4 chars)
|
| 340 |
+
max_chars = max_tokens * 4
|
| 341 |
+
if len(content) <= max_chars:
|
| 342 |
+
return content
|
| 343 |
+
return content[:max_chars - 3] + "..."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/models_config.py
CHANGED
|
@@ -1,61 +1,45 @@
|
|
| 1 |
# models_config.py
|
| 2 |
-
#
|
| 3 |
-
# UPDATED: Local models only - no API fallback
|
| 4 |
LLM_CONFIG = {
|
| 5 |
-
"primary_provider": "
|
| 6 |
"models": {
|
| 7 |
"reasoning_primary": {
|
| 8 |
-
|
| 9 |
-
"model_id": "Qwen/Qwen2.5-7B-Instruct", # Single primary model for all text tasks
|
| 10 |
"task": "general_reasoning",
|
| 11 |
-
"max_tokens":
|
| 12 |
-
"temperature": 0.
|
| 13 |
-
|
| 14 |
-
"
|
| 15 |
-
"
|
| 16 |
-
"is_chat_model": True,
|
| 17 |
-
"use_4bit_quantization": True, # Enable 4-bit quantization for 16GB T4
|
| 18 |
-
"use_8bit_quantization": False
|
| 19 |
-
},
|
| 20 |
-
"embedding_specialist": {
|
| 21 |
-
"model_id": "intfloat/e5-large-v2", # 1024-dim embeddings for semantic similarity
|
| 22 |
-
"task": "embeddings",
|
| 23 |
-
"vector_dimensions": 1024,
|
| 24 |
-
"purpose": "semantic_similarity",
|
| 25 |
-
"is_chat_model": False
|
| 26 |
},
|
| 27 |
"classification_specialist": {
|
| 28 |
-
"model_id": "
|
| 29 |
"task": "intent_classification",
|
| 30 |
-
"
|
| 31 |
-
"
|
| 32 |
-
"
|
| 33 |
-
"
|
| 34 |
-
"
|
| 35 |
-
"fallback": "mistralai/Mistral-7B-Instruct-v0.2", # Non-gated, stable
|
| 36 |
-
"fallback2": "microsoft/Phi-3-mini-4k-instruct" # Secondary fallback with DynamicCache workaround
|
| 37 |
},
|
| 38 |
"safety_checker": {
|
| 39 |
-
"model_id": "
|
| 40 |
"task": "content_moderation",
|
| 41 |
-
"
|
| 42 |
-
"
|
| 43 |
-
"
|
| 44 |
-
"
|
| 45 |
-
"
|
| 46 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
}
|
| 48 |
},
|
| 49 |
"routing_logic": {
|
| 50 |
-
"strategy": "
|
| 51 |
-
"fallback_chain": [
|
| 52 |
-
"load_balancing": "
|
| 53 |
-
},
|
| 54 |
-
"quantization_settings": {
|
| 55 |
-
"default_4bit": True, # Enable 4-bit quantization by default for T4 16GB
|
| 56 |
-
"default_8bit": False,
|
| 57 |
-
"bnb_4bit_compute_dtype": "float16",
|
| 58 |
-
"bnb_4bit_use_double_quant": True,
|
| 59 |
-
"bnb_4bit_quant_type": "nf4"
|
| 60 |
}
|
| 61 |
}
|
|
|
|
| 1 |
# models_config.py
|
| 2 |
+
# UPDATED: Novita AI API only - no local models
|
|
|
|
| 3 |
LLM_CONFIG = {
|
| 4 |
+
"primary_provider": "novita_api",
|
| 5 |
"models": {
|
| 6 |
"reasoning_primary": {
|
| 7 |
+
"model_id": "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2",
|
|
|
|
| 8 |
"task": "general_reasoning",
|
| 9 |
+
"max_tokens": 4096,
|
| 10 |
+
"temperature": 0.6, # Recommended for DeepSeek-R1
|
| 11 |
+
"top_p": 0.95,
|
| 12 |
+
"force_reasoning_prefix": True,
|
| 13 |
+
"is_chat_model": True
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
},
|
| 15 |
"classification_specialist": {
|
| 16 |
+
"model_id": "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2",
|
| 17 |
"task": "intent_classification",
|
| 18 |
+
"max_tokens": 512,
|
| 19 |
+
"temperature": 0.5, # Lower for consistency
|
| 20 |
+
"top_p": 0.9,
|
| 21 |
+
"force_reasoning_prefix": False,
|
| 22 |
+
"is_chat_model": True
|
|
|
|
|
|
|
| 23 |
},
|
| 24 |
"safety_checker": {
|
| 25 |
+
"model_id": "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2",
|
| 26 |
"task": "content_moderation",
|
| 27 |
+
"max_tokens": 1024,
|
| 28 |
+
"temperature": 0.5,
|
| 29 |
+
"top_p": 0.9,
|
| 30 |
+
"force_reasoning_prefix": False,
|
| 31 |
+
"is_chat_model": True
|
| 32 |
+
},
|
| 33 |
+
"embedding_specialist": {
|
| 34 |
+
"model_id": "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2",
|
| 35 |
+
"task": "embeddings",
|
| 36 |
+
"note": "Embeddings via Novita API - may require special handling",
|
| 37 |
+
"is_chat_model": True
|
| 38 |
}
|
| 39 |
},
|
| 40 |
"routing_logic": {
|
| 41 |
+
"strategy": "novita_api_only",
|
| 42 |
+
"fallback_chain": [],
|
| 43 |
+
"load_balancing": "single_endpoint"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
}
|
| 45 |
}
|
test_novita_conda.bat
ADDED
|
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
@echo off
|
| 2 |
+
REM Test Novita AI connection using Anaconda environment
|
| 3 |
+
REM This script activates the conda environment and runs the test
|
| 4 |
+
|
| 5 |
+
echo ============================================================
|
| 6 |
+
echo Testing Novita AI Connection with Anaconda
|
| 7 |
+
echo ============================================================
|
| 8 |
+
echo.
|
| 9 |
+
|
| 10 |
+
REM Check if conda is available
|
| 11 |
+
where conda >nul 2>&1
|
| 12 |
+
if %ERRORLEVEL% NEQ 0 (
|
| 13 |
+
echo ERROR: conda command not found
|
| 14 |
+
echo Please activate Anaconda Prompt first or add conda to PATH
|
| 15 |
+
goto :end
|
| 16 |
+
)
|
| 17 |
+
|
| 18 |
+
echo Step 1: Checking conda environments...
|
| 19 |
+
call conda env list
|
| 20 |
+
|
| 21 |
+
echo.
|
| 22 |
+
echo Step 2: Creating environment if it doesn't exist...
|
| 23 |
+
call conda env create -f environment.yml --name research-ai-assistant 2>nul
|
| 24 |
+
if %ERRORLEVEL% NEQ 0 (
|
| 25 |
+
echo Environment may already exist, continuing...
|
| 26 |
+
)
|
| 27 |
+
|
| 28 |
+
echo.
|
| 29 |
+
echo Step 3: Activating environment and running test...
|
| 30 |
+
call conda activate research-ai-assistant
|
| 31 |
+
if %ERRORLEVEL% NEQ 0 (
|
| 32 |
+
echo ERROR: Failed to activate environment
|
| 33 |
+
echo Try: conda activate research-ai-assistant
|
| 34 |
+
goto :end
|
| 35 |
+
)
|
| 36 |
+
|
| 37 |
+
echo.
|
| 38 |
+
echo Step 4: Installing openai package if needed...
|
| 39 |
+
python -c "import openai" 2>nul
|
| 40 |
+
if %ERRORLEVEL% NEQ 0 (
|
| 41 |
+
echo Installing openai package...
|
| 42 |
+
pip install openai>=1.0.0
|
| 43 |
+
)
|
| 44 |
+
|
| 45 |
+
echo.
|
| 46 |
+
echo Step 5: Running Novita AI connection test...
|
| 47 |
+
python test_novita_connection.py
|
| 48 |
+
|
| 49 |
+
:end
|
| 50 |
+
echo.
|
| 51 |
+
echo Test complete!
|
| 52 |
+
pause
|
| 53 |
+
|
test_novita_connection.py
ADDED
|
@@ -0,0 +1,275 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Test script for Novita AI API connection
|
| 4 |
+
Tests configuration, client initialization, and API calls
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import os
|
| 8 |
+
import sys
|
| 9 |
+
import asyncio
|
| 10 |
+
from pathlib import Path
|
| 11 |
+
|
| 12 |
+
# Add project root to path
|
| 13 |
+
project_root = Path(__file__).parent
|
| 14 |
+
sys.path.insert(0, str(project_root))
|
| 15 |
+
|
| 16 |
+
def test_configuration():
|
| 17 |
+
"""Test configuration loading"""
|
| 18 |
+
print("=" * 60)
|
| 19 |
+
print("TEST 1: Configuration Loading")
|
| 20 |
+
print("=" * 60)
|
| 21 |
+
|
| 22 |
+
try:
|
| 23 |
+
from src.config import get_settings
|
| 24 |
+
settings = get_settings()
|
| 25 |
+
|
| 26 |
+
print(f"β Configuration loaded successfully")
|
| 27 |
+
print(f" Novita API Key: {'Set' if settings.novita_api_key else 'NOT SET'}")
|
| 28 |
+
print(f" Base URL: {settings.novita_base_url}")
|
| 29 |
+
print(f" Model: {settings.novita_model}")
|
| 30 |
+
print(f" Temperature: {settings.deepseek_r1_temperature}")
|
| 31 |
+
print(f" Force Reasoning: {settings.deepseek_r1_force_reasoning}")
|
| 32 |
+
print(f" User Input Max Tokens: {settings.user_input_max_tokens}")
|
| 33 |
+
print(f" Context Preparation Budget: {settings.context_preparation_budget}")
|
| 34 |
+
|
| 35 |
+
if not settings.novita_api_key:
|
| 36 |
+
print("\nβ ERROR: NOVITA_API_KEY is not set!")
|
| 37 |
+
print(" Please set it in environment variables or .env file")
|
| 38 |
+
return False
|
| 39 |
+
|
| 40 |
+
return True
|
| 41 |
+
|
| 42 |
+
except Exception as e:
|
| 43 |
+
print(f"β Configuration loading failed: {e}")
|
| 44 |
+
import traceback
|
| 45 |
+
traceback.print_exc()
|
| 46 |
+
return False
|
| 47 |
+
|
| 48 |
+
def test_openai_package():
|
| 49 |
+
"""Test OpenAI package availability"""
|
| 50 |
+
print("\n" + "=" * 60)
|
| 51 |
+
print("TEST 2: OpenAI Package Check")
|
| 52 |
+
print("=" * 60)
|
| 53 |
+
|
| 54 |
+
try:
|
| 55 |
+
from openai import OpenAI
|
| 56 |
+
print("β OpenAI package is available")
|
| 57 |
+
print(f" OpenAI version: {OpenAI.__module__}")
|
| 58 |
+
return True
|
| 59 |
+
except ImportError as e:
|
| 60 |
+
print(f"β OpenAI package not available: {e}")
|
| 61 |
+
print(" Install with: pip install openai>=1.0.0")
|
| 62 |
+
return False
|
| 63 |
+
|
| 64 |
+
def test_client_initialization():
|
| 65 |
+
"""Test Novita AI client initialization"""
|
| 66 |
+
print("\n" + "=" * 60)
|
| 67 |
+
print("TEST 3: Novita AI Client Initialization")
|
| 68 |
+
print("=" * 60)
|
| 69 |
+
|
| 70 |
+
try:
|
| 71 |
+
from src.config import get_settings
|
| 72 |
+
from openai import OpenAI
|
| 73 |
+
|
| 74 |
+
settings = get_settings()
|
| 75 |
+
|
| 76 |
+
if not settings.novita_api_key:
|
| 77 |
+
print("β Cannot test - NOVITA_API_KEY not set")
|
| 78 |
+
return False
|
| 79 |
+
|
| 80 |
+
client = OpenAI(
|
| 81 |
+
base_url=settings.novita_base_url,
|
| 82 |
+
api_key=settings.novita_api_key,
|
| 83 |
+
)
|
| 84 |
+
|
| 85 |
+
print("β Novita AI client initialized successfully")
|
| 86 |
+
print(f" Base URL: {settings.novita_base_url}")
|
| 87 |
+
print(f" API Key: {settings.novita_api_key[:10]}...{settings.novita_api_key[-4:] if len(settings.novita_api_key) > 14 else '***'}")
|
| 88 |
+
|
| 89 |
+
return True, client
|
| 90 |
+
|
| 91 |
+
except Exception as e:
|
| 92 |
+
print(f"β Client initialization failed: {e}")
|
| 93 |
+
import traceback
|
| 94 |
+
traceback.print_exc()
|
| 95 |
+
return False, None
|
| 96 |
+
|
| 97 |
+
def test_simple_api_call(client):
|
| 98 |
+
"""Test a simple API call to Novita AI"""
|
| 99 |
+
print("\n" + "=" * 60)
|
| 100 |
+
print("TEST 4: Simple API Call")
|
| 101 |
+
print("=" * 60)
|
| 102 |
+
|
| 103 |
+
if not client:
|
| 104 |
+
print("β Cannot test - client not initialized")
|
| 105 |
+
return False
|
| 106 |
+
|
| 107 |
+
try:
|
| 108 |
+
from src.config import get_settings
|
| 109 |
+
settings = get_settings()
|
| 110 |
+
|
| 111 |
+
print(f"Sending test request to: {settings.novita_model}")
|
| 112 |
+
print("Prompt: 'Hello, this is a test. Please respond briefly.'")
|
| 113 |
+
|
| 114 |
+
response = client.chat.completions.create(
|
| 115 |
+
model=settings.novita_model,
|
| 116 |
+
messages=[
|
| 117 |
+
{"role": "user", "content": "Hello, this is a test. Please respond briefly."}
|
| 118 |
+
],
|
| 119 |
+
max_tokens=50,
|
| 120 |
+
temperature=0.6
|
| 121 |
+
)
|
| 122 |
+
|
| 123 |
+
if response.choices and len(response.choices) > 0:
|
| 124 |
+
result = response.choices[0].message.content
|
| 125 |
+
print(f"β API call successful!")
|
| 126 |
+
print(f" Response length: {len(result)} characters")
|
| 127 |
+
print(f" Response preview: {result[:100]}...")
|
| 128 |
+
print(f" Model used: {response.model if hasattr(response, 'model') else 'N/A'}")
|
| 129 |
+
return True
|
| 130 |
+
else:
|
| 131 |
+
print("β API call returned empty response")
|
| 132 |
+
return False
|
| 133 |
+
|
| 134 |
+
except Exception as e:
|
| 135 |
+
print(f"β API call failed: {e}")
|
| 136 |
+
import traceback
|
| 137 |
+
traceback.print_exc()
|
| 138 |
+
return False
|
| 139 |
+
|
| 140 |
+
def test_llm_router():
|
| 141 |
+
"""Test LLM Router initialization and health check"""
|
| 142 |
+
print("\n" + "=" * 60)
|
| 143 |
+
print("TEST 5: LLM Router Initialization")
|
| 144 |
+
print("=" * 60)
|
| 145 |
+
|
| 146 |
+
try:
|
| 147 |
+
from src.llm_router import LLMRouter
|
| 148 |
+
|
| 149 |
+
print("Initializing LLM Router...")
|
| 150 |
+
router = LLMRouter(hf_token=None, use_local_models=False)
|
| 151 |
+
|
| 152 |
+
print("β LLM Router initialized successfully")
|
| 153 |
+
|
| 154 |
+
# Test health check
|
| 155 |
+
print("\nTesting health check...")
|
| 156 |
+
async def test_health():
|
| 157 |
+
health = await router.health_check()
|
| 158 |
+
return health
|
| 159 |
+
|
| 160 |
+
health = asyncio.run(test_health())
|
| 161 |
+
print(f"β Health check result: {health}")
|
| 162 |
+
|
| 163 |
+
return True
|
| 164 |
+
|
| 165 |
+
except Exception as e:
|
| 166 |
+
print(f"β LLM Router initialization failed: {e}")
|
| 167 |
+
import traceback
|
| 168 |
+
traceback.print_exc()
|
| 169 |
+
return False
|
| 170 |
+
|
| 171 |
+
async def test_inference():
|
| 172 |
+
"""Test actual inference through LLM Router"""
|
| 173 |
+
print("\n" + "=" * 60)
|
| 174 |
+
print("TEST 6: Inference Test")
|
| 175 |
+
print("=" * 60)
|
| 176 |
+
|
| 177 |
+
try:
|
| 178 |
+
from src.llm_router import LLMRouter
|
| 179 |
+
|
| 180 |
+
router = LLMRouter(hf_token=None, use_local_models=False)
|
| 181 |
+
|
| 182 |
+
test_prompt = "What is the capital of France? Answer in one sentence."
|
| 183 |
+
print(f"Test prompt: {test_prompt}")
|
| 184 |
+
|
| 185 |
+
result = await router.route_inference(
|
| 186 |
+
task_type="general_reasoning",
|
| 187 |
+
prompt=test_prompt,
|
| 188 |
+
max_tokens=100,
|
| 189 |
+
temperature=0.6
|
| 190 |
+
)
|
| 191 |
+
|
| 192 |
+
if result:
|
| 193 |
+
print(f"β Inference successful!")
|
| 194 |
+
print(f" Response length: {len(result)} characters")
|
| 195 |
+
print(f" Response: {result}")
|
| 196 |
+
return True
|
| 197 |
+
else:
|
| 198 |
+
print("β Inference returned None")
|
| 199 |
+
return False
|
| 200 |
+
|
| 201 |
+
except Exception as e:
|
| 202 |
+
print(f"β Inference test failed: {e}")
|
| 203 |
+
import traceback
|
| 204 |
+
traceback.print_exc()
|
| 205 |
+
return False
|
| 206 |
+
|
| 207 |
+
def main():
|
| 208 |
+
"""Run all tests"""
|
| 209 |
+
print("\n" + "=" * 60)
|
| 210 |
+
print("NOVITA AI CONNECTION TEST")
|
| 211 |
+
print("=" * 60)
|
| 212 |
+
print()
|
| 213 |
+
|
| 214 |
+
results = {}
|
| 215 |
+
|
| 216 |
+
# Test 1: Configuration
|
| 217 |
+
results['config'] = test_configuration()
|
| 218 |
+
if not results['config']:
|
| 219 |
+
print("\nβ Configuration test failed. Please check your environment variables.")
|
| 220 |
+
return
|
| 221 |
+
|
| 222 |
+
# Test 2: OpenAI package
|
| 223 |
+
results['package'] = test_openai_package()
|
| 224 |
+
if not results['package']:
|
| 225 |
+
print("\nβ Package test failed. Please install: pip install openai>=1.0.0")
|
| 226 |
+
return
|
| 227 |
+
|
| 228 |
+
# Test 3: Client initialization
|
| 229 |
+
client_init_result = test_client_initialization()
|
| 230 |
+
if isinstance(client_init_result, tuple):
|
| 231 |
+
results['client'] = client_init_result[0]
|
| 232 |
+
client = client_init_result[1]
|
| 233 |
+
else:
|
| 234 |
+
results['client'] = client_init_result
|
| 235 |
+
client = None
|
| 236 |
+
|
| 237 |
+
if not results['client']:
|
| 238 |
+
print("\nβ Client initialization failed. Check your API key and base URL.")
|
| 239 |
+
return
|
| 240 |
+
|
| 241 |
+
# Test 4: Simple API call
|
| 242 |
+
results['api_call'] = test_simple_api_call(client)
|
| 243 |
+
|
| 244 |
+
# Test 5: LLM Router
|
| 245 |
+
results['router'] = test_llm_router()
|
| 246 |
+
|
| 247 |
+
# Test 6: Inference
|
| 248 |
+
if results['router']:
|
| 249 |
+
results['inference'] = asyncio.run(test_inference())
|
| 250 |
+
|
| 251 |
+
# Summary
|
| 252 |
+
print("\n" + "=" * 60)
|
| 253 |
+
print("TEST SUMMARY")
|
| 254 |
+
print("=" * 60)
|
| 255 |
+
|
| 256 |
+
total_tests = len(results)
|
| 257 |
+
passed_tests = sum(1 for v in results.values() if v)
|
| 258 |
+
|
| 259 |
+
for test_name, result in results.items():
|
| 260 |
+
status = "β PASS" if result else "β FAIL"
|
| 261 |
+
print(f" {test_name.upper()}: {status}")
|
| 262 |
+
|
| 263 |
+
print(f"\nTotal: {passed_tests}/{total_tests} tests passed")
|
| 264 |
+
|
| 265 |
+
if passed_tests == total_tests:
|
| 266 |
+
print("\nπ All tests passed! Novita AI connection is working correctly.")
|
| 267 |
+
return 0
|
| 268 |
+
else:
|
| 269 |
+
print("\nβ οΈ Some tests failed. Please review the errors above.")
|
| 270 |
+
return 1
|
| 271 |
+
|
| 272 |
+
if __name__ == "__main__":
|
| 273 |
+
exit_code = main()
|
| 274 |
+
sys.exit(exit_code)
|
| 275 |
+
|