JatsTheAIGen commited on
Commit
927854c
Β·
1 Parent(s): ea87e33

Integrate Novita AI as exclusive inference provider - Add Novita AI API integration with DeepSeek-R1-Distill-Qwen-7B model - Remove all local model dependencies - Optimize token allocation for user inputs and context - Add Anaconda environment setup files - Add comprehensive test scripts and documentation

Browse files
CONDA_SETUP_GUIDE.md ADDED
@@ -0,0 +1,166 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Anaconda Environment Setup Guide
2
+
3
+ ## Quick Start
4
+
5
+ ### 1. Create Conda Environment
6
+
7
+ ```bash
8
+ # Create environment from environment.yml
9
+ conda env create -f environment.yml
10
+
11
+ # OR create manually
12
+ conda create -n research-ai-assistant python=3.10
13
+ conda activate research-ai-assistant
14
+ ```
15
+
16
+ ### 2. Activate Environment
17
+
18
+ ```bash
19
+ # Windows
20
+ conda activate research-ai-assistant
21
+
22
+ # Linux/Mac
23
+ source activate research-ai-assistant
24
+ # OR
25
+ conda activate research-ai-assistant
26
+ ```
27
+
28
+ ### 3. Install Dependencies
29
+
30
+ ```bash
31
+ # Install from requirements.txt
32
+ pip install -r requirements.txt
33
+
34
+ # OR install openai package directly
35
+ pip install openai>=1.0.0
36
+ ```
37
+
38
+ ### 4. Set Environment Variables
39
+
40
+ ```bash
41
+ # Windows (PowerShell)
42
+ $env:NOVITA_API_KEY="your_api_key_here"
43
+ $env:NOVITA_BASE_URL="https://api.novita.ai/dedicated/v1/openai"
44
+ $env:NOVITA_MODEL="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2"
45
+
46
+ # Windows (CMD)
47
+ set NOVITA_API_KEY=your_api_key_here
48
+ set NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai
49
+ set NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
50
+
51
+ # Linux/Mac
52
+ export NOVITA_API_KEY=your_api_key_here
53
+ export NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai
54
+ export NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
55
+ ```
56
+
57
+ ### 5. Test Connection
58
+
59
+ ```bash
60
+ # Run the test script
61
+ python test_novita_connection.py
62
+
63
+ # OR use the batch script (Windows)
64
+ test_novita_conda.bat
65
+ ```
66
+
67
+ ## Using Anaconda Prompt (Windows)
68
+
69
+ 1. **Open Anaconda Prompt** (search for "Anaconda Prompt" in Start menu)
70
+
71
+ 2. **Navigate to project directory:**
72
+ ```bash
73
+ cd C:\Users\85jat\GenAI_work_V2\Prototyping\Research_AI_Assistant_V2\Research_AI_Assistant_API
74
+ ```
75
+
76
+ 3. **Create/activate environment:**
77
+ ```bash
78
+ conda env create -f environment.yml
79
+ conda activate research-ai-assistant
80
+ ```
81
+
82
+ 4. **Install dependencies:**
83
+ ```bash
84
+ pip install -r requirements.txt
85
+ ```
86
+
87
+ 5. **Set environment variables:**
88
+ ```bash
89
+ set NOVITA_API_KEY=your_api_key_here
90
+ set NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai
91
+ set NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
92
+ ```
93
+
94
+ 6. **Run test:**
95
+ ```bash
96
+ python test_novita_connection.py
97
+ ```
98
+
99
+ ## Environment Management
100
+
101
+ ### List environments
102
+ ```bash
103
+ conda env list
104
+ ```
105
+
106
+ ### Activate environment
107
+ ```bash
108
+ conda activate research-ai-assistant
109
+ ```
110
+
111
+ ### Deactivate environment
112
+ ```bash
113
+ conda deactivate
114
+ ```
115
+
116
+ ### Remove environment (if needed)
117
+ ```bash
118
+ conda env remove -n research-ai-assistant
119
+ ```
120
+
121
+ ### Update environment
122
+ ```bash
123
+ conda env update -f environment.yml --prune
124
+ ```
125
+
126
+ ## Verification
127
+
128
+ After setup, verify everything works:
129
+
130
+ ```bash
131
+ # Activate environment
132
+ conda activate research-ai-assistant
133
+
134
+ # Check Python
135
+ python --version
136
+
137
+ # Check openai package
138
+ python -c "import openai; print(openai.__version__)"
139
+
140
+ # Check configuration
141
+ python -c "from src.config import get_settings; s = get_settings(); print(f'API Key: {s.novita_api_key[:10]}...' if s.novita_api_key else 'API Key: NOT SET')"
142
+
143
+ # Run full test
144
+ python test_novita_connection.py
145
+ ```
146
+
147
+ ## Troubleshooting
148
+
149
+ ### Conda command not found
150
+ - **Windows:** Open Anaconda Prompt instead of regular PowerShell/CMD
151
+ - **Linux/Mac:** Ensure conda is initialized: `conda init bash` or `conda init zsh`
152
+
153
+ ### Environment activation fails
154
+ - Try: `conda activate base` first, then `conda activate research-ai-assistant`
155
+ - On Windows: Use Anaconda Prompt instead of regular terminal
156
+
157
+ ### Package installation fails
158
+ - Update conda: `conda update conda`
159
+ - Update pip: `pip install --upgrade pip`
160
+ - Try installing from conda-forge: `conda install -c conda-forge openai`
161
+
162
+ ### Import errors
163
+ - Ensure environment is activated: `conda activate research-ai-assistant`
164
+ - Verify package is installed: `pip list | grep openai`
165
+ - Reinstall if needed: `pip install --force-reinstall openai>=1.0.0`
166
+
ENV_EXAMPLE_CONTENT.txt ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # =============================================================================
2
+ # Research AI Assistant API - Environment Configuration
3
+ # =============================================================================
4
+ # Copy this content to a file named .env and fill in your actual values
5
+ # Never commit .env to version control!
6
+
7
+ # =============================================================================
8
+ # Novita AI Configuration (REQUIRED)
9
+ # =============================================================================
10
+ # Get your API key from: https://novita.ai
11
+ NOVITA_API_KEY=your_novita_api_key_here
12
+
13
+ # Dedicated endpoint base URL (default for dedicated endpoints)
14
+ NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai
15
+
16
+ # Your dedicated endpoint model ID
17
+ # Format: model-name:endpoint-id
18
+ NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
19
+
20
+ # =============================================================================
21
+ # DeepSeek-R1 Optimized Settings
22
+ # =============================================================================
23
+ # Temperature: 0.5-0.7 range (0.6 recommended for DeepSeek-R1)
24
+ DEEPSEEK_R1_TEMPERATURE=0.6
25
+
26
+ # Force reasoning trigger: Enable to ensure DeepSeek-R1 uses reasoning pattern
27
+ # Set to True to add `<think>` prefix for reasoning tasks
28
+ DEEPSEEK_R1_FORCE_REASONING=True
29
+
30
+ # =============================================================================
31
+ # Token Allocation Configuration
32
+ # =============================================================================
33
+ # Maximum tokens dedicated for user input (prioritized over context)
34
+ # Recommended: 8000 tokens for large queries
35
+ USER_INPUT_MAX_TOKENS=8000
36
+
37
+ # Maximum tokens for context preparation (includes user input + context)
38
+ # Recommended: 28000 tokens for 32K context window models
39
+ CONTEXT_PREPARATION_BUDGET=28000
40
+
41
+ # Context pruning threshold (should match context_preparation_budget)
42
+ CONTEXT_PRUNING_THRESHOLD=28000
43
+
44
+ # Always prioritize user input over historical context
45
+ PRIORITIZE_USER_INPUT=True
46
+
47
+ # =============================================================================
48
+ # Database Configuration
49
+ # =============================================================================
50
+ # SQLite database path (default: sessions.db)
51
+ # Use /tmp/ for Docker/containerized environments
52
+ DB_PATH=sessions.db
53
+
54
+ # FAISS index path for embeddings (default: embeddings.faiss)
55
+ FAISS_INDEX_PATH=embeddings.faiss
56
+
57
+ # =============================================================================
58
+ # Cache Configuration
59
+ # =============================================================================
60
+ # HuggingFace cache directory (for any remaining model downloads)
61
+ HF_HOME=~/.cache/huggingface
62
+ TRANSFORMERS_CACHE=~/.cache/huggingface
63
+
64
+ # HuggingFace token (optional - only needed if using gated models)
65
+ HF_TOKEN=
66
+
67
+ # Cache TTL in seconds (default: 3600 = 1 hour)
68
+ CACHE_TTL=3600
69
+
70
+ # =============================================================================
71
+ # Session Configuration
72
+ # =============================================================================
73
+ # Session timeout in seconds (default: 3600 = 1 hour)
74
+ SESSION_TIMEOUT=3600
75
+
76
+ # Maximum session size in megabytes (default: 10 MB)
77
+ MAX_SESSION_SIZE_MB=10
78
+
79
+ # =============================================================================
80
+ # Performance Configuration
81
+ # =============================================================================
82
+ # Maximum worker threads for parallel processing (default: 4)
83
+ MAX_WORKERS=4
84
+
85
+ # =============================================================================
86
+ # Mobile Optimization
87
+ # =============================================================================
88
+ # Maximum tokens for mobile responses (default: 1200)
89
+ # Increased from 800 to allow better responses on mobile
90
+ MOBILE_MAX_TOKENS=1200
91
+
92
+ # Mobile request timeout in milliseconds (default: 15000)
93
+ MOBILE_TIMEOUT=15000
94
+
95
+ # =============================================================================
96
+ # API Configuration
97
+ # =============================================================================
98
+ # Flask/Gradio server port (default: 7860)
99
+ GRADIO_PORT=7860
100
+
101
+ # Server host (default: 0.0.0.0 for all interfaces)
102
+ GRADIO_HOST=0.0.0.0
103
+
104
+ # =============================================================================
105
+ # Logging Configuration
106
+ # =============================================================================
107
+ # Logging level: DEBUG, INFO, WARNING, ERROR, CRITICAL (default: INFO)
108
+ LOG_LEVEL=INFO
109
+
110
+ # Log format: json or text (default: json)
111
+ LOG_FORMAT=json
112
+
113
+ # Log directory (default: /tmp/logs)
114
+ LOG_DIR=/tmp/logs
115
+
116
+ # =============================================================================
117
+ # Context Configuration
118
+ # =============================================================================
119
+ # Maximum context tokens (default: 4000)
120
+ # Note: This is overridden by CONTEXT_PREPARATION_BUDGET if set
121
+ MAX_CONTEXT_TOKENS=4000
122
+
123
+ # Cache TTL for context in seconds (default: 300 = 5 minutes)
124
+ CACHE_TTL_SECONDS=300
125
+
126
+ # Maximum cache size (default: 100)
127
+ MAX_CACHE_SIZE=100
128
+
129
+ # Enable parallel processing (default: True)
130
+ PARALLEL_PROCESSING=True
131
+
132
+ # Context decay factor (default: 0.8)
133
+ CONTEXT_DECAY_FACTOR=0.8
134
+
135
+ # Maximum interactions to keep in context (default: 10)
136
+ MAX_INTERACTIONS_TO_KEEP=10
137
+
138
+ # Enable metrics collection (default: True)
139
+ ENABLE_METRICS=True
140
+
141
+ # Enable context compression (default: True)
142
+ COMPRESSION_ENABLED=True
143
+
144
+ # Summarization threshold in tokens (default: 2000)
145
+ SUMMARIZATION_THRESHOLD=2000
146
+
147
+ # =============================================================================
148
+ # Model Selection (for context operations - if still using local models)
149
+ # =============================================================================
150
+ # These are optional and only used if local models are still needed
151
+ # for context summarization or other operations
152
+ CONTEXT_SUMMARIZATION_MODEL=Qwen/Qwen2.5-7B-Instruct
153
+ CONTEXT_INTENT_MODEL=Qwen/Qwen2.5-7B-Instruct
154
+ CONTEXT_SYNTHESIS_MODEL=Qwen/Qwen2.5-7B-Instruct
155
+
156
+ # =============================================================================
157
+ # Security Notes
158
+ # =============================================================================
159
+ # - Never commit .env file to version control
160
+ # - Keep API keys secret and rotate them regularly
161
+ # - Use environment variables in production (not .env files)
162
+ # - Set proper file permissions: chmod 600 .env
163
+
NOVITA_AI_IMPLEMENTATION_SUMMARY.md ADDED
@@ -0,0 +1,212 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Novita AI Implementation Summary
2
+
3
+ ## βœ… Implementation Complete
4
+
5
+ All changes have been implemented to switch from local models to Novita AI API as the only inference source.
6
+
7
+ ## πŸ“‹ Files Modified
8
+
9
+ ### 1. βœ… `src/config.py`
10
+ - Added Novita AI configuration section with:
11
+ - `novita_api_key` (required, validated)
12
+ - `novita_base_url` (default: https://api.novita.ai/dedicated/v1/openai)
13
+ - `novita_model` (default: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2)
14
+ - `deepseek_r1_temperature` (default: 0.6, validated 0.5-0.7 range)
15
+ - `deepseek_r1_force_reasoning` (default: True)
16
+ - Token allocation configuration:
17
+ - `user_input_max_tokens` (default: 8000)
18
+ - `context_preparation_budget` (default: 28000)
19
+ - `context_pruning_threshold` (default: 28000)
20
+ - `prioritize_user_input` (default: True)
21
+
22
+ ### 2. βœ… `requirements.txt`
23
+ - Added `openai>=1.0.0` package
24
+
25
+ ### 3. βœ… `src/models_config.py`
26
+ - Changed `primary_provider` from "local" to "novita_api"
27
+ - Updated all model IDs to Novita model ID
28
+ - Added DeepSeek-R1 optimized parameters:
29
+ - Temperature: 0.6 for reasoning, 0.5 for classification/safety
30
+ - Top_p: 0.95 for reasoning, 0.9 for classification
31
+ - `force_reasoning_prefix: True` for reasoning tasks
32
+ - Removed all local model configuration (quantization, fallbacks)
33
+
34
+ ### 4. βœ… `src/llm_router.py` (Complete Rewrite)
35
+ - Removed all local model loading code
36
+ - Removed `LocalModelLoader` dependencies
37
+ - Added OpenAI client initialization
38
+ - Implemented `_call_novita_api()` method
39
+ - Added DeepSeek-R1 optimizations:
40
+ - `_format_deepseek_r1_prompt()` - reasoning trigger and math directives
41
+ - `_is_math_query()` - automatic math detection
42
+ - `_clean_reasoning_tags()` - response cleanup
43
+ - Updated `prepare_context_for_llm()` with:
44
+ - User input priority (never truncated)
45
+ - Dedicated 8K token budget for user input
46
+ - 28K token context preparation budget
47
+ - Dynamic context allocation
48
+ - Updated `health_check()` for Novita API
49
+ - Removed all local model methods
50
+
51
+ ### 5. βœ… `flask_api_standalone.py`
52
+ - Updated `initialize_orchestrator()`:
53
+ - Changed to "Novita AI API Only" mode
54
+ - Removed HF_TOKEN dependency
55
+ - Set `use_local_models=False`
56
+ - Updated error handling for configuration errors
57
+ - Increased `MAX_MESSAGE_LENGTH` from 10KB to 100KB
58
+ - Updated logging messages
59
+
60
+ ### 6. βœ… `src/context_manager.py`
61
+ - Updated `prune_context()` to use config threshold (28000 tokens)
62
+ - Increased user input storage from 500 to 5000 characters
63
+ - Increased system response storage from 1000 to 2000 characters
64
+ - Updated interaction context generation to use more of user input
65
+
66
+ ## πŸ“ Environment Variables Required
67
+
68
+ Create a `.env` file with the following (see `.env.example` for full template):
69
+
70
+ ```bash
71
+ # REQUIRED - Novita AI Configuration
72
+ NOVITA_API_KEY=your_api_key_here
73
+ NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai
74
+ NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
75
+
76
+ # DeepSeek-R1 Optimized Settings
77
+ DEEPSEEK_R1_TEMPERATURE=0.6
78
+ DEEPSEEK_R1_FORCE_REASONING=True
79
+
80
+ # Token Allocation (Optional - defaults provided)
81
+ USER_INPUT_MAX_TOKENS=8000
82
+ CONTEXT_PREPARATION_BUDGET=28000
83
+ CONTEXT_PRUNING_THRESHOLD=28000
84
+ PRIORITIZE_USER_INPUT=True
85
+ ```
86
+
87
+ ## πŸš€ Installation Steps
88
+
89
+ 1. **Install dependencies:**
90
+ ```bash
91
+ pip install -r requirements.txt
92
+ ```
93
+
94
+ 2. **Create `.env` file:**
95
+ ```bash
96
+ cp .env.example .env
97
+ # Edit .env and add your NOVITA_API_KEY
98
+ ```
99
+
100
+ 3. **Set environment variables:**
101
+ ```bash
102
+ export NOVITA_API_KEY=your_api_key_here
103
+ export NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai
104
+ export NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
105
+ ```
106
+
107
+ 4. **Start the application:**
108
+ ```bash
109
+ python flask_api_standalone.py
110
+ ```
111
+
112
+ ## ✨ Key Features Implemented
113
+
114
+ ### DeepSeek-R1 Optimizations
115
+ - βœ… Temperature set to 0.6 (recommended range 0.5-0.7)
116
+ - βœ… Reasoning trigger (`<think>` prefix) for reasoning tasks
117
+ - βœ… Automatic math directive detection
118
+ - βœ… No system prompts (all instructions in user prompt)
119
+
120
+ ### Token Allocation
121
+ - βœ… User input: 8K tokens dedicated budget (never truncated)
122
+ - βœ… Context preparation: 28K tokens total budget
123
+ - βœ… Context pruning: 28K token threshold
124
+ - βœ… User input always prioritized over historical context
125
+
126
+ ### API Improvements
127
+ - βœ… Message length limit: 100KB (increased from 10KB)
128
+ - βœ… Better error messages with token estimates
129
+ - βœ… Configuration validation with helpful error messages
130
+
131
+ ### Database Storage
132
+ - βœ… User input storage: 5000 characters (increased from 500)
133
+ - βœ… System response storage: 2000 characters (increased from 1000)
134
+
135
+ ## πŸ§ͺ Testing Checklist
136
+
137
+ - [ ] Test API health check endpoint
138
+ - [ ] Test simple inference request
139
+ - [ ] Test large user input (5K+ tokens)
140
+ - [ ] Test reasoning tasks (should see reasoning trigger)
141
+ - [ ] Test math queries (should see math directive)
142
+ - [ ] Test context preparation (user input should not be truncated)
143
+ - [ ] Test error handling (missing API key, invalid endpoint)
144
+
145
+ ## πŸ“Š Expected Behavior
146
+
147
+ 1. **Startup:**
148
+ - System initializes Novita AI client
149
+ - Validates API key is present
150
+ - Logs Novita AI configuration
151
+
152
+ 2. **Inference:**
153
+ - All requests routed to Novita AI API
154
+ - DeepSeek-R1 optimizations applied automatically
155
+ - User input prioritized in context preparation
156
+
157
+ 3. **Error Handling:**
158
+ - Clear error messages if API key missing
159
+ - Helpful guidance for configuration issues
160
+ - Graceful handling of API failures
161
+
162
+ ## πŸ”§ Troubleshooting
163
+
164
+ ### Issue: "NOVITA_API_KEY is required"
165
+ **Solution:** Set the environment variable:
166
+ ```bash
167
+ export NOVITA_API_KEY=your_key_here
168
+ ```
169
+
170
+ ### Issue: "openai package not available"
171
+ **Solution:** Install dependencies:
172
+ ```bash
173
+ pip install -r requirements.txt
174
+ ```
175
+
176
+ ### Issue: API connection errors
177
+ **Solution:**
178
+ - Verify API key is correct
179
+ - Check base URL matches your endpoint
180
+ - Verify model ID matches your deployment
181
+
182
+ ## πŸ“š Configuration Reference
183
+
184
+ ### Model Configuration
185
+ - **Model ID:** `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2`
186
+ - **Context Window:** 131,072 tokens (131K)
187
+ - **Optimized Settings:** Temperature 0.6, Top_p 0.95
188
+
189
+ ### Token Allocation
190
+ - **User Input:** 8,000 tokens (dedicated, never truncated)
191
+ - **Context Budget:** 28,000 tokens (includes user input + context)
192
+ - **Output Limits:**
193
+ - Reasoning: 4,096 tokens
194
+ - Synthesis: 2,000 tokens
195
+ - Classification: 512 tokens
196
+
197
+ ## 🎯 Next Steps
198
+
199
+ 1. Set your `NOVITA_API_KEY` in environment variables
200
+ 2. Test the health check endpoint: `GET /api/health`
201
+ 3. Send a test request: `POST /api/chat`
202
+ 4. Monitor logs for Novita AI API calls
203
+ 5. Verify DeepSeek-R1 optimizations are working
204
+
205
+ ## πŸ“ Notes
206
+
207
+ - All local model code has been removed
208
+ - System now depends entirely on Novita AI API
209
+ - No GPU/quantization configuration needed
210
+ - No model downloading required
211
+ - Faster startup (no model loading)
212
+
QUICK_TEST_NOVITA.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Quick Test: Novita AI Connection with Anaconda
2
+
3
+ ## Step-by-Step Instructions
4
+
5
+ ### 1. Open Anaconda Prompt
6
+ - Search for "Anaconda Prompt" in Windows Start menu
7
+ - This ensures conda commands work properly
8
+
9
+ ### 2. Navigate to Project Directory
10
+ ```bash
11
+ cd C:\Users\85jat\GenAI_work_V2\Prototyping\Research_AI_Assistant_V2\Research_AI_Assistant_API
12
+ ```
13
+
14
+ ### 3. Create Conda Environment (First Time Only)
15
+ ```bash
16
+ conda create -n research-ai-assistant python=3.10 -y
17
+ ```
18
+
19
+ ### 4. Activate Environment
20
+ ```bash
21
+ conda activate research-ai-assistant
22
+ ```
23
+
24
+ ### 5. Install Required Packages
25
+ ```bash
26
+ pip install openai>=1.0.0
27
+ pip install -r requirements.txt
28
+ ```
29
+
30
+ ### 6. Set Environment Variables
31
+ ```bash
32
+ # Set your Novita API key
33
+ set NOVITA_API_KEY=your_api_key_here
34
+ set NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai
35
+ set NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
36
+ ```
37
+
38
+ ### 7. Run Test
39
+ ```bash
40
+ python test_novita_connection.py
41
+ ```
42
+
43
+ ## Alternative: Use Batch Script
44
+
45
+ Simply double-click or run:
46
+ ```bash
47
+ test_novita_conda.bat
48
+ ```
49
+
50
+ ## Expected Output
51
+
52
+ You should see:
53
+ ```
54
+ ============================================================
55
+ NOVITA AI CONNECTION TEST
56
+ ============================================================
57
+
58
+ ============================================================
59
+ TEST 1: Configuration Loading
60
+ ============================================================
61
+ βœ“ Configuration loaded successfully
62
+ Novita API Key: Set
63
+ Base URL: https://api.novita.ai/dedicated/v1/openai
64
+ Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
65
+ ...
66
+
67
+ ============================================================
68
+ TEST 4: Simple API Call
69
+ ============================================================
70
+ βœ“ API call successful!
71
+ Response: ...
72
+
73
+ πŸŽ‰ All tests passed! Novita AI connection is working correctly.
74
+ ```
75
+
76
+ ## Troubleshooting
77
+
78
+ **If conda command not found:**
79
+ - Use Anaconda Prompt instead of regular PowerShell
80
+ - Or run: `C:\Users\85jat\anaconda3\Scripts\activate.bat` (adjust path as needed)
81
+
82
+ **If environment activation fails:**
83
+ - Create environment first: `conda create -n research-ai-assistant python=3.10`
84
+
85
+ **If import errors:**
86
+ - Ensure environment is activated: `conda activate research-ai-assistant`
87
+ - Install packages: `pip install openai>=1.0.0`
88
+
TEST_NOVITA_CONNECTION.md ADDED
@@ -0,0 +1,220 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Testing Novita AI Connection
2
+
3
+ ## Quick Test Instructions
4
+
5
+ ### Option 1: Run Test Script (Recommended)
6
+
7
+ 1. **Ensure Python is available:**
8
+ ```bash
9
+ # Check Python version
10
+ python --version
11
+ # OR
12
+ python3 --version
13
+ # OR (Windows)
14
+ py --version
15
+ ```
16
+
17
+ 2. **Install dependencies if needed:**
18
+ ```bash
19
+ pip install openai>=1.0.0
20
+ pip install -r requirements.txt
21
+ ```
22
+
23
+ 3. **Set environment variables:**
24
+ ```bash
25
+ # Windows (PowerShell)
26
+ $env:NOVITA_API_KEY="your_api_key_here"
27
+ $env:NOVITA_BASE_URL="https://api.novita.ai/dedicated/v1/openai"
28
+ $env:NOVITA_MODEL="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2"
29
+
30
+ # Windows (CMD)
31
+ set NOVITA_API_KEY=your_api_key_here
32
+ set NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai
33
+ set NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
34
+
35
+ # Linux/Mac
36
+ export NOVITA_API_KEY=your_api_key_here
37
+ export NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai
38
+ export NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
39
+ ```
40
+
41
+ 4. **Run the test script:**
42
+ ```bash
43
+ python test_novita_connection.py
44
+ # OR
45
+ python3 test_novita_connection.py
46
+ # OR (Windows)
47
+ py test_novita_connection.py
48
+ ```
49
+
50
+ ### Option 2: Manual Python Test
51
+
52
+ Create a simple test file `quick_test.py`:
53
+
54
+ ```python
55
+ import os
56
+ from openai import OpenAI
57
+
58
+ # Get API key from environment
59
+ api_key = os.getenv("NOVITA_API_KEY")
60
+ base_url = os.getenv("NOVITA_BASE_URL", "https://api.novita.ai/dedicated/v1/openai")
61
+ model = os.getenv("NOVITA_MODEL", "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2")
62
+
63
+ if not api_key:
64
+ print("ERROR: NOVITA_API_KEY not set!")
65
+ exit(1)
66
+
67
+ print(f"Testing Novita AI connection...")
68
+ print(f"Base URL: {base_url}")
69
+ print(f"Model: {model}")
70
+
71
+ client = OpenAI(
72
+ base_url=base_url,
73
+ api_key=api_key,
74
+ )
75
+
76
+ try:
77
+ response = client.chat.completions.create(
78
+ model=model,
79
+ messages=[{"role": "user", "content": "Say 'Hello' if you can hear me."}],
80
+ max_tokens=20,
81
+ temperature=0.6
82
+ )
83
+
84
+ if response.choices:
85
+ print(f"\nβœ“ SUCCESS! Connection working.")
86
+ print(f"Response: {response.choices[0].message.content}")
87
+ else:
88
+ print("\n❌ No response received")
89
+
90
+ except Exception as e:
91
+ print(f"\n❌ ERROR: {e}")
92
+ ```
93
+
94
+ Run it:
95
+ ```bash
96
+ python quick_test.py
97
+ ```
98
+
99
+ ### Option 3: Test via API Endpoint
100
+
101
+ If the Flask server is running:
102
+
103
+ 1. **Start the server:**
104
+ ```bash
105
+ python flask_api_standalone.py
106
+ ```
107
+
108
+ 2. **Test health endpoint:**
109
+ ```bash
110
+ curl http://localhost:7860/api/health
111
+ # OR
112
+ # Visit http://localhost:7860/api/health in browser
113
+ ```
114
+
115
+ 3. **Test chat endpoint:**
116
+ ```bash
117
+ curl -X POST http://localhost:7860/api/chat \
118
+ -H "Content-Type: application/json" \
119
+ -d '{"message": "Hello", "session_id": "test-123"}'
120
+ ```
121
+
122
+ ## Expected Test Results
123
+
124
+ ### Successful Test Output:
125
+ ```
126
+ ============================================================
127
+ NOVITA AI CONNECTION TEST
128
+ ============================================================
129
+
130
+ ============================================================
131
+ TEST 1: Configuration Loading
132
+ ============================================================
133
+ βœ“ Configuration loaded successfully
134
+ Novita API Key: Set
135
+ Base URL: https://api.novita.ai/dedicated/v1/openai
136
+ Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
137
+ Temperature: 0.6
138
+ Force Reasoning: True
139
+ User Input Max Tokens: 8000
140
+ Context Preparation Budget: 28000
141
+
142
+ ============================================================
143
+ TEST 2: OpenAI Package Check
144
+ ============================================================
145
+ βœ“ OpenAI package is available
146
+
147
+ ============================================================
148
+ TEST 3: Novita AI Client Initialization
149
+ ============================================================
150
+ βœ“ Novita AI client initialized successfully
151
+ Base URL: https://api.novita.ai/dedicated/v1/openai
152
+ API Key: nv-****
153
+
154
+ ============================================================
155
+ TEST 4: Simple API Call
156
+ ============================================================
157
+ Sending test request to: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2
158
+ Prompt: 'Hello, this is a test. Please respond briefly.'
159
+ βœ“ API call successful!
160
+ Response length: XX characters
161
+ Response preview: ...
162
+
163
+ ============================================================
164
+ TEST 5: LLM Router Initialization
165
+ ============================================================
166
+ Initializing LLM Router...
167
+ βœ“ LLM Router initialized successfully
168
+
169
+ Testing health check...
170
+ βœ“ Health check result: {'provider': 'novita_api', 'status': 'healthy', ...}
171
+
172
+ ============================================================
173
+ TEST 6: Inference Test
174
+ ============================================================
175
+ Test prompt: What is the capital of France? Answer in one sentence.
176
+ βœ“ Inference successful!
177
+ Response length: XX characters
178
+ Response: ...
179
+
180
+ ============================================================
181
+ TEST SUMMARY
182
+ ============================================================
183
+ CONFIG: βœ“ PASS
184
+ PACKAGE: βœ“ PASS
185
+ CLIENT: βœ“ PASS
186
+ API_CALL: βœ“ PASS
187
+ ROUTER: βœ“ PASS
188
+ INFERENCE: βœ“ PASS
189
+
190
+ Total: 6/6 tests passed
191
+
192
+ πŸŽ‰ All tests passed! Novita AI connection is working correctly.
193
+ ```
194
+
195
+ ## Troubleshooting
196
+
197
+ ### Error: "NOVITA_API_KEY is required"
198
+ **Solution:** Set the environment variable:
199
+ ```bash
200
+ export NOVITA_API_KEY=your_key_here
201
+ ```
202
+
203
+ ### Error: "openai package not available"
204
+ **Solution:** Install the package:
205
+ ```bash
206
+ pip install openai>=1.0.0
207
+ ```
208
+
209
+ ### Error: "Failed to initialize Novita AI client"
210
+ **Solution:**
211
+ - Verify API key is correct
212
+ - Check base URL matches your endpoint
213
+ - Verify network connectivity
214
+
215
+ ### Error: "API call failed"
216
+ **Solution:**
217
+ - Check API key has proper permissions
218
+ - Verify model ID matches your deployment
219
+ - Check Novita AI service status
220
+
environment.yml ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: research-ai-assistant
2
+ channels:
3
+ - conda-forge
4
+ - defaults
5
+ dependencies:
6
+ - python>=3.10,<3.12
7
+ - pip
8
+ - pip:
9
+ # LLM API Client (required for Novita AI API)
10
+ - openai>=1.0.0
11
+ # Web Framework & Interface
12
+ - aiohttp>=3.9.0
13
+ - httpx>=0.25.0
14
+ # Flask API for external integrations
15
+ - flask>=3.0.0
16
+ - flask-cors>=4.0.0
17
+ - flask-limiter>=3.5.0
18
+ # Security & Validation
19
+ - pydantic-settings>=2.1.0
20
+ - python-dotenv>=1.0.0
21
+ # Database & Persistence
22
+ - sqlalchemy>=2.0.0
23
+ # Data Processing & Utilities
24
+ - pandas>=2.1.0
25
+ - numpy>=1.24.0,<2.0.0
26
+ # Caching & Performance
27
+ - cachetools>=5.3.0
28
+ # Async & Concurrency
29
+ - aiofiles>=23.2.0
30
+ # Logging & Monitoring
31
+ - structlog>=23.2.0
32
+ - prometheus-client>=0.19.0
33
+ - psutil>=5.9.0
34
+ # Utility Libraries
35
+ - python-dateutil>=2.8.0
36
+ - pytz>=2023.3
37
+ - requests>=2.31.0
38
+ # Production WSGI Server
39
+ - gunicorn>=21.2.0
40
+ # Development & Testing
41
+ - pytest>=7.4.0
42
+ - pytest-asyncio>=0.21.0
43
+
flask_api_standalone.py CHANGED
@@ -145,7 +145,7 @@ initialization_attempted = False
145
  initialization_error = None
146
 
147
  def initialize_orchestrator():
148
- """Initialize the AI orchestrator with local GPU models"""
149
  global orchestrator, orchestrator_available, initialization_attempted, initialization_error
150
 
151
  initialization_attempted = True
@@ -153,7 +153,7 @@ def initialize_orchestrator():
153
 
154
  try:
155
  logger.info("=" * 60)
156
- logger.info("INITIALIZING AI ORCHESTRATOR (Local GPU Models)")
157
  logger.info("=" * 60)
158
 
159
  from src.agents.intent_agent import create_intent_agent
@@ -166,27 +166,16 @@ def initialize_orchestrator():
166
 
167
  logger.info("βœ“ Imports successful")
168
 
169
- # Initialize LLM Router - local models only (no API fallback)
170
- hf_token = os.getenv('HF_TOKEN', '') # Optional - only needed for downloading gated models
171
- if not hf_token:
172
- logger.warning("HF_TOKEN not set - may be needed for gated model access")
173
- else:
174
- logger.info(f"HF_TOKEN available (for model download only)")
175
-
176
- # Import GatedRepoError for better error handling
177
  try:
178
- from huggingface_hub.exceptions import GatedRepoError
179
- except ImportError:
180
- GatedRepoError = Exception
181
-
182
- logger.info("Initializing LLM Router (local models only, no API fallback)...")
183
- try:
184
- # Always use local models - API fallback removed
185
- llm_router = LLMRouter(hf_token=hf_token, use_local_models=True)
186
- logger.info("βœ“ LLM Router initialized (local models only)")
187
  except Exception as e:
188
  logger.error(f"❌ Failed to initialize LLM Router: {e}", exc_info=True)
189
- logger.error("This is a critical error - local models are required")
 
190
  raise
191
 
192
  logger.info("Initializing Agents...")
@@ -221,28 +210,29 @@ def initialize_orchestrator():
221
  orchestrator_available = True
222
  logger.info("=" * 60)
223
  logger.info("βœ“ AI ORCHESTRATOR READY")
224
- logger.info(" - Local GPU models enabled" if llm_router.use_local_models else " - API-only mode (local models disabled)")
225
  logger.info(" - MAX_WORKERS: 4")
226
  logger.info("=" * 60)
227
 
228
  return True
229
 
230
- except GatedRepoError as e:
231
- logger.error("=" * 60)
232
- logger.error("❌ GATED REPOSITORY ERROR DURING INITIALIZATION")
233
- logger.error("=" * 60)
234
- logger.error(f"Error: {e}")
235
- logger.error("")
236
- logger.error("SOLUTION:")
237
- logger.error("1. Visit the model repository on Hugging Face")
238
- logger.error("2. Click 'Agree and access repository'")
239
- logger.error("3. Wait for approval (usually instant)")
240
- logger.error("4. Ensure HF_TOKEN is set with your access token")
241
- logger.error("")
242
- logger.error("NOTE: API fallback has been removed. Local models are required.")
243
- logger.error("=" * 60)
244
- orchestrator_available = False
245
- initialization_error = f"GatedRepoError: {str(e)}"
 
246
  return False
247
  except Exception as e:
248
  logger.error("=" * 60)
@@ -351,12 +341,12 @@ def chat():
351
  'error': 'Message cannot be empty'
352
  }), 400
353
 
354
- # Length limit (prevent abuse)
355
- MAX_MESSAGE_LENGTH = 10000 # 10KB limit
356
  if len(message) > MAX_MESSAGE_LENGTH:
357
  return jsonify({
358
  'success': False,
359
- 'error': f'Message too long. Maximum length is {MAX_MESSAGE_LENGTH} characters'
360
  }), 400
361
 
362
  history = data.get('history', [])
 
145
  initialization_error = None
146
 
147
  def initialize_orchestrator():
148
+ """Initialize the AI orchestrator with Novita AI API only"""
149
  global orchestrator, orchestrator_available, initialization_attempted, initialization_error
150
 
151
  initialization_attempted = True
 
153
 
154
  try:
155
  logger.info("=" * 60)
156
+ logger.info("INITIALIZING AI ORCHESTRATOR (Novita AI API Only)")
157
  logger.info("=" * 60)
158
 
159
  from src.agents.intent_agent import create_intent_agent
 
166
 
167
  logger.info("βœ“ Imports successful")
168
 
169
+ # Initialize LLM Router - Novita AI API only
170
+ logger.info("Initializing LLM Router (Novita AI API only)...")
 
 
 
 
 
 
171
  try:
172
+ # Always use Novita AI API (local models disabled)
173
+ llm_router = LLMRouter(hf_token=None, use_local_models=False)
174
+ logger.info("βœ“ LLM Router initialized (Novita AI API)")
 
 
 
 
 
 
175
  except Exception as e:
176
  logger.error(f"❌ Failed to initialize LLM Router: {e}", exc_info=True)
177
+ logger.error("This is a critical error - Novita AI API is required")
178
+ logger.error("Please ensure NOVITA_API_KEY is set in environment variables")
179
  raise
180
 
181
  logger.info("Initializing Agents...")
 
210
  orchestrator_available = True
211
  logger.info("=" * 60)
212
  logger.info("βœ“ AI ORCHESTRATOR READY")
213
+ logger.info(" - Novita AI API enabled")
214
  logger.info(" - MAX_WORKERS: 4")
215
  logger.info("=" * 60)
216
 
217
  return True
218
 
219
+ except ValueError as e:
220
+ # Handle configuration errors (e.g., missing NOVITA_API_KEY)
221
+ if "NOVITA_API_KEY" in str(e) or "required" in str(e).lower():
222
+ logger.error("=" * 60)
223
+ logger.error("❌ CONFIGURATION ERROR")
224
+ logger.error("=" * 60)
225
+ logger.error(f"Error: {e}")
226
+ logger.error("")
227
+ logger.error("SOLUTION:")
228
+ logger.error("1. Set NOVITA_API_KEY in environment variables")
229
+ logger.error("2. Ensure NOVITA_BASE_URL is correct")
230
+ logger.error("3. Verify NOVITA_MODEL matches your endpoint")
231
+ logger.error("=" * 60)
232
+ orchestrator_available = False
233
+ initialization_error = f"Configuration Error: {str(e)}"
234
+ else:
235
+ raise
236
  return False
237
  except Exception as e:
238
  logger.error("=" * 60)
 
341
  'error': 'Message cannot be empty'
342
  }), 400
343
 
344
+ # Length limit (allow larger inputs for complex queries)
345
+ MAX_MESSAGE_LENGTH = 100000 # 100KB limit (increased from 10KB)
346
  if len(message) > MAX_MESSAGE_LENGTH:
347
  return jsonify({
348
  'success': False,
349
+ 'error': f'Message too long. Maximum length is {MAX_MESSAGE_LENGTH} characters (approximately {MAX_MESSAGE_LENGTH // 4} tokens)'
350
  }), 400
351
 
352
  history = data.get('history', [])
requirements.txt CHANGED
@@ -107,3 +107,6 @@ debugpy>=1.7.0
107
  bandit>=1.7.5 # Security linter for Python code
108
  safety>=2.3.5 # Dependency vulnerability scanner
109
 
 
 
 
 
107
  bandit>=1.7.5 # Security linter for Python code
108
  safety>=2.3.5 # Dependency vulnerability scanner
109
 
110
+ # LLM API Client (required for Novita AI API)
111
+ openai>=1.0.0
112
+
setup_conda_env.bat ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ @echo off
2
+ REM Setup script for Anaconda environment (Windows)
3
+ REM This script creates and activates a conda environment for the Research AI Assistant
4
+
5
+ echo ============================================================
6
+ echo Setting up Anaconda environment for Research AI Assistant
7
+ echo ============================================================
8
+
9
+ REM Check if conda is available
10
+ where conda >nul 2>&1
11
+ if %ERRORLEVEL% NEQ 0 (
12
+ echo ERROR: conda command not found
13
+ echo Please install Anaconda or Miniconda first
14
+ echo Download from: https://www.anaconda.com/products/distribution
15
+ exit /b 1
16
+ )
17
+
18
+ echo Conda found
19
+
20
+ REM Create environment from environment.yml
21
+ echo.
22
+ echo Creating conda environment from environment.yml...
23
+ conda env create -f environment.yml
24
+
25
+ if %ERRORLEVEL% EQU 0 (
26
+ echo Environment created successfully
27
+ echo.
28
+ echo To activate the environment, run:
29
+ echo conda activate research-ai-assistant
30
+ echo.
31
+ echo Then install remaining dependencies:
32
+ echo pip install -r requirements.txt
33
+ ) else (
34
+ echo Environment creation failed
35
+ exit /b 1
36
+ )
37
+
setup_conda_env.sh ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ # Setup script for Anaconda environment
3
+ # This script creates and activates a conda environment for the Research AI Assistant
4
+
5
+ echo "============================================================"
6
+ echo "Setting up Anaconda environment for Research AI Assistant"
7
+ echo "============================================================"
8
+
9
+ # Check if conda is available
10
+ if ! command -v conda &> /dev/null; then
11
+ echo "❌ Error: conda command not found"
12
+ echo " Please install Anaconda or Miniconda first"
13
+ echo " Download from: https://www.anaconda.com/products/distribution"
14
+ exit 1
15
+ fi
16
+
17
+ echo "βœ“ Conda found"
18
+
19
+ # Create environment from environment.yml
20
+ echo ""
21
+ echo "Creating conda environment from environment.yml..."
22
+ conda env create -f environment.yml
23
+
24
+ if [ $? -eq 0 ]; then
25
+ echo "βœ“ Environment created successfully"
26
+ else
27
+ echo "❌ Environment creation failed"
28
+ exit 1
29
+ fi
30
+
31
+ # Activate environment
32
+ echo ""
33
+ echo "To activate the environment, run:"
34
+ echo " conda activate research-ai-assistant"
35
+ echo ""
36
+ echo "Or on Windows:"
37
+ echo " conda activate research-ai-assistant"
38
+ echo ""
39
+ echo "Then install remaining dependencies:"
40
+ echo " pip install -r requirements.txt"
41
+
src/config.py CHANGED
@@ -174,6 +174,98 @@ class Settings(BaseSettings):
174
 
175
  return self._cached_cache_dir
176
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
177
  # ==================== Model Configuration ====================
178
 
179
  default_model: str = Field(
 
174
 
175
  return self._cached_cache_dir
176
 
177
+ # ==================== Novita AI Configuration ====================
178
+
179
+ novita_api_key: str = Field(
180
+ default="",
181
+ description="Novita AI API key (required)",
182
+ env="NOVITA_API_KEY"
183
+ )
184
+
185
+ novita_base_url: str = Field(
186
+ default="https://api.novita.ai/dedicated/v1/openai",
187
+ description="Novita AI dedicated endpoint base URL",
188
+ env="NOVITA_BASE_URL"
189
+ )
190
+
191
+ novita_model: str = Field(
192
+ default="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2",
193
+ description="Novita AI dedicated endpoint model ID",
194
+ env="NOVITA_MODEL"
195
+ )
196
+
197
+ # DeepSeek-R1 optimized settings
198
+ deepseek_r1_temperature: float = Field(
199
+ default=0.6,
200
+ description="Temperature for DeepSeek-R1 models (0.5-0.7 range, 0.6 recommended)",
201
+ env="DEEPSEEK_R1_TEMPERATURE"
202
+ )
203
+
204
+ deepseek_r1_force_reasoning: bool = Field(
205
+ default=True,
206
+ description="Force DeepSeek-R1 to start with reasoning trigger",
207
+ env="DEEPSEEK_R1_FORCE_REASONING"
208
+ )
209
+
210
+ # Token Allocation Configuration
211
+ user_input_max_tokens: int = Field(
212
+ default=8000,
213
+ description="Maximum tokens dedicated for user input (prioritized over context)",
214
+ env="USER_INPUT_MAX_TOKENS"
215
+ )
216
+
217
+ context_preparation_budget: int = Field(
218
+ default=28000,
219
+ description="Maximum tokens for context preparation (includes user input + context)",
220
+ env="CONTEXT_PREPARATION_BUDGET"
221
+ )
222
+
223
+ context_pruning_threshold: int = Field(
224
+ default=28000,
225
+ description="Context pruning threshold (should match context_preparation_budget)",
226
+ env="CONTEXT_PRUNING_THRESHOLD"
227
+ )
228
+
229
+ prioritize_user_input: bool = Field(
230
+ default=True,
231
+ description="Always prioritize user input over historical context",
232
+ env="PRIORITIZE_USER_INPUT"
233
+ )
234
+
235
+ @validator("novita_api_key", pre=True)
236
+ def validate_novita_api_key(cls, v):
237
+ """Validate and clean Novita API key"""
238
+ if v is None:
239
+ return ""
240
+ return str(v).strip()
241
+
242
+ @validator("deepseek_r1_temperature", pre=True)
243
+ def validate_deepseek_temperature(cls, v):
244
+ """Validate DeepSeek-R1 temperature is in recommended range"""
245
+ if isinstance(v, str):
246
+ v = float(v)
247
+ temp = float(v) if v else 0.6
248
+ return max(0.5, min(0.7, temp))
249
+
250
+ @validator("deepseek_r1_force_reasoning", pre=True)
251
+ def validate_force_reasoning(cls, v):
252
+ """Convert string to boolean for force_reasoning"""
253
+ if isinstance(v, str):
254
+ return v.lower() in ("true", "1", "yes", "on")
255
+ return bool(v)
256
+
257
+ @validator("user_input_max_tokens", pre=True)
258
+ def validate_user_input_tokens(cls, v):
259
+ """Validate user input token limit"""
260
+ val = int(v) if v else 8000
261
+ return max(1000, min(20000, val))
262
+
263
+ @validator("context_preparation_budget", pre=True)
264
+ def validate_context_budget(cls, v):
265
+ """Validate context preparation budget"""
266
+ val = int(v) if v else 28000
267
+ return max(4000, min(120000, val))
268
+
269
  # ==================== Model Configuration ====================
270
 
271
  default_model: str = Field(
src/context_manager.py CHANGED
@@ -439,10 +439,13 @@ Keep the summary concise and focused (approximately 500 tokens)."""
439
  if not self.llm_router:
440
  return ""
441
 
 
 
 
442
  prompt = f"""Summarize this interaction in approximately 50 tokens:
443
 
444
- User Input: {user_input[:200]}
445
- System Response: {system_response[:300]}
446
 
447
  Provide a brief summary capturing the key exchange."""
448
 
@@ -466,8 +469,8 @@ Provide a brief summary capturing the key exchange."""
466
  """, (
467
  interaction_id,
468
  session_id,
469
- user_input[:500],
470
- system_response[:1000],
471
  summary.strip(),
472
  created_at
473
  ))
@@ -607,8 +610,8 @@ Keep the summary concise (approximately 100 tokens)."""
607
 
608
  Applies smart pruning before formatting.
609
  """
610
- # Step 4: Prune context if it exceeds token limits
611
- pruned_context = self.prune_context(context, max_tokens=2000)
612
 
613
  # Get context mode (fresh or relevant)
614
  session_id = pruned_context.get("session_id")
@@ -735,19 +738,30 @@ Keep the summary concise (approximately 100 tokens)."""
735
  # Simple approximation: 4 characters per token
736
  return len(text) // 4
737
 
738
- def prune_context(self, context: dict, max_tokens: int = 2000) -> dict:
739
  """
740
- Step 4: Implement Smart Context Pruning
741
 
742
  Prune context to stay within token limit while keeping most recent and relevant content.
743
 
744
  Args:
745
  context: Context dictionary to prune
746
- max_tokens: Maximum token count (default 2000)
747
 
748
  Returns:
749
  Pruned context dictionary
750
  """
 
 
 
 
 
 
 
 
 
 
 
751
  try:
752
  # Calculate current token count
753
  current_tokens = self._calculate_context_tokens(context)
 
439
  if not self.llm_router:
440
  return ""
441
 
442
+ # Use full user input for context generation (not truncated in prompt)
443
+ # Only truncate for display in prompt if extremely long
444
+ user_input_preview = user_input[:500] if len(user_input) > 500 else user_input
445
  prompt = f"""Summarize this interaction in approximately 50 tokens:
446
 
447
+ User Input: {user_input_preview}
448
+ System Response: {system_response[:500]}
449
 
450
  Provide a brief summary capturing the key exchange."""
451
 
 
469
  """, (
470
  interaction_id,
471
  session_id,
472
+ user_input[:5000], # Increased from 500 to 5000 characters
473
+ system_response[:2000], # Increased from 1000 to 2000
474
  summary.strip(),
475
  created_at
476
  ))
 
610
 
611
  Applies smart pruning before formatting.
612
  """
613
+ # Step 4: Prune context if it exceeds token limits (uses config threshold)
614
+ pruned_context = self.prune_context(context)
615
 
616
  # Get context mode (fresh or relevant)
617
  session_id = pruned_context.get("session_id")
 
738
  # Simple approximation: 4 characters per token
739
  return len(text) // 4
740
 
741
+ def prune_context(self, context: dict, max_tokens: Optional[int] = None) -> dict:
742
  """
743
+ Step 4: Implement Smart Context Pruning with configurable threshold
744
 
745
  Prune context to stay within token limit while keeping most recent and relevant content.
746
 
747
  Args:
748
  context: Context dictionary to prune
749
+ max_tokens: Maximum token count (uses config default if None)
750
 
751
  Returns:
752
  Pruned context dictionary
753
  """
754
+ # Use config threshold if not provided
755
+ if max_tokens is None:
756
+ try:
757
+ from .config import get_settings
758
+ settings = get_settings()
759
+ max_tokens = settings.context_pruning_threshold
760
+ logger.debug(f"Using config pruning threshold: {max_tokens} tokens")
761
+ except Exception:
762
+ max_tokens = 2000 # Fallback to default
763
+ logger.warning("Could not load config, using default pruning threshold: 2000")
764
+
765
  try:
766
  # Calculate current token count
767
  current_tokens = self._calculate_context_tokens(context)
src/llm_router.py CHANGED
@@ -1,290 +1,213 @@
1
- # llm_router.py - UPDATED FOR LOCAL GPU MODEL LOADING
2
  import logging
3
  import asyncio
4
  from typing import Dict, Optional
5
  from .models_config import LLM_CONFIG
 
6
 
7
- # Import GatedRepoError for handling gated repositories
8
  try:
9
- from huggingface_hub.exceptions import GatedRepoError
 
10
  except ImportError:
11
- # Fallback if huggingface_hub is not available
12
- GatedRepoError = Exception
 
13
 
14
  logger = logging.getLogger(__name__)
15
 
16
  class LLMRouter:
17
- def __init__(self, hf_token=None, use_local_models: bool = True):
18
- # hf_token kept for backward compatibility but not used for API calls
19
- # Only needed for downloading gated models from HuggingFace Hub
20
- self.hf_token = hf_token
21
- self.health_status = {}
22
- self.use_local_models = use_local_models
23
- self.local_loader = None
24
 
25
- logger.info("LLMRouter initialized (local models only, no API fallback)")
26
- if hf_token:
27
- logger.info("HF token available (for model download only)")
28
- else:
29
- logger.warning("HF_TOKEN not set - may be needed for gated model access")
 
30
 
31
- # Initialize local model loader - REQUIRED
32
- if self.use_local_models:
33
- try:
34
- from .local_model_loader import LocalModelLoader
35
- self.local_loader = LocalModelLoader()
36
- logger.info("βœ“ Local model loader initialized (GPU-based inference)")
37
-
38
- # Note: Pre-loading will happen on first request (lazy loading)
39
- # Models will be loaded on-demand to avoid blocking startup
40
- logger.info("Models will be loaded on-demand for faster startup")
41
- except Exception as e:
42
- logger.error(f"❌ CRITICAL: Could not initialize local model loader: {e}")
43
- logger.error("Local models are required - API fallback has been removed")
44
- raise RuntimeError(
45
- "Local model loader is required but could not be initialized. "
46
- "Please ensure transformers and torch are installed."
47
- ) from e
48
- else:
49
- logger.error("use_local_models=False but API fallback removed - this will fail")
50
- raise ValueError("use_local_models must be True - API fallback has been removed")
 
 
 
 
 
 
 
 
 
51
 
52
  async def route_inference(self, task_type: str, prompt: str, **kwargs):
53
  """
54
- Smart routing based on task specialization
55
- Uses ONLY local models - no API fallback
 
 
 
 
 
 
 
56
  """
57
- logger.info(f"Routing inference for task: {task_type}")
58
- model_config = self._select_model(task_type)
59
- logger.info(f"Selected model: {model_config['model_id']}")
60
 
61
- # Use local models only
62
- if not self.local_loader:
63
- raise RuntimeError("Local model loader not available - cannot perform inference")
64
 
65
  try:
66
- # Handle embedding generation separately
67
  if task_type == "embedding_generation":
68
- result = await self._call_local_embedding(model_config, prompt, **kwargs)
 
 
69
  else:
70
- result = await self._call_local_model(model_config, prompt, task_type, **kwargs)
71
 
72
  if result is None:
73
- logger.error(f"Local model returned None for task: {task_type}")
74
  raise RuntimeError(f"Inference failed for task: {task_type}")
75
 
76
- logger.info(f"Inference complete for {task_type} (local model)")
77
  return result
78
 
79
  except Exception as e:
80
- logger.error(f"Local model inference failed: {e}", exc_info=True)
81
- # Try fallback model if configured
82
- fallback_model_id = model_config.get("fallback")
83
- if fallback_model_id and fallback_model_id != model_config["model_id"]:
84
- logger.warning(f"Attempting fallback model: {fallback_model_id}")
85
- try:
86
- fallback_config = model_config.copy()
87
- fallback_config["model_id"] = fallback_model_id
88
- fallback_config.pop("fallback", None) # Prevent infinite recursion
89
-
90
- if task_type == "embedding_generation":
91
- result = await self._call_local_embedding(fallback_config, prompt, **kwargs)
92
- else:
93
- result = await self._call_local_model(fallback_config, prompt, task_type, **{**kwargs, '_is_fallback': True})
94
-
95
- if result is not None:
96
- logger.info(f"Inference complete using fallback model: {fallback_model_id}")
97
- return result
98
- except Exception as fallback_error:
99
- logger.error(f"Fallback model also failed: {fallback_error}")
100
-
101
- # No API fallback - raise error
102
  raise RuntimeError(
103
  f"Inference failed for task: {task_type}. "
104
- f"Local models are required - ensure models are properly loaded and accessible."
105
  ) from e
106
 
107
- async def _call_local_model(self, model_config: dict, prompt: str, task_type: str, **kwargs) -> Optional[str]:
108
- """Call local model for inference."""
109
- if not self.local_loader:
110
  return None
111
 
112
- # Check if this is already a fallback attempt (prevent infinite loops)
113
- is_fallback_attempt = kwargs.get('_is_fallback', False)
 
 
 
 
 
 
 
 
 
 
 
114
 
115
- model_id = model_config["model_id"]
116
- max_tokens = kwargs.get('max_tokens', 512)
117
- temperature = kwargs.get('temperature', 0.7)
 
 
 
 
 
 
 
 
 
118
 
119
  try:
120
- # Ensure model is loaded
121
- if model_id not in self.local_loader.loaded_models:
122
- logger.info(f"Loading model {model_id} on demand...")
123
- # Check if model config specifies quantization
124
- use_4bit = model_config.get("use_4bit_quantization", False)
125
- use_8bit = model_config.get("use_8bit_quantization", False)
126
- # Fallback to default quantization settings if not specified
127
- if not use_4bit and not use_8bit:
128
- quantization_config = LLM_CONFIG.get("quantization_settings", {})
129
- use_4bit = quantization_config.get("default_4bit", True)
130
- use_8bit = quantization_config.get("default_8bit", False)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
131
 
132
- try:
133
- self.local_loader.load_chat_model(
134
- model_id,
135
- load_in_8bit=use_8bit,
136
- load_in_4bit=use_4bit
137
- )
138
- except GatedRepoError as e:
139
- logger.error(f"❌ Cannot access gated repository {model_id}")
140
- logger.error(f" Visit https://huggingface.co/{model_id.split(':')[0] if ':' in model_id else model_id} to request access.")
141
-
142
- # Prevent infinite loops: if this is already a fallback attempt, don't try another fallback
143
- if is_fallback_attempt:
144
- logger.error("❌ Fallback model also failed with gated repository error")
145
- raise RuntimeError("Both primary and fallback models are gated repositories") from e
146
-
147
- # Try fallback models in order (fallback, then fallback2)
148
- fallback_chain = []
149
- if model_config.get("fallback") and model_config.get("fallback") != model_id:
150
- fallback_chain.append(model_config.get("fallback"))
151
- if model_config.get("fallback2") and model_config.get("fallback2") != model_id:
152
- fallback_chain.append(model_config.get("fallback2"))
153
-
154
- if fallback_chain:
155
- last_error = e
156
- for fallback_idx, fallback_model_id in enumerate(fallback_chain):
157
- logger.warning(f"Attempting fallback model {fallback_idx + 1}/{len(fallback_chain)}: {fallback_model_id}")
158
- try:
159
- # Create fallback config
160
- fallback_config = model_config.copy()
161
- fallback_config["model_id"] = fallback_model_id
162
- # Remove this fallback and subsequent ones to prevent infinite recursion
163
- fallback_config.pop("fallback", None)
164
- fallback_config.pop("fallback2", None)
165
-
166
- # Retry with fallback model (mark as fallback attempt if this is the last fallback)
167
- is_last_fallback = (fallback_idx == len(fallback_chain) - 1)
168
- return await self._call_local_model(
169
- fallback_config,
170
- prompt,
171
- task_type,
172
- **{**kwargs, '_is_fallback': is_last_fallback}
173
- )
174
- except GatedRepoError as fallback_gated_error:
175
- logger.error(f"❌ Fallback model {fallback_model_id} is also gated")
176
- last_error = fallback_gated_error
177
- if fallback_idx == len(fallback_chain) - 1:
178
- # Last fallback failed
179
- raise RuntimeError("All models (primary and fallbacks) are gated repositories") from fallback_gated_error
180
- # Continue to next fallback
181
- continue
182
- except Exception as fallback_error:
183
- logger.error(f"Fallback model {fallback_model_id} failed: {fallback_error}")
184
- last_error = fallback_error
185
- if fallback_idx == len(fallback_chain) - 1:
186
- # Last fallback failed
187
- raise
188
- # Continue to next fallback
189
- continue
190
- # All fallbacks exhausted
191
- raise RuntimeError(f"All models failed. Last error: {last_error}") from last_error
192
- else:
193
- raise RuntimeError(f"Model {model_id} is a gated repository and no fallback available") from e
194
- except (RuntimeError, ModuleNotFoundError, ImportError) as e:
195
- # Check if this is a bitsandbytes error (not a gated repo error)
196
- error_str = str(e).lower()
197
- if "bitsandbytes" in error_str or "int8_mm_dequant" in error_str or "validate_bnb_backend" in error_str:
198
- logger.warning(f"⚠ BitsAndBytes compatibility issue detected: {e}")
199
- logger.warning(f"⚠ Model {model_id} will be loaded without quantization")
200
- # Retry without quantization
201
- try:
202
- # Disable quantization for this attempt
203
- fallback_config = model_config.copy()
204
- fallback_config["use_4bit_quantization"] = False
205
- fallback_config["use_8bit_quantization"] = False
206
- return await self._call_local_model(
207
- fallback_config,
208
- prompt,
209
- task_type,
210
- **kwargs
211
- )
212
- except Exception as retry_error:
213
- logger.error(f"Failed to load model even without quantization: {retry_error}")
214
- raise RuntimeError(f"Model loading failed: {retry_error}") from retry_error
215
- else:
216
- # Not a bitsandbytes error, re-raise
217
- raise
218
-
219
- # Format as chat messages if needed
220
- messages = [{"role": "user", "content": prompt}]
221
-
222
- # Generate using local model
223
- result = await asyncio.to_thread(
224
- self.local_loader.generate_chat_completion,
225
- model_id=model_id,
226
- messages=messages,
227
- max_tokens=max_tokens,
228
- temperature=temperature
229
- )
230
-
231
- logger.info(f"Local model {model_id} generated response (length: {len(result)})")
232
- logger.info("=" * 80)
233
- logger.info("LOCAL MODEL RESPONSE:")
234
- logger.info("=" * 80)
235
- logger.info(f"Model: {model_id}")
236
- logger.info(f"Task Type: {task_type}")
237
- logger.info(f"Response Length: {len(result)} characters")
238
- logger.info("-" * 40)
239
- logger.info("FULL RESPONSE CONTENT:")
240
- logger.info("-" * 40)
241
- logger.info(result)
242
- logger.info("-" * 40)
243
- logger.info("END OF RESPONSE")
244
- logger.info("=" * 80)
245
-
246
- return result
247
-
248
- except GatedRepoError:
249
- # Re-raise to be handled by caller
250
- raise
251
  except Exception as e:
252
- logger.error(f"Error calling local model: {e}", exc_info=True)
253
  raise
254
 
255
- async def _call_local_embedding(self, model_config: dict, text: str, **kwargs) -> Optional[list]:
256
- """Call local embedding model."""
257
- if not self.local_loader:
258
- raise RuntimeError("Local model loader not available")
 
 
 
 
259
 
260
- model_id = model_config["model_id"]
 
 
 
 
261
 
262
- try:
263
- # Ensure model is loaded
264
- if model_id not in self.local_loader.loaded_embedding_models:
265
- logger.info(f"Loading embedding model {model_id} on demand...")
266
- try:
267
- self.local_loader.load_embedding_model(model_id)
268
- except GatedRepoError as e:
269
- logger.error(f"❌ Cannot access gated repository {model_id}")
270
- logger.error(f" Visit https://huggingface.co/{model_id.split(':')[0] if ':' in model_id else model_id} to request access.")
271
- raise RuntimeError(f"Embedding model {model_id} is a gated repository") from e
272
-
273
- # Generate embedding
274
- embedding = await asyncio.to_thread(
275
- self.local_loader.get_embedding,
276
- model_id=model_id,
277
- text=text
278
- )
279
-
280
- logger.info(f"Local embedding model {model_id} generated vector (dim: {len(embedding)})")
281
- return embedding
282
-
283
- except Exception as e:
284
- logger.error(f"Error calling local embedding model: {e}", exc_info=True)
285
- raise
 
 
286
 
287
  def _select_model(self, task_type: str) -> dict:
 
288
  model_map = {
289
  "intent_classification": LLM_CONFIG["models"]["classification_specialist"],
290
  "embedding_generation": LLM_CONFIG["models"]["embedding_specialist"],
@@ -294,64 +217,73 @@ class LLMRouter:
294
  }
295
  return model_map.get(task_type, LLM_CONFIG["models"]["reasoning_primary"])
296
 
297
- # REMOVED: _is_model_healthy - no longer needed (local models only)
298
- # REMOVED: _get_fallback_model - no longer needed (local models only)
299
- # REMOVED: _call_hf_endpoint - HF API inference removed
300
-
301
  async def get_available_models(self):
302
- """
303
- Get list of available models for testing
304
- """
305
- return list(LLM_CONFIG["models"].keys())
306
 
307
  async def health_check(self):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
308
  """
309
- Perform health check on local models only
 
 
 
 
 
 
310
  """
311
- health_status = {}
312
- if not self.local_loader:
313
- return {"error": "Local model loader not available"}
314
 
315
- for model_name, model_config in LLM_CONFIG["models"].items():
316
- model_id = model_config["model_id"]
317
- # Check if model is loaded (for chat models)
318
- is_loaded = model_id in self.local_loader.loaded_models or model_id in self.local_loader.loaded_embedding_models
319
- health_status[model_name] = {
320
- "model_id": model_id,
321
- "loaded": is_loaded,
322
- "healthy": is_loaded # Consider loaded models healthy
323
- }
324
 
325
- return health_status
326
-
327
- def prepare_context_for_llm(self, raw_context: Dict, max_tokens: int = 4000) -> str:
328
- """Smart context windowing for LLM calls"""
329
 
330
- try:
331
- from transformers import AutoTokenizer
332
-
333
- # Initialize tokenizer lazily
334
- if not hasattr(self, 'tokenizer'):
335
- try:
336
- # Use the primary model for tokenization
337
- primary_model_id = LLM_CONFIG["models"]["reasoning_primary"]["model_id"]
338
- # Strip API suffix if present (though we don't use them anymore)
339
- base_model_id = primary_model_id.split(':')[0] if ':' in primary_model_id else primary_model_id
340
- self.tokenizer = AutoTokenizer.from_pretrained(base_model_id)
341
- except GatedRepoError as e:
342
- logger.warning(f"Gated repository error loading tokenizer: {e}")
343
- logger.warning("Using character count estimation instead")
344
- self.tokenizer = None
345
- except Exception as e:
346
- logger.warning(f"Could not load tokenizer: {e}, using character count estimation")
347
- self.tokenizer = None
348
- except ImportError:
349
- logger.warning("transformers library not available, using character count estimation")
350
- self.tokenizer = None
351
 
352
- # Priority order for context elements
 
 
 
 
 
 
 
 
353
  priority_elements = [
354
- ('current_query', 1.0),
355
  ('recent_interactions', 0.8),
356
  ('user_preferences', 0.6),
357
  ('session_summary', 0.4),
@@ -359,12 +291,15 @@ class LLMRouter:
359
  ]
360
 
361
  formatted_context = []
362
- total_tokens = 0
363
 
 
 
 
 
 
364
  for element, priority in priority_elements:
365
- # Map element names to context keys
366
  element_key_map = {
367
- 'current_query': raw_context.get('user_input', ''),
368
  'recent_interactions': raw_context.get('interaction_contexts', []),
369
  'user_preferences': raw_context.get('preferences', {}),
370
  'session_summary': raw_context.get('session_context', {}),
@@ -377,55 +312,32 @@ class LLMRouter:
377
  if isinstance(content, dict):
378
  content = str(content)
379
  elif isinstance(content, list):
380
- content = "\n".join([str(item) for item in content[:10]]) # Limit to 10 items
381
 
382
  if not content:
383
  continue
384
 
385
- # Estimate tokens
386
- if self.tokenizer:
387
- try:
388
- tokens = len(self.tokenizer.encode(content))
389
- except:
390
- # Fallback to character-based estimation (rough: 1 token β‰ˆ 4 chars)
391
- tokens = len(content) // 4
392
- else:
393
- # Character-based estimation (rough: 1 token β‰ˆ 4 chars)
394
- tokens = len(content) // 4
395
 
396
  if total_tokens + tokens <= max_tokens:
397
  formatted_context.append(f"=== {element.upper()} ===\n{content}")
398
  total_tokens += tokens
399
- elif priority > 0.5: # Critical elements - truncate if needed
400
  available = max_tokens - total_tokens
401
  if available > 100: # Only truncate if we have meaningful space
402
  truncated = self._truncate_to_tokens(content, available)
403
  formatted_context.append(f"=== {element.upper()} (TRUNCATED) ===\n{truncated}")
 
404
  break
405
 
 
406
  return "\n\n".join(formatted_context)
407
 
408
  def _truncate_to_tokens(self, content: str, max_tokens: int) -> str:
409
  """Truncate content to fit within token limit"""
410
- if not self.tokenizer:
411
- # Simple character-based truncation
412
- max_chars = max_tokens * 4
413
- if len(content) <= max_chars:
414
- return content
415
- return content[:max_chars-3] + "..."
416
-
417
- try:
418
- # Tokenize and truncate
419
- tokens = self.tokenizer.encode(content)
420
- if len(tokens) <= max_tokens:
421
- return content
422
-
423
- truncated_tokens = tokens[:max_tokens-3] # Leave room for "..."
424
- truncated_text = self.tokenizer.decode(truncated_tokens)
425
- return truncated_text + "..."
426
- except Exception as e:
427
- logger.warning(f"Error truncating with tokenizer: {e}, using character truncation")
428
- max_chars = max_tokens * 4
429
- if len(content) <= max_chars:
430
- return content
431
- return content[:max_chars-3] + "..."
 
1
+ # llm_router.py - NOVITA AI API ONLY
2
  import logging
3
  import asyncio
4
  from typing import Dict, Optional
5
  from .models_config import LLM_CONFIG
6
+ from .config import get_settings
7
 
8
+ # Import OpenAI client for Novita AI API
9
  try:
10
+ from openai import OpenAI
11
+ OPENAI_AVAILABLE = True
12
  except ImportError:
13
+ OPENAI_AVAILABLE = False
14
+ logger = logging.getLogger(__name__)
15
+ logger.error("openai package not available - Novita AI API requires openai package")
16
 
17
  logger = logging.getLogger(__name__)
18
 
19
  class LLMRouter:
20
+ def __init__(self, hf_token=None, use_local_models: bool = False):
21
+ """
22
+ Initialize LLM Router with Novita AI API only.
 
 
 
 
23
 
24
+ Args:
25
+ hf_token: Not used (kept for backward compatibility)
26
+ use_local_models: Must be False (local models disabled)
27
+ """
28
+ if use_local_models:
29
+ raise ValueError("Local models are disabled. Only Novita AI API is supported.")
30
 
31
+ self.settings = get_settings()
32
+ self.novita_client = None
33
+
34
+ # Validate OpenAI package
35
+ if not OPENAI_AVAILABLE:
36
+ raise ImportError(
37
+ "openai package is required for Novita AI API. "
38
+ "Install it with: pip install openai>=1.0.0"
39
+ )
40
+
41
+ # Validate API key
42
+ if not self.settings.novita_api_key:
43
+ raise ValueError(
44
+ "NOVITA_API_KEY is required. "
45
+ "Set it in environment variables or .env file"
46
+ )
47
+
48
+ # Initialize Novita AI client
49
+ try:
50
+ self.novita_client = OpenAI(
51
+ base_url=self.settings.novita_base_url,
52
+ api_key=self.settings.novita_api_key,
53
+ )
54
+ logger.info(f"βœ“ Novita AI API client initialized")
55
+ logger.info(f" Base URL: {self.settings.novita_base_url}")
56
+ logger.info(f" Model: {self.settings.novita_model}")
57
+ except Exception as e:
58
+ logger.error(f"Failed to initialize Novita AI client: {e}")
59
+ raise RuntimeError(f"Could not initialize Novita AI API client: {e}") from e
60
 
61
  async def route_inference(self, task_type: str, prompt: str, **kwargs):
62
  """
63
+ Route inference to Novita AI API.
64
+
65
+ Args:
66
+ task_type: Type of task (general_reasoning, intent_classification, etc.)
67
+ prompt: Input prompt
68
+ **kwargs: Additional parameters (max_tokens, temperature, etc.)
69
+
70
+ Returns:
71
+ Generated text response
72
  """
73
+ logger.info(f"Routing inference to Novita AI API for task: {task_type}")
 
 
74
 
75
+ if not self.novita_client:
76
+ raise RuntimeError("Novita AI client not initialized")
 
77
 
78
  try:
79
+ # Handle embedding generation (may need special handling)
80
  if task_type == "embedding_generation":
81
+ logger.warning("Embedding generation via Novita API may require special implementation")
82
+ # For now, use chat completion (may need adjustment based on Novita API capabilities)
83
+ result = await self._call_novita_api(task_type, prompt, **kwargs)
84
  else:
85
+ result = await self._call_novita_api(task_type, prompt, **kwargs)
86
 
87
  if result is None:
88
+ logger.error(f"Novita AI API returned None for task: {task_type}")
89
  raise RuntimeError(f"Inference failed for task: {task_type}")
90
 
91
+ logger.info(f"Inference complete for {task_type} (Novita AI API)")
92
  return result
93
 
94
  except Exception as e:
95
+ logger.error(f"Novita AI API inference failed: {e}", exc_info=True)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
  raise RuntimeError(
97
  f"Inference failed for task: {task_type}. "
98
+ f"Novita AI API error: {e}"
99
  ) from e
100
 
101
+ async def _call_novita_api(self, task_type: str, prompt: str, **kwargs) -> Optional[str]:
102
+ """Call Novita AI API for inference."""
103
+ if not self.novita_client:
104
  return None
105
 
106
+ # Get model config
107
+ model_config = self._select_model(task_type)
108
+ model_name = kwargs.get('model', self.settings.novita_model)
109
+
110
+ # Get optimized parameters
111
+ max_tokens = kwargs.get('max_tokens', model_config.get('max_tokens', 4096))
112
+ temperature = kwargs.get('temperature',
113
+ model_config.get('temperature', self.settings.deepseek_r1_temperature))
114
+ top_p = kwargs.get('top_p', model_config.get('top_p', 0.95))
115
+ stream = kwargs.get('stream', False)
116
+
117
+ # Format prompt according to DeepSeek-R1 best practices
118
+ formatted_prompt = self._format_deepseek_r1_prompt(prompt, task_type, model_config)
119
 
120
+ # IMPORTANT: No system prompt - all instructions in user prompt
121
+ messages = [{"role": "user", "content": formatted_prompt}]
122
+
123
+ # Build request parameters
124
+ request_params = {
125
+ "model": model_name,
126
+ "messages": messages,
127
+ "stream": stream,
128
+ "max_tokens": max_tokens,
129
+ "temperature": temperature,
130
+ "top_p": top_p,
131
+ }
132
 
133
  try:
134
+ if stream:
135
+ # Handle streaming response
136
+ response_text = ""
137
+ stream_response = self.novita_client.chat.completions.create(**request_params)
138
+
139
+ for chunk in stream_response:
140
+ if chunk.choices and len(chunk.choices) > 0:
141
+ delta = chunk.choices[0].delta
142
+ if delta and delta.content:
143
+ response_text += delta.content
144
+
145
+ # Clean up reasoning tags if present
146
+ response_text = self._clean_reasoning_tags(response_text)
147
+ logger.info(f"Novita AI API generated response (length: {len(response_text)})")
148
+ return response_text
149
+ else:
150
+ # Handle non-streaming response
151
+ response = self.novita_client.chat.completions.create(**request_params)
152
+
153
+ if response.choices and len(response.choices) > 0:
154
+ result = response.choices[0].message.content
155
+ # Clean up reasoning tags if present
156
+ result = self._clean_reasoning_tags(result)
157
+ logger.info(f"Novita AI API generated response (length: {len(result)})")
158
+ return result
159
+ else:
160
+ logger.error("Novita AI API returned empty response")
161
+ return None
162
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
163
  except Exception as e:
164
+ logger.error(f"Error calling Novita AI API: {e}", exc_info=True)
165
  raise
166
 
167
+ def _format_deepseek_r1_prompt(self, prompt: str, task_type: str, model_config: dict) -> str:
168
+ """
169
+ Format prompt according to DeepSeek-R1 best practices:
170
+ - No system prompt (all instructions in user prompt)
171
+ - Force reasoning trigger for reasoning tasks
172
+ - Add math directive for mathematical problems
173
+ """
174
+ formatted_prompt = prompt
175
 
176
+ # Check if we should force reasoning prefix
177
+ force_reasoning = (
178
+ self.settings.deepseek_r1_force_reasoning and
179
+ model_config.get("force_reasoning_prefix", False)
180
+ )
181
 
182
+ if force_reasoning:
183
+ # Force model to start with reasoning trigger
184
+ formatted_prompt = f"`<think>`\n\n{formatted_prompt}"
185
+
186
+ # Add math directive for mathematical problems
187
+ if self._is_math_query(prompt):
188
+ math_directive = "Please reason step by step, and put your final answer within \\boxed{}."
189
+ formatted_prompt = f"{formatted_prompt}\n\n{math_directive}"
190
+
191
+ return formatted_prompt
192
+
193
+ def _is_math_query(self, prompt: str) -> bool:
194
+ """Detect if query is mathematical"""
195
+ math_keywords = [
196
+ "solve", "calculate", "compute", "equation", "formula",
197
+ "mathematical", "algebra", "geometry", "calculus", "integral",
198
+ "derivative", "theorem", "proof", "problem"
199
+ ]
200
+ prompt_lower = prompt.lower()
201
+ return any(keyword in prompt_lower for keyword in math_keywords)
202
+
203
+ def _clean_reasoning_tags(self, text: str) -> str:
204
+ """Clean up reasoning tags from response"""
205
+ text = text.replace("`<think>`", "").replace("`</think>`", "")
206
+ text = text.strip()
207
+ return text
208
 
209
  def _select_model(self, task_type: str) -> dict:
210
+ """Select model configuration based on task type"""
211
  model_map = {
212
  "intent_classification": LLM_CONFIG["models"]["classification_specialist"],
213
  "embedding_generation": LLM_CONFIG["models"]["embedding_specialist"],
 
217
  }
218
  return model_map.get(task_type, LLM_CONFIG["models"]["reasoning_primary"])
219
 
 
 
 
 
220
  async def get_available_models(self):
221
+ """Get list of available models (Novita AI only)"""
222
+ return ["Novita AI API - DeepSeek-R1-Distill-Qwen-7B"]
 
 
223
 
224
  async def health_check(self):
225
+ """Perform health check on Novita AI API"""
226
+ try:
227
+ # Test API with a simple request
228
+ test_response = self.novita_client.chat.completions.create(
229
+ model=self.settings.novita_model,
230
+ messages=[{"role": "user", "content": "test"}],
231
+ max_tokens=5
232
+ )
233
+
234
+ return {
235
+ "provider": "novita_api",
236
+ "status": "healthy",
237
+ "model": self.settings.novita_model,
238
+ "base_url": self.settings.novita_base_url
239
+ }
240
+ except Exception as e:
241
+ logger.error(f"Health check failed: {e}")
242
+ return {
243
+ "provider": "novita_api",
244
+ "status": "unhealthy",
245
+ "error": str(e)
246
+ }
247
+
248
+ def prepare_context_for_llm(self, raw_context: Dict, max_tokens: Optional[int] = None,
249
+ user_input: Optional[str] = None) -> str:
250
  """
251
+ Smart context windowing with user input priority.
252
+ User input is NEVER truncated - context is reduced to fit.
253
+
254
+ Args:
255
+ raw_context: Context dictionary
256
+ max_tokens: Optional override (uses config default if None)
257
+ user_input: Optional explicit user input (takes priority over raw_context['user_input'])
258
  """
259
+ # Use config budget if not provided
260
+ if max_tokens is None:
261
+ max_tokens = self.settings.context_preparation_budget
262
 
263
+ # Get user input (explicit parameter takes priority)
264
+ actual_user_input = user_input or raw_context.get('user_input', '')
 
 
 
 
 
 
 
265
 
266
+ # Calculate user input tokens (simple estimation: 1 token β‰ˆ 4 chars)
267
+ user_input_tokens = len(actual_user_input) // 4
 
 
268
 
269
+ # Ensure user input fits within dedicated budget
270
+ user_input_max = self.settings.user_input_max_tokens
271
+ if user_input_tokens > user_input_max:
272
+ logger.warning(f"User input ({user_input_tokens} tokens) exceeds max ({user_input_max}), truncating")
273
+ max_chars = user_input_max * 4
274
+ actual_user_input = actual_user_input[:max_chars - 3] + "..."
275
+ user_input_tokens = user_input_max
 
 
 
 
 
 
 
 
 
 
 
 
 
 
276
 
277
+ # Reserve space for user input (it has highest priority)
278
+ remaining_tokens = max_tokens - user_input_tokens
279
+ if remaining_tokens < 0:
280
+ logger.warning(f"User input ({user_input_tokens} tokens) exceeds total budget ({max_tokens})")
281
+ remaining_tokens = 0
282
+
283
+ logger.info(f"Token allocation: User input={user_input_tokens}, Context budget={remaining_tokens}, Total={max_tokens}")
284
+
285
+ # Priority order for context elements (user input already handled)
286
  priority_elements = [
 
287
  ('recent_interactions', 0.8),
288
  ('user_preferences', 0.6),
289
  ('session_summary', 0.4),
 
291
  ]
292
 
293
  formatted_context = []
294
+ total_tokens = user_input_tokens # Start with user input tokens
295
 
296
+ # Add user input first (unconditionally, never truncated)
297
+ if actual_user_input:
298
+ formatted_context.append(f"=== USER INPUT ===\n{actual_user_input}")
299
+
300
+ # Now add context elements within remaining budget
301
  for element, priority in priority_elements:
 
302
  element_key_map = {
 
303
  'recent_interactions': raw_context.get('interaction_contexts', []),
304
  'user_preferences': raw_context.get('preferences', {}),
305
  'session_summary': raw_context.get('session_context', {}),
 
312
  if isinstance(content, dict):
313
  content = str(content)
314
  elif isinstance(content, list):
315
+ content = "\n".join([str(item) for item in content[:10]])
316
 
317
  if not content:
318
  continue
319
 
320
+ # Estimate tokens (simple: 1 token β‰ˆ 4 chars)
321
+ tokens = len(content) // 4
 
 
 
 
 
 
 
 
322
 
323
  if total_tokens + tokens <= max_tokens:
324
  formatted_context.append(f"=== {element.upper()} ===\n{content}")
325
  total_tokens += tokens
326
+ elif priority > 0.5 and remaining_tokens > 0: # Critical elements - truncate if needed
327
  available = max_tokens - total_tokens
328
  if available > 100: # Only truncate if we have meaningful space
329
  truncated = self._truncate_to_tokens(content, available)
330
  formatted_context.append(f"=== {element.upper()} (TRUNCATED) ===\n{truncated}")
331
+ total_tokens += available
332
  break
333
 
334
+ logger.info(f"Context prepared: {total_tokens}/{max_tokens} tokens (user input: {user_input_tokens}, context: {total_tokens - user_input_tokens})")
335
  return "\n\n".join(formatted_context)
336
 
337
  def _truncate_to_tokens(self, content: str, max_tokens: int) -> str:
338
  """Truncate content to fit within token limit"""
339
+ # Simple character-based truncation (1 token β‰ˆ 4 chars)
340
+ max_chars = max_tokens * 4
341
+ if len(content) <= max_chars:
342
+ return content
343
+ return content[:max_chars - 3] + "..."
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/models_config.py CHANGED
@@ -1,61 +1,45 @@
1
  # models_config.py
2
- # Optimized for NVIDIA T4 Medium (16GB VRAM) with 4-bit quantization
3
- # UPDATED: Local models only - no API fallback
4
  LLM_CONFIG = {
5
- "primary_provider": "local",
6
  "models": {
7
  "reasoning_primary": {
8
- # Primary: Qwen (gated, requires access) - Fallback: Mistral (non-gated, stable)
9
- "model_id": "Qwen/Qwen2.5-7B-Instruct", # Single primary model for all text tasks
10
  "task": "general_reasoning",
11
- "max_tokens": 8000, # Reduced from 10000
12
- "temperature": 0.7,
13
- # Fallback to Mistral (non-gated, no DynamicCache issues) before Phi-3
14
- "fallback": "mistralai/Mistral-7B-Instruct-v0.2", # Non-gated, stable, no DynamicCache issues
15
- "fallback2": "microsoft/Phi-3-mini-4k-instruct", # Secondary fallback (3.8B, has DynamicCache workaround)
16
- "is_chat_model": True,
17
- "use_4bit_quantization": True, # Enable 4-bit quantization for 16GB T4
18
- "use_8bit_quantization": False
19
- },
20
- "embedding_specialist": {
21
- "model_id": "intfloat/e5-large-v2", # 1024-dim embeddings for semantic similarity
22
- "task": "embeddings",
23
- "vector_dimensions": 1024,
24
- "purpose": "semantic_similarity",
25
- "is_chat_model": False
26
  },
27
  "classification_specialist": {
28
- "model_id": "Qwen/Qwen2.5-7B-Instruct", # Same model for all text tasks
29
  "task": "intent_classification",
30
- "max_length": 512,
31
- "specialization": "fast_inference",
32
- "latency_target": "<100ms",
33
- "is_chat_model": True,
34
- "use_4bit_quantization": True,
35
- "fallback": "mistralai/Mistral-7B-Instruct-v0.2", # Non-gated, stable
36
- "fallback2": "microsoft/Phi-3-mini-4k-instruct" # Secondary fallback with DynamicCache workaround
37
  },
38
  "safety_checker": {
39
- "model_id": "Qwen/Qwen2.5-7B-Instruct", # Same model for all text tasks
40
  "task": "content_moderation",
41
- "confidence_threshold": 0.85,
42
- "purpose": "bias_detection",
43
- "is_chat_model": True,
44
- "use_4bit_quantization": True,
45
- "fallback": "mistralai/Mistral-7B-Instruct-v0.2", # Non-gated, stable
46
- "fallback2": "microsoft/Phi-3-mini-4k-instruct" # Secondary fallback with DynamicCache workaround
 
 
 
 
 
47
  }
48
  },
49
  "routing_logic": {
50
- "strategy": "task_based_routing",
51
- "fallback_chain": ["primary"], # No API fallback
52
- "load_balancing": "single_model_reuse"
53
- },
54
- "quantization_settings": {
55
- "default_4bit": True, # Enable 4-bit quantization by default for T4 16GB
56
- "default_8bit": False,
57
- "bnb_4bit_compute_dtype": "float16",
58
- "bnb_4bit_use_double_quant": True,
59
- "bnb_4bit_quant_type": "nf4"
60
  }
61
  }
 
1
  # models_config.py
2
+ # UPDATED: Novita AI API only - no local models
 
3
  LLM_CONFIG = {
4
+ "primary_provider": "novita_api",
5
  "models": {
6
  "reasoning_primary": {
7
+ "model_id": "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2",
 
8
  "task": "general_reasoning",
9
+ "max_tokens": 4096,
10
+ "temperature": 0.6, # Recommended for DeepSeek-R1
11
+ "top_p": 0.95,
12
+ "force_reasoning_prefix": True,
13
+ "is_chat_model": True
 
 
 
 
 
 
 
 
 
 
14
  },
15
  "classification_specialist": {
16
+ "model_id": "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2",
17
  "task": "intent_classification",
18
+ "max_tokens": 512,
19
+ "temperature": 0.5, # Lower for consistency
20
+ "top_p": 0.9,
21
+ "force_reasoning_prefix": False,
22
+ "is_chat_model": True
 
 
23
  },
24
  "safety_checker": {
25
+ "model_id": "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2",
26
  "task": "content_moderation",
27
+ "max_tokens": 1024,
28
+ "temperature": 0.5,
29
+ "top_p": 0.9,
30
+ "force_reasoning_prefix": False,
31
+ "is_chat_model": True
32
+ },
33
+ "embedding_specialist": {
34
+ "model_id": "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2",
35
+ "task": "embeddings",
36
+ "note": "Embeddings via Novita API - may require special handling",
37
+ "is_chat_model": True
38
  }
39
  },
40
  "routing_logic": {
41
+ "strategy": "novita_api_only",
42
+ "fallback_chain": [],
43
+ "load_balancing": "single_endpoint"
 
 
 
 
 
 
 
44
  }
45
  }
test_novita_conda.bat ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ @echo off
2
+ REM Test Novita AI connection using Anaconda environment
3
+ REM This script activates the conda environment and runs the test
4
+
5
+ echo ============================================================
6
+ echo Testing Novita AI Connection with Anaconda
7
+ echo ============================================================
8
+ echo.
9
+
10
+ REM Check if conda is available
11
+ where conda >nul 2>&1
12
+ if %ERRORLEVEL% NEQ 0 (
13
+ echo ERROR: conda command not found
14
+ echo Please activate Anaconda Prompt first or add conda to PATH
15
+ goto :end
16
+ )
17
+
18
+ echo Step 1: Checking conda environments...
19
+ call conda env list
20
+
21
+ echo.
22
+ echo Step 2: Creating environment if it doesn't exist...
23
+ call conda env create -f environment.yml --name research-ai-assistant 2>nul
24
+ if %ERRORLEVEL% NEQ 0 (
25
+ echo Environment may already exist, continuing...
26
+ )
27
+
28
+ echo.
29
+ echo Step 3: Activating environment and running test...
30
+ call conda activate research-ai-assistant
31
+ if %ERRORLEVEL% NEQ 0 (
32
+ echo ERROR: Failed to activate environment
33
+ echo Try: conda activate research-ai-assistant
34
+ goto :end
35
+ )
36
+
37
+ echo.
38
+ echo Step 4: Installing openai package if needed...
39
+ python -c "import openai" 2>nul
40
+ if %ERRORLEVEL% NEQ 0 (
41
+ echo Installing openai package...
42
+ pip install openai>=1.0.0
43
+ )
44
+
45
+ echo.
46
+ echo Step 5: Running Novita AI connection test...
47
+ python test_novita_connection.py
48
+
49
+ :end
50
+ echo.
51
+ echo Test complete!
52
+ pause
53
+
test_novita_connection.py ADDED
@@ -0,0 +1,275 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test script for Novita AI API connection
4
+ Tests configuration, client initialization, and API calls
5
+ """
6
+
7
+ import os
8
+ import sys
9
+ import asyncio
10
+ from pathlib import Path
11
+
12
+ # Add project root to path
13
+ project_root = Path(__file__).parent
14
+ sys.path.insert(0, str(project_root))
15
+
16
+ def test_configuration():
17
+ """Test configuration loading"""
18
+ print("=" * 60)
19
+ print("TEST 1: Configuration Loading")
20
+ print("=" * 60)
21
+
22
+ try:
23
+ from src.config import get_settings
24
+ settings = get_settings()
25
+
26
+ print(f"βœ“ Configuration loaded successfully")
27
+ print(f" Novita API Key: {'Set' if settings.novita_api_key else 'NOT SET'}")
28
+ print(f" Base URL: {settings.novita_base_url}")
29
+ print(f" Model: {settings.novita_model}")
30
+ print(f" Temperature: {settings.deepseek_r1_temperature}")
31
+ print(f" Force Reasoning: {settings.deepseek_r1_force_reasoning}")
32
+ print(f" User Input Max Tokens: {settings.user_input_max_tokens}")
33
+ print(f" Context Preparation Budget: {settings.context_preparation_budget}")
34
+
35
+ if not settings.novita_api_key:
36
+ print("\n❌ ERROR: NOVITA_API_KEY is not set!")
37
+ print(" Please set it in environment variables or .env file")
38
+ return False
39
+
40
+ return True
41
+
42
+ except Exception as e:
43
+ print(f"❌ Configuration loading failed: {e}")
44
+ import traceback
45
+ traceback.print_exc()
46
+ return False
47
+
48
+ def test_openai_package():
49
+ """Test OpenAI package availability"""
50
+ print("\n" + "=" * 60)
51
+ print("TEST 2: OpenAI Package Check")
52
+ print("=" * 60)
53
+
54
+ try:
55
+ from openai import OpenAI
56
+ print("βœ“ OpenAI package is available")
57
+ print(f" OpenAI version: {OpenAI.__module__}")
58
+ return True
59
+ except ImportError as e:
60
+ print(f"❌ OpenAI package not available: {e}")
61
+ print(" Install with: pip install openai>=1.0.0")
62
+ return False
63
+
64
+ def test_client_initialization():
65
+ """Test Novita AI client initialization"""
66
+ print("\n" + "=" * 60)
67
+ print("TEST 3: Novita AI Client Initialization")
68
+ print("=" * 60)
69
+
70
+ try:
71
+ from src.config import get_settings
72
+ from openai import OpenAI
73
+
74
+ settings = get_settings()
75
+
76
+ if not settings.novita_api_key:
77
+ print("❌ Cannot test - NOVITA_API_KEY not set")
78
+ return False
79
+
80
+ client = OpenAI(
81
+ base_url=settings.novita_base_url,
82
+ api_key=settings.novita_api_key,
83
+ )
84
+
85
+ print("βœ“ Novita AI client initialized successfully")
86
+ print(f" Base URL: {settings.novita_base_url}")
87
+ print(f" API Key: {settings.novita_api_key[:10]}...{settings.novita_api_key[-4:] if len(settings.novita_api_key) > 14 else '***'}")
88
+
89
+ return True, client
90
+
91
+ except Exception as e:
92
+ print(f"❌ Client initialization failed: {e}")
93
+ import traceback
94
+ traceback.print_exc()
95
+ return False, None
96
+
97
+ def test_simple_api_call(client):
98
+ """Test a simple API call to Novita AI"""
99
+ print("\n" + "=" * 60)
100
+ print("TEST 4: Simple API Call")
101
+ print("=" * 60)
102
+
103
+ if not client:
104
+ print("❌ Cannot test - client not initialized")
105
+ return False
106
+
107
+ try:
108
+ from src.config import get_settings
109
+ settings = get_settings()
110
+
111
+ print(f"Sending test request to: {settings.novita_model}")
112
+ print("Prompt: 'Hello, this is a test. Please respond briefly.'")
113
+
114
+ response = client.chat.completions.create(
115
+ model=settings.novita_model,
116
+ messages=[
117
+ {"role": "user", "content": "Hello, this is a test. Please respond briefly."}
118
+ ],
119
+ max_tokens=50,
120
+ temperature=0.6
121
+ )
122
+
123
+ if response.choices and len(response.choices) > 0:
124
+ result = response.choices[0].message.content
125
+ print(f"βœ“ API call successful!")
126
+ print(f" Response length: {len(result)} characters")
127
+ print(f" Response preview: {result[:100]}...")
128
+ print(f" Model used: {response.model if hasattr(response, 'model') else 'N/A'}")
129
+ return True
130
+ else:
131
+ print("❌ API call returned empty response")
132
+ return False
133
+
134
+ except Exception as e:
135
+ print(f"❌ API call failed: {e}")
136
+ import traceback
137
+ traceback.print_exc()
138
+ return False
139
+
140
+ def test_llm_router():
141
+ """Test LLM Router initialization and health check"""
142
+ print("\n" + "=" * 60)
143
+ print("TEST 5: LLM Router Initialization")
144
+ print("=" * 60)
145
+
146
+ try:
147
+ from src.llm_router import LLMRouter
148
+
149
+ print("Initializing LLM Router...")
150
+ router = LLMRouter(hf_token=None, use_local_models=False)
151
+
152
+ print("βœ“ LLM Router initialized successfully")
153
+
154
+ # Test health check
155
+ print("\nTesting health check...")
156
+ async def test_health():
157
+ health = await router.health_check()
158
+ return health
159
+
160
+ health = asyncio.run(test_health())
161
+ print(f"βœ“ Health check result: {health}")
162
+
163
+ return True
164
+
165
+ except Exception as e:
166
+ print(f"❌ LLM Router initialization failed: {e}")
167
+ import traceback
168
+ traceback.print_exc()
169
+ return False
170
+
171
+ async def test_inference():
172
+ """Test actual inference through LLM Router"""
173
+ print("\n" + "=" * 60)
174
+ print("TEST 6: Inference Test")
175
+ print("=" * 60)
176
+
177
+ try:
178
+ from src.llm_router import LLMRouter
179
+
180
+ router = LLMRouter(hf_token=None, use_local_models=False)
181
+
182
+ test_prompt = "What is the capital of France? Answer in one sentence."
183
+ print(f"Test prompt: {test_prompt}")
184
+
185
+ result = await router.route_inference(
186
+ task_type="general_reasoning",
187
+ prompt=test_prompt,
188
+ max_tokens=100,
189
+ temperature=0.6
190
+ )
191
+
192
+ if result:
193
+ print(f"βœ“ Inference successful!")
194
+ print(f" Response length: {len(result)} characters")
195
+ print(f" Response: {result}")
196
+ return True
197
+ else:
198
+ print("❌ Inference returned None")
199
+ return False
200
+
201
+ except Exception as e:
202
+ print(f"❌ Inference test failed: {e}")
203
+ import traceback
204
+ traceback.print_exc()
205
+ return False
206
+
207
+ def main():
208
+ """Run all tests"""
209
+ print("\n" + "=" * 60)
210
+ print("NOVITA AI CONNECTION TEST")
211
+ print("=" * 60)
212
+ print()
213
+
214
+ results = {}
215
+
216
+ # Test 1: Configuration
217
+ results['config'] = test_configuration()
218
+ if not results['config']:
219
+ print("\n❌ Configuration test failed. Please check your environment variables.")
220
+ return
221
+
222
+ # Test 2: OpenAI package
223
+ results['package'] = test_openai_package()
224
+ if not results['package']:
225
+ print("\n❌ Package test failed. Please install: pip install openai>=1.0.0")
226
+ return
227
+
228
+ # Test 3: Client initialization
229
+ client_init_result = test_client_initialization()
230
+ if isinstance(client_init_result, tuple):
231
+ results['client'] = client_init_result[0]
232
+ client = client_init_result[1]
233
+ else:
234
+ results['client'] = client_init_result
235
+ client = None
236
+
237
+ if not results['client']:
238
+ print("\n❌ Client initialization failed. Check your API key and base URL.")
239
+ return
240
+
241
+ # Test 4: Simple API call
242
+ results['api_call'] = test_simple_api_call(client)
243
+
244
+ # Test 5: LLM Router
245
+ results['router'] = test_llm_router()
246
+
247
+ # Test 6: Inference
248
+ if results['router']:
249
+ results['inference'] = asyncio.run(test_inference())
250
+
251
+ # Summary
252
+ print("\n" + "=" * 60)
253
+ print("TEST SUMMARY")
254
+ print("=" * 60)
255
+
256
+ total_tests = len(results)
257
+ passed_tests = sum(1 for v in results.values() if v)
258
+
259
+ for test_name, result in results.items():
260
+ status = "βœ“ PASS" if result else "❌ FAIL"
261
+ print(f" {test_name.upper()}: {status}")
262
+
263
+ print(f"\nTotal: {passed_tests}/{total_tests} tests passed")
264
+
265
+ if passed_tests == total_tests:
266
+ print("\nπŸŽ‰ All tests passed! Novita AI connection is working correctly.")
267
+ return 0
268
+ else:
269
+ print("\n⚠️ Some tests failed. Please review the errors above.")
270
+ return 1
271
+
272
+ if __name__ == "__main__":
273
+ exit_code = main()
274
+ sys.exit(exit_code)
275
+