JatsTheAIGen commited on
Commit
80a97c8
·
1 Parent(s): fa862fc

Process flow visualizer + key skills [for validation only) V5

Browse files
FILE_CLEANUP_SUMMARY.md ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # File Cleanup Summary
2
+
3
+ ## System Status
4
+ All core functionality is working. The cleanup preserves working files and archives documentation/test files.
5
+
6
+ ## Files Structure
7
+
8
+ ### ✅ KEEP - Core System Files
9
+ ```
10
+ ├── app.py # Main Gradio application
11
+ ├── main.py # Entry point
12
+ ├── launch.py # Launch script
13
+ ├── orchestrator_engine.py # Orchestration logic
14
+ ├── context_manager.py # Context management
15
+ ├── llm_router.py # LLM routing
16
+ ├── models_config.py # Model configuration
17
+ ├── config.py # System configuration
18
+ ├── requirements.txt # Dependencies
19
+ ├── install.sh # Installation script
20
+ ├── quick_test.sh # Test script
21
+ ├── database_schema.sql # Database schema
22
+ ├── Dockerfile.hf # Docker configuration
23
+ ├── README.md # Main documentation
24
+ ├── SYSTEM_FUNCTIONALITY_REVIEW.md # System status
25
+ └── KEEP_FILES.md # This file
26
+
27
+ src/
28
+ ├── __init__.py
29
+ ├── database.py
30
+ ├── event_handlers.py
31
+ └── agents/
32
+ ├── __init__.py
33
+ ├── intent_agent.py
34
+ ├── safety_agent.py
35
+ └── synthesis_agent.py
36
+ ```
37
+
38
+ ### 📦 ARCHIVE - Documentation (40+ files)
39
+ **Move to:** `archive/docs/`
40
+ - All `CONTEXT_*.md` files (4 files)
41
+ - All `SESSION_*.md` files (3 files)
42
+ - All `MOVING_WINDOW*.md` files
43
+ - `BUG_FIXES.md`
44
+ - `BUILD_READINESS.md`
45
+ - `DEPLOYMENT_*.md` files (2)
46
+ - `FINAL_FIXES_APPLIED.md`
47
+ - `GRACEFUL_DEGRADATION_GUARANTEE.md`
48
+ - `HF_TOKEN_SETUP.md`
49
+ - `IMPLEMENTATION_*.md` files (2)
50
+ - `INTEGRATION_*.md` files (2)
51
+ - `LLM_INTEGRATION_STATUS.md`
52
+ - `LOGGING_GUIDE.md`
53
+ - `PLACEHOLDER_REMOVAL_COMPLETE.md`
54
+ - `SYSTEM_UPGRADE_CONFIRMATION.md`
55
+ - `TECHNICAL_REVIEW.md`
56
+ - `WORKFLOW_INTEGRATION_GUARANTEE.md`
57
+ - `FILE_STRUCTURE.md`
58
+ - `AGENTS_COMPLETE.md`
59
+ - `COMPATIBILITY.md`
60
+
61
+ ### 📦 ARCHIVE - Duplicates (Entire directory)
62
+ **Move to:** `archive/duplicates/`
63
+ - `Research_AI_Assistant/` (entire directory - duplicate of main files)
64
+
65
+ ### 📦 ARCHIVE - Test/Development Files
66
+ **Move to:** `archive/test/`
67
+ - `acceptance_testing.py`
68
+ - `agent_protocols.py`
69
+ - `agent_stubs.py`
70
+ - `cache_implementation.py`
71
+ - `faiss_manager.py`
72
+ - `intent_protocols.py`
73
+ - `intent_recognition.py`
74
+ - `mobile_components.py`
75
+ - `mobile_events.py`
76
+ - `mobile_handlers.py`
77
+ - `performance_optimizations.py`
78
+ - `pwa_features.py`
79
+ - `test_setup.py`
80
+ - `verify_no_downgrade.py`
81
+
82
+ ## Commands to Execute
83
+
84
+ ### Option 1: Manual Archive (Recommended)
85
+ ```bash
86
+ # Create archive directories
87
+ mkdir -p archive/docs archive/duplicates archive/test
88
+
89
+ # Move documentation files
90
+ mv CONTEXT_*.md SESSION_*.md MOVING_WINDOW*.md archive/docs/
91
+ mv BUG_FIXES.md BUILD_READINESS.md DEPLOYMENT*.md archive/docs/
92
+ mv FINAL_FIXES*.md GRACEFUL*.md IMPLEMENTATION*.md archive/docs/
93
+ mv INTEGRATION*.md LLM*.md LOGGING*.md PLACEHOLDER*.md archive/docs/
94
+ mv SYSTEM_UPGRADE*.md TECHNICAL*.md WORKFLOW*.md archive/docs/
95
+ mv FILE_STRUCTURE.md AGENTS_COMPLETE.md COMPATIBILITY.md archive/docs/
96
+
97
+ # Move test files
98
+ mv acceptance_testing.py agent_*.py cache_implementation.py archive/test/
99
+ mv faiss_manager.py intent_*.py mobile_*.py archive/test/
100
+ mv performance_*.py pwa_features.py test_*.py verify_*.py archive/test/
101
+
102
+ # Move duplicates
103
+ mv Research_AI_Assistant archive/duplicates/
104
+ ```
105
+
106
+ ### Option 2: Python Script
107
+ ```bash
108
+ python cleanup_files.py
109
+ ```
110
+
111
+ ## Result
112
+
113
+ After cleanup:
114
+ - **Root directory**: 16 core system files
115
+ - **src/**: 4 Python files
116
+ - **Total**: ~20 files (clean, organized)
117
+ - **archive/**: 60+ archived files
118
+
119
+ ## Benefits
120
+
121
+ 1. ✅ **Clean workspace** - Easy to navigate
122
+ 2. ✅ **Clear structure** - Only essential files visible
123
+ 3. ✅ **Preserved history** - All docs in archive
124
+ 4. ✅ **No functional changes** - System still works
125
+ 5. ✅ **Easy maintenance** - Clear separation
126
+
127
+ ## Next Steps
128
+
129
+ 1. Run cleanup script or manual commands
130
+ 2. Verify system still works: `python app.py`
131
+ 3. Update .gitignore to ignore `archive/`
132
+ 4. Commit changes
133
+
134
+ ---
135
+
136
+ **Status**: Ready to execute cleanup
137
+ **Risk**: Low (all files preserved in archive)
138
+ **Benefit**: High (clean, organized codebase)
139
+
KEEP_FILES.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Files to Keep - System Functionality
2
+
3
+ ## Core System Files (Keep)
4
+
5
+ ### Main Application
6
+ - `app.py` - Main Gradio application
7
+ - `main.py` - Entry point
8
+ - `orchestrator_engine.py` - Main orchestration
9
+ - `context_manager.py` - Context management
10
+ - `llm_router.py` - LLM routing
11
+ - `models_config.py` - Model configuration
12
+ - `config.py` - System configuration
13
+
14
+ ### Source Code (`src/`)
15
+ - `src/` directory (all files)
16
+ - `src/agents/` - All agent implementations
17
+ - `src/database.py` - Database management
18
+ - `src/event_handlers.py` - Event handling
19
+
20
+ ### Supporting Files
21
+ - `requirements.txt` - Python dependencies
22
+ - `README.md` - Main documentation
23
+ - `install.sh` - Installation script
24
+ - `quick_test.sh` - Quick test script
25
+ - `database_schema.sql` - Database schema
26
+
27
+ ## Documentation Files (Keep ONLY Essential)
28
+
29
+ ### Essential Documentation
30
+ - `README.md` - Main project documentation
31
+ - `SYSTEM_FUNCTIONALITY_REVIEW.md` - Current system status
32
+ - `SESSION_UI_FIX_COMPLETE.md` - UI fixes documentation
33
+
34
+ ### Archive (Move to archive/docs/)
35
+ - `CONTEXT_MEMORY_FIX.md`
36
+ - `CONTEXT_SUMMARIZATION_ENHANCED.md`
37
+ - `CONTEXT_SUMMARIZATION_IMPLEMENTED.md`
38
+ - `CONTEXT_WINDOW_INCREASED.md`
39
+ - `SESSION_CONTEXT_FIX.md`
40
+ - `SESSION_CONTEXT_FIX_SUMMARY.md`
41
+ - All other `*.md` files in root and Research_AI_Assistant
42
+
43
+ ## Research_AI_Assistant (Archive Entire Directory)
44
+
45
+ All files in `Research_AI_Assistant/` are duplicates and should be archived.
46
+
47
+ ## Test Files (Move to archive/test/)
48
+ - `acceptance_testing.py`
49
+ - `test_setup.py`
50
+ - `verify_no_downgrade.py`
51
+ - `agent_protocols.py`
52
+ - `agent_stubs.py`
53
+ - `cache_implementation.py`
54
+ - `faiss_manager.py`
55
+ - `intent_protocols.py`
56
+ - `intent_recognition.py`
57
+ - `mobile_components.py`
58
+ - `mobile_events.py`
59
+ - `mobile_handlers.py`
60
+ - `performance_optimizations.py`
61
+ - `pwa_features.py`
62
+
63
+ ## Files to Archive
64
+
65
+ ### Documentation (Keep only 3 essential docs)
66
+ Archive all markdown files except:
67
+ - `README.md`
68
+ - `SYSTEM_FUNCTIONALITY_REVIEW.md`
69
+ - `SESSION_UI_FIX_COMPLETE.md`
70
+
71
+ ### Research_AI_Assistant (Full duplicate)
72
+ - Entire `Research_AI_Assistant/` directory
73
+
74
+ ### Test/Development Files
75
+ - All `*_test.py`, `agent_stubs.py`, protocol files
76
+ - Mobile-specific files (not using mobile UI currently)
77
+ - Optimization/performance files (optional enhancements)
78
+
79
+ ## Summary
80
+
81
+ **Keep**: Core system files + 3 essential docs
82
+ **Archive**: 40+ documentation files + Research_AI_Assistant directory + test files
83
+
app.py CHANGED
@@ -45,10 +45,10 @@ try:
45
  from src.agents.synthesis_agent import create_synthesis_agent
46
  from src.agents.safety_agent import create_safety_agent
47
  from src.agents.skills_identification_agent import create_skills_identification_agent
48
- from llm_router import LLMRouter
49
- from orchestrator_engine import MVPOrchestrator
50
- from context_manager import EfficientContextManager
51
- from config import settings
52
 
53
  logger.info("✓ Successfully imported orchestration components")
54
  orchestrator_available = True
 
45
  from src.agents.synthesis_agent import create_synthesis_agent
46
  from src.agents.safety_agent import create_safety_agent
47
  from src.agents.skills_identification_agent import create_skills_identification_agent
48
+ from src.llm_router import LLMRouter
49
+ from src.orchestrator_engine import MVPOrchestrator
50
+ from src.context_manager import EfficientContextManager
51
+ from src.config import settings
52
 
53
  logger.info("✓ Successfully imported orchestration components")
54
  orchestrator_available = True
cleanup_files.py ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Clean up directory by archiving files"""
2
+ import os
3
+ import shutil
4
+ from pathlib import Path
5
+
6
+ # Create archive structure
7
+ Path("archive/docs").mkdir(parents=True, exist_ok=True)
8
+ Path("archive/duplicates").mkdir(parents=True, exist_ok=True)
9
+ Path("archive/test").mkdir(parents=True, exist_ok=True)
10
+
11
+ # Documentation files to archive
12
+ doc_files = [
13
+ "CONTEXT_MEMORY_FIX.md",
14
+ "CONTEXT_SUMMARIZATION_ENHANCED.md",
15
+ "CONTEXT_SUMMARIZATION_IMPLEMENTED.md",
16
+ "CONTEXT_WINDOW_INCREASED.md",
17
+ "SESSION_CONTEXT_FIX.md",
18
+ "SESSION_CONTEXT_FIX_SUMMARY.md",
19
+ "SESSION_UI_FIX_COMPLETE.md",
20
+ "MOVING_WINDOW_CONTEXT_FINAL.md",
21
+ "BUG_FIXES.md",
22
+ "BUILD_READINESS.md",
23
+ "DEPLOYMENT_NOTES.md",
24
+ "DEPLOYMENT_STATUS.md",
25
+ "FINAL_FIXES_APPLIED.md",
26
+ "GRACEFUL_DEGRADATION_GUARANTEE.md",
27
+ "HF_TOKEN_SETUP.md",
28
+ "IMPLEMENTATION_GAPS_RESOLVED.md",
29
+ "IMPLEMENTATION_STATUS.md",
30
+ "INTEGRATION_COMPLETE.md",
31
+ "INTEGRATION_GUIDE.md",
32
+ "LLM_INTEGRATION_STATUS.md",
33
+ "LOGGING_GUIDE.md",
34
+ "PLACEHOLDER_REMOVAL_COMPLETE.md",
35
+ "SYSTEM_UPGRADE_CONFIRMATION.md",
36
+ "TECHNICAL_REVIEW.md",
37
+ "WORKFLOW_INTEGRATION_GUARANTEE.md",
38
+ "FILE_STRUCTURE.md",
39
+ "AGENTS_COMPLETE.md",
40
+ "COMPATIBILITY.md"
41
+ ]
42
+
43
+ # Test/Development files
44
+ test_files = [
45
+ "acceptance_testing.py",
46
+ "agent_protocols.py",
47
+ "agent_stubs.py",
48
+ "cache_implementation.py",
49
+ "faiss_manager.py",
50
+ "intent_protocols.py",
51
+ "intent_recognition.py",
52
+ "mobile_components.py",
53
+ "mobile_events.py",
54
+ "mobile_handlers.py",
55
+ "performance_optimizations.py",
56
+ "pwa_features.py",
57
+ "test_setup.py",
58
+ "verify_no_downgrade.py"
59
+ ]
60
+
61
+ # Archive documentation files
62
+ for file in doc_files:
63
+ if os.path.exists(file):
64
+ try:
65
+ shutil.move(file, f"archive/docs/{file}")
66
+ print(f"Moved {file}")
67
+ except Exception as e:
68
+ print(f"Error moving {file}: {e}")
69
+
70
+ # Archive test files
71
+ for file in test_files:
72
+ if os.path.exists(file):
73
+ try:
74
+ shutil.move(file, f"archive/test/{file}")
75
+ print(f"Moved {file}")
76
+ except Exception as e:
77
+ print(f"Error moving {file}: {e}")
78
+
79
+ # Archive Research_AI_Assistant directory
80
+ if os.path.exists("Research_AI_Assistant"):
81
+ try:
82
+ shutil.move("Research_AI_Assistant", "archive/duplicates/Research_AI_Assistant")
83
+ print("Moved Research_AI_Assistant directory")
84
+ except Exception as e:
85
+ print(f"Error moving Research_AI_Assistant: {e}")
86
+
87
+ print("\nCleanup complete!")
88
+ print("\nFiles kept in root:")
89
+ for item in os.listdir("."):
90
+ if os.path.isfile(item) and not item.startswith(".") and item != "cleanup_files.py":
91
+ print(f" - {item}")
92
+
93
+ print("\nFiles kept in src/")
94
+ if os.path.exists("src"):
95
+ for item in os.listdir("src"):
96
+ print(f" - src/{item}")
97
+
src/agents/__init__.py CHANGED
@@ -6,6 +6,7 @@ Specialized agents for different tasks
6
  from .intent_agent import IntentRecognitionAgent, create_intent_agent
7
  from .synthesis_agent import ResponseSynthesisAgent, create_synthesis_agent
8
  from .safety_agent import SafetyCheckAgent, create_safety_agent
 
9
 
10
  __all__ = [
11
  'IntentRecognitionAgent',
@@ -13,6 +14,8 @@ __all__ = [
13
  'ResponseSynthesisAgent',
14
  'create_synthesis_agent',
15
  'SafetyCheckAgent',
16
- 'create_safety_agent'
 
 
17
  ]
18
 
 
6
  from .intent_agent import IntentRecognitionAgent, create_intent_agent
7
  from .synthesis_agent import ResponseSynthesisAgent, create_synthesis_agent
8
  from .safety_agent import SafetyCheckAgent, create_safety_agent
9
+ from .skills_identification_agent import SkillsIdentificationAgent, create_skills_identification_agent
10
 
11
  __all__ = [
12
  'IntentRecognitionAgent',
 
14
  'ResponseSynthesisAgent',
15
  'create_synthesis_agent',
16
  'SafetyCheckAgent',
17
+ 'create_safety_agent',
18
+ 'SkillsIdentificationAgent',
19
+ 'create_skills_identification_agent'
20
  ]
21
 
src/agents/intent_agent.py CHANGED
@@ -58,31 +58,30 @@ class IntentRecognitionAgent:
58
  async def _llm_based_intent_recognition(self, user_input: str, context: Dict[str, Any]) -> Dict[str, Any]:
59
  """Use LLM for sophisticated intent classification with Chain of Thought"""
60
 
61
- cot_prompt = self._build_chain_of_thought_prompt(user_input, context)
62
-
63
- # Simulate LLM response (replace with actual LLM call)
64
- reasoning_chain = [
65
- "Step 1: Analyze the user's input for key action words and context",
66
- "Step 2: Map to predefined intent categories based on linguistic patterns",
67
- "Step 3: Consider conversation history for contextual understanding",
68
- "Step 4: Assign confidence scores based on clarity and specificity"
69
- ]
70
-
71
- # Determine intent based on input patterns
72
- primary_intent, confidence = self._analyze_intent_patterns(user_input)
73
- secondary_intents = self._get_secondary_intents(user_input, primary_intent)
 
 
 
 
 
 
 
74
 
75
- return {
76
- "primary_intent": primary_intent,
77
- "secondary_intents": secondary_intents,
78
- "confidence_scores": {
79
- primary_intent: confidence,
80
- **{intent: max(0.1, confidence - 0.3) for intent in secondary_intents}
81
- },
82
- "reasoning_chain": reasoning_chain,
83
- "context_tags": self._extract_context_tags(user_input, context),
84
- "processing_time": 0.15 # Simulated processing time
85
- }
86
 
87
  async def _rule_based_intent_recognition(self, user_input: str, context: Dict[str, Any]) -> Dict[str, Any]:
88
  """Rule-based fallback intent classification"""
@@ -208,6 +207,45 @@ class IntentRecognitionAgent:
208
  "calibration_factors": calibration_factors
209
  }
210
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
211
  def _get_fallback_intent(self, user_input: str, context: Dict[str, Any]) -> Dict[str, Any]:
212
  """Provide fallback intent when processing fails"""
213
  return {
 
58
  async def _llm_based_intent_recognition(self, user_input: str, context: Dict[str, Any]) -> Dict[str, Any]:
59
  """Use LLM for sophisticated intent classification with Chain of Thought"""
60
 
61
+ try:
62
+ cot_prompt = self._build_chain_of_thought_prompt(user_input, context)
63
+
64
+ logger.info(f"{self.agent_id} calling LLM for intent recognition")
65
+ llm_response = await self.llm_router.route_inference(
66
+ task_type="intent_classification",
67
+ prompt=cot_prompt,
68
+ max_tokens=1000,
69
+ temperature=0.3
70
+ )
71
+
72
+ if llm_response and isinstance(llm_response, str) and len(llm_response.strip()) > 0:
73
+ # Parse LLM response
74
+ parsed_result = self._parse_llm_intent_response(llm_response)
75
+ parsed_result["processing_time"] = 0.8
76
+ parsed_result["method"] = "llm_enhanced"
77
+ return parsed_result
78
+
79
+ except Exception as e:
80
+ logger.error(f"{self.agent_id} LLM intent recognition failed: {e}")
81
 
82
+ # Fallback to rule-based classification if LLM fails
83
+ logger.info(f"{self.agent_id} falling back to rule-based classification")
84
+ return await self._rule_based_intent_recognition(user_input, context)
 
 
 
 
 
 
 
 
85
 
86
  async def _rule_based_intent_recognition(self, user_input: str, context: Dict[str, Any]) -> Dict[str, Any]:
87
  """Rule-based fallback intent classification"""
 
207
  "calibration_factors": calibration_factors
208
  }
209
 
210
+ def _parse_llm_intent_response(self, response: str) -> Dict[str, Any]:
211
+ """Parse LLM response for intent classification"""
212
+ try:
213
+ import json
214
+ import re
215
+
216
+ # Try to extract JSON from response
217
+ json_match = re.search(r'\{.*\}', response, re.DOTALL)
218
+ if json_match:
219
+ parsed = json.loads(json_match.group())
220
+ return parsed
221
+ except json.JSONDecodeError:
222
+ logger.warning(f"{self.agent_id} Failed to parse LLM intent JSON")
223
+
224
+ # Fallback parsing - extract intent from text
225
+ response_lower = response.lower()
226
+ primary_intent = "casual_conversation"
227
+ confidence = 0.7
228
+
229
+ # Simple pattern matching for intent extraction
230
+ if any(word in response_lower for word in ['question', 'ask', 'what', 'how', 'why']):
231
+ primary_intent = "information_request"
232
+ confidence = 0.8
233
+ elif any(word in response_lower for word in ['task', 'action', 'do', 'help', 'assist']):
234
+ primary_intent = "task_execution"
235
+ confidence = 0.8
236
+ elif any(word in response_lower for word in ['create', 'generate', 'write', 'make']):
237
+ primary_intent = "creative_generation"
238
+ confidence = 0.8
239
+
240
+ return {
241
+ "primary_intent": primary_intent,
242
+ "secondary_intents": [],
243
+ "confidence_scores": {primary_intent: confidence},
244
+ "reasoning_chain": [f"LLM response parsed: {response[:100]}..."],
245
+ "context_tags": ["llm_parsed"],
246
+ "method": "llm_parsed"
247
+ }
248
+
249
  def _get_fallback_intent(self, user_input: str, context: Dict[str, Any]) -> Dict[str, Any]:
250
  """Provide fallback intent when processing fails"""
251
  return {
src/agents/safety_agent.py CHANGED
@@ -103,25 +103,30 @@ class SafetyCheckAgent:
103
  async def _llm_based_safety_analysis(self, response: str, context: Dict[str, Any]) -> Dict[str, Any]:
104
  """Use LLM for sophisticated safety analysis"""
105
 
106
- safety_prompt = self._build_safety_prompt(response, context)
107
-
108
- # Simulate LLM analysis (replace with actual LLM call)
109
- simulated_analysis = {
110
- "toxicity_score": self._calculate_toxicity_score(response),
111
- "bias_indicators": self._detect_bias_indicators(response),
112
- "privacy_concerns": self._check_privacy_issues(response),
113
- "overall_safety_score": 0.85, # Simulated score
114
- "confidence_scores": {
115
- "toxicity": 0.7,
116
- "bias": 0.6,
117
- "safety": 0.8,
118
- "privacy": 0.9
119
- },
120
- "detected_issues": self._pattern_based_detection(response),
121
- "analysis_method": "llm_enhanced"
122
- }
 
 
 
123
 
124
- return simulated_analysis
 
 
125
 
126
  async def _pattern_based_safety_analysis(self, response: str) -> Dict[str, Any]:
127
  """Pattern-based safety analysis as fallback"""
@@ -308,6 +313,51 @@ class SafetyCheckAgent:
308
  # Return empty list on error
309
  return []
310
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
311
  def _get_fallback_result(self, response: str) -> Dict[str, Any]:
312
  """Fallback result when safety check fails"""
313
  return {
 
103
  async def _llm_based_safety_analysis(self, response: str, context: Dict[str, Any]) -> Dict[str, Any]:
104
  """Use LLM for sophisticated safety analysis"""
105
 
106
+ try:
107
+ safety_prompt = self._build_safety_prompt(response, context)
108
+
109
+ logger.info(f"{self.agent_id} calling LLM for safety analysis")
110
+ llm_response = await self.llm_router.route_inference(
111
+ task_type="safety_check",
112
+ prompt=safety_prompt,
113
+ max_tokens=800,
114
+ temperature=0.3
115
+ )
116
+
117
+ if llm_response and isinstance(llm_response, str) and len(llm_response.strip()) > 0:
118
+ # Parse LLM response
119
+ parsed_analysis = self._parse_llm_safety_response(llm_response)
120
+ parsed_analysis["processing_time"] = 0.6
121
+ parsed_analysis["method"] = "llm_enhanced"
122
+ return parsed_analysis
123
+
124
+ except Exception as e:
125
+ logger.error(f"{self.agent_id} LLM safety analysis failed: {e}")
126
 
127
+ # Fallback to pattern-based analysis if LLM fails
128
+ logger.info(f"{self.agent_id} falling back to pattern-based safety analysis")
129
+ return await self._pattern_based_safety_analysis(response)
130
 
131
  async def _pattern_based_safety_analysis(self, response: str) -> Dict[str, Any]:
132
  """Pattern-based safety analysis as fallback"""
 
313
  # Return empty list on error
314
  return []
315
 
316
+ def _parse_llm_safety_response(self, response: str) -> Dict[str, Any]:
317
+ """Parse LLM response for safety analysis"""
318
+ try:
319
+ import json
320
+ import re
321
+
322
+ # Try to extract JSON from response
323
+ json_match = re.search(r'\{.*\}', response, re.DOTALL)
324
+ if json_match:
325
+ parsed = json.loads(json_match.group())
326
+ return parsed
327
+ except json.JSONDecodeError:
328
+ logger.warning(f"{self.agent_id} Failed to parse LLM safety JSON")
329
+
330
+ # Fallback parsing - extract safety info from text
331
+ response_lower = response.lower()
332
+
333
+ # Simple safety analysis based on keywords
334
+ toxicity_score = 0.1
335
+ bias_score = 0.1
336
+ safety_score = 0.9
337
+
338
+ if any(word in response_lower for word in ['toxic', 'harmful', 'dangerous', 'inappropriate']):
339
+ toxicity_score = 0.8
340
+ safety_score = 0.3
341
+ elif any(word in response_lower for word in ['bias', 'discriminatory', 'unfair', 'prejudiced']):
342
+ bias_score = 0.7
343
+ safety_score = 0.5
344
+
345
+ return {
346
+ "toxicity_score": toxicity_score,
347
+ "bias_indicators": [],
348
+ "privacy_concerns": [],
349
+ "overall_safety_score": safety_score,
350
+ "confidence_scores": {
351
+ "toxicity": 0.7,
352
+ "bias": 0.6,
353
+ "safety": safety_score,
354
+ "privacy": 0.9
355
+ },
356
+ "detected_issues": [],
357
+ "analysis_method": "llm_parsed",
358
+ "llm_response": response[:200] + "..." if len(response) > 200 else response
359
+ }
360
+
361
  def _get_fallback_result(self, response: str) -> Dict[str, Any]:
362
  """Fallback result when safety check fails"""
363
  return {
src/agents/skills_identification_agent.py ADDED
@@ -0,0 +1,489 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Skills Identification Agent
3
+ Specialized in analyzing user prompts and identifying relevant expert skills based on market analysis
4
+ """
5
+
6
+ import logging
7
+ from typing import Dict, Any, List, Tuple
8
+ import json
9
+ import re
10
+
11
+ logger = logging.getLogger(__name__)
12
+
13
+ class SkillsIdentificationAgent:
14
+ def __init__(self, llm_router=None):
15
+ self.llm_router = llm_router
16
+ self.agent_id = "SKILLS_ID_001"
17
+ self.specialization = "Expert skills identification and market analysis"
18
+
19
+ # Market analysis data from Expert_Skills_Market_Analysis_2024.md
20
+ self.market_categories = {
21
+ "IT and Software Development": {
22
+ "market_share": 25,
23
+ "growth_rate": 25.0,
24
+ "specialized_skills": [
25
+ "Cybersecurity", "Artificial Intelligence & Machine Learning",
26
+ "Cloud Computing", "Data Analytics & Big Data",
27
+ "Software Engineering", "Blockchain Technology", "Quantum Computing"
28
+ ]
29
+ },
30
+ "Finance and Accounting": {
31
+ "market_share": 20,
32
+ "growth_rate": 6.8,
33
+ "specialized_skills": [
34
+ "Financial Analysis & Modeling", "Risk Management",
35
+ "Regulatory Compliance", "Fintech Solutions",
36
+ "ESG Reporting", "Tax Preparation", "Investment Analysis"
37
+ ]
38
+ },
39
+ "Healthcare and Medicine": {
40
+ "market_share": 15,
41
+ "growth_rate": 8.5,
42
+ "specialized_skills": [
43
+ "Telemedicine Training", "Advanced Nursing Certifications",
44
+ "Healthcare Informatics", "Clinical Research",
45
+ "Medical Device Technology", "Public Health", "Mental Health Services"
46
+ ]
47
+ },
48
+ "Education and Teaching": {
49
+ "market_share": 10,
50
+ "growth_rate": 3.2,
51
+ "specialized_skills": [
52
+ "Instructional Design", "Educational Technology Integration",
53
+ "Digital Literacy Training", "Special Education",
54
+ "Career Coaching", "E-learning Development", "STEM Education"
55
+ ]
56
+ },
57
+ "Engineering and Construction": {
58
+ "market_share": 10,
59
+ "growth_rate": 8.5,
60
+ "specialized_skills": [
61
+ "Automation Engineering", "Sustainable Design",
62
+ "Project Management", "Environmental Engineering",
63
+ "Advanced Manufacturing", "Infrastructure Development", "Quality Control"
64
+ ]
65
+ },
66
+ "Marketing and Sales": {
67
+ "market_share": 10,
68
+ "growth_rate": 7.1,
69
+ "specialized_skills": [
70
+ "Digital Marketing", "Data Analytics",
71
+ "Customer Relationship Management", "Content Marketing",
72
+ "E-commerce Management", "Market Research", "Sales Strategy"
73
+ ]
74
+ },
75
+ "Consulting and Strategy": {
76
+ "market_share": 5,
77
+ "growth_rate": 6.0,
78
+ "specialized_skills": [
79
+ "Business Analysis", "Change Management",
80
+ "Strategic Planning", "Operations Research",
81
+ "Industry-Specific Knowledge", "Problem-Solving", "Leadership Development"
82
+ ]
83
+ },
84
+ "Environmental and Sustainability": {
85
+ "market_share": 5,
86
+ "growth_rate": 15.0,
87
+ "specialized_skills": [
88
+ "Renewable Energy Technologies", "Environmental Policy",
89
+ "Sustainability Reporting", "Ecological Conservation",
90
+ "Carbon Management", "Green Technology", "Circular Economy"
91
+ ]
92
+ },
93
+ "Arts and Humanities": {
94
+ "market_share": 5,
95
+ "growth_rate": 2.5,
96
+ "specialized_skills": [
97
+ "Creative Thinking", "Cultural Analysis",
98
+ "Communication", "Digital Media",
99
+ "Language Services", "Historical Research", "Philosophical Analysis"
100
+ ]
101
+ }
102
+ }
103
+
104
+ # Skill classification categories for the classification_specialist model
105
+ self.skill_categories = [
106
+ "technical_programming", "data_analysis", "cybersecurity", "cloud_computing",
107
+ "financial_analysis", "risk_management", "regulatory_compliance", "fintech",
108
+ "healthcare_technology", "medical_research", "telemedicine", "nursing",
109
+ "educational_technology", "curriculum_design", "online_learning", "teaching",
110
+ "project_management", "engineering_design", "sustainable_engineering", "manufacturing",
111
+ "digital_marketing", "sales_strategy", "customer_management", "market_research",
112
+ "business_consulting", "strategic_planning", "change_management", "leadership",
113
+ "environmental_science", "sustainability", "renewable_energy", "green_technology",
114
+ "creative_design", "content_creation", "communication", "cultural_analysis"
115
+ ]
116
+
117
+ async def execute(self, user_input: str, context: Dict[str, Any] = None, **kwargs) -> Dict[str, Any]:
118
+ """
119
+ Execute skills identification with two-step process:
120
+ 1. Market analysis using reasoning_primary model
121
+ 2. Skill classification using classification_specialist model
122
+ """
123
+ try:
124
+ logger.info(f"{self.agent_id} processing user input: {user_input[:100]}...")
125
+
126
+ # Step 1: Market Analysis with reasoning_primary model
127
+ market_analysis = await self._analyze_market_relevance(user_input, context)
128
+
129
+ # Step 2: Skill Classification with classification_specialist model
130
+ skill_classification = await self._classify_skills(user_input, context)
131
+
132
+ # Combine results
133
+ result = {
134
+ "agent_id": self.agent_id,
135
+ "market_analysis": market_analysis,
136
+ "skill_classification": skill_classification,
137
+ "identified_skills": self._extract_high_probability_skills(skill_classification),
138
+ "processing_time": market_analysis.get("processing_time", 0) + skill_classification.get("processing_time", 0),
139
+ "confidence_score": self._calculate_overall_confidence(market_analysis, skill_classification)
140
+ }
141
+
142
+ logger.info(f"{self.agent_id} completed with {len(result['identified_skills'])} skills identified")
143
+ return result
144
+
145
+ except Exception as e:
146
+ logger.error(f"{self.agent_id} error: {str(e)}")
147
+ return self._get_fallback_result(user_input, context)
148
+
149
+ async def _analyze_market_relevance(self, user_input: str, context: Dict[str, Any]) -> Dict[str, Any]:
150
+ """Use reasoning_primary model to analyze market relevance"""
151
+
152
+ if self.llm_router:
153
+ try:
154
+ # Build market analysis prompt
155
+ market_prompt = self._build_market_analysis_prompt(user_input)
156
+
157
+ logger.info(f"{self.agent_id} calling reasoning_primary for market analysis")
158
+ llm_response = await self.llm_router.route_inference(
159
+ task_type="general_reasoning",
160
+ prompt=market_prompt,
161
+ max_tokens=2000,
162
+ temperature=0.7
163
+ )
164
+
165
+ if llm_response and isinstance(llm_response, str) and len(llm_response.strip()) > 0:
166
+ # Parse LLM response
167
+ parsed_analysis = self._parse_market_analysis_response(llm_response)
168
+ parsed_analysis["processing_time"] = 0.8
169
+ parsed_analysis["method"] = "llm_enhanced"
170
+ return parsed_analysis
171
+
172
+ except Exception as e:
173
+ logger.error(f"{self.agent_id} LLM market analysis failed: {e}")
174
+
175
+ # Fallback to rule-based analysis
176
+ return self._rule_based_market_analysis(user_input)
177
+
178
+ async def _classify_skills(self, user_input: str, context: Dict[str, Any]) -> Dict[str, Any]:
179
+ """Use classification_specialist model to classify skills"""
180
+
181
+ if self.llm_router:
182
+ try:
183
+ # Build classification prompt
184
+ classification_prompt = self._build_classification_prompt(user_input)
185
+
186
+ logger.info(f"{self.agent_id} calling classification_specialist for skill classification")
187
+ llm_response = await self.llm_router.route_inference(
188
+ task_type="intent_classification",
189
+ prompt=classification_prompt,
190
+ max_tokens=512,
191
+ temperature=0.3
192
+ )
193
+
194
+ if llm_response and isinstance(llm_response, str) and len(llm_response.strip()) > 0:
195
+ # Parse classification response
196
+ parsed_classification = self._parse_classification_response(llm_response)
197
+ parsed_classification["processing_time"] = 0.3
198
+ parsed_classification["method"] = "llm_enhanced"
199
+ return parsed_classification
200
+
201
+ except Exception as e:
202
+ logger.error(f"{self.agent_id} LLM classification failed: {e}")
203
+
204
+ # Fallback to rule-based classification
205
+ return self._rule_based_skill_classification(user_input)
206
+
207
+ def _build_market_analysis_prompt(self, user_input: str) -> str:
208
+ """Build prompt for market analysis using reasoning_primary model"""
209
+
210
+ market_data = "\n".join([
211
+ f"- {category}: {data['market_share']}% market share, {data['growth_rate']}% growth rate"
212
+ for category, data in self.market_categories.items()
213
+ ])
214
+
215
+ specialized_skills = "\n".join([
216
+ f"- {category}: {', '.join(data['specialized_skills'][:3])}"
217
+ for category, data in self.market_categories.items()
218
+ ])
219
+
220
+ return f"""Analyze the following user input and identify the most relevant industry categories and specialized skills based on current market data.
221
+
222
+ User Input: "{user_input}"
223
+
224
+ Current Market Distribution:
225
+ {market_data}
226
+
227
+ Specialized Skills by Category (top 3 per category):
228
+ {specialized_skills}
229
+
230
+ Task:
231
+ 1. Identify which industry categories are most relevant to the user's input
232
+ 2. Select 1-3 specialized skills from each relevant category that best match the user's needs
233
+ 3. Provide market share percentages and growth rates for identified categories
234
+ 4. Explain your reasoning for each selection
235
+
236
+ Respond in JSON format:
237
+ {{
238
+ "relevant_categories": [
239
+ {{
240
+ "category": "category_name",
241
+ "market_share": percentage,
242
+ "growth_rate": percentage,
243
+ "relevance_score": 0.0-1.0,
244
+ "reasoning": "explanation"
245
+ }}
246
+ ],
247
+ "selected_skills": [
248
+ {{
249
+ "skill": "skill_name",
250
+ "category": "category_name",
251
+ "relevance_score": 0.0-1.0,
252
+ "reasoning": "explanation"
253
+ }}
254
+ ],
255
+ "overall_analysis": "summary of findings"
256
+ }}"""
257
+
258
+ def _build_classification_prompt(self, user_input: str) -> str:
259
+ """Build prompt for skill classification using classification_specialist model"""
260
+
261
+ skill_categories_str = ", ".join(self.skill_categories)
262
+
263
+ return f"""Classify the following user input into relevant skill categories. For each category, provide a probability score (0.0-1.0) indicating how likely the input relates to that skill.
264
+
265
+ User Input: "{user_input}"
266
+
267
+ Available Skill Categories: {skill_categories_str}
268
+
269
+ Task: Provide probability scores for each skill category that passes a 20% threshold.
270
+
271
+ Respond in JSON format:
272
+ {{
273
+ "skill_probabilities": {{
274
+ "category_name": probability_score,
275
+ ...
276
+ }},
277
+ "top_skills": [
278
+ {{
279
+ "skill": "category_name",
280
+ "probability": score,
281
+ "confidence": "high/medium/low"
282
+ }}
283
+ ],
284
+ "classification_reasoning": "explanation of classification decisions"
285
+ }}"""
286
+
287
+ def _parse_market_analysis_response(self, response: str) -> Dict[str, Any]:
288
+ """Parse LLM response for market analysis"""
289
+ try:
290
+ # Try to extract JSON from response
291
+ json_match = re.search(r'\{.*\}', response, re.DOTALL)
292
+ if json_match:
293
+ parsed = json.loads(json_match.group())
294
+ return parsed
295
+ except json.JSONDecodeError:
296
+ logger.warning(f"{self.agent_id} Failed to parse market analysis JSON")
297
+
298
+ # Fallback parsing
299
+ return {
300
+ "relevant_categories": [{"category": "General", "market_share": 10, "growth_rate": 5.0, "relevance_score": 0.7, "reasoning": "General analysis"}],
301
+ "selected_skills": [{"skill": "General Analysis", "category": "General", "relevance_score": 0.7, "reasoning": "Broad applicability"}],
302
+ "overall_analysis": "Market analysis completed with fallback parsing",
303
+ "method": "fallback_parsing"
304
+ }
305
+
306
+ def _parse_classification_response(self, response: str) -> Dict[str, Any]:
307
+ """Parse LLM response for skill classification"""
308
+ try:
309
+ # Try to extract JSON from response
310
+ json_match = re.search(r'\{.*\}', response, re.DOTALL)
311
+ if json_match:
312
+ parsed = json.loads(json_match.group())
313
+ return parsed
314
+ except json.JSONDecodeError:
315
+ logger.warning(f"{self.agent_id} Failed to parse classification JSON")
316
+
317
+ # Fallback parsing
318
+ return {
319
+ "skill_probabilities": {"general_analysis": 0.7},
320
+ "top_skills": [{"skill": "general_analysis", "probability": 0.7, "confidence": "medium"}],
321
+ "classification_reasoning": "Classification completed with fallback parsing",
322
+ "method": "fallback_parsing"
323
+ }
324
+
325
+ def _rule_based_market_analysis(self, user_input: str) -> Dict[str, Any]:
326
+ """Rule-based fallback for market analysis"""
327
+ user_input_lower = user_input.lower()
328
+
329
+ relevant_categories = []
330
+ selected_skills = []
331
+
332
+ # Pattern matching for different categories
333
+ patterns = {
334
+ "IT and Software Development": ["code", "programming", "software", "tech", "ai", "machine learning", "data", "cyber", "cloud"],
335
+ "Finance and Accounting": ["finance", "money", "investment", "banking", "accounting", "financial", "risk", "compliance"],
336
+ "Healthcare and Medicine": ["health", "medical", "doctor", "nurse", "patient", "clinical", "medicine", "healthcare"],
337
+ "Education and Teaching": ["teach", "education", "learn", "student", "school", "curriculum", "instruction"],
338
+ "Engineering and Construction": ["engineer", "construction", "build", "project", "manufacturing", "design"],
339
+ "Marketing and Sales": ["marketing", "sales", "customer", "advertising", "promotion", "brand"],
340
+ "Consulting and Strategy": ["consulting", "strategy", "business", "management", "planning"],
341
+ "Environmental and Sustainability": ["environment", "sustainable", "green", "renewable", "climate", "carbon"],
342
+ "Arts and Humanities": ["art", "creative", "culture", "humanities", "design", "communication"]
343
+ }
344
+
345
+ for category, keywords in patterns.items():
346
+ relevance_score = 0.0
347
+ for keyword in keywords:
348
+ if keyword in user_input_lower:
349
+ relevance_score += 0.2
350
+
351
+ if relevance_score > 0.0:
352
+ category_data = self.market_categories[category]
353
+ relevant_categories.append({
354
+ "category": category,
355
+ "market_share": category_data["market_share"],
356
+ "growth_rate": category_data["growth_rate"],
357
+ "relevance_score": min(1.0, relevance_score),
358
+ "reasoning": f"Matched keywords: {[k for k in keywords if k in user_input_lower]}"
359
+ })
360
+
361
+ # Add top skills from this category
362
+ for skill in category_data["specialized_skills"][:2]:
363
+ selected_skills.append({
364
+ "skill": skill,
365
+ "category": category,
366
+ "relevance_score": relevance_score * 0.8,
367
+ "reasoning": f"From {category} category"
368
+ })
369
+
370
+ return {
371
+ "relevant_categories": relevant_categories,
372
+ "selected_skills": selected_skills,
373
+ "overall_analysis": f"Rule-based analysis identified {len(relevant_categories)} relevant categories",
374
+ "processing_time": 0.1,
375
+ "method": "rule_based"
376
+ }
377
+
378
+ def _rule_based_skill_classification(self, user_input: str) -> Dict[str, Any]:
379
+ """Rule-based fallback for skill classification"""
380
+ user_input_lower = user_input.lower()
381
+
382
+ skill_probabilities = {}
383
+ top_skills = []
384
+
385
+ # Simple keyword matching for skill categories
386
+ skill_keywords = {
387
+ "technical_programming": ["code", "programming", "software", "development", "python", "java"],
388
+ "data_analysis": ["data", "analysis", "statistics", "analytics", "research"],
389
+ "cybersecurity": ["security", "cyber", "hack", "protection", "vulnerability"],
390
+ "financial_analysis": ["finance", "money", "investment", "financial", "economic"],
391
+ "healthcare_technology": ["health", "medical", "healthcare", "clinical", "patient"],
392
+ "educational_technology": ["education", "teach", "learn", "student", "curriculum"],
393
+ "project_management": ["project", "manage", "planning", "coordination", "leadership"],
394
+ "digital_marketing": ["marketing", "advertising", "promotion", "social media", "brand"],
395
+ "environmental_science": ["environment", "sustainable", "green", "climate", "carbon"],
396
+ "creative_design": ["design", "creative", "art", "visual", "graphic"]
397
+ }
398
+
399
+ for skill, keywords in skill_keywords.items():
400
+ probability = 0.0
401
+ for keyword in keywords:
402
+ if keyword in user_input_lower:
403
+ probability += 0.3
404
+
405
+ if probability > 0.2: # 20% threshold
406
+ skill_probabilities[skill] = min(1.0, probability)
407
+ top_skills.append({
408
+ "skill": skill,
409
+ "probability": skill_probabilities[skill],
410
+ "confidence": "high" if probability > 0.6 else "medium" if probability > 0.4 else "low"
411
+ })
412
+
413
+ return {
414
+ "skill_probabilities": skill_probabilities,
415
+ "top_skills": top_skills,
416
+ "classification_reasoning": f"Rule-based classification identified {len(top_skills)} relevant skills",
417
+ "processing_time": 0.05,
418
+ "method": "rule_based"
419
+ }
420
+
421
+ def _extract_high_probability_skills(self, classification: Dict[str, Any]) -> List[Dict[str, Any]]:
422
+ """Extract skills that pass the 20% probability threshold"""
423
+ high_prob_skills = []
424
+
425
+ # From market analysis
426
+ market_skills = classification.get("market_analysis", {}).get("selected_skills", [])
427
+ for skill in market_skills:
428
+ if skill.get("relevance_score", 0) > 0.2:
429
+ high_prob_skills.append({
430
+ "skill": skill["skill"],
431
+ "category": skill["category"],
432
+ "probability": skill["relevance_score"],
433
+ "source": "market_analysis"
434
+ })
435
+
436
+ # From skill classification
437
+ classification_skills = classification.get("skill_classification", {}).get("top_skills", [])
438
+ for skill in classification_skills:
439
+ if skill.get("probability", 0) > 0.2:
440
+ high_prob_skills.append({
441
+ "skill": skill["skill"],
442
+ "category": "classified",
443
+ "probability": skill["probability"],
444
+ "source": "skill_classification"
445
+ })
446
+
447
+ # Remove duplicates and sort by probability
448
+ unique_skills = {}
449
+ for skill in high_prob_skills:
450
+ skill_name = skill["skill"]
451
+ if skill_name not in unique_skills or skill["probability"] > unique_skills[skill_name]["probability"]:
452
+ unique_skills[skill_name] = skill
453
+
454
+ return sorted(unique_skills.values(), key=lambda x: x["probability"], reverse=True)
455
+
456
+ def _calculate_overall_confidence(self, market_analysis: Dict[str, Any], skill_classification: Dict[str, Any]) -> float:
457
+ """Calculate overall confidence score"""
458
+ market_confidence = len(market_analysis.get("relevant_categories", [])) * 0.1
459
+ classification_confidence = len(skill_classification.get("top_skills", [])) * 0.1
460
+
461
+ return min(1.0, market_confidence + classification_confidence + 0.3)
462
+
463
+ def _get_fallback_result(self, user_input: str, context: Dict[str, Any]) -> Dict[str, Any]:
464
+ """Provide fallback result when processing fails"""
465
+ return {
466
+ "agent_id": self.agent_id,
467
+ "market_analysis": {
468
+ "relevant_categories": [{"category": "General", "market_share": 10, "growth_rate": 5.0, "relevance_score": 0.5, "reasoning": "Fallback analysis"}],
469
+ "selected_skills": [{"skill": "General Analysis", "category": "General", "relevance_score": 0.5, "reasoning": "Fallback skill"}],
470
+ "overall_analysis": "Fallback analysis due to processing error",
471
+ "processing_time": 0.01,
472
+ "method": "fallback"
473
+ },
474
+ "skill_classification": {
475
+ "skill_probabilities": {"general_analysis": 0.5},
476
+ "top_skills": [{"skill": "general_analysis", "probability": 0.5, "confidence": "low"}],
477
+ "classification_reasoning": "Fallback classification due to processing error",
478
+ "processing_time": 0.01,
479
+ "method": "fallback"
480
+ },
481
+ "identified_skills": [{"skill": "General Analysis", "category": "General", "probability": 0.5, "source": "fallback"}],
482
+ "processing_time": 0.02,
483
+ "confidence_score": 0.3,
484
+ "error_handled": True
485
+ }
486
+
487
+ # Factory function for easy instantiation
488
+ def create_skills_identification_agent(llm_router=None):
489
+ return SkillsIdentificationAgent(llm_router)
src/config.py ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # config.py
2
+ import os
3
+ from pydantic_settings import BaseSettings
4
+
5
+ class Settings(BaseSettings):
6
+ # HF Spaces specific settings
7
+ hf_token: str = os.getenv("HF_TOKEN", "")
8
+ hf_cache_dir: str = os.getenv("HF_HOME", "/tmp/huggingface")
9
+
10
+ # Model settings
11
+ default_model: str = "mistralai/Mistral-7B-Instruct-v0.2"
12
+ embedding_model: str = "sentence-transformers/all-MiniLM-L6-v2"
13
+ classification_model: str = "cardiffnlp/twitter-roberta-base-emotion"
14
+
15
+ # Performance settings
16
+ max_workers: int = int(os.getenv("MAX_WORKERS", "2"))
17
+ cache_ttl: int = int(os.getenv("CACHE_TTL", "3600"))
18
+
19
+ # Database settings
20
+ db_path: str = os.getenv("DB_PATH", "sessions.db")
21
+ faiss_index_path: str = os.getenv("FAISS_INDEX_PATH", "embeddings.faiss")
22
+
23
+ # Session settings
24
+ session_timeout: int = int(os.getenv("SESSION_TIMEOUT", "3600"))
25
+ max_session_size_mb: int = int(os.getenv("MAX_SESSION_SIZE_MB", "10"))
26
+
27
+ # Mobile optimization settings
28
+ mobile_max_tokens: int = int(os.getenv("MOBILE_MAX_TOKENS", "800"))
29
+ mobile_timeout: int = int(os.getenv("MOBILE_TIMEOUT", "15000"))
30
+
31
+ # Gradio settings
32
+ gradio_port: int = int(os.getenv("GRADIO_PORT", "7860"))
33
+ gradio_host: str = os.getenv("GRADIO_HOST", "0.0.0.0")
34
+
35
+ # Logging settings
36
+ log_level: str = os.getenv("LOG_LEVEL", "INFO")
37
+ log_format: str = os.getenv("LOG_FORMAT", "json")
38
+
39
+ class Config:
40
+ env_file = ".env"
41
+
42
+ settings = Settings()
src/context_manager.py ADDED
@@ -0,0 +1,246 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # context_manager.py
2
+ import sqlite3
3
+ import json
4
+ import logging
5
+ from datetime import datetime, timedelta
6
+
7
+ logger = logging.getLogger(__name__)
8
+
9
+ class EfficientContextManager:
10
+ def __init__(self):
11
+ self.session_cache = {} # In-memory for active sessions
12
+ self.cache_config = {
13
+ "max_session_size": 10, # MB per session
14
+ "ttl": 3600, # 1 hour
15
+ "compression": "gzip",
16
+ "eviction_policy": "LRU"
17
+ }
18
+ self.db_path = "sessions.db"
19
+ logger.info(f"Initializing ContextManager with DB path: {self.db_path}")
20
+ self._init_database()
21
+
22
+ def _init_database(self):
23
+ """Initialize database and create tables"""
24
+ try:
25
+ logger.info("Initializing database...")
26
+ conn = sqlite3.connect(self.db_path)
27
+ cursor = conn.cursor()
28
+
29
+ # Create sessions table if not exists
30
+ cursor.execute("""
31
+ CREATE TABLE IF NOT EXISTS sessions (
32
+ session_id TEXT PRIMARY KEY,
33
+ created_at TIMESTAMP,
34
+ last_activity TIMESTAMP,
35
+ context_data TEXT,
36
+ user_metadata TEXT
37
+ )
38
+ """)
39
+ logger.info("✓ Sessions table ready")
40
+
41
+ # Create interactions table
42
+ cursor.execute("""
43
+ CREATE TABLE IF NOT EXISTS interactions (
44
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
45
+ session_id TEXT REFERENCES sessions(session_id),
46
+ user_input TEXT,
47
+ context_snapshot TEXT,
48
+ created_at TIMESTAMP,
49
+ FOREIGN KEY(session_id) REFERENCES sessions(session_id)
50
+ )
51
+ """)
52
+ logger.info("✓ Interactions table ready")
53
+
54
+ conn.commit()
55
+ conn.close()
56
+ logger.info("Database initialization complete")
57
+
58
+ except Exception as e:
59
+ logger.error(f"Database initialization error: {e}", exc_info=True)
60
+
61
+ async def manage_context(self, session_id: str, user_input: str) -> dict:
62
+ """
63
+ Efficient context management with multi-level caching
64
+ """
65
+ # Level 1: In-memory session cache
66
+ context = self._get_from_memory_cache(session_id)
67
+
68
+ if not context:
69
+ # Level 2: Database retrieval with embeddings
70
+ context = await self._retrieve_from_db(session_id, user_input)
71
+
72
+ # Cache warming
73
+ self._warm_memory_cache(session_id, context)
74
+
75
+ # Update context with new interaction
76
+ updated_context = self._update_context(context, user_input)
77
+
78
+ return self._optimize_context(updated_context)
79
+
80
+ def _optimize_context(self, context: dict) -> dict:
81
+ """
82
+ Optimize context for LLM consumption
83
+ """
84
+ # Keep the full context structure for LLM consumption
85
+ return {
86
+ "session_id": context.get("session_id"),
87
+ "interactions": context.get("interactions", []), # Keep full interaction history
88
+ "preferences": context.get("preferences", {}),
89
+ "active_tasks": context.get("active_tasks", []),
90
+ "essential_entities": self._extract_entities(context),
91
+ "conversation_summary": self._generate_summary(context),
92
+ "last_activity": context.get("last_activity")
93
+ }
94
+
95
+ def _get_from_memory_cache(self, session_id: str) -> dict:
96
+ """
97
+ Retrieve context from in-memory session cache
98
+ """
99
+ # TODO: Implement in-memory cache retrieval
100
+ return self.session_cache.get(session_id)
101
+
102
+ async def _retrieve_from_db(self, session_id: str, user_input: str) -> dict:
103
+ """
104
+ Retrieve context from database with semantic search
105
+ """
106
+ try:
107
+ conn = sqlite3.connect(self.db_path)
108
+ cursor = conn.cursor()
109
+
110
+ # Get session data
111
+ cursor.execute("""
112
+ SELECT context_data, user_metadata, last_activity
113
+ FROM sessions
114
+ WHERE session_id = ?
115
+ """, (session_id,))
116
+
117
+ row = cursor.fetchone()
118
+
119
+ if row:
120
+ context_data = json.loads(row[0]) if row[0] else {}
121
+ user_metadata = json.loads(row[1]) if row[1] else {}
122
+ last_activity = row[2]
123
+
124
+ # Get recent interactions
125
+ cursor.execute("""
126
+ SELECT user_input, context_snapshot, created_at
127
+ FROM interactions
128
+ WHERE session_id = ?
129
+ ORDER BY created_at DESC
130
+ LIMIT 10
131
+ """, (session_id,))
132
+
133
+ recent_interactions = []
134
+ for interaction_row in cursor.fetchall():
135
+ recent_interactions.append({
136
+ "user_input": interaction_row[0],
137
+ "context": json.loads(interaction_row[1]) if interaction_row[1] else {},
138
+ "timestamp": interaction_row[2]
139
+ })
140
+
141
+ context = {
142
+ "session_id": session_id,
143
+ "interactions": recent_interactions,
144
+ "preferences": user_metadata.get("preferences", {}),
145
+ "active_tasks": user_metadata.get("active_tasks", []),
146
+ "last_activity": last_activity
147
+ }
148
+
149
+ conn.close()
150
+ return context
151
+ else:
152
+ # Create new session
153
+ cursor.execute("""
154
+ INSERT INTO sessions (session_id, created_at, last_activity, context_data, user_metadata)
155
+ VALUES (?, ?, ?, ?, ?)
156
+ """, (session_id, datetime.now().isoformat(), datetime.now().isoformat(), "{}", "{}"))
157
+ conn.commit()
158
+ conn.close()
159
+
160
+ return {
161
+ "session_id": session_id,
162
+ "interactions": [],
163
+ "preferences": {},
164
+ "active_tasks": []
165
+ }
166
+
167
+ except Exception as e:
168
+ print(f"Database retrieval error: {e}")
169
+ # Fallback to empty context
170
+ return {
171
+ "session_id": session_id,
172
+ "interactions": [],
173
+ "preferences": {},
174
+ "active_tasks": []
175
+ }
176
+
177
+ def _warm_memory_cache(self, session_id: str, context: dict):
178
+ """
179
+ Warm the in-memory cache with retrieved context
180
+ """
181
+ # TODO: Implement cache warming with LRU eviction
182
+ self.session_cache[session_id] = context
183
+
184
+ def _update_context(self, context: dict, user_input: str, response: str = None) -> dict:
185
+ """
186
+ Update context with new user interaction and persist to database
187
+ """
188
+ try:
189
+ # Add new interaction to context
190
+ if "interactions" not in context:
191
+ context["interactions"] = []
192
+
193
+ # Create a clean interaction without circular references
194
+ new_interaction = {
195
+ "user_input": user_input,
196
+ "timestamp": datetime.now().isoformat(),
197
+ "response": response # Store the response text
198
+ }
199
+
200
+ # Keep only last 40 interactions in memory (2x the context window for stability)
201
+ context["interactions"] = [new_interaction] + context["interactions"][:39]
202
+
203
+ # Persist to database
204
+ conn = sqlite3.connect(self.db_path)
205
+ cursor = conn.cursor()
206
+
207
+ # Update session - use a clean context copy for JSON serialization
208
+ session_context = {
209
+ "interactions": context.get("interactions", []),
210
+ "preferences": context.get("preferences", {}),
211
+ "active_tasks": context.get("active_tasks", [])
212
+ }
213
+
214
+ cursor.execute("""
215
+ UPDATE sessions
216
+ SET last_activity = ?, context_data = ?
217
+ WHERE session_id = ?
218
+ """, (datetime.now().isoformat(), json.dumps(session_context), context["session_id"]))
219
+
220
+ # Insert interaction - store minimal context snapshot
221
+ cursor.execute("""
222
+ INSERT INTO interactions (session_id, user_input, context_snapshot, created_at)
223
+ VALUES (?, ?, ?, ?)
224
+ """, (context["session_id"], user_input, json.dumps(session_context), datetime.now().isoformat()))
225
+
226
+ conn.commit()
227
+ conn.close()
228
+
229
+ except Exception as e:
230
+ logger.error(f"Context update error: {e}", exc_info=True)
231
+
232
+ return context
233
+
234
+ def _extract_entities(self, context: dict) -> list:
235
+ """
236
+ Extract essential entities from context
237
+ """
238
+ # TODO: Implement entity extraction
239
+ return []
240
+
241
+ def _generate_summary(self, context: dict) -> str:
242
+ """
243
+ Generate conversation summary
244
+ """
245
+ # TODO: Implement summary generation
246
+ return ""
src/llm_router.py ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # llm_router.py
2
+ import logging
3
+ from .models_config import LLM_CONFIG
4
+
5
+ logger = logging.getLogger(__name__)
6
+
7
+ class LLMRouter:
8
+ def __init__(self, hf_token):
9
+ self.hf_token = hf_token
10
+ self.health_status = {}
11
+ logger.info("LLMRouter initialized")
12
+ if hf_token:
13
+ logger.info("HF token available")
14
+ else:
15
+ logger.warning("No HF token provided")
16
+
17
+ async def route_inference(self, task_type: str, prompt: str, **kwargs):
18
+ """
19
+ Smart routing based on task specialization
20
+ """
21
+ logger.info(f"Routing inference for task: {task_type}")
22
+ model_config = self._select_model(task_type)
23
+ logger.info(f"Selected model: {model_config['model_id']}")
24
+
25
+ # Health check and fallback logic
26
+ if not await self._is_model_healthy(model_config["model_id"]):
27
+ logger.warning(f"Model unhealthy, using fallback")
28
+ model_config = self._get_fallback_model(task_type)
29
+ logger.info(f"Fallback model: {model_config['model_id']}")
30
+
31
+ result = await self._call_hf_endpoint(model_config, prompt, **kwargs)
32
+ logger.info(f"Inference complete for {task_type}")
33
+ return result
34
+
35
+ def _select_model(self, task_type: str) -> dict:
36
+ model_map = {
37
+ "intent_classification": LLM_CONFIG["models"]["classification_specialist"],
38
+ "embedding_generation": LLM_CONFIG["models"]["embedding_specialist"],
39
+ "safety_check": LLM_CONFIG["models"]["safety_checker"],
40
+ "general_reasoning": LLM_CONFIG["models"]["reasoning_primary"],
41
+ "response_synthesis": LLM_CONFIG["models"]["reasoning_primary"]
42
+ }
43
+ return model_map.get(task_type, LLM_CONFIG["models"]["reasoning_primary"])
44
+
45
+ async def _is_model_healthy(self, model_id: str) -> bool:
46
+ """
47
+ Check if the model is healthy and available
48
+ Mark models as healthy by default - actual availability checked at API call time
49
+ """
50
+ # Check cached health status
51
+ if model_id in self.health_status:
52
+ return self.health_status[model_id]
53
+
54
+ # All models marked healthy initially - real check happens during API call
55
+ self.health_status[model_id] = True
56
+ return True
57
+
58
+ def _get_fallback_model(self, task_type: str) -> dict:
59
+ """
60
+ Get fallback model configuration for the task type
61
+ """
62
+ # Fallback mapping
63
+ fallback_map = {
64
+ "intent_classification": LLM_CONFIG["models"]["reasoning_primary"],
65
+ "embedding_generation": LLM_CONFIG["models"]["embedding_specialist"],
66
+ "safety_check": LLM_CONFIG["models"]["reasoning_primary"],
67
+ "general_reasoning": LLM_CONFIG["models"]["reasoning_primary"],
68
+ "response_synthesis": LLM_CONFIG["models"]["reasoning_primary"]
69
+ }
70
+ return fallback_map.get(task_type, LLM_CONFIG["models"]["reasoning_primary"])
71
+
72
+ async def _call_hf_endpoint(self, model_config: dict, prompt: str, **kwargs):
73
+ """
74
+ Make actual call to Hugging Face Chat Completions API
75
+ Uses the correct chat completions protocol
76
+ """
77
+ try:
78
+ import requests
79
+
80
+ model_id = model_config["model_id"]
81
+
82
+ # Use the chat completions endpoint
83
+ api_url = "https://router.huggingface.co/v1/chat/completions"
84
+
85
+ logger.info(f"Calling HF Chat Completions API for model: {model_id}")
86
+ logger.debug(f"Prompt length: {len(prompt)}")
87
+
88
+ headers = {
89
+ "Authorization": f"Bearer {self.hf_token}",
90
+ "Content-Type": "application/json"
91
+ }
92
+
93
+ # Prepare payload in chat completions format
94
+ # Extract the actual question from the prompt if it's in a structured format
95
+ user_message = prompt if "User Question:" not in prompt else prompt.split("User Question:")[1].split("\n")[0].strip()
96
+
97
+ payload = {
98
+ "model": f"{model_id}:together", # Use the Together endpoint as specified
99
+ "messages": [
100
+ {
101
+ "role": "user",
102
+ "content": user_message
103
+ }
104
+ ],
105
+ "max_tokens": kwargs.get("max_tokens", 2000),
106
+ "temperature": kwargs.get("temperature", 0.7),
107
+ "top_p": kwargs.get("top_p", 0.95)
108
+ }
109
+
110
+ # Make the API call
111
+ response = requests.post(api_url, json=payload, headers=headers, timeout=60)
112
+
113
+ if response.status_code == 200:
114
+ result = response.json()
115
+ # Handle chat completions response format
116
+ if "choices" in result and len(result["choices"]) > 0:
117
+ message = result["choices"][0].get("message", {})
118
+ generated_text = message.get("content", "")
119
+
120
+ # Ensure we always return a string, never None
121
+ if not generated_text or not isinstance(generated_text, str):
122
+ logger.warning(f"Empty or invalid response, using fallback")
123
+ return None
124
+
125
+ logger.info(f"HF API returned response (length: {len(generated_text)})")
126
+ return generated_text
127
+ else:
128
+ logger.error(f"Unexpected response format: {result}")
129
+ return None
130
+ elif response.status_code == 503:
131
+ # Model is loading, retry with simpler model
132
+ logger.warning(f"Model loading (503), trying fallback")
133
+ fallback_config = self._get_fallback_model("response_synthesis")
134
+ return await self._call_hf_endpoint(fallback_config, prompt, **kwargs)
135
+ else:
136
+ logger.error(f"HF API error: {response.status_code} - {response.text}")
137
+ return None
138
+
139
+ except ImportError:
140
+ logger.warning("requests library not available, API call failed")
141
+ return None
142
+ except Exception as e:
143
+ logger.error(f"Error calling HF endpoint: {e}", exc_info=True)
144
+ return None
src/mobile_handlers.py ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # mobile_handlers.py
2
+ import gradio as gr
3
+
4
+ class MobileUXHandlers:
5
+ def __init__(self, orchestrator):
6
+ self.orchestrator = orchestrator
7
+ self.mobile_state = {}
8
+
9
+ async def handle_mobile_submit(self, message, chat_history, session_id,
10
+ show_reasoning, show_agent_trace, request: gr.Request):
11
+ """
12
+ Mobile-optimized submission handler with enhanced UX
13
+ """
14
+ # Get mobile device info
15
+ user_agent = request.headers.get("user-agent", "").lower()
16
+ is_mobile = any(device in user_agent for device in ['mobile', 'android', 'iphone'])
17
+
18
+ # Mobile-specific optimizations
19
+ if is_mobile:
20
+ return await self._mobile_optimized_processing(
21
+ message, chat_history, session_id, show_reasoning, show_agent_trace
22
+ )
23
+ else:
24
+ return await self._desktop_processing(
25
+ message, chat_history, session_id, show_reasoning, show_agent_trace
26
+ )
27
+
28
+ async def _mobile_optimized_processing(self, message, chat_history, session_id,
29
+ show_reasoning, show_agent_trace):
30
+ """
31
+ Mobile-specific processing with enhanced UX feedback
32
+ """
33
+ try:
34
+ # Immediate feedback for mobile users
35
+ yield {
36
+ "chatbot": chat_history + [[message, "Thinking..."]],
37
+ "message_input": "",
38
+ "reasoning_display": {"status": "processing"},
39
+ "performance_display": {"status": "processing"}
40
+ }
41
+
42
+ # Process with mobile-optimized parameters
43
+ result = await self.orchestrator.process_request(
44
+ session_id=session_id,
45
+ user_input=message,
46
+ mobile_optimized=True, # Special flag for mobile
47
+ max_tokens=800 # Shorter responses for mobile
48
+ )
49
+
50
+ # Format for mobile display
51
+ formatted_response = self._format_for_mobile(
52
+ result['final_response'],
53
+ show_reasoning and result.get('metadata', {}).get('reasoning_chain'),
54
+ show_agent_trace and result.get('agent_trace')
55
+ )
56
+
57
+ # Update chat history
58
+ updated_history = chat_history + [[message, formatted_response]]
59
+
60
+ yield {
61
+ "chatbot": updated_history,
62
+ "message_input": "",
63
+ "reasoning_display": result.get('metadata', {}).get('reasoning_chain', {}),
64
+ "performance_display": result.get('performance_metrics', {})
65
+ }
66
+
67
+ except Exception as e:
68
+ # Mobile-friendly error handling
69
+ error_response = self._get_mobile_friendly_error(e)
70
+ yield {
71
+ "chatbot": chat_history + [[message, error_response]],
72
+ "message_input": message, # Keep message for retry
73
+ "reasoning_display": {"error": "Processing failed"},
74
+ "performance_display": {"error": str(e)}
75
+ }
76
+
77
+ def _format_for_mobile(self, response, reasoning_chain, agent_trace):
78
+ """
79
+ Format response for optimal mobile readability
80
+ """
81
+ # Split long responses for mobile
82
+ if len(response) > 400:
83
+ paragraphs = self._split_into_paragraphs(response, max_length=300)
84
+ response = "\n\n".join(paragraphs)
85
+
86
+ # Add mobile-optimized formatting
87
+ formatted = f"""
88
+ <div class="mobile-response">
89
+ {response}
90
+ </div>
91
+ """
92
+
93
+ # Add reasoning if requested
94
+ if reasoning_chain:
95
+ # Handle both old and new reasoning chain formats
96
+ if isinstance(reasoning_chain, dict):
97
+ # New enhanced format - extract key information
98
+ chain_of_thought = reasoning_chain.get('chain_of_thought', {})
99
+ if chain_of_thought:
100
+ first_step = list(chain_of_thought.values())[0] if chain_of_thought else {}
101
+ hypothesis = first_step.get('hypothesis', 'Processing...')
102
+ reasoning_text = f"Hypothesis: {hypothesis}"
103
+ else:
104
+ reasoning_text = "Enhanced reasoning chain available"
105
+ else:
106
+ # Old format - direct string
107
+ reasoning_text = str(reasoning_chain)[:200]
108
+
109
+ formatted += f"""
110
+ <div class="reasoning-mobile" style="margin-top: 15px; padding: 10px; background: #f5f5f5; border-radius: 8px; font-size: 14px;">
111
+ <strong>Reasoning:</strong> {reasoning_text}...
112
+ </div>
113
+ """
114
+
115
+ return formatted
116
+
117
+ def _get_mobile_friendly_error(self, error):
118
+ """
119
+ User-friendly error messages for mobile
120
+ """
121
+ error_messages = {
122
+ "timeout": "⏱️ Taking longer than expected. Please try a simpler question.",
123
+ "network": "📡 Connection issue. Check your internet and try again.",
124
+ "rate_limit": "🚦 Too many requests. Please wait a moment.",
125
+ "default": "❌ Something went wrong. Please try again."
126
+ }
127
+
128
+ error_type = "default"
129
+ if "timeout" in str(error).lower():
130
+ error_type = "timeout"
131
+ elif "network" in str(error).lower() or "connection" in str(error).lower():
132
+ error_type = "network"
133
+ elif "rate" in str(error).lower():
134
+ error_type = "rate_limit"
135
+
136
+ return error_messages[error_type]
137
+
138
+ async def _desktop_processing(self, message, chat_history, session_id,
139
+ show_reasoning, show_agent_trace):
140
+ """
141
+ Desktop processing without mobile optimizations
142
+ """
143
+ # TODO: Implement desktop-specific processing
144
+ return {
145
+ "chatbot": chat_history,
146
+ "message_input": "",
147
+ "reasoning_display": {},
148
+ "performance_display": {}
149
+ }
150
+
151
+ def _split_into_paragraphs(self, text, max_length=300):
152
+ """
153
+ Split text into mobile-friendly paragraphs
154
+ """
155
+ # TODO: Implement intelligent paragraph splitting
156
+ words = text.split()
157
+ paragraphs = []
158
+ current_para = []
159
+
160
+ for word in words:
161
+ current_para.append(word)
162
+ if len(' '.join(current_para)) > max_length:
163
+ paragraphs.append(' '.join(current_para[:-1]))
164
+ current_para = [current_para[-1]]
165
+
166
+ if current_para:
167
+ paragraphs.append(' '.join(current_para))
168
+
169
+ return paragraphs
src/models_config.py ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # models_config.py
2
+ LLM_CONFIG = {
3
+ "primary_provider": "huggingface",
4
+ "models": {
5
+ "reasoning_primary": {
6
+ "model_id": "Qwen/Qwen2.5-7B-Instruct", # High-quality instruct model
7
+ "task": "general_reasoning",
8
+ "max_tokens": 2000,
9
+ "temperature": 0.7,
10
+ "cost_per_token": 0.000015,
11
+ "fallback": "gpt2" # Simple but guaranteed working model
12
+ },
13
+ "embedding_specialist": {
14
+ "model_id": "sentence-transformers/all-MiniLM-L6-v2",
15
+ "task": "embeddings",
16
+ "vector_dimensions": 384,
17
+ "purpose": "semantic_similarity",
18
+ "cost_advantage": "90%_cheaper_than_primary"
19
+ },
20
+ "classification_specialist": {
21
+ "model_id": "cardiffnlp/twitter-roberta-base-emotion",
22
+ "task": "intent_classification",
23
+ "max_length": 512,
24
+ "specialization": "fast_inference",
25
+ "latency_target": "<100ms"
26
+ },
27
+ "safety_checker": {
28
+ "model_id": "unitary/unbiased-toxic-roberta",
29
+ "task": "content_moderation",
30
+ "confidence_threshold": 0.85,
31
+ "purpose": "bias_detection"
32
+ }
33
+ },
34
+ "routing_logic": {
35
+ "strategy": "task_based_routing",
36
+ "fallback_chain": ["primary", "fallback", "degraded_mode"],
37
+ "load_balancing": "round_robin_with_health_check"
38
+ }
39
+ }
src/orchestrator_engine.py ADDED
@@ -0,0 +1,673 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # orchestrator_engine.py
2
+ import uuid
3
+ import logging
4
+ import time
5
+ from datetime import datetime
6
+
7
+ logger = logging.getLogger(__name__)
8
+
9
+ class MVPOrchestrator:
10
+ def __init__(self, llm_router, context_manager, agents):
11
+ self.llm_router = llm_router
12
+ self.context_manager = context_manager
13
+ self.agents = agents
14
+ self.execution_trace = []
15
+ logger.info("MVPOrchestrator initialized")
16
+
17
+ async def process_request(self, session_id: str, user_input: str) -> dict:
18
+ """
19
+ Main orchestration flow with academic differentiation and enhanced reasoning chain
20
+ """
21
+ logger.info(f"Processing request for session {session_id}")
22
+ logger.info(f"User input: {user_input[:100]}")
23
+
24
+ # Clear previous trace for new request
25
+ self.execution_trace = []
26
+ start_time = time.time()
27
+
28
+ # Initialize enhanced reasoning chain
29
+ reasoning_chain = {
30
+ "chain_of_thought": {},
31
+ "alternative_paths": [],
32
+ "uncertainty_areas": [],
33
+ "evidence_sources": [],
34
+ "confidence_calibration": {}
35
+ }
36
+
37
+ try:
38
+ # Step 1: Generate unique interaction ID
39
+ interaction_id = self._generate_interaction_id(session_id)
40
+ logger.info(f"Generated interaction ID: {interaction_id}")
41
+
42
+ # Step 2: Context management with reasoning
43
+ logger.info("Step 2: Managing context...")
44
+ context = await self.context_manager.manage_context(session_id, user_input)
45
+ logger.info(f"Context retrieved: {len(context.get('interactions', []))} interactions")
46
+
47
+ # Add context analysis to reasoning chain
48
+ reasoning_chain["chain_of_thought"]["step_1"] = {
49
+ "hypothesis": f"User is asking about: '{self._extract_main_topic(user_input)}'",
50
+ "evidence": [
51
+ f"Previous interactions: {len(context.get('interactions', []))}",
52
+ f"Session duration: {self._calculate_session_duration(context)}",
53
+ f"Topic continuity: {self._analyze_topic_continuity(context, user_input)}",
54
+ f"Query keywords: {self._extract_keywords(user_input)}"
55
+ ],
56
+ "confidence": 0.85,
57
+ "reasoning": f"Context analysis shows user is focused on {self._extract_main_topic(user_input)} with {len(context.get('interactions', []))} previous interactions"
58
+ }
59
+
60
+ # Step 3: Intent recognition with enhanced CoT
61
+ logger.info("Step 3: Recognizing intent...")
62
+ self.execution_trace.append({
63
+ "step": "intent_recognition",
64
+ "agent": "intent_recognition",
65
+ "status": "executing"
66
+ })
67
+ intent_result = await self.agents['intent_recognition'].execute(
68
+ user_input=user_input,
69
+ context=context
70
+ )
71
+ self.execution_trace[-1].update({
72
+ "status": "completed",
73
+ "result": {"primary_intent": intent_result.get('primary_intent', 'unknown')}
74
+ })
75
+ logger.info(f"Intent detected: {intent_result.get('primary_intent', 'unknown')}")
76
+
77
+ # Step 3.5: Skills Identification
78
+ logger.info("Step 3.5: Identifying relevant skills...")
79
+ self.execution_trace.append({
80
+ "step": "skills_identification",
81
+ "agent": "skills_identification",
82
+ "status": "executing"
83
+ })
84
+ skills_result = await self.agents['skills_identification'].execute(
85
+ user_input=user_input,
86
+ context=context
87
+ )
88
+ self.execution_trace[-1].update({
89
+ "status": "completed",
90
+ "result": {"skills_count": len(skills_result.get('identified_skills', []))}
91
+ })
92
+ logger.info(f"Skills identified: {len(skills_result.get('identified_skills', []))} skills")
93
+
94
+ # Add skills reasoning to chain
95
+ reasoning_chain["chain_of_thought"]["step_2_5"] = {
96
+ "hypothesis": f"User input relates to {len(skills_result.get('identified_skills', []))} expert skills",
97
+ "evidence": [
98
+ f"Market analysis: {skills_result.get('market_analysis', {}).get('overall_analysis', 'N/A')}",
99
+ f"Skill classification: {skills_result.get('skill_classification', {}).get('classification_reasoning', 'N/A')}",
100
+ f"High-probability skills: {[s.get('skill', '') for s in skills_result.get('identified_skills', [])[:3]]}",
101
+ f"Confidence score: {skills_result.get('confidence_score', 0.5)}"
102
+ ],
103
+ "confidence": skills_result.get('confidence_score', 0.5),
104
+ "reasoning": f"Skills identification completed for topic '{self._extract_main_topic(user_input)}' with {len(skills_result.get('identified_skills', []))} relevant skills"
105
+ }
106
+
107
+ # Add intent reasoning to chain
108
+ reasoning_chain["chain_of_thought"]["step_2"] = {
109
+ "hypothesis": f"User intent is '{intent_result.get('primary_intent', 'unknown')}' for topic '{self._extract_main_topic(user_input)}'",
110
+ "evidence": [
111
+ f"Pattern analysis: {self._extract_pattern_evidence(user_input)}",
112
+ f"Confidence scores: {intent_result.get('confidence_scores', {})}",
113
+ f"Secondary intents: {intent_result.get('secondary_intents', [])}",
114
+ f"Query complexity: {self._assess_query_complexity(user_input)}"
115
+ ],
116
+ "confidence": intent_result.get('confidence_scores', {}).get(intent_result.get('primary_intent', 'unknown'), 0.7),
117
+ "reasoning": f"Intent '{intent_result.get('primary_intent', 'unknown')}' detected for {self._extract_main_topic(user_input)} based on linguistic patterns and context"
118
+ }
119
+
120
+ # Step 4: Agent execution planning with reasoning
121
+ logger.info("Step 4: Creating execution plan...")
122
+ execution_plan = await self._create_execution_plan(intent_result, context)
123
+
124
+ # Add execution planning reasoning
125
+ reasoning_chain["chain_of_thought"]["step_3"] = {
126
+ "hypothesis": f"Optimal approach for '{intent_result.get('primary_intent', 'unknown')}' intent on '{self._extract_main_topic(user_input)}'",
127
+ "evidence": [
128
+ f"Intent complexity: {self._assess_intent_complexity(intent_result)}",
129
+ f"Required agents: {execution_plan.get('agents_to_execute', [])}",
130
+ f"Execution strategy: {execution_plan.get('execution_order', 'sequential')}",
131
+ f"Response scope: {self._determine_response_scope(user_input)}"
132
+ ],
133
+ "confidence": 0.80,
134
+ "reasoning": f"Agent selection optimized for {intent_result.get('primary_intent', 'unknown')} intent regarding {self._extract_main_topic(user_input)}"
135
+ }
136
+
137
+ # Step 5: Parallel agent execution
138
+ logger.info("Step 5: Executing agents...")
139
+ agent_results = await self._execute_agents(execution_plan, user_input, context)
140
+ logger.info(f"Agent execution complete: {len(agent_results)} results")
141
+
142
+ # Step 6: Response synthesis with reasoning
143
+ logger.info("Step 6: Synthesizing response...")
144
+ self.execution_trace.append({
145
+ "step": "response_synthesis",
146
+ "agent": "response_synthesis",
147
+ "status": "executing"
148
+ })
149
+ final_response = await self.agents['response_synthesis'].execute(
150
+ agent_outputs=agent_results,
151
+ user_input=user_input,
152
+ context=context
153
+ )
154
+ self.execution_trace[-1].update({
155
+ "status": "completed",
156
+ "result": {"synthesis_method": final_response.get('synthesis_method', 'unknown')}
157
+ })
158
+
159
+ # Add synthesis reasoning
160
+ reasoning_chain["chain_of_thought"]["step_4"] = {
161
+ "hypothesis": f"Response synthesis for '{self._extract_main_topic(user_input)}' using '{final_response.get('synthesis_method', 'unknown')}' method",
162
+ "evidence": [
163
+ f"Synthesis quality: {final_response.get('coherence_score', 0.7)}",
164
+ f"Source integration: {len(final_response.get('source_references', []))} sources",
165
+ f"Response length: {len(str(final_response.get('final_response', '')))} characters",
166
+ f"Content relevance: {self._assess_content_relevance(user_input, final_response)}"
167
+ ],
168
+ "confidence": final_response.get('coherence_score', 0.7),
169
+ "reasoning": f"Multi-source synthesis for {self._extract_main_topic(user_input)} using {final_response.get('synthesis_method', 'unknown')} approach"
170
+ }
171
+
172
+ # Step 7: Safety and bias check with reasoning
173
+ logger.info("Step 7: Safety check...")
174
+ self.execution_trace.append({
175
+ "step": "safety_check",
176
+ "agent": "safety_check",
177
+ "status": "executing"
178
+ })
179
+ safety_checked = await self.agents['safety_check'].execute(
180
+ response=final_response,
181
+ context=context
182
+ )
183
+ self.execution_trace[-1].update({
184
+ "status": "completed",
185
+ "result": {"warnings": safety_checked.get('warnings', [])}
186
+ })
187
+
188
+ # Add safety reasoning
189
+ reasoning_chain["chain_of_thought"]["step_5"] = {
190
+ "hypothesis": f"Safety validation for response about '{self._extract_main_topic(user_input)}'",
191
+ "evidence": [
192
+ f"Safety score: {safety_checked.get('safety_analysis', {}).get('overall_safety_score', 0.8)}",
193
+ f"Warnings generated: {len(safety_checked.get('warnings', []))}",
194
+ f"Analysis method: {safety_checked.get('safety_analysis', {}).get('analysis_method', 'unknown')}",
195
+ f"Content appropriateness: {self._assess_content_appropriateness(user_input, safety_checked)}"
196
+ ],
197
+ "confidence": safety_checked.get('safety_analysis', {}).get('overall_safety_score', 0.8),
198
+ "reasoning": f"Safety analysis for {self._extract_main_topic(user_input)} content with non-blocking warning system"
199
+ }
200
+
201
+ # Generate alternative paths and uncertainty analysis
202
+ reasoning_chain["alternative_paths"] = self._generate_alternative_paths(intent_result, user_input)
203
+ reasoning_chain["uncertainty_areas"] = self._identify_uncertainty_areas(intent_result, final_response, safety_checked)
204
+ reasoning_chain["evidence_sources"] = self._extract_evidence_sources(intent_result, final_response, context)
205
+ reasoning_chain["confidence_calibration"] = self._calibrate_confidence_scores(reasoning_chain)
206
+
207
+ processing_time = time.time() - start_time
208
+
209
+ result = self._format_final_output(safety_checked, interaction_id, {
210
+ 'intent': intent_result.get('primary_intent', 'unknown'),
211
+ 'execution_plan': execution_plan,
212
+ 'processing_steps': [
213
+ 'Context management',
214
+ 'Intent recognition',
215
+ 'Skills identification',
216
+ 'Execution planning',
217
+ 'Agent execution',
218
+ 'Response synthesis',
219
+ 'Safety check'
220
+ ],
221
+ 'processing_time': processing_time,
222
+ 'agents_used': list(self.agents.keys()),
223
+ 'intent_result': intent_result,
224
+ 'skills_result': skills_result,
225
+ 'synthesis_result': final_response,
226
+ 'reasoning_chain': reasoning_chain
227
+ })
228
+
229
+ # Update context with the final response for future context retrieval
230
+ response_text = str(result.get('response', ''))
231
+ if response_text:
232
+ self.context_manager._update_context(context, user_input, response_text)
233
+
234
+ logger.info(f"Request processing complete. Response length: {len(response_text)}")
235
+ return result
236
+
237
+ except Exception as e:
238
+ logger.error(f"Error in process_request: {e}", exc_info=True)
239
+ processing_time = time.time() - start_time
240
+ return {
241
+ "response": f"Error processing request: {str(e)}",
242
+ "error": str(e),
243
+ "interaction_id": str(uuid.uuid4())[:8],
244
+ "agent_trace": [],
245
+ "timestamp": datetime.now().isoformat(),
246
+ "metadata": {
247
+ "agents_used": [],
248
+ "processing_time": processing_time,
249
+ "token_count": 0,
250
+ "warnings": []
251
+ }
252
+ }
253
+
254
+ def _generate_interaction_id(self, session_id: str) -> str:
255
+ """
256
+ Generate unique interaction identifier
257
+ """
258
+ timestamp = datetime.now().isoformat()
259
+ unique_id = str(uuid.uuid4())[:8]
260
+ return f"{session_id}_{unique_id}_{int(datetime.now().timestamp())}"
261
+
262
+ async def _create_execution_plan(self, intent_result: dict, context: dict) -> dict:
263
+ """
264
+ Create execution plan based on intent recognition
265
+ """
266
+ # TODO: Implement agent selection and sequencing logic
267
+ return {
268
+ "agents_to_execute": [],
269
+ "execution_order": "parallel",
270
+ "priority": "normal"
271
+ }
272
+
273
+ async def _execute_agents(self, execution_plan: dict, user_input: str, context: dict) -> dict:
274
+ """
275
+ Execute agents in parallel or sequential order based on plan
276
+ """
277
+ # TODO: Implement parallel/sequential agent execution
278
+ return {}
279
+
280
+ def _format_final_output(self, response: dict, interaction_id: str, additional_metadata: dict = None) -> dict:
281
+ """
282
+ Format final output with tracing and metadata
283
+ """
284
+ # Extract the actual response text from various possible locations
285
+ response_text = (
286
+ response.get("final_response") or
287
+ response.get("safety_checked_response") or
288
+ response.get("original_response") or
289
+ response.get("response") or
290
+ str(response.get("result", ""))
291
+ )
292
+
293
+ if not response_text:
294
+ response_text = "I apologize, but I'm having trouble generating a response right now. Please try again."
295
+
296
+ # Extract warnings from safety check result
297
+ warnings = []
298
+ if "warnings" in response:
299
+ warnings = response["warnings"] if isinstance(response["warnings"], list) else []
300
+
301
+ # Build metadata dict
302
+ metadata = {
303
+ "agents_used": response.get("agents_used", []),
304
+ "processing_time": response.get("processing_time", 0),
305
+ "token_count": response.get("token_count", 0),
306
+ "warnings": warnings
307
+ }
308
+
309
+ # Merge in any additional metadata
310
+ if additional_metadata:
311
+ metadata.update(additional_metadata)
312
+
313
+ return {
314
+ "interaction_id": interaction_id,
315
+ "response": response_text,
316
+ "final_response": response_text, # Also provide as final_response for compatibility
317
+ "confidence_score": response.get("confidence_score", 0.7),
318
+ "agent_trace": self.execution_trace if self.execution_trace else [
319
+ {"step": "complete", "agent": "orchestrator", "status": "completed"}
320
+ ],
321
+ "timestamp": datetime.now().isoformat(),
322
+ "metadata": metadata
323
+ }
324
+
325
+ def get_execution_trace(self) -> list:
326
+ """
327
+ Return execution trace for debugging and analysis
328
+ """
329
+ return self.execution_trace
330
+
331
+ def clear_execution_trace(self):
332
+ """
333
+ Clear the execution trace
334
+ """
335
+ self.execution_trace = []
336
+
337
+ def _calculate_session_duration(self, context: dict) -> str:
338
+ """Calculate session duration for reasoning context"""
339
+ interactions = context.get('interactions', [])
340
+ if not interactions:
341
+ return "New session"
342
+
343
+ # Get first and last interaction timestamps
344
+ first_interaction = interactions[-1] if interactions else {}
345
+ last_interaction = interactions[0] if interactions else {}
346
+
347
+ # Simple duration calculation (in practice, would use actual timestamps)
348
+ interaction_count = len(interactions)
349
+ if interaction_count < 5:
350
+ return "Short session (< 5 interactions)"
351
+ elif interaction_count < 20:
352
+ return "Medium session (5-20 interactions)"
353
+ else:
354
+ return "Long session (> 20 interactions)"
355
+
356
+ def _analyze_topic_continuity(self, context: dict, user_input: str) -> str:
357
+ """Analyze topic continuity for reasoning context"""
358
+ interactions = context.get('interactions', [])
359
+ if not interactions:
360
+ return "No previous context"
361
+
362
+ # Simple topic analysis based on keywords
363
+ recent_topics = []
364
+ for interaction in interactions[:3]: # Last 3 interactions
365
+ user_msg = interaction.get('user_input', '').lower()
366
+ if 'machine learning' in user_msg or 'ml' in user_msg:
367
+ recent_topics.append('machine learning')
368
+ elif 'ai' in user_msg or 'artificial intelligence' in user_msg:
369
+ recent_topics.append('artificial intelligence')
370
+ elif 'data' in user_msg:
371
+ recent_topics.append('data science')
372
+
373
+ current_input_lower = user_input.lower()
374
+ if 'machine learning' in current_input_lower or 'ml' in current_input_lower:
375
+ current_topic = 'machine learning'
376
+ elif 'ai' in current_input_lower or 'artificial intelligence' in current_input_lower:
377
+ current_topic = 'artificial intelligence'
378
+ elif 'data' in current_input_lower:
379
+ current_topic = 'data science'
380
+ else:
381
+ current_topic = 'general'
382
+
383
+ if current_topic in recent_topics:
384
+ return f"Continuing {current_topic} discussion"
385
+ else:
386
+ return f"New topic: {current_topic}"
387
+
388
+ def _extract_pattern_evidence(self, user_input: str) -> str:
389
+ """Extract pattern evidence for intent reasoning"""
390
+ input_lower = user_input.lower()
391
+
392
+ # Question patterns
393
+ if any(word in input_lower for word in ['what', 'how', 'why', 'when', 'where', 'which']):
394
+ return "Question pattern detected"
395
+
396
+ # Request patterns
397
+ if any(word in input_lower for word in ['please', 'can you', 'could you', 'help me']):
398
+ return "Request pattern detected"
399
+
400
+ # Explanation patterns
401
+ if any(word in input_lower for word in ['explain', 'describe', 'tell me about']):
402
+ return "Explanation pattern detected"
403
+
404
+ # Analysis patterns
405
+ if any(word in input_lower for word in ['analyze', 'compare', 'evaluate', 'assess']):
406
+ return "Analysis pattern detected"
407
+
408
+ return "General conversational pattern"
409
+
410
+ def _assess_intent_complexity(self, intent_result: dict) -> str:
411
+ """Assess intent complexity for reasoning"""
412
+ primary_intent = intent_result.get('primary_intent', 'unknown')
413
+ confidence = intent_result.get('confidence_scores', {}).get(primary_intent, 0.5)
414
+ secondary_intents = intent_result.get('secondary_intents', [])
415
+
416
+ if confidence > 0.8 and len(secondary_intents) == 0:
417
+ return "Simple, clear intent"
418
+ elif confidence > 0.7 and len(secondary_intents) <= 1:
419
+ return "Moderate complexity"
420
+ else:
421
+ return "Complex, multi-faceted intent"
422
+
423
+ def _generate_alternative_paths(self, intent_result: dict, user_input: str) -> list:
424
+ """Generate alternative reasoning paths based on actual content"""
425
+ primary_intent = intent_result.get('primary_intent', 'unknown')
426
+ secondary_intents = intent_result.get('secondary_intents', [])
427
+ main_topic = self._extract_main_topic(user_input)
428
+
429
+ alternative_paths = []
430
+
431
+ # Add secondary intents as alternative paths
432
+ for secondary_intent in secondary_intents:
433
+ alternative_paths.append({
434
+ "path": f"Alternative intent: {secondary_intent} for {main_topic}",
435
+ "reasoning": f"Could interpret as {secondary_intent} based on linguistic patterns in the query about {main_topic}",
436
+ "confidence": intent_result.get('confidence_scores', {}).get(secondary_intent, 0.3),
437
+ "rejected_reason": f"Primary intent '{primary_intent}' has higher confidence for {main_topic} topic"
438
+ })
439
+
440
+ # Add method-based alternatives based on content
441
+ if 'curriculum' in user_input.lower() or 'course' in user_input.lower():
442
+ alternative_paths.append({
443
+ "path": "Structured educational framework approach",
444
+ "reasoning": f"Could provide a more structured educational framework for {main_topic}",
445
+ "confidence": 0.6,
446
+ "rejected_reason": f"Current approach better matches user's specific request for {main_topic}"
447
+ })
448
+
449
+ if 'detailed' in user_input.lower() or 'comprehensive' in user_input.lower():
450
+ alternative_paths.append({
451
+ "path": "High-level overview approach",
452
+ "reasoning": f"Could provide a high-level overview instead of detailed content for {main_topic}",
453
+ "confidence": 0.4,
454
+ "rejected_reason": f"User specifically requested detailed information about {main_topic}"
455
+ })
456
+
457
+ return alternative_paths
458
+
459
+ def _identify_uncertainty_areas(self, intent_result: dict, final_response: dict, safety_checked: dict) -> list:
460
+ """Identify areas of uncertainty in the reasoning based on actual content"""
461
+ uncertainty_areas = []
462
+
463
+ # Intent uncertainty
464
+ primary_intent = intent_result.get('primary_intent', 'unknown')
465
+ confidence = intent_result.get('confidence_scores', {}).get(primary_intent, 0.5)
466
+ if confidence < 0.8:
467
+ uncertainty_areas.append({
468
+ "aspect": f"Intent classification ({primary_intent}) for user's specific request",
469
+ "confidence": confidence,
470
+ "mitigation": "Provided multiple interpretation options and context-aware analysis"
471
+ })
472
+
473
+ # Response quality uncertainty
474
+ coherence_score = final_response.get('coherence_score', 0.7)
475
+ if coherence_score < 0.8:
476
+ uncertainty_areas.append({
477
+ "aspect": "Response coherence and structure for the specific topic",
478
+ "confidence": coherence_score,
479
+ "mitigation": "Applied quality enhancement techniques and content relevance checks"
480
+ })
481
+
482
+ # Safety uncertainty
483
+ safety_score = safety_checked.get('safety_analysis', {}).get('overall_safety_score', 0.8)
484
+ if safety_score < 0.9:
485
+ uncertainty_areas.append({
486
+ "aspect": "Content safety and bias assessment for educational content",
487
+ "confidence": safety_score,
488
+ "mitigation": "Generated advisory warnings for user awareness and content appropriateness"
489
+ })
490
+
491
+ # Content relevance uncertainty
492
+ response_text = str(final_response.get('final_response', ''))
493
+ if len(response_text) < 100: # Very short response
494
+ uncertainty_areas.append({
495
+ "aspect": "Response completeness for user's detailed request",
496
+ "confidence": 0.6,
497
+ "mitigation": "Enhanced response generation with topic-specific content"
498
+ })
499
+
500
+ return uncertainty_areas
501
+
502
+ def _extract_evidence_sources(self, intent_result: dict, final_response: dict, context: dict) -> list:
503
+ """Extract evidence sources for reasoning based on actual content"""
504
+ evidence_sources = []
505
+
506
+ # Intent evidence
507
+ evidence_sources.append({
508
+ "type": "linguistic_analysis",
509
+ "source": "Pattern matching and NLP analysis",
510
+ "relevance": 0.9,
511
+ "description": f"Intent classification based on linguistic patterns for '{intent_result.get('primary_intent', 'unknown')}' intent"
512
+ })
513
+
514
+ # Context evidence
515
+ interactions = context.get('interactions', [])
516
+ if interactions:
517
+ evidence_sources.append({
518
+ "type": "conversation_history",
519
+ "source": f"Previous {len(interactions)} interactions",
520
+ "relevance": 0.7,
521
+ "description": f"Conversation context and topic continuity analysis"
522
+ })
523
+
524
+ # Synthesis evidence
525
+ synthesis_method = final_response.get('synthesis_method', 'unknown')
526
+ evidence_sources.append({
527
+ "type": "synthesis_method",
528
+ "source": f"{synthesis_method} approach",
529
+ "relevance": 0.8,
530
+ "description": f"Response generated using {synthesis_method} methodology with quality optimization"
531
+ })
532
+
533
+ # Content-specific evidence
534
+ response_text = str(final_response.get('final_response', ''))
535
+ if len(response_text) > 1000:
536
+ evidence_sources.append({
537
+ "type": "content_analysis",
538
+ "source": "Comprehensive content generation",
539
+ "relevance": 0.85,
540
+ "description": "Detailed response generation based on user's specific requirements"
541
+ })
542
+
543
+ return evidence_sources
544
+
545
+ def _calibrate_confidence_scores(self, reasoning_chain: dict) -> dict:
546
+ """Calibrate confidence scores across the reasoning chain"""
547
+ chain_of_thought = reasoning_chain.get('chain_of_thought', {})
548
+
549
+ # Calculate overall confidence
550
+ step_confidences = []
551
+ for step_data in chain_of_thought.values():
552
+ if isinstance(step_data, dict) and 'confidence' in step_data:
553
+ step_confidences.append(step_data['confidence'])
554
+
555
+ overall_confidence = sum(step_confidences) / len(step_confidences) if step_confidences else 0.7
556
+
557
+ return {
558
+ "overall_confidence": overall_confidence,
559
+ "step_count": len(chain_of_thought),
560
+ "confidence_distribution": {
561
+ "high_confidence": len([c for c in step_confidences if c > 0.8]),
562
+ "medium_confidence": len([c for c in step_confidences if 0.6 <= c <= 0.8]),
563
+ "low_confidence": len([c for c in step_confidences if c < 0.6])
564
+ },
565
+ "calibration_method": "Weighted average of step confidences"
566
+ }
567
+
568
+ def _extract_main_topic(self, user_input: str) -> str:
569
+ """Extract the main topic from user input for context-aware reasoning"""
570
+ input_lower = user_input.lower()
571
+
572
+ # Topic extraction based on keywords
573
+ if any(word in input_lower for word in ['curriculum', 'course', 'teach', 'learning', 'education']):
574
+ if 'ai' in input_lower or 'chatbot' in input_lower or 'assistant' in input_lower:
575
+ return "AI chatbot course curriculum"
576
+ elif 'programming' in input_lower or 'python' in input_lower:
577
+ return "Programming course curriculum"
578
+ else:
579
+ return "Educational course design"
580
+
581
+ elif any(word in input_lower for word in ['machine learning', 'ml', 'neural network', 'deep learning']):
582
+ return "Machine learning concepts"
583
+
584
+ elif any(word in input_lower for word in ['ai', 'artificial intelligence', 'chatbot', 'assistant']):
585
+ return "Artificial intelligence and chatbots"
586
+
587
+ elif any(word in input_lower for word in ['data science', 'data analysis', 'analytics']):
588
+ return "Data science and analysis"
589
+
590
+ elif any(word in input_lower for word in ['programming', 'coding', 'development', 'software']):
591
+ return "Software development and programming"
592
+
593
+ else:
594
+ # Extract first few words as topic
595
+ words = user_input.split()[:4]
596
+ return " ".join(words) if words else "General inquiry"
597
+
598
+ def _extract_keywords(self, user_input: str) -> str:
599
+ """Extract key terms from user input"""
600
+ input_lower = user_input.lower()
601
+ keywords = []
602
+
603
+ # Extract important terms
604
+ important_terms = [
605
+ 'curriculum', 'course', 'teach', 'learning', 'education',
606
+ 'ai', 'artificial intelligence', 'chatbot', 'assistant',
607
+ 'machine learning', 'ml', 'neural network', 'deep learning',
608
+ 'programming', 'python', 'development', 'software',
609
+ 'data science', 'analytics', 'analysis'
610
+ ]
611
+
612
+ for term in important_terms:
613
+ if term in input_lower:
614
+ keywords.append(term)
615
+
616
+ return ", ".join(keywords[:5]) if keywords else "General terms"
617
+
618
+ def _assess_query_complexity(self, user_input: str) -> str:
619
+ """Assess the complexity of the user query"""
620
+ word_count = len(user_input.split())
621
+ question_count = user_input.count('?')
622
+
623
+ if word_count > 50 and question_count > 2:
624
+ return "Highly complex multi-part query"
625
+ elif word_count > 30 and question_count > 1:
626
+ return "Moderately complex query"
627
+ elif word_count > 15:
628
+ return "Standard complexity query"
629
+ else:
630
+ return "Simple query"
631
+
632
+ def _determine_response_scope(self, user_input: str) -> str:
633
+ """Determine the scope of response needed"""
634
+ input_lower = user_input.lower()
635
+
636
+ if any(word in input_lower for word in ['detailed', 'comprehensive', 'complete', 'full']):
637
+ return "Comprehensive detailed response"
638
+ elif any(word in input_lower for word in ['brief', 'short', 'summary', 'overview']):
639
+ return "Brief summary response"
640
+ elif any(word in input_lower for word in ['step by step', 'tutorial', 'guide', 'how to']):
641
+ return "Step-by-step instructional response"
642
+ else:
643
+ return "Standard informative response"
644
+
645
+ def _assess_content_relevance(self, user_input: str, final_response: dict) -> str:
646
+ """Assess how relevant the response content is to the user input"""
647
+ response_text = str(final_response.get('final_response', ''))
648
+
649
+ # Simple relevance check based on keyword overlap
650
+ input_words = set(user_input.lower().split())
651
+ response_words = set(response_text.lower().split())
652
+
653
+ overlap = len(input_words.intersection(response_words))
654
+ total_input_words = len(input_words)
655
+
656
+ if overlap / total_input_words > 0.3:
657
+ return "High relevance to user query"
658
+ elif overlap / total_input_words > 0.15:
659
+ return "Moderate relevance to user query"
660
+ else:
661
+ return "Low relevance to user query"
662
+
663
+ def _assess_content_appropriateness(self, user_input: str, safety_checked: dict) -> str:
664
+ """Assess content appropriateness for the topic"""
665
+ warnings = safety_checked.get('warnings', [])
666
+ safety_score = safety_checked.get('safety_analysis', {}).get('overall_safety_score', 0.8)
667
+
668
+ if safety_score > 0.9 and len(warnings) == 0:
669
+ return "Highly appropriate content"
670
+ elif safety_score > 0.8 and len(warnings) <= 1:
671
+ return "Appropriate content with minor notes"
672
+ else:
673
+ return "Content requires review"