Research_AI_Assistant / LLM_BASED_TOPIC_EXTRACTION_IMPLEMENTATION.md
JatsTheAIGen's picture
cache key error when user id changes -fixed task 1 31_10_2025 v6
93f44e2
# LLM-Based Topic Extraction Implementation (Option 2)
## Summary
Successfully implemented Option 2: LLM-based zero-shot classification for topic extraction and continuity analysis, replacing hardcoded pattern matching.
## Changes Implemented
### 1. Topic Cache Infrastructure
**Location**: `src/orchestrator_engine.py` - `__init__()` (lines 34-36)
**Added**:
```python
# Cache for topic extraction to reduce API calls
self._topic_cache = {}
self._topic_cache_max_size = 100 # Limit cache size
```
**Purpose**: Cache topic extraction results to minimize LLM API calls for identical queries.
---
### 2. LLM-Based Topic Extraction
**Location**: `src/orchestrator_engine.py` - `_extract_main_topic()` (lines 1276-1343)
**Changes**:
- **Method signature**: Changed to `async def _extract_main_topic(self, user_input: str, context: dict = None) -> str`
- **Implementation**: Uses LLM zero-shot classification instead of hardcoded keywords
- **Context-aware**: Uses session_context and interaction_contexts from cache when available
- **Caching**: Implements cache with FIFO eviction (max 100 entries)
- **Fallback**: Falls back to simple word extraction if LLM unavailable
**LLM Prompt**:
```
Classify the main topic of this query in 2-5 words. Be specific and concise.
Query: "{user_input}"
[Session context if available]
Respond with ONLY the topic name (e.g., "Machine Learning", "Healthcare Analytics").
```
**Temperature**: 0.3 (for consistency)
---
### 3. LLM-Based Topic Continuity Analysis
**Location**: `src/orchestrator_engine.py` - `_analyze_topic_continuity()` (lines 1029-1094)
**Changes**:
- **Method signature**: Changed to `async def _analyze_topic_continuity(self, context: dict, user_input: str) -> str`
- **Implementation**: Uses LLM to determine if query continues previous topic or introduces new topic
- **Context-aware**: Uses session_context and interaction_contexts from cache
- **Format validation**: Validates LLM response format ("Continuing X" or "New topic: X")
- **Fallback**: Returns descriptive message if LLM unavailable
**LLM Prompt**:
```
Determine if the current query continues the previous conversation topic or introduces a new topic.
Session Summary: {session_summary}
Recent Interactions: {recent_interactions}
Current Query: "{user_input}"
Respond with EXACTLY one of:
- "Continuing [topic name] discussion" if same topic
- "New topic: [topic name]" if different topic
```
**Temperature**: 0.3 (for consistency)
---
### 4. Keyword Extraction Update
**Location**: `src/orchestrator_engine.py` - `_extract_keywords()` (lines 1345-1361)
**Changes**:
- **Method signature**: Changed to `async def _extract_keywords(self, user_input: str) -> str`
- **Implementation**: Simple regex-based extraction (not LLM-based for performance)
- **Stop word filtering**: Filters common stop words
- **Note**: Can be enhanced with LLM if needed, but kept simple for performance
---
### 5. Updated All Usage Sites
**Location**: `src/orchestrator_engine.py` - `process_request()` (lines 184-200)
**Changes**:
- **Extract topic once**: `main_topic = await self._extract_main_topic(user_input, context)`
- **Extract continuity**: `topic_continuity = await self._analyze_topic_continuity(context, user_input)`
- **Extract keywords**: `query_keywords = await self._extract_keywords(user_input)`
- **Reuse main_topic**: All 18+ usage sites now use the `main_topic` variable instead of calling method repeatedly
**Updated Reasoning Chain Steps**:
- Step 1: Uses `main_topic` (line 190)
- Step 2: Uses `main_topic` (line 251, 259)
- Step 3: Uses `main_topic` (line 268, 276)
- Step 4: Uses `main_topic` (line 304, 312)
- Step 5: Uses `main_topic` (line 384, 392)
- Alternative paths: Uses `main_topic` (line 403, 1146-1166)
**Error Recovery**: Simplified to avoid async complexity (line 1733)
---
### 6. Alternative Paths Method Update
**Location**: `src/orchestrator_engine.py` - `_generate_alternative_paths()` (lines 1136-1169)
**Changes**:
- **Method signature**: Added `main_topic` parameter
- **Before**: `def _generate_alternative_paths(self, intent_result: dict, user_input: str) -> list:`
- **After**: `def _generate_alternative_paths(self, intent_result: dict, user_input: str, main_topic: str) -> list:`
- **Updated call site**: Line 403 passes `main_topic` as third parameter
---
## Performance Characteristics
### Latency Impact
**Per Request**:
- 2 LLM calls per request (topic extraction + continuity analysis)
- Estimated latency: ~200-500ms total (depending on LLM router)
- Caching reduces repeat calls: Cache hit = 0ms latency
**Mitigation**:
- Topic extraction cached per unique query (MD5 hash)
- Cache size limited to 100 entries (FIFO eviction)
- Keywords extraction kept simple (no LLM, minimal latency)
### API Costs
**Per Request**:
- Topic extraction: ~50-100 tokens
- Topic continuity: ~100-150 tokens
- Total: ~150-250 tokens per request (first time)
- Cached requests: 0 tokens
**Monthly Estimate** (assuming 1000 unique queries/day):
- First requests: ~150-250k tokens/day = ~4.5-7.5M tokens/month
- Subsequent requests: Cached, 0 tokens
- Actual usage depends on cache hit rate
---
## Error Handling
### Fallback Mechanisms
1. **Topic Extraction**:
- If LLM unavailable: Falls back to first 4 words of query
- If LLM error: Logs error, returns fallback
- Cache miss handling: Generates and caches
2. **Topic Continuity**:
- If LLM unavailable: Returns "Topic continuity analysis unavailable"
- If no context: Returns "No previous context"
- If LLM error: Logs error, returns "Topic continuity analysis failed"
3. **Keywords**:
- Simple extraction, no LLM dependency
- Error handling: Returns "General terms" on exception
---
## Testing Recommendations
### Unit Tests
1. **Topic Extraction**:
- Test LLM-based extraction with various queries
- Test caching behavior (cache hit/miss)
- Test fallback behavior when LLM unavailable
- Test context-aware extraction
2. **Topic Continuity**:
- Test continuation detection
- Test new topic detection
- Test with empty context
- Test format validation
3. **Integration Tests**:
- Test full request flow with LLM calls
- Test cache persistence across requests
- Test error recovery with LLM failures
### Performance Tests
1. **Latency Measurement**:
- Measure average latency with LLM calls
- Measure latency with cache hits
- Compare to previous pattern-based approach
2. **Cache Effectiveness**:
- Measure cache hit rate
- Test cache eviction behavior
---
## Migration Notes
### Breaking Changes
**None**: All changes are internal to orchestrator. External API unchanged.
### Compatibility
- **LLM Router Required**: System requires `llm_router` to be available
- **Graceful Degradation**: Falls back to simple extraction if LLM unavailable
- **Backward Compatible**: Old pattern-based code removed, but fallbacks maintain functionality
---
## Benefits Realized
✅ **Accurate Topic Classification**: LLM understands context, synonyms, nuances
✅ **Domain Adaptive**: Works for any domain without code changes
✅ **Context-Aware**: Uses session_context and interaction_contexts
✅ **Human-Readable**: Maintains descriptive reasoning chain strings
✅ **Scalable**: No manual keyword list maintenance
✅ **Cached**: Reduces API calls for repeated queries
---
## Trade-offs
⚠️ **Latency**: Adds ~200-500ms per request (first time, cached after)
⚠️ **API Costs**: ~150-250 tokens per request (first time)
⚠️ **LLM Dependency**: Requires LLM router to be functional
⚠️ **Complexity**: More code to maintain (async handling, caching, error handling)
⚠️ **Inconsistency Risk**: LLM responses may vary slightly (mitigated by temperature=0.3)
---
## Files Modified
1. `src/orchestrator_engine.py`:
- Added topic cache infrastructure
- Rewrote `_extract_main_topic()` to use LLM
- Rewrote `_analyze_topic_continuity()` to use LLM
- Updated `_extract_keywords()` to async
- Updated all 18+ usage sites to use cached `main_topic`
- Updated `_generate_alternative_paths()` signature
---
## Next Steps
1. **Monitor Performance**: Track latency and cache hit rates
2. **Tune Caching**: Adjust cache size based on usage patterns
3. **Optional Enhancements**:
- Consider LLM-based keyword extraction if needed
- Add topic extraction metrics/logging
- Implement cache persistence across restarts
---
## Conclusion
Option 2 implementation complete. System now uses LLM-based zero-shot classification for topic extraction and continuity analysis, providing accurate, context-aware topic classification without hardcoded patterns. Caching minimizes latency and API costs for repeated queries.