| # LLM-Based Topic Extraction Implementation (Option 2) | |
| ## Summary | |
| Successfully implemented Option 2: LLM-based zero-shot classification for topic extraction and continuity analysis, replacing hardcoded pattern matching. | |
| ## Changes Implemented | |
| ### 1. Topic Cache Infrastructure | |
| **Location**: `src/orchestrator_engine.py` - `__init__()` (lines 34-36) | |
| **Added**: | |
| ```python | |
| # Cache for topic extraction to reduce API calls | |
| self._topic_cache = {} | |
| self._topic_cache_max_size = 100 # Limit cache size | |
| ``` | |
| **Purpose**: Cache topic extraction results to minimize LLM API calls for identical queries. | |
| --- | |
| ### 2. LLM-Based Topic Extraction | |
| **Location**: `src/orchestrator_engine.py` - `_extract_main_topic()` (lines 1276-1343) | |
| **Changes**: | |
| - **Method signature**: Changed to `async def _extract_main_topic(self, user_input: str, context: dict = None) -> str` | |
| - **Implementation**: Uses LLM zero-shot classification instead of hardcoded keywords | |
| - **Context-aware**: Uses session_context and interaction_contexts from cache when available | |
| - **Caching**: Implements cache with FIFO eviction (max 100 entries) | |
| - **Fallback**: Falls back to simple word extraction if LLM unavailable | |
| **LLM Prompt**: | |
| ``` | |
| Classify the main topic of this query in 2-5 words. Be specific and concise. | |
| Query: "{user_input}" | |
| [Session context if available] | |
| Respond with ONLY the topic name (e.g., "Machine Learning", "Healthcare Analytics"). | |
| ``` | |
| **Temperature**: 0.3 (for consistency) | |
| --- | |
| ### 3. LLM-Based Topic Continuity Analysis | |
| **Location**: `src/orchestrator_engine.py` - `_analyze_topic_continuity()` (lines 1029-1094) | |
| **Changes**: | |
| - **Method signature**: Changed to `async def _analyze_topic_continuity(self, context: dict, user_input: str) -> str` | |
| - **Implementation**: Uses LLM to determine if query continues previous topic or introduces new topic | |
| - **Context-aware**: Uses session_context and interaction_contexts from cache | |
| - **Format validation**: Validates LLM response format ("Continuing X" or "New topic: X") | |
| - **Fallback**: Returns descriptive message if LLM unavailable | |
| **LLM Prompt**: | |
| ``` | |
| Determine if the current query continues the previous conversation topic or introduces a new topic. | |
| Session Summary: {session_summary} | |
| Recent Interactions: {recent_interactions} | |
| Current Query: "{user_input}" | |
| Respond with EXACTLY one of: | |
| - "Continuing [topic name] discussion" if same topic | |
| - "New topic: [topic name]" if different topic | |
| ``` | |
| **Temperature**: 0.3 (for consistency) | |
| --- | |
| ### 4. Keyword Extraction Update | |
| **Location**: `src/orchestrator_engine.py` - `_extract_keywords()` (lines 1345-1361) | |
| **Changes**: | |
| - **Method signature**: Changed to `async def _extract_keywords(self, user_input: str) -> str` | |
| - **Implementation**: Simple regex-based extraction (not LLM-based for performance) | |
| - **Stop word filtering**: Filters common stop words | |
| - **Note**: Can be enhanced with LLM if needed, but kept simple for performance | |
| --- | |
| ### 5. Updated All Usage Sites | |
| **Location**: `src/orchestrator_engine.py` - `process_request()` (lines 184-200) | |
| **Changes**: | |
| - **Extract topic once**: `main_topic = await self._extract_main_topic(user_input, context)` | |
| - **Extract continuity**: `topic_continuity = await self._analyze_topic_continuity(context, user_input)` | |
| - **Extract keywords**: `query_keywords = await self._extract_keywords(user_input)` | |
| - **Reuse main_topic**: All 18+ usage sites now use the `main_topic` variable instead of calling method repeatedly | |
| **Updated Reasoning Chain Steps**: | |
| - Step 1: Uses `main_topic` (line 190) | |
| - Step 2: Uses `main_topic` (line 251, 259) | |
| - Step 3: Uses `main_topic` (line 268, 276) | |
| - Step 4: Uses `main_topic` (line 304, 312) | |
| - Step 5: Uses `main_topic` (line 384, 392) | |
| - Alternative paths: Uses `main_topic` (line 403, 1146-1166) | |
| **Error Recovery**: Simplified to avoid async complexity (line 1733) | |
| --- | |
| ### 6. Alternative Paths Method Update | |
| **Location**: `src/orchestrator_engine.py` - `_generate_alternative_paths()` (lines 1136-1169) | |
| **Changes**: | |
| - **Method signature**: Added `main_topic` parameter | |
| - **Before**: `def _generate_alternative_paths(self, intent_result: dict, user_input: str) -> list:` | |
| - **After**: `def _generate_alternative_paths(self, intent_result: dict, user_input: str, main_topic: str) -> list:` | |
| - **Updated call site**: Line 403 passes `main_topic` as third parameter | |
| --- | |
| ## Performance Characteristics | |
| ### Latency Impact | |
| **Per Request**: | |
| - 2 LLM calls per request (topic extraction + continuity analysis) | |
| - Estimated latency: ~200-500ms total (depending on LLM router) | |
| - Caching reduces repeat calls: Cache hit = 0ms latency | |
| **Mitigation**: | |
| - Topic extraction cached per unique query (MD5 hash) | |
| - Cache size limited to 100 entries (FIFO eviction) | |
| - Keywords extraction kept simple (no LLM, minimal latency) | |
| ### API Costs | |
| **Per Request**: | |
| - Topic extraction: ~50-100 tokens | |
| - Topic continuity: ~100-150 tokens | |
| - Total: ~150-250 tokens per request (first time) | |
| - Cached requests: 0 tokens | |
| **Monthly Estimate** (assuming 1000 unique queries/day): | |
| - First requests: ~150-250k tokens/day = ~4.5-7.5M tokens/month | |
| - Subsequent requests: Cached, 0 tokens | |
| - Actual usage depends on cache hit rate | |
| --- | |
| ## Error Handling | |
| ### Fallback Mechanisms | |
| 1. **Topic Extraction**: | |
| - If LLM unavailable: Falls back to first 4 words of query | |
| - If LLM error: Logs error, returns fallback | |
| - Cache miss handling: Generates and caches | |
| 2. **Topic Continuity**: | |
| - If LLM unavailable: Returns "Topic continuity analysis unavailable" | |
| - If no context: Returns "No previous context" | |
| - If LLM error: Logs error, returns "Topic continuity analysis failed" | |
| 3. **Keywords**: | |
| - Simple extraction, no LLM dependency | |
| - Error handling: Returns "General terms" on exception | |
| --- | |
| ## Testing Recommendations | |
| ### Unit Tests | |
| 1. **Topic Extraction**: | |
| - Test LLM-based extraction with various queries | |
| - Test caching behavior (cache hit/miss) | |
| - Test fallback behavior when LLM unavailable | |
| - Test context-aware extraction | |
| 2. **Topic Continuity**: | |
| - Test continuation detection | |
| - Test new topic detection | |
| - Test with empty context | |
| - Test format validation | |
| 3. **Integration Tests**: | |
| - Test full request flow with LLM calls | |
| - Test cache persistence across requests | |
| - Test error recovery with LLM failures | |
| ### Performance Tests | |
| 1. **Latency Measurement**: | |
| - Measure average latency with LLM calls | |
| - Measure latency with cache hits | |
| - Compare to previous pattern-based approach | |
| 2. **Cache Effectiveness**: | |
| - Measure cache hit rate | |
| - Test cache eviction behavior | |
| --- | |
| ## Migration Notes | |
| ### Breaking Changes | |
| **None**: All changes are internal to orchestrator. External API unchanged. | |
| ### Compatibility | |
| - **LLM Router Required**: System requires `llm_router` to be available | |
| - **Graceful Degradation**: Falls back to simple extraction if LLM unavailable | |
| - **Backward Compatible**: Old pattern-based code removed, but fallbacks maintain functionality | |
| --- | |
| ## Benefits Realized | |
| ✅ **Accurate Topic Classification**: LLM understands context, synonyms, nuances | |
| ✅ **Domain Adaptive**: Works for any domain without code changes | |
| ✅ **Context-Aware**: Uses session_context and interaction_contexts | |
| ✅ **Human-Readable**: Maintains descriptive reasoning chain strings | |
| ✅ **Scalable**: No manual keyword list maintenance | |
| ✅ **Cached**: Reduces API calls for repeated queries | |
| --- | |
| ## Trade-offs | |
| ⚠️ **Latency**: Adds ~200-500ms per request (first time, cached after) | |
| ⚠️ **API Costs**: ~150-250 tokens per request (first time) | |
| ⚠️ **LLM Dependency**: Requires LLM router to be functional | |
| ⚠️ **Complexity**: More code to maintain (async handling, caching, error handling) | |
| ⚠️ **Inconsistency Risk**: LLM responses may vary slightly (mitigated by temperature=0.3) | |
| --- | |
| ## Files Modified | |
| 1. `src/orchestrator_engine.py`: | |
| - Added topic cache infrastructure | |
| - Rewrote `_extract_main_topic()` to use LLM | |
| - Rewrote `_analyze_topic_continuity()` to use LLM | |
| - Updated `_extract_keywords()` to async | |
| - Updated all 18+ usage sites to use cached `main_topic` | |
| - Updated `_generate_alternative_paths()` signature | |
| --- | |
| ## Next Steps | |
| 1. **Monitor Performance**: Track latency and cache hit rates | |
| 2. **Tune Caching**: Adjust cache size based on usage patterns | |
| 3. **Optional Enhancements**: | |
| - Consider LLM-based keyword extraction if needed | |
| - Add topic extraction metrics/logging | |
| - Implement cache persistence across restarts | |
| --- | |
| ## Conclusion | |
| Option 2 implementation complete. System now uses LLM-based zero-shot classification for topic extraction and continuity analysis, providing accurate, context-aware topic classification without hardcoded patterns. Caching minimizes latency and API costs for repeated queries. | |