# LLM-Based Topic Extraction Implementation (Option 2) ## Summary Successfully implemented Option 2: LLM-based zero-shot classification for topic extraction and continuity analysis, replacing hardcoded pattern matching. ## Changes Implemented ### 1. Topic Cache Infrastructure **Location**: `src/orchestrator_engine.py` - `__init__()` (lines 34-36) **Added**: ```python # Cache for topic extraction to reduce API calls self._topic_cache = {} self._topic_cache_max_size = 100 # Limit cache size ``` **Purpose**: Cache topic extraction results to minimize LLM API calls for identical queries. --- ### 2. LLM-Based Topic Extraction **Location**: `src/orchestrator_engine.py` - `_extract_main_topic()` (lines 1276-1343) **Changes**: - **Method signature**: Changed to `async def _extract_main_topic(self, user_input: str, context: dict = None) -> str` - **Implementation**: Uses LLM zero-shot classification instead of hardcoded keywords - **Context-aware**: Uses session_context and interaction_contexts from cache when available - **Caching**: Implements cache with FIFO eviction (max 100 entries) - **Fallback**: Falls back to simple word extraction if LLM unavailable **LLM Prompt**: ``` Classify the main topic of this query in 2-5 words. Be specific and concise. Query: "{user_input}" [Session context if available] Respond with ONLY the topic name (e.g., "Machine Learning", "Healthcare Analytics"). ``` **Temperature**: 0.3 (for consistency) --- ### 3. LLM-Based Topic Continuity Analysis **Location**: `src/orchestrator_engine.py` - `_analyze_topic_continuity()` (lines 1029-1094) **Changes**: - **Method signature**: Changed to `async def _analyze_topic_continuity(self, context: dict, user_input: str) -> str` - **Implementation**: Uses LLM to determine if query continues previous topic or introduces new topic - **Context-aware**: Uses session_context and interaction_contexts from cache - **Format validation**: Validates LLM response format ("Continuing X" or "New topic: X") - **Fallback**: Returns descriptive message if LLM unavailable **LLM Prompt**: ``` Determine if the current query continues the previous conversation topic or introduces a new topic. Session Summary: {session_summary} Recent Interactions: {recent_interactions} Current Query: "{user_input}" Respond with EXACTLY one of: - "Continuing [topic name] discussion" if same topic - "New topic: [topic name]" if different topic ``` **Temperature**: 0.3 (for consistency) --- ### 4. Keyword Extraction Update **Location**: `src/orchestrator_engine.py` - `_extract_keywords()` (lines 1345-1361) **Changes**: - **Method signature**: Changed to `async def _extract_keywords(self, user_input: str) -> str` - **Implementation**: Simple regex-based extraction (not LLM-based for performance) - **Stop word filtering**: Filters common stop words - **Note**: Can be enhanced with LLM if needed, but kept simple for performance --- ### 5. Updated All Usage Sites **Location**: `src/orchestrator_engine.py` - `process_request()` (lines 184-200) **Changes**: - **Extract topic once**: `main_topic = await self._extract_main_topic(user_input, context)` - **Extract continuity**: `topic_continuity = await self._analyze_topic_continuity(context, user_input)` - **Extract keywords**: `query_keywords = await self._extract_keywords(user_input)` - **Reuse main_topic**: All 18+ usage sites now use the `main_topic` variable instead of calling method repeatedly **Updated Reasoning Chain Steps**: - Step 1: Uses `main_topic` (line 190) - Step 2: Uses `main_topic` (line 251, 259) - Step 3: Uses `main_topic` (line 268, 276) - Step 4: Uses `main_topic` (line 304, 312) - Step 5: Uses `main_topic` (line 384, 392) - Alternative paths: Uses `main_topic` (line 403, 1146-1166) **Error Recovery**: Simplified to avoid async complexity (line 1733) --- ### 6. Alternative Paths Method Update **Location**: `src/orchestrator_engine.py` - `_generate_alternative_paths()` (lines 1136-1169) **Changes**: - **Method signature**: Added `main_topic` parameter - **Before**: `def _generate_alternative_paths(self, intent_result: dict, user_input: str) -> list:` - **After**: `def _generate_alternative_paths(self, intent_result: dict, user_input: str, main_topic: str) -> list:` - **Updated call site**: Line 403 passes `main_topic` as third parameter --- ## Performance Characteristics ### Latency Impact **Per Request**: - 2 LLM calls per request (topic extraction + continuity analysis) - Estimated latency: ~200-500ms total (depending on LLM router) - Caching reduces repeat calls: Cache hit = 0ms latency **Mitigation**: - Topic extraction cached per unique query (MD5 hash) - Cache size limited to 100 entries (FIFO eviction) - Keywords extraction kept simple (no LLM, minimal latency) ### API Costs **Per Request**: - Topic extraction: ~50-100 tokens - Topic continuity: ~100-150 tokens - Total: ~150-250 tokens per request (first time) - Cached requests: 0 tokens **Monthly Estimate** (assuming 1000 unique queries/day): - First requests: ~150-250k tokens/day = ~4.5-7.5M tokens/month - Subsequent requests: Cached, 0 tokens - Actual usage depends on cache hit rate --- ## Error Handling ### Fallback Mechanisms 1. **Topic Extraction**: - If LLM unavailable: Falls back to first 4 words of query - If LLM error: Logs error, returns fallback - Cache miss handling: Generates and caches 2. **Topic Continuity**: - If LLM unavailable: Returns "Topic continuity analysis unavailable" - If no context: Returns "No previous context" - If LLM error: Logs error, returns "Topic continuity analysis failed" 3. **Keywords**: - Simple extraction, no LLM dependency - Error handling: Returns "General terms" on exception --- ## Testing Recommendations ### Unit Tests 1. **Topic Extraction**: - Test LLM-based extraction with various queries - Test caching behavior (cache hit/miss) - Test fallback behavior when LLM unavailable - Test context-aware extraction 2. **Topic Continuity**: - Test continuation detection - Test new topic detection - Test with empty context - Test format validation 3. **Integration Tests**: - Test full request flow with LLM calls - Test cache persistence across requests - Test error recovery with LLM failures ### Performance Tests 1. **Latency Measurement**: - Measure average latency with LLM calls - Measure latency with cache hits - Compare to previous pattern-based approach 2. **Cache Effectiveness**: - Measure cache hit rate - Test cache eviction behavior --- ## Migration Notes ### Breaking Changes **None**: All changes are internal to orchestrator. External API unchanged. ### Compatibility - **LLM Router Required**: System requires `llm_router` to be available - **Graceful Degradation**: Falls back to simple extraction if LLM unavailable - **Backward Compatible**: Old pattern-based code removed, but fallbacks maintain functionality --- ## Benefits Realized ✅ **Accurate Topic Classification**: LLM understands context, synonyms, nuances ✅ **Domain Adaptive**: Works for any domain without code changes ✅ **Context-Aware**: Uses session_context and interaction_contexts ✅ **Human-Readable**: Maintains descriptive reasoning chain strings ✅ **Scalable**: No manual keyword list maintenance ✅ **Cached**: Reduces API calls for repeated queries --- ## Trade-offs ⚠️ **Latency**: Adds ~200-500ms per request (first time, cached after) ⚠️ **API Costs**: ~150-250 tokens per request (first time) ⚠️ **LLM Dependency**: Requires LLM router to be functional ⚠️ **Complexity**: More code to maintain (async handling, caching, error handling) ⚠️ **Inconsistency Risk**: LLM responses may vary slightly (mitigated by temperature=0.3) --- ## Files Modified 1. `src/orchestrator_engine.py`: - Added topic cache infrastructure - Rewrote `_extract_main_topic()` to use LLM - Rewrote `_analyze_topic_continuity()` to use LLM - Updated `_extract_keywords()` to async - Updated all 18+ usage sites to use cached `main_topic` - Updated `_generate_alternative_paths()` signature --- ## Next Steps 1. **Monitor Performance**: Track latency and cache hit rates 2. **Tune Caching**: Adjust cache size based on usage patterns 3. **Optional Enhancements**: - Consider LLM-based keyword extraction if needed - Add topic extraction metrics/logging - Implement cache persistence across restarts --- ## Conclusion Option 2 implementation complete. System now uses LLM-based zero-shot classification for topic extraction and continuity analysis, providing accurate, context-aware topic classification without hardcoded patterns. Caching minimizes latency and API costs for repeated queries.