LLM-Based Topic Extraction Implementation (Option 2)
Summary
Successfully implemented Option 2: LLM-based zero-shot classification for topic extraction and continuity analysis, replacing hardcoded pattern matching.
Changes Implemented
1. Topic Cache Infrastructure
Location: src/orchestrator_engine.py - __init__() (lines 34-36)
Added:
# Cache for topic extraction to reduce API calls
self._topic_cache = {}
self._topic_cache_max_size = 100 # Limit cache size
Purpose: Cache topic extraction results to minimize LLM API calls for identical queries.
2. LLM-Based Topic Extraction
Location: src/orchestrator_engine.py - _extract_main_topic() (lines 1276-1343)
Changes:
- Method signature: Changed to
async def _extract_main_topic(self, user_input: str, context: dict = None) -> str - Implementation: Uses LLM zero-shot classification instead of hardcoded keywords
- Context-aware: Uses session_context and interaction_contexts from cache when available
- Caching: Implements cache with FIFO eviction (max 100 entries)
- Fallback: Falls back to simple word extraction if LLM unavailable
LLM Prompt:
Classify the main topic of this query in 2-5 words. Be specific and concise.
Query: "{user_input}"
[Session context if available]
Respond with ONLY the topic name (e.g., "Machine Learning", "Healthcare Analytics").
Temperature: 0.3 (for consistency)
3. LLM-Based Topic Continuity Analysis
Location: src/orchestrator_engine.py - _analyze_topic_continuity() (lines 1029-1094)
Changes:
- Method signature: Changed to
async def _analyze_topic_continuity(self, context: dict, user_input: str) -> str - Implementation: Uses LLM to determine if query continues previous topic or introduces new topic
- Context-aware: Uses session_context and interaction_contexts from cache
- Format validation: Validates LLM response format ("Continuing X" or "New topic: X")
- Fallback: Returns descriptive message if LLM unavailable
LLM Prompt:
Determine if the current query continues the previous conversation topic or introduces a new topic.
Session Summary: {session_summary}
Recent Interactions: {recent_interactions}
Current Query: "{user_input}"
Respond with EXACTLY one of:
- "Continuing [topic name] discussion" if same topic
- "New topic: [topic name]" if different topic
Temperature: 0.3 (for consistency)
4. Keyword Extraction Update
Location: src/orchestrator_engine.py - _extract_keywords() (lines 1345-1361)
Changes:
- Method signature: Changed to
async def _extract_keywords(self, user_input: str) -> str - Implementation: Simple regex-based extraction (not LLM-based for performance)
- Stop word filtering: Filters common stop words
- Note: Can be enhanced with LLM if needed, but kept simple for performance
5. Updated All Usage Sites
Location: src/orchestrator_engine.py - process_request() (lines 184-200)
Changes:
- Extract topic once:
main_topic = await self._extract_main_topic(user_input, context) - Extract continuity:
topic_continuity = await self._analyze_topic_continuity(context, user_input) - Extract keywords:
query_keywords = await self._extract_keywords(user_input) - Reuse main_topic: All 18+ usage sites now use the
main_topicvariable instead of calling method repeatedly
Updated Reasoning Chain Steps:
- Step 1: Uses
main_topic(line 190) - Step 2: Uses
main_topic(line 251, 259) - Step 3: Uses
main_topic(line 268, 276) - Step 4: Uses
main_topic(line 304, 312) - Step 5: Uses
main_topic(line 384, 392) - Alternative paths: Uses
main_topic(line 403, 1146-1166)
Error Recovery: Simplified to avoid async complexity (line 1733)
6. Alternative Paths Method Update
Location: src/orchestrator_engine.py - _generate_alternative_paths() (lines 1136-1169)
Changes:
- Method signature: Added
main_topicparameter - Before:
def _generate_alternative_paths(self, intent_result: dict, user_input: str) -> list: - After:
def _generate_alternative_paths(self, intent_result: dict, user_input: str, main_topic: str) -> list: - Updated call site: Line 403 passes
main_topicas third parameter
Performance Characteristics
Latency Impact
Per Request:
- 2 LLM calls per request (topic extraction + continuity analysis)
- Estimated latency: ~200-500ms total (depending on LLM router)
- Caching reduces repeat calls: Cache hit = 0ms latency
Mitigation:
- Topic extraction cached per unique query (MD5 hash)
- Cache size limited to 100 entries (FIFO eviction)
- Keywords extraction kept simple (no LLM, minimal latency)
API Costs
Per Request:
- Topic extraction: ~50-100 tokens
- Topic continuity: ~100-150 tokens
- Total: ~150-250 tokens per request (first time)
- Cached requests: 0 tokens
Monthly Estimate (assuming 1000 unique queries/day):
- First requests: ~150-250k tokens/day = ~4.5-7.5M tokens/month
- Subsequent requests: Cached, 0 tokens
- Actual usage depends on cache hit rate
Error Handling
Fallback Mechanisms
Topic Extraction:
- If LLM unavailable: Falls back to first 4 words of query
- If LLM error: Logs error, returns fallback
- Cache miss handling: Generates and caches
Topic Continuity:
- If LLM unavailable: Returns "Topic continuity analysis unavailable"
- If no context: Returns "No previous context"
- If LLM error: Logs error, returns "Topic continuity analysis failed"
Keywords:
- Simple extraction, no LLM dependency
- Error handling: Returns "General terms" on exception
Testing Recommendations
Unit Tests
Topic Extraction:
- Test LLM-based extraction with various queries
- Test caching behavior (cache hit/miss)
- Test fallback behavior when LLM unavailable
- Test context-aware extraction
Topic Continuity:
- Test continuation detection
- Test new topic detection
- Test with empty context
- Test format validation
Integration Tests:
- Test full request flow with LLM calls
- Test cache persistence across requests
- Test error recovery with LLM failures
Performance Tests
Latency Measurement:
- Measure average latency with LLM calls
- Measure latency with cache hits
- Compare to previous pattern-based approach
Cache Effectiveness:
- Measure cache hit rate
- Test cache eviction behavior
Migration Notes
Breaking Changes
None: All changes are internal to orchestrator. External API unchanged.
Compatibility
- LLM Router Required: System requires
llm_routerto be available - Graceful Degradation: Falls back to simple extraction if LLM unavailable
- Backward Compatible: Old pattern-based code removed, but fallbacks maintain functionality
Benefits Realized
✅ Accurate Topic Classification: LLM understands context, synonyms, nuances
✅ Domain Adaptive: Works for any domain without code changes
✅ Context-Aware: Uses session_context and interaction_contexts
✅ Human-Readable: Maintains descriptive reasoning chain strings
✅ Scalable: No manual keyword list maintenance
✅ Cached: Reduces API calls for repeated queries
Trade-offs
⚠️ Latency: Adds ~200-500ms per request (first time, cached after)
⚠️ API Costs: ~150-250 tokens per request (first time)
⚠️ LLM Dependency: Requires LLM router to be functional
⚠️ Complexity: More code to maintain (async handling, caching, error handling)
⚠️ Inconsistency Risk: LLM responses may vary slightly (mitigated by temperature=0.3)
Files Modified
src/orchestrator_engine.py:- Added topic cache infrastructure
- Rewrote
_extract_main_topic()to use LLM - Rewrote
_analyze_topic_continuity()to use LLM - Updated
_extract_keywords()to async - Updated all 18+ usage sites to use cached
main_topic - Updated
_generate_alternative_paths()signature
Next Steps
- Monitor Performance: Track latency and cache hit rates
- Tune Caching: Adjust cache size based on usage patterns
- Optional Enhancements:
- Consider LLM-based keyword extraction if needed
- Add topic extraction metrics/logging
- Implement cache persistence across restarts
Conclusion
Option 2 implementation complete. System now uses LLM-based zero-shot classification for topic extraction and continuity analysis, providing accurate, context-aware topic classification without hardcoded patterns. Caching minimizes latency and API costs for repeated queries.