Research_AI_Assistant / LLM_BASED_TOPIC_EXTRACTION_IMPLEMENTATION.md
JatsTheAIGen's picture
cache key error when user id changes -fixed task 1 31_10_2025 v6
93f44e2

LLM-Based Topic Extraction Implementation (Option 2)

Summary

Successfully implemented Option 2: LLM-based zero-shot classification for topic extraction and continuity analysis, replacing hardcoded pattern matching.

Changes Implemented

1. Topic Cache Infrastructure

Location: src/orchestrator_engine.py - __init__() (lines 34-36)

Added:

# Cache for topic extraction to reduce API calls
self._topic_cache = {}
self._topic_cache_max_size = 100  # Limit cache size

Purpose: Cache topic extraction results to minimize LLM API calls for identical queries.


2. LLM-Based Topic Extraction

Location: src/orchestrator_engine.py - _extract_main_topic() (lines 1276-1343)

Changes:

  • Method signature: Changed to async def _extract_main_topic(self, user_input: str, context: dict = None) -> str
  • Implementation: Uses LLM zero-shot classification instead of hardcoded keywords
  • Context-aware: Uses session_context and interaction_contexts from cache when available
  • Caching: Implements cache with FIFO eviction (max 100 entries)
  • Fallback: Falls back to simple word extraction if LLM unavailable

LLM Prompt:

Classify the main topic of this query in 2-5 words. Be specific and concise.

Query: "{user_input}"
[Session context if available]

Respond with ONLY the topic name (e.g., "Machine Learning", "Healthcare Analytics").

Temperature: 0.3 (for consistency)


3. LLM-Based Topic Continuity Analysis

Location: src/orchestrator_engine.py - _analyze_topic_continuity() (lines 1029-1094)

Changes:

  • Method signature: Changed to async def _analyze_topic_continuity(self, context: dict, user_input: str) -> str
  • Implementation: Uses LLM to determine if query continues previous topic or introduces new topic
  • Context-aware: Uses session_context and interaction_contexts from cache
  • Format validation: Validates LLM response format ("Continuing X" or "New topic: X")
  • Fallback: Returns descriptive message if LLM unavailable

LLM Prompt:

Determine if the current query continues the previous conversation topic or introduces a new topic.

Session Summary: {session_summary}
Recent Interactions: {recent_interactions}

Current Query: "{user_input}"

Respond with EXACTLY one of:
- "Continuing [topic name] discussion" if same topic
- "New topic: [topic name]" if different topic

Temperature: 0.3 (for consistency)


4. Keyword Extraction Update

Location: src/orchestrator_engine.py - _extract_keywords() (lines 1345-1361)

Changes:

  • Method signature: Changed to async def _extract_keywords(self, user_input: str) -> str
  • Implementation: Simple regex-based extraction (not LLM-based for performance)
  • Stop word filtering: Filters common stop words
  • Note: Can be enhanced with LLM if needed, but kept simple for performance

5. Updated All Usage Sites

Location: src/orchestrator_engine.py - process_request() (lines 184-200)

Changes:

  • Extract topic once: main_topic = await self._extract_main_topic(user_input, context)
  • Extract continuity: topic_continuity = await self._analyze_topic_continuity(context, user_input)
  • Extract keywords: query_keywords = await self._extract_keywords(user_input)
  • Reuse main_topic: All 18+ usage sites now use the main_topic variable instead of calling method repeatedly

Updated Reasoning Chain Steps:

  • Step 1: Uses main_topic (line 190)
  • Step 2: Uses main_topic (line 251, 259)
  • Step 3: Uses main_topic (line 268, 276)
  • Step 4: Uses main_topic (line 304, 312)
  • Step 5: Uses main_topic (line 384, 392)
  • Alternative paths: Uses main_topic (line 403, 1146-1166)

Error Recovery: Simplified to avoid async complexity (line 1733)


6. Alternative Paths Method Update

Location: src/orchestrator_engine.py - _generate_alternative_paths() (lines 1136-1169)

Changes:

  • Method signature: Added main_topic parameter
  • Before: def _generate_alternative_paths(self, intent_result: dict, user_input: str) -> list:
  • After: def _generate_alternative_paths(self, intent_result: dict, user_input: str, main_topic: str) -> list:
  • Updated call site: Line 403 passes main_topic as third parameter

Performance Characteristics

Latency Impact

Per Request:

  • 2 LLM calls per request (topic extraction + continuity analysis)
  • Estimated latency: ~200-500ms total (depending on LLM router)
  • Caching reduces repeat calls: Cache hit = 0ms latency

Mitigation:

  • Topic extraction cached per unique query (MD5 hash)
  • Cache size limited to 100 entries (FIFO eviction)
  • Keywords extraction kept simple (no LLM, minimal latency)

API Costs

Per Request:

  • Topic extraction: ~50-100 tokens
  • Topic continuity: ~100-150 tokens
  • Total: ~150-250 tokens per request (first time)
  • Cached requests: 0 tokens

Monthly Estimate (assuming 1000 unique queries/day):

  • First requests: ~150-250k tokens/day = ~4.5-7.5M tokens/month
  • Subsequent requests: Cached, 0 tokens
  • Actual usage depends on cache hit rate

Error Handling

Fallback Mechanisms

  1. Topic Extraction:

    • If LLM unavailable: Falls back to first 4 words of query
    • If LLM error: Logs error, returns fallback
    • Cache miss handling: Generates and caches
  2. Topic Continuity:

    • If LLM unavailable: Returns "Topic continuity analysis unavailable"
    • If no context: Returns "No previous context"
    • If LLM error: Logs error, returns "Topic continuity analysis failed"
  3. Keywords:

    • Simple extraction, no LLM dependency
    • Error handling: Returns "General terms" on exception

Testing Recommendations

Unit Tests

  1. Topic Extraction:

    • Test LLM-based extraction with various queries
    • Test caching behavior (cache hit/miss)
    • Test fallback behavior when LLM unavailable
    • Test context-aware extraction
  2. Topic Continuity:

    • Test continuation detection
    • Test new topic detection
    • Test with empty context
    • Test format validation
  3. Integration Tests:

    • Test full request flow with LLM calls
    • Test cache persistence across requests
    • Test error recovery with LLM failures

Performance Tests

  1. Latency Measurement:

    • Measure average latency with LLM calls
    • Measure latency with cache hits
    • Compare to previous pattern-based approach
  2. Cache Effectiveness:

    • Measure cache hit rate
    • Test cache eviction behavior

Migration Notes

Breaking Changes

None: All changes are internal to orchestrator. External API unchanged.

Compatibility

  • LLM Router Required: System requires llm_router to be available
  • Graceful Degradation: Falls back to simple extraction if LLM unavailable
  • Backward Compatible: Old pattern-based code removed, but fallbacks maintain functionality

Benefits Realized

Accurate Topic Classification: LLM understands context, synonyms, nuances
Domain Adaptive: Works for any domain without code changes
Context-Aware: Uses session_context and interaction_contexts
Human-Readable: Maintains descriptive reasoning chain strings
Scalable: No manual keyword list maintenance
Cached: Reduces API calls for repeated queries


Trade-offs

⚠️ Latency: Adds ~200-500ms per request (first time, cached after)
⚠️ API Costs: ~150-250 tokens per request (first time)
⚠️ LLM Dependency: Requires LLM router to be functional
⚠️ Complexity: More code to maintain (async handling, caching, error handling)
⚠️ Inconsistency Risk: LLM responses may vary slightly (mitigated by temperature=0.3)


Files Modified

  1. src/orchestrator_engine.py:
    • Added topic cache infrastructure
    • Rewrote _extract_main_topic() to use LLM
    • Rewrote _analyze_topic_continuity() to use LLM
    • Updated _extract_keywords() to async
    • Updated all 18+ usage sites to use cached main_topic
    • Updated _generate_alternative_paths() signature

Next Steps

  1. Monitor Performance: Track latency and cache hit rates
  2. Tune Caching: Adjust cache size based on usage patterns
  3. Optional Enhancements:
    • Consider LLM-based keyword extraction if needed
    • Add topic extraction metrics/logging
    • Implement cache persistence across restarts

Conclusion

Option 2 implementation complete. System now uses LLM-based zero-shot classification for topic extraction and continuity analysis, providing accurate, context-aware topic classification without hardcoded patterns. Caching minimizes latency and API costs for repeated queries.