Spaces:

JatinAutonomousLabs
/

Research_AI_Assistant

Sleeping

App Files Files Community

Research_AI_Assistant / LLM_BASED_TOPIC_EXTRACTION_IMPLEMENTATION.md

JatsTheAIGen

cache key error when user id changes -fixed task 1 31_10_2025 v6

93f44e2 about 1 month ago

preview code

raw

history blame contribute delete

8.79 kB

LLM-Based Topic Extraction Implementation (Option 2)

Summary

Successfully implemented Option 2: LLM-based zero-shot classification for topic extraction and continuity analysis, replacing hardcoded pattern matching.

Changes Implemented

1. Topic Cache Infrastructure

Location: src/orchestrator_engine.py - __init__() (lines 34-36)

Added:

# Cache for topic extraction to reduce API calls
self._topic_cache = {}
self._topic_cache_max_size = 100  # Limit cache size

Purpose: Cache topic extraction results to minimize LLM API calls for identical queries.

2. LLM-Based Topic Extraction

Location: src/orchestrator_engine.py - _extract_main_topic() (lines 1276-1343)

Changes:

Method signature: Changed to async def _extract_main_topic(self, user_input: str, context: dict = None) -> str
Implementation: Uses LLM zero-shot classification instead of hardcoded keywords
Context-aware: Uses session_context and interaction_contexts from cache when available
Caching: Implements cache with FIFO eviction (max 100 entries)
Fallback: Falls back to simple word extraction if LLM unavailable

LLM Prompt:

Classify the main topic of this query in 2-5 words. Be specific and concise.

Query: "{user_input}"
[Session context if available]

Respond with ONLY the topic name (e.g., "Machine Learning", "Healthcare Analytics").

Temperature: 0.3 (for consistency)

3. LLM-Based Topic Continuity Analysis

Location: src/orchestrator_engine.py - _analyze_topic_continuity() (lines 1029-1094)

Changes:

Method signature: Changed to async def _analyze_topic_continuity(self, context: dict, user_input: str) -> str
Implementation: Uses LLM to determine if query continues previous topic or introduces new topic
Context-aware: Uses session_context and interaction_contexts from cache
Format validation: Validates LLM response format ("Continuing X" or "New topic: X")
Fallback: Returns descriptive message if LLM unavailable

LLM Prompt:

Determine if the current query continues the previous conversation topic or introduces a new topic.

Session Summary: {session_summary}
Recent Interactions: {recent_interactions}

Current Query: "{user_input}"

Respond with EXACTLY one of:
- "Continuing [topic name] discussion" if same topic
- "New topic: [topic name]" if different topic

Temperature: 0.3 (for consistency)

4. Keyword Extraction Update

Location: src/orchestrator_engine.py - _extract_keywords() (lines 1345-1361)

Changes:

Method signature: Changed to async def _extract_keywords(self, user_input: str) -> str
Implementation: Simple regex-based extraction (not LLM-based for performance)
Stop word filtering: Filters common stop words
Note: Can be enhanced with LLM if needed, but kept simple for performance

5. Updated All Usage Sites

Location: src/orchestrator_engine.py - process_request() (lines 184-200)

Changes:

Extract topic once: main_topic = await self._extract_main_topic(user_input, context)
Extract continuity: topic_continuity = await self._analyze_topic_continuity(context, user_input)
Extract keywords: query_keywords = await self._extract_keywords(user_input)
Reuse main_topic: All 18+ usage sites now use the main_topic variable instead of calling method repeatedly

Updated Reasoning Chain Steps:

Step 1: Uses main_topic (line 190)
Step 2: Uses main_topic (line 251, 259)
Step 3: Uses main_topic (line 268, 276)
Step 4: Uses main_topic (line 304, 312)
Step 5: Uses main_topic (line 384, 392)
Alternative paths: Uses main_topic (line 403, 1146-1166)

Error Recovery: Simplified to avoid async complexity (line 1733)

6. Alternative Paths Method Update

Location: src/orchestrator_engine.py - _generate_alternative_paths() (lines 1136-1169)

Changes:

Method signature: Added main_topic parameter
Before: def _generate_alternative_paths(self, intent_result: dict, user_input: str) -> list:
After: def _generate_alternative_paths(self, intent_result: dict, user_input: str, main_topic: str) -> list:
Updated call site: Line 403 passes main_topic as third parameter

Performance Characteristics

Latency Impact

Per Request:

2 LLM calls per request (topic extraction + continuity analysis)
Estimated latency: ~200-500ms total (depending on LLM router)
Caching reduces repeat calls: Cache hit = 0ms latency

Mitigation:

Topic extraction cached per unique query (MD5 hash)
Cache size limited to 100 entries (FIFO eviction)
Keywords extraction kept simple (no LLM, minimal latency)

API Costs

Per Request:

Topic extraction: ~50-100 tokens
Topic continuity: ~100-150 tokens
Total: ~150-250 tokens per request (first time)
Cached requests: 0 tokens

Monthly Estimate (assuming 1000 unique queries/day):

First requests: ~150-250k tokens/day = ~4.5-7.5M tokens/month
Subsequent requests: Cached, 0 tokens
Actual usage depends on cache hit rate

Error Handling

Fallback Mechanisms

Topic Extraction:
- If LLM unavailable: Falls back to first 4 words of query
- If LLM error: Logs error, returns fallback
- Cache miss handling: Generates and caches
Topic Continuity:
- If LLM unavailable: Returns "Topic continuity analysis unavailable"
- If no context: Returns "No previous context"
- If LLM error: Logs error, returns "Topic continuity analysis failed"
Keywords:
- Simple extraction, no LLM dependency
- Error handling: Returns "General terms" on exception

Testing Recommendations

Unit Tests

Topic Extraction:
- Test LLM-based extraction with various queries
- Test caching behavior (cache hit/miss)
- Test fallback behavior when LLM unavailable
- Test context-aware extraction
Topic Continuity:
- Test continuation detection
- Test new topic detection
- Test with empty context
- Test format validation
Integration Tests:
- Test full request flow with LLM calls
- Test cache persistence across requests
- Test error recovery with LLM failures

Performance Tests

Latency Measurement:
- Measure average latency with LLM calls
- Measure latency with cache hits
- Compare to previous pattern-based approach
Cache Effectiveness:
- Measure cache hit rate
- Test cache eviction behavior

Migration Notes

Breaking Changes

None: All changes are internal to orchestrator. External API unchanged.

Compatibility

LLM Router Required: System requires llm_router to be available
Graceful Degradation: Falls back to simple extraction if LLM unavailable
Backward Compatible: Old pattern-based code removed, but fallbacks maintain functionality

Benefits Realized

✅ Accurate Topic Classification: LLM understands context, synonyms, nuances
✅ Domain Adaptive: Works for any domain without code changes
✅ Context-Aware: Uses session_context and interaction_contexts
✅ Human-Readable: Maintains descriptive reasoning chain strings
✅ Scalable: No manual keyword list maintenance
✅ Cached: Reduces API calls for repeated queries

Trade-offs

⚠️ Latency: Adds ~200-500ms per request (first time, cached after)
⚠️ API Costs: ~150-250 tokens per request (first time)
⚠️ LLM Dependency: Requires LLM router to be functional
⚠️ Complexity: More code to maintain (async handling, caching, error handling)
⚠️ Inconsistency Risk: LLM responses may vary slightly (mitigated by temperature=0.3)

Files Modified

src/orchestrator_engine.py:
- Added topic cache infrastructure
- Rewrote _extract_main_topic() to use LLM
- Rewrote _analyze_topic_continuity() to use LLM
- Updated _extract_keywords() to async
- Updated all 18+ usage sites to use cached main_topic
- Updated _generate_alternative_paths() signature

Next Steps

Monitor Performance: Track latency and cache hit rates
Tune Caching: Adjust cache size based on usage patterns
Optional Enhancements:
- Consider LLM-based keyword extraction if needed
- Add topic extraction metrics/logging
- Implement cache persistence across restarts

Conclusion

Option 2 implementation complete. System now uses LLM-based zero-shot classification for topic extraction and continuity analysis, providing accurate, context-aware topic classification without hardcoded patterns. Caching minimizes latency and API costs for repeated queries.