Spaces:

JatinAutonomousLabs
/

Research_AI_Assistant

Sleeping

App Files Files Community

Research_AI_Assistant / LLM_BASED_TOPIC_EXTRACTION_IMPLEMENTATION.md

JatsTheAIGen

cache key error when user id changes -fixed task 1 31_10_2025 v6

93f44e2 about 1 month ago

preview code

raw

history blame contribute delete

8.79 kB

	# LLM-Based Topic Extraction Implementation (Option 2)

	## Summary

	Successfully implemented Option 2: LLM-based zero-shot classification for topic extraction and continuity analysis, replacing hardcoded pattern matching.

	## Changes Implemented

	### 1. Topic Cache Infrastructure

	Location: `src/orchestrator_engine.py` - `__init__()` (lines 34-36)

	Added:
	```python
	# Cache for topic extraction to reduce API calls
	self._topic_cache = {}
	self._topic_cache_max_size = 100 # Limit cache size
	```

	Purpose: Cache topic extraction results to minimize LLM API calls for identical queries.

	---

	### 2. LLM-Based Topic Extraction

	Location: `src/orchestrator_engine.py` - `_extract_main_topic()` (lines 1276-1343)

	Changes:
	- Method signature: Changed to `async def _extract_main_topic(self, user_input: str, context: dict = None) -> str`
	- Implementation: Uses LLM zero-shot classification instead of hardcoded keywords
	- Context-aware: Uses session_context and interaction_contexts from cache when available
	- Caching: Implements cache with FIFO eviction (max 100 entries)
	- Fallback: Falls back to simple word extraction if LLM unavailable

	LLM Prompt:
	```
	Classify the main topic of this query in 2-5 words. Be specific and concise.

	Query: "{user_input}"
	[Session context if available]

	Respond with ONLY the topic name (e.g., "Machine Learning", "Healthcare Analytics").
	```

	Temperature: 0.3 (for consistency)

	---

	### 3. LLM-Based Topic Continuity Analysis

	Location: `src/orchestrator_engine.py` - `_analyze_topic_continuity()` (lines 1029-1094)

	Changes:
	- Method signature: Changed to `async def _analyze_topic_continuity(self, context: dict, user_input: str) -> str`
	- Implementation: Uses LLM to determine if query continues previous topic or introduces new topic
	- Context-aware: Uses session_context and interaction_contexts from cache
	- Format validation: Validates LLM response format ("Continuing X" or "New topic: X")
	- Fallback: Returns descriptive message if LLM unavailable

	LLM Prompt:
	```
	Determine if the current query continues the previous conversation topic or introduces a new topic.

	Session Summary: {session_summary}
	Recent Interactions: {recent_interactions}

	Current Query: "{user_input}"

	Respond with EXACTLY one of:
	- "Continuing [topic name] discussion" if same topic
	- "New topic: [topic name]" if different topic
	```

	Temperature: 0.3 (for consistency)

	---

	### 4. Keyword Extraction Update

	Location: `src/orchestrator_engine.py` - `_extract_keywords()` (lines 1345-1361)

	Changes:
	- Method signature: Changed to `async def _extract_keywords(self, user_input: str) -> str`
	- Implementation: Simple regex-based extraction (not LLM-based for performance)
	- Stop word filtering: Filters common stop words
	- Note: Can be enhanced with LLM if needed, but kept simple for performance

	---

	### 5. Updated All Usage Sites

	Location: `src/orchestrator_engine.py` - `process_request()` (lines 184-200)

	Changes:
	- Extract topic once: `main_topic = await self._extract_main_topic(user_input, context)`
	- Extract continuity: `topic_continuity = await self._analyze_topic_continuity(context, user_input)`
	- Extract keywords: `query_keywords = await self._extract_keywords(user_input)`
	- Reuse main_topic: All 18+ usage sites now use the `main_topic` variable instead of calling method repeatedly

	Updated Reasoning Chain Steps:
	- Step 1: Uses `main_topic` (line 190)
	- Step 2: Uses `main_topic` (line 251, 259)
	- Step 3: Uses `main_topic` (line 268, 276)
	- Step 4: Uses `main_topic` (line 304, 312)
	- Step 5: Uses `main_topic` (line 384, 392)
	- Alternative paths: Uses `main_topic` (line 403, 1146-1166)

	Error Recovery: Simplified to avoid async complexity (line 1733)

	---

	### 6. Alternative Paths Method Update

	Location: `src/orchestrator_engine.py` - `_generate_alternative_paths()` (lines 1136-1169)

	Changes:
	- Method signature: Added `main_topic` parameter
	- Before: `def _generate_alternative_paths(self, intent_result: dict, user_input: str) -> list:`
	- After: `def _generate_alternative_paths(self, intent_result: dict, user_input: str, main_topic: str) -> list:`
	- Updated call site: Line 403 passes `main_topic` as third parameter

	---

	## Performance Characteristics

	### Latency Impact

	Per Request:
	- 2 LLM calls per request (topic extraction + continuity analysis)
	- Estimated latency: ~200-500ms total (depending on LLM router)
	- Caching reduces repeat calls: Cache hit = 0ms latency

	Mitigation:
	- Topic extraction cached per unique query (MD5 hash)
	- Cache size limited to 100 entries (FIFO eviction)
	- Keywords extraction kept simple (no LLM, minimal latency)

	### API Costs

	Per Request:
	- Topic extraction: ~50-100 tokens
	- Topic continuity: ~100-150 tokens
	- Total: ~150-250 tokens per request (first time)
	- Cached requests: 0 tokens

	Monthly Estimate (assuming 1000 unique queries/day):
	- First requests: ~150-250k tokens/day = ~4.5-7.5M tokens/month
	- Subsequent requests: Cached, 0 tokens
	- Actual usage depends on cache hit rate

	---

	## Error Handling

	### Fallback Mechanisms

	1. Topic Extraction:
	- If LLM unavailable: Falls back to first 4 words of query
	- If LLM error: Logs error, returns fallback
	- Cache miss handling: Generates and caches

	2. Topic Continuity:
	- If LLM unavailable: Returns "Topic continuity analysis unavailable"
	- If no context: Returns "No previous context"
	- If LLM error: Logs error, returns "Topic continuity analysis failed"

	3. Keywords:
	- Simple extraction, no LLM dependency
	- Error handling: Returns "General terms" on exception

	---

	## Testing Recommendations

	### Unit Tests

	1. Topic Extraction:
	- Test LLM-based extraction with various queries
	- Test caching behavior (cache hit/miss)
	- Test fallback behavior when LLM unavailable
	- Test context-aware extraction

	2. Topic Continuity:
	- Test continuation detection
	- Test new topic detection
	- Test with empty context
	- Test format validation

	3. Integration Tests:
	- Test full request flow with LLM calls
	- Test cache persistence across requests
	- Test error recovery with LLM failures

	### Performance Tests

	1. Latency Measurement:
	- Measure average latency with LLM calls
	- Measure latency with cache hits
	- Compare to previous pattern-based approach

	2. Cache Effectiveness:
	- Measure cache hit rate
	- Test cache eviction behavior

	---

	## Migration Notes

	### Breaking Changes

	None: All changes are internal to orchestrator. External API unchanged.

	### Compatibility

	- LLM Router Required: System requires `llm_router` to be available
	- Graceful Degradation: Falls back to simple extraction if LLM unavailable
	- Backward Compatible: Old pattern-based code removed, but fallbacks maintain functionality

	---

	## Benefits Realized

	✅ Accurate Topic Classification: LLM understands context, synonyms, nuances
	✅ Domain Adaptive: Works for any domain without code changes
	✅ Context-Aware: Uses session_context and interaction_contexts
	✅ Human-Readable: Maintains descriptive reasoning chain strings
	✅ Scalable: No manual keyword list maintenance
	✅ Cached: Reduces API calls for repeated queries

	---

	## Trade-offs

	⚠️ Latency: Adds ~200-500ms per request (first time, cached after)
	⚠️ API Costs: ~150-250 tokens per request (first time)
	⚠️ LLM Dependency: Requires LLM router to be functional
	⚠️ Complexity: More code to maintain (async handling, caching, error handling)
	⚠️ Inconsistency Risk: LLM responses may vary slightly (mitigated by temperature=0.3)

	---

	## Files Modified

	1. `src/orchestrator_engine.py`:
	- Added topic cache infrastructure
	- Rewrote `_extract_main_topic()` to use LLM
	- Rewrote `_analyze_topic_continuity()` to use LLM
	- Updated `_extract_keywords()` to async
	- Updated all 18+ usage sites to use cached `main_topic`
	- Updated `_generate_alternative_paths()` signature

	---

	## Next Steps

	1. Monitor Performance: Track latency and cache hit rates
	2. Tune Caching: Adjust cache size based on usage patterns
	3. Optional Enhancements:
	- Consider LLM-based keyword extraction if needed
	- Add topic extraction metrics/logging
	- Implement cache persistence across restarts

	---

	## Conclusion

	Option 2 implementation complete. System now uses LLM-based zero-shot classification for topic extraction and continuity analysis, providing accurate, context-aware topic classification without hardcoded patterns. Caching minimizes latency and API costs for repeated queries.