File size: 8,786 Bytes
93f44e2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 |
# LLM-Based Topic Extraction Implementation (Option 2)
## Summary
Successfully implemented Option 2: LLM-based zero-shot classification for topic extraction and continuity analysis, replacing hardcoded pattern matching.
## Changes Implemented
### 1. Topic Cache Infrastructure
**Location**: `src/orchestrator_engine.py` - `__init__()` (lines 34-36)
**Added**:
```python
# Cache for topic extraction to reduce API calls
self._topic_cache = {}
self._topic_cache_max_size = 100 # Limit cache size
```
**Purpose**: Cache topic extraction results to minimize LLM API calls for identical queries.
---
### 2. LLM-Based Topic Extraction
**Location**: `src/orchestrator_engine.py` - `_extract_main_topic()` (lines 1276-1343)
**Changes**:
- **Method signature**: Changed to `async def _extract_main_topic(self, user_input: str, context: dict = None) -> str`
- **Implementation**: Uses LLM zero-shot classification instead of hardcoded keywords
- **Context-aware**: Uses session_context and interaction_contexts from cache when available
- **Caching**: Implements cache with FIFO eviction (max 100 entries)
- **Fallback**: Falls back to simple word extraction if LLM unavailable
**LLM Prompt**:
```
Classify the main topic of this query in 2-5 words. Be specific and concise.
Query: "{user_input}"
[Session context if available]
Respond with ONLY the topic name (e.g., "Machine Learning", "Healthcare Analytics").
```
**Temperature**: 0.3 (for consistency)
---
### 3. LLM-Based Topic Continuity Analysis
**Location**: `src/orchestrator_engine.py` - `_analyze_topic_continuity()` (lines 1029-1094)
**Changes**:
- **Method signature**: Changed to `async def _analyze_topic_continuity(self, context: dict, user_input: str) -> str`
- **Implementation**: Uses LLM to determine if query continues previous topic or introduces new topic
- **Context-aware**: Uses session_context and interaction_contexts from cache
- **Format validation**: Validates LLM response format ("Continuing X" or "New topic: X")
- **Fallback**: Returns descriptive message if LLM unavailable
**LLM Prompt**:
```
Determine if the current query continues the previous conversation topic or introduces a new topic.
Session Summary: {session_summary}
Recent Interactions: {recent_interactions}
Current Query: "{user_input}"
Respond with EXACTLY one of:
- "Continuing [topic name] discussion" if same topic
- "New topic: [topic name]" if different topic
```
**Temperature**: 0.3 (for consistency)
---
### 4. Keyword Extraction Update
**Location**: `src/orchestrator_engine.py` - `_extract_keywords()` (lines 1345-1361)
**Changes**:
- **Method signature**: Changed to `async def _extract_keywords(self, user_input: str) -> str`
- **Implementation**: Simple regex-based extraction (not LLM-based for performance)
- **Stop word filtering**: Filters common stop words
- **Note**: Can be enhanced with LLM if needed, but kept simple for performance
---
### 5. Updated All Usage Sites
**Location**: `src/orchestrator_engine.py` - `process_request()` (lines 184-200)
**Changes**:
- **Extract topic once**: `main_topic = await self._extract_main_topic(user_input, context)`
- **Extract continuity**: `topic_continuity = await self._analyze_topic_continuity(context, user_input)`
- **Extract keywords**: `query_keywords = await self._extract_keywords(user_input)`
- **Reuse main_topic**: All 18+ usage sites now use the `main_topic` variable instead of calling method repeatedly
**Updated Reasoning Chain Steps**:
- Step 1: Uses `main_topic` (line 190)
- Step 2: Uses `main_topic` (line 251, 259)
- Step 3: Uses `main_topic` (line 268, 276)
- Step 4: Uses `main_topic` (line 304, 312)
- Step 5: Uses `main_topic` (line 384, 392)
- Alternative paths: Uses `main_topic` (line 403, 1146-1166)
**Error Recovery**: Simplified to avoid async complexity (line 1733)
---
### 6. Alternative Paths Method Update
**Location**: `src/orchestrator_engine.py` - `_generate_alternative_paths()` (lines 1136-1169)
**Changes**:
- **Method signature**: Added `main_topic` parameter
- **Before**: `def _generate_alternative_paths(self, intent_result: dict, user_input: str) -> list:`
- **After**: `def _generate_alternative_paths(self, intent_result: dict, user_input: str, main_topic: str) -> list:`
- **Updated call site**: Line 403 passes `main_topic` as third parameter
---
## Performance Characteristics
### Latency Impact
**Per Request**:
- 2 LLM calls per request (topic extraction + continuity analysis)
- Estimated latency: ~200-500ms total (depending on LLM router)
- Caching reduces repeat calls: Cache hit = 0ms latency
**Mitigation**:
- Topic extraction cached per unique query (MD5 hash)
- Cache size limited to 100 entries (FIFO eviction)
- Keywords extraction kept simple (no LLM, minimal latency)
### API Costs
**Per Request**:
- Topic extraction: ~50-100 tokens
- Topic continuity: ~100-150 tokens
- Total: ~150-250 tokens per request (first time)
- Cached requests: 0 tokens
**Monthly Estimate** (assuming 1000 unique queries/day):
- First requests: ~150-250k tokens/day = ~4.5-7.5M tokens/month
- Subsequent requests: Cached, 0 tokens
- Actual usage depends on cache hit rate
---
## Error Handling
### Fallback Mechanisms
1. **Topic Extraction**:
- If LLM unavailable: Falls back to first 4 words of query
- If LLM error: Logs error, returns fallback
- Cache miss handling: Generates and caches
2. **Topic Continuity**:
- If LLM unavailable: Returns "Topic continuity analysis unavailable"
- If no context: Returns "No previous context"
- If LLM error: Logs error, returns "Topic continuity analysis failed"
3. **Keywords**:
- Simple extraction, no LLM dependency
- Error handling: Returns "General terms" on exception
---
## Testing Recommendations
### Unit Tests
1. **Topic Extraction**:
- Test LLM-based extraction with various queries
- Test caching behavior (cache hit/miss)
- Test fallback behavior when LLM unavailable
- Test context-aware extraction
2. **Topic Continuity**:
- Test continuation detection
- Test new topic detection
- Test with empty context
- Test format validation
3. **Integration Tests**:
- Test full request flow with LLM calls
- Test cache persistence across requests
- Test error recovery with LLM failures
### Performance Tests
1. **Latency Measurement**:
- Measure average latency with LLM calls
- Measure latency with cache hits
- Compare to previous pattern-based approach
2. **Cache Effectiveness**:
- Measure cache hit rate
- Test cache eviction behavior
---
## Migration Notes
### Breaking Changes
**None**: All changes are internal to orchestrator. External API unchanged.
### Compatibility
- **LLM Router Required**: System requires `llm_router` to be available
- **Graceful Degradation**: Falls back to simple extraction if LLM unavailable
- **Backward Compatible**: Old pattern-based code removed, but fallbacks maintain functionality
---
## Benefits Realized
✅ **Accurate Topic Classification**: LLM understands context, synonyms, nuances
✅ **Domain Adaptive**: Works for any domain without code changes
✅ **Context-Aware**: Uses session_context and interaction_contexts
✅ **Human-Readable**: Maintains descriptive reasoning chain strings
✅ **Scalable**: No manual keyword list maintenance
✅ **Cached**: Reduces API calls for repeated queries
---
## Trade-offs
⚠️ **Latency**: Adds ~200-500ms per request (first time, cached after)
⚠️ **API Costs**: ~150-250 tokens per request (first time)
⚠️ **LLM Dependency**: Requires LLM router to be functional
⚠️ **Complexity**: More code to maintain (async handling, caching, error handling)
⚠️ **Inconsistency Risk**: LLM responses may vary slightly (mitigated by temperature=0.3)
---
## Files Modified
1. `src/orchestrator_engine.py`:
- Added topic cache infrastructure
- Rewrote `_extract_main_topic()` to use LLM
- Rewrote `_analyze_topic_continuity()` to use LLM
- Updated `_extract_keywords()` to async
- Updated all 18+ usage sites to use cached `main_topic`
- Updated `_generate_alternative_paths()` signature
---
## Next Steps
1. **Monitor Performance**: Track latency and cache hit rates
2. **Tune Caching**: Adjust cache size based on usage patterns
3. **Optional Enhancements**:
- Consider LLM-based keyword extraction if needed
- Add topic extraction metrics/logging
- Implement cache persistence across restarts
---
## Conclusion
Option 2 implementation complete. System now uses LLM-based zero-shot classification for topic extraction and continuity analysis, providing accurate, context-aware topic classification without hardcoded patterns. Caching minimizes latency and API costs for repeated queries.
|