File size: 8,786 Bytes
93f44e2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
# LLM-Based Topic Extraction Implementation (Option 2)

## Summary

Successfully implemented Option 2: LLM-based zero-shot classification for topic extraction and continuity analysis, replacing hardcoded pattern matching.

## Changes Implemented

### 1. Topic Cache Infrastructure

**Location**: `src/orchestrator_engine.py` - `__init__()` (lines 34-36)

**Added**:
```python
# Cache for topic extraction to reduce API calls
self._topic_cache = {}
self._topic_cache_max_size = 100  # Limit cache size
```

**Purpose**: Cache topic extraction results to minimize LLM API calls for identical queries.

---

### 2. LLM-Based Topic Extraction

**Location**: `src/orchestrator_engine.py` - `_extract_main_topic()` (lines 1276-1343)

**Changes**:
- **Method signature**: Changed to `async def _extract_main_topic(self, user_input: str, context: dict = None) -> str`
- **Implementation**: Uses LLM zero-shot classification instead of hardcoded keywords
- **Context-aware**: Uses session_context and interaction_contexts from cache when available
- **Caching**: Implements cache with FIFO eviction (max 100 entries)
- **Fallback**: Falls back to simple word extraction if LLM unavailable

**LLM Prompt**:
```
Classify the main topic of this query in 2-5 words. Be specific and concise.

Query: "{user_input}"
[Session context if available]

Respond with ONLY the topic name (e.g., "Machine Learning", "Healthcare Analytics").
```

**Temperature**: 0.3 (for consistency)

---

### 3. LLM-Based Topic Continuity Analysis

**Location**: `src/orchestrator_engine.py` - `_analyze_topic_continuity()` (lines 1029-1094)

**Changes**:
- **Method signature**: Changed to `async def _analyze_topic_continuity(self, context: dict, user_input: str) -> str`
- **Implementation**: Uses LLM to determine if query continues previous topic or introduces new topic
- **Context-aware**: Uses session_context and interaction_contexts from cache
- **Format validation**: Validates LLM response format ("Continuing X" or "New topic: X")
- **Fallback**: Returns descriptive message if LLM unavailable

**LLM Prompt**:
```
Determine if the current query continues the previous conversation topic or introduces a new topic.

Session Summary: {session_summary}
Recent Interactions: {recent_interactions}

Current Query: "{user_input}"

Respond with EXACTLY one of:
- "Continuing [topic name] discussion" if same topic
- "New topic: [topic name]" if different topic
```

**Temperature**: 0.3 (for consistency)

---

### 4. Keyword Extraction Update

**Location**: `src/orchestrator_engine.py` - `_extract_keywords()` (lines 1345-1361)

**Changes**:
- **Method signature**: Changed to `async def _extract_keywords(self, user_input: str) -> str`
- **Implementation**: Simple regex-based extraction (not LLM-based for performance)
- **Stop word filtering**: Filters common stop words
- **Note**: Can be enhanced with LLM if needed, but kept simple for performance

---

### 5. Updated All Usage Sites

**Location**: `src/orchestrator_engine.py` - `process_request()` (lines 184-200)

**Changes**:
- **Extract topic once**: `main_topic = await self._extract_main_topic(user_input, context)`
- **Extract continuity**: `topic_continuity = await self._analyze_topic_continuity(context, user_input)`
- **Extract keywords**: `query_keywords = await self._extract_keywords(user_input)`
- **Reuse main_topic**: All 18+ usage sites now use the `main_topic` variable instead of calling method repeatedly

**Updated Reasoning Chain Steps**:
- Step 1: Uses `main_topic` (line 190)
- Step 2: Uses `main_topic` (line 251, 259)
- Step 3: Uses `main_topic` (line 268, 276)
- Step 4: Uses `main_topic` (line 304, 312)
- Step 5: Uses `main_topic` (line 384, 392)
- Alternative paths: Uses `main_topic` (line 403, 1146-1166)

**Error Recovery**: Simplified to avoid async complexity (line 1733)

---

### 6. Alternative Paths Method Update

**Location**: `src/orchestrator_engine.py` - `_generate_alternative_paths()` (lines 1136-1169)

**Changes**:
- **Method signature**: Added `main_topic` parameter
- **Before**: `def _generate_alternative_paths(self, intent_result: dict, user_input: str) -> list:`
- **After**: `def _generate_alternative_paths(self, intent_result: dict, user_input: str, main_topic: str) -> list:`
- **Updated call site**: Line 403 passes `main_topic` as third parameter

---

## Performance Characteristics

### Latency Impact

**Per Request**:
- 2 LLM calls per request (topic extraction + continuity analysis)
- Estimated latency: ~200-500ms total (depending on LLM router)
- Caching reduces repeat calls: Cache hit = 0ms latency

**Mitigation**:
- Topic extraction cached per unique query (MD5 hash)
- Cache size limited to 100 entries (FIFO eviction)
- Keywords extraction kept simple (no LLM, minimal latency)

### API Costs

**Per Request**:
- Topic extraction: ~50-100 tokens
- Topic continuity: ~100-150 tokens
- Total: ~150-250 tokens per request (first time)
- Cached requests: 0 tokens

**Monthly Estimate** (assuming 1000 unique queries/day):
- First requests: ~150-250k tokens/day = ~4.5-7.5M tokens/month
- Subsequent requests: Cached, 0 tokens
- Actual usage depends on cache hit rate

---

## Error Handling

### Fallback Mechanisms

1. **Topic Extraction**:
   - If LLM unavailable: Falls back to first 4 words of query
   - If LLM error: Logs error, returns fallback
   - Cache miss handling: Generates and caches

2. **Topic Continuity**:
   - If LLM unavailable: Returns "Topic continuity analysis unavailable"
   - If no context: Returns "No previous context"
   - If LLM error: Logs error, returns "Topic continuity analysis failed"

3. **Keywords**:
   - Simple extraction, no LLM dependency
   - Error handling: Returns "General terms" on exception

---

## Testing Recommendations

### Unit Tests

1. **Topic Extraction**:
   - Test LLM-based extraction with various queries
   - Test caching behavior (cache hit/miss)
   - Test fallback behavior when LLM unavailable
   - Test context-aware extraction

2. **Topic Continuity**:
   - Test continuation detection
   - Test new topic detection
   - Test with empty context
   - Test format validation

3. **Integration Tests**:
   - Test full request flow with LLM calls
   - Test cache persistence across requests
   - Test error recovery with LLM failures

### Performance Tests

1. **Latency Measurement**:
   - Measure average latency with LLM calls
   - Measure latency with cache hits
   - Compare to previous pattern-based approach

2. **Cache Effectiveness**:
   - Measure cache hit rate
   - Test cache eviction behavior

---

## Migration Notes

### Breaking Changes

**None**: All changes are internal to orchestrator. External API unchanged.

### Compatibility

- **LLM Router Required**: System requires `llm_router` to be available
- **Graceful Degradation**: Falls back to simple extraction if LLM unavailable
- **Backward Compatible**: Old pattern-based code removed, but fallbacks maintain functionality

---

## Benefits Realized

✅ **Accurate Topic Classification**: LLM understands context, synonyms, nuances  
✅ **Domain Adaptive**: Works for any domain without code changes  
✅ **Context-Aware**: Uses session_context and interaction_contexts  
✅ **Human-Readable**: Maintains descriptive reasoning chain strings  
✅ **Scalable**: No manual keyword list maintenance  
✅ **Cached**: Reduces API calls for repeated queries  

---

## Trade-offs

⚠️ **Latency**: Adds ~200-500ms per request (first time, cached after)  
⚠️ **API Costs**: ~150-250 tokens per request (first time)  
⚠️ **LLM Dependency**: Requires LLM router to be functional  
⚠️ **Complexity**: More code to maintain (async handling, caching, error handling)  
⚠️ **Inconsistency Risk**: LLM responses may vary slightly (mitigated by temperature=0.3)  

---

## Files Modified

1. `src/orchestrator_engine.py`:
   - Added topic cache infrastructure
   - Rewrote `_extract_main_topic()` to use LLM
   - Rewrote `_analyze_topic_continuity()` to use LLM
   - Updated `_extract_keywords()` to async
   - Updated all 18+ usage sites to use cached `main_topic`
   - Updated `_generate_alternative_paths()` signature

---

## Next Steps

1. **Monitor Performance**: Track latency and cache hit rates
2. **Tune Caching**: Adjust cache size based on usage patterns
3. **Optional Enhancements**:
   - Consider LLM-based keyword extraction if needed
   - Add topic extraction metrics/logging
   - Implement cache persistence across restarts

---

## Conclusion

Option 2 implementation complete. System now uses LLM-based zero-shot classification for topic extraction and continuity analysis, providing accurate, context-aware topic classification without hardcoded patterns. Caching minimizes latency and API costs for repeated queries.