# LLM-Based Topic Extraction Implementation (Option 2)

## Summary

Successfully implemented Option 2: LLM-based zero-shot classification for topic extraction and continuity analysis, replacing hardcoded pattern matching.

## Changes Implemented

### 1. Topic Cache Infrastructure

**Location**: `src/orchestrator_engine.py` - `__init__()` (lines 34-36)

**Added**:
```python
# Cache for topic extraction to reduce API calls
self._topic_cache = {}
self._topic_cache_max_size = 100  # Limit cache size
```

**Purpose**: Cache topic extraction results to minimize LLM API calls for identical queries.

---

### 2. LLM-Based Topic Extraction

**Location**: `src/orchestrator_engine.py` - `_extract_main_topic()` (lines 1276-1343)

**Changes**:
- **Method signature**: Changed to `async def _extract_main_topic(self, user_input: str, context: dict = None) -> str`
- **Implementation**: Uses LLM zero-shot classification instead of hardcoded keywords
- **Context-aware**: Uses session_context and interaction_contexts from cache when available
- **Caching**: Implements cache with FIFO eviction (max 100 entries)
- **Fallback**: Falls back to simple word extraction if LLM unavailable

**LLM Prompt**:
```
Classify the main topic of this query in 2-5 words. Be specific and concise.

Query: "{user_input}"
[Session context if available]

Respond with ONLY the topic name (e.g., "Machine Learning", "Healthcare Analytics").
```

**Temperature**: 0.3 (for consistency)

---

### 3. LLM-Based Topic Continuity Analysis

**Location**: `src/orchestrator_engine.py` - `_analyze_topic_continuity()` (lines 1029-1094)

**Changes**:
- **Method signature**: Changed to `async def _analyze_topic_continuity(self, context: dict, user_input: str) -> str`
- **Implementation**: Uses LLM to determine if query continues previous topic or introduces new topic
- **Context-aware**: Uses session_context and interaction_contexts from cache
- **Format validation**: Validates LLM response format ("Continuing X" or "New topic: X")
- **Fallback**: Returns descriptive message if LLM unavailable

**LLM Prompt**:
```
Determine if the current query continues the previous conversation topic or introduces a new topic.

Session Summary: {session_summary}
Recent Interactions: {recent_interactions}

Current Query: "{user_input}"

Respond with EXACTLY one of:
- "Continuing [topic name] discussion" if same topic
- "New topic: [topic name]" if different topic
```

**Temperature**: 0.3 (for consistency)

---

### 4. Keyword Extraction Update

**Location**: `src/orchestrator_engine.py` - `_extract_keywords()` (lines 1345-1361)

**Changes**:
- **Method signature**: Changed to `async def _extract_keywords(self, user_input: str) -> str`
- **Implementation**: Simple regex-based extraction (not LLM-based for performance)
- **Stop word filtering**: Filters common stop words
- **Note**: Can be enhanced with LLM if needed, but kept simple for performance

---

### 5. Updated All Usage Sites

**Location**: `src/orchestrator_engine.py` - `process_request()` (lines 184-200)

**Changes**:
- **Extract topic once**: `main_topic = await self._extract_main_topic(user_input, context)`
- **Extract continuity**: `topic_continuity = await self._analyze_topic_continuity(context, user_input)`
- **Extract keywords**: `query_keywords = await self._extract_keywords(user_input)`
- **Reuse main_topic**: All 18+ usage sites now use the `main_topic` variable instead of calling method repeatedly

**Updated Reasoning Chain Steps**:
- Step 1: Uses `main_topic` (line 190)
- Step 2: Uses `main_topic` (line 251, 259)
- Step 3: Uses `main_topic` (line 268, 276)
- Step 4: Uses `main_topic` (line 304, 312)
- Step 5: Uses `main_topic` (line 384, 392)
- Alternative paths: Uses `main_topic` (line 403, 1146-1166)

**Error Recovery**: Simplified to avoid async complexity (line 1733)

---

### 6. Alternative Paths Method Update

**Location**: `src/orchestrator_engine.py` - `_generate_alternative_paths()` (lines 1136-1169)

**Changes**:
- **Method signature**: Added `main_topic` parameter
- **Before**: `def _generate_alternative_paths(self, intent_result: dict, user_input: str) -> list:`
- **After**: `def _generate_alternative_paths(self, intent_result: dict, user_input: str, main_topic: str) -> list:`
- **Updated call site**: Line 403 passes `main_topic` as third parameter

---

## Performance Characteristics

### Latency Impact

**Per Request**:
- 2 LLM calls per request (topic extraction + continuity analysis)
- Estimated latency: ~200-500ms total (depending on LLM router)
- Caching reduces repeat calls: Cache hit = 0ms latency

**Mitigation**:
- Topic extraction cached per unique query (MD5 hash)
- Cache size limited to 100 entries (FIFO eviction)
- Keywords extraction kept simple (no LLM, minimal latency)

### API Costs

**Per Request**:
- Topic extraction: ~50-100 tokens
- Topic continuity: ~100-150 tokens
- Total: ~150-250 tokens per request (first time)
- Cached requests: 0 tokens

**Monthly Estimate** (assuming 1000 unique queries/day):
- First requests: ~150-250k tokens/day = ~4.5-7.5M tokens/month
- Subsequent requests: Cached, 0 tokens
- Actual usage depends on cache hit rate

---

## Error Handling

### Fallback Mechanisms

1. **Topic Extraction**:
   - If LLM unavailable: Falls back to first 4 words of query
   - If LLM error: Logs error, returns fallback
   - Cache miss handling: Generates and caches

2. **Topic Continuity**:
   - If LLM unavailable: Returns "Topic continuity analysis unavailable"
   - If no context: Returns "No previous context"
   - If LLM error: Logs error, returns "Topic continuity analysis failed"

3. **Keywords**:
   - Simple extraction, no LLM dependency
   - Error handling: Returns "General terms" on exception

---

## Testing Recommendations

### Unit Tests

1. **Topic Extraction**:
   - Test LLM-based extraction with various queries
   - Test caching behavior (cache hit/miss)
   - Test fallback behavior when LLM unavailable
   - Test context-aware extraction

2. **Topic Continuity**:
   - Test continuation detection
   - Test new topic detection
   - Test with empty context
   - Test format validation

3. **Integration Tests**:
   - Test full request flow with LLM calls
   - Test cache persistence across requests
   - Test error recovery with LLM failures

### Performance Tests

1. **Latency Measurement**:
   - Measure average latency with LLM calls
   - Measure latency with cache hits
   - Compare to previous pattern-based approach

2. **Cache Effectiveness**:
   - Measure cache hit rate
   - Test cache eviction behavior

---

## Migration Notes

### Breaking Changes

**None**: All changes are internal to orchestrator. External API unchanged.

### Compatibility

- **LLM Router Required**: System requires `llm_router` to be available
- **Graceful Degradation**: Falls back to simple extraction if LLM unavailable
- **Backward Compatible**: Old pattern-based code removed, but fallbacks maintain functionality

---

## Benefits Realized

✅ **Accurate Topic Classification**: LLM understands context, synonyms, nuances  
✅ **Domain Adaptive**: Works for any domain without code changes  
✅ **Context-Aware**: Uses session_context and interaction_contexts  
✅ **Human-Readable**: Maintains descriptive reasoning chain strings  
✅ **Scalable**: No manual keyword list maintenance  
✅ **Cached**: Reduces API calls for repeated queries  

---

## Trade-offs

⚠️ **Latency**: Adds ~200-500ms per request (first time, cached after)  
⚠️ **API Costs**: ~150-250 tokens per request (first time)  
⚠️ **LLM Dependency**: Requires LLM router to be functional  
⚠️ **Complexity**: More code to maintain (async handling, caching, error handling)  
⚠️ **Inconsistency Risk**: LLM responses may vary slightly (mitigated by temperature=0.3)  

---

## Files Modified

1. `src/orchestrator_engine.py`:
   - Added topic cache infrastructure
   - Rewrote `_extract_main_topic()` to use LLM
   - Rewrote `_analyze_topic_continuity()` to use LLM
   - Updated `_extract_keywords()` to async
   - Updated all 18+ usage sites to use cached `main_topic`
   - Updated `_generate_alternative_paths()` signature

---

## Next Steps

1. **Monitor Performance**: Track latency and cache hit rates
2. **Tune Caching**: Adjust cache size based on usage patterns
3. **Optional Enhancements**:
   - Consider LLM-based keyword extraction if needed
   - Add topic extraction metrics/logging
   - Implement cache persistence across restarts

---

## Conclusion

Option 2 implementation complete. System now uses LLM-based zero-shot classification for topic extraction and continuity analysis, providing accurate, context-aware topic classification without hardcoded patterns. Caching minimizes latency and API costs for repeated queries.