File size: 10,042 Bytes
93f44e2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 |
# Interaction Context Retrieval Failure - Root Cause Analysis
## Executive Summary
Interaction contexts are being **stored correctly** in the database, but are **not being retrieved** on subsequent requests due to a cache invalidation failure. The system returns stale cached context that doesn't include newly generated interaction contexts.
## Problem Statement
When a user submits a request referencing previous context (e.g., "based on above inputs"), the system reports `Context retrieved: 0 interaction contexts`, causing:
- Loss of conversation continuity
- Responses generated for wrong topics
- Previous interaction context unavailable to agents
## Root Cause Analysis
### The Caching Flow
The system uses a two-tier caching mechanism:
1. **Context Manager Cache** (`src/context_manager.py`):
- Key: `session_{session_id}`
- Storage: `self.session_cache` dictionary
- Purpose: Cache session context to avoid database queries
2. **Orchestrator Cache** (`src/orchestrator_engine.py`):
- Key: `context_{session_id}`
- Storage: `self._context_cache` dictionary
- TTL: 5 seconds
- Purpose: Prevent rapid repeated context retrieval within same request processing
### The Failure Sequence
#### **First Request (Working - Context Storage)**:
```
1. User: "Tell me about Excel handling"
2. orchestrator.process_request() called
3. _get_or_create_context() checks orchestrator cache β MISS (empty)
4. Calls context_manager.manage_context()
5. manage_context() checks session_cache β MISS (empty)
6. Calls _retrieve_from_db()
7. Database query: SELECT interaction_summary FROM interaction_contexts WHERE session_id = ?
β Returns 0 rows (new session)
8. Returns context: { interaction_contexts: [] }
9. Caches in session_cache: session_cache["session_cca279a4"] = { interaction_contexts: [] }
10. Response generated about Excel handling
11. generate_interaction_context() called
12. LLM generates 50-token summary
13. Database INSERT: INSERT INTO interaction_contexts (interaction_id, session_id, ...)
β β
SUCCESS: Interaction context stored in database
14. **CRITICAL MISSING STEP**: Cache NOT invalidated
```
#### **Second Request (Broken - Context Retrieval)**:
```
1. User: "Based on above inputs, create a prototype"
2. orchestrator.process_request() called
3. _get_or_create_context() checks orchestrator cache:
- If < 5 seconds old β Returns cached context (from step 1)
- OR continues to step 4
4. Calls context_manager.manage_context()
5. manage_context() checks session_cache:
session_cache.get("session_cca279a4")
β β
CACHE HIT: Returns cached context from first request
β Contains: { interaction_contexts: [] }
6. **NEVER queries database** because cache hit
7. Context returned with 0 interaction contexts
8. Logs show: "Context retrieved: 0 interaction contexts"
9. Intent agent receives empty context
10. Skills agent analyzes wrong topic
11. Response generated for wrong context (story generation, not Excel)
```
### Root Cause Identified
**PRIMARY ISSUE**: Cache Invalidation Failure
After `generate_interaction_context()` successfully stores an interaction context in the database, **the cache is never invalidated**. This causes:
1. **First Request**: Context cached with `interaction_contexts = []`
2. **Interaction Context Generated**: Stored in database β
3. **Cache Not Cleared**: `session_cache["session_{session_id}"]` still contains old context
4. **Second Request**: Cache hit returns stale context with 0 interaction contexts
5. **Database Never Queried**: Cache check happens before database query
**Location of Issue**:
- File: `src/orchestrator_engine.py`
- Method: `process_request()`
- Lines: 442-450 (after `generate_interaction_context()` call)
- **Missing**: Cache invalidation after interaction context generation
### Secondary Issues
#### Issue 2: Orchestrator-Level Cache Also Not Cleared
The orchestrator maintains its own cache (`_context_cache`) with a 5-second TTL. If requests come within 5 seconds:
- **Orchestrator cache hit**: Returns cached context immediately
- **Context manager never called**: Never checks session_cache or database
- **Result**: Even if session_cache were cleared, orchestrator cache would still return stale data
**Location**:
- File: `src/orchestrator_engine.py`
- Method: `_get_or_create_context()`
- Lines: 89-93
#### Issue 3: No Detection of Context Reference Mismatches
When a user explicitly references previous context (e.g., "based on above inputs"), but the system has 0 interaction contexts, there's no mechanism to:
1. Detect the mismatch
2. Force cache invalidation
3. Re-query the database
4. Warn about potential context loss
**Location**:
- File: `src/orchestrator_engine.py`
- Method: `process_request()`
- Lines: 172-174 (context retrieval happens, but no validation)
## Code Flow Analysis
### Storage Flow (Working)
```
orchestrator.process_request()
ββ> generate_interaction_context()
ββ> llm_router.route_inference() β Generate summary
ββ> Database INSERT β Store in interaction_contexts table
ββ> β
SUCCESS: Stored in database
ββ> β MISSING: Cache invalidation
```
### Retrieval Flow (Broken)
```
orchestrator.process_request()
ββ> _get_or_create_context()
ββ> Check orchestrator cache (5s TTL)
β ββ> If hit: Return cached (may be stale)
ββ> manage_context()
ββ> Check session_cache
β ββ> If hit: Return cached (STALE - has 0 contexts)
ββ> _retrieve_from_db() (NEVER REACHED if cache hit)
ββ> Query: SELECT FROM interaction_contexts WHERE session_id = ?
ββ> Would return stored contexts, but never called
```
## Database Verification
The interaction context **IS being stored** correctly. Evidence:
1. **Log Entry**:
```
2025-10-31 06:55:55,481 - src.context_manager - INFO - β Generated interaction context for 64d4ace2_15ca4dec_1761890055
```
2. **Storage Code** (src/context_manager.py:426-438):
```python
cursor.execute("""
INSERT OR REPLACE INTO interaction_contexts
(interaction_id, session_id, user_input, system_response, interaction_summary, created_at)
VALUES (?, ?, ?, ?, ?, ?)
""", (interaction_id, session_id, user_input[:500], system_response[:1000], summary.strip(), datetime.now().isoformat()))
conn.commit()
conn.close()
```
β
This executes successfully and commits
3. **Retrieval Code** (src/context_manager.py:656-671):
```python
cursor.execute("""
SELECT interaction_summary, created_at, needs_refresh
FROM interaction_contexts
WHERE session_id = ? AND (needs_refresh IS NULL OR needs_refresh = 0)
ORDER BY created_at DESC
LIMIT 20
""", (session_id,))
```
β
This query would work, but is never executed due to cache hit
## Cache Invalidation Points
Current cache invalidation only happens in these scenarios:
1. **Session End**: `end_session()` clears cache (line 534-536)
2. **User Change**: User mismatch detection clears cache (line 254-255)
3. **Never**: After generating interaction context β
## Expected vs Actual Behavior
### Expected Behavior:
```
Request 1 β Generate context β Store in DB β Clear cache
Request 2 β Cache miss β Query DB β Find stored context β Use it
```
### Actual Behavior:
```
Request 1 β Generate context β Store in DB β Keep cache (stale)
Request 2 β Cache hit β Return stale cache (0 contexts) β Never query DB
```
## Evidence from Logs
```
# First Request - Context Generation
2025-10-31 06:55:55,481 - src.context_manager - INFO - β Generated interaction context for 64d4ace2_15ca4dec_1761890055
# Second Request - Cache Hit (No DB Query)
2025-10-31 07:02:55,911 - src.context_manager - INFO - Context retrieved: 0 interaction contexts
```
**Time Gap**: 7 minutes between requests (well beyond 5-second orchestrator cache TTL)
**Result**: Still 0 contexts β Session cache hit, database never queried
## Impact Assessment
### Functional Impact:
- **HIGH**: Conversation continuity completely broken
- Users cannot reference previous responses
- Each request treated as isolated, losing all context
### User Experience Impact:
- **HIGH**: Responses generated for wrong topics
- Frustration when "based on above inputs" is ignored
- Loss of trust in system reliability
### Performance Impact:
- **LOW**: Cache is working (too well - preventing fresh data retrieval)
- Database queries being avoided (but should happen after context generation)
## Conclusion
The interaction context system is **architecturally sound** but has a **critical cache invalidation bug**:
1. β
Interaction contexts are correctly generated
2. β
Interaction contexts are correctly stored in database
3. β
Database retrieval query is correctly implemented
4. β Cache is never invalidated after interaction context generation
5. β Cache hit prevents database query from executing
6. β Stale cached context (with 0 interaction contexts) is returned
**The fix requires** invalidating both:
- Context Manager's `session_cache` after `generate_interaction_context()`
- Orchestrator's `_context_cache` after `generate_interaction_context()`
This will force fresh database queries on subsequent requests, allowing stored interaction contexts to be retrieved and used.
## Files Involved
1. `src/orchestrator_engine.py` - Lines 442-450 (missing cache invalidation)
2. `src/orchestrator_engine.py` - Lines 83-113 (orchestrator cache)
3. `src/context_manager.py` - Lines 235-289 (session cache management)
4. `src/context_manager.py` - Lines 396-451 (interaction context generation)
## Additional Notes
- The cache mechanism itself is working as designed (performance optimization)
- The bug is in the **cache lifecycle management** (invalidation timing)
- Database operations are functioning correctly
- The issue is purely in the caching layer, not the persistence layer
|