Spaces:

JatinAutonomousLabs
/

Research_AI_Assistant

Sleeping

File size: 10,042 Bytes

93f44e2

# Interaction Context Retrieval Failure - Root Cause Analysis

## Executive Summary

Interaction contexts are being **stored correctly** in the database, but are **not being retrieved** on subsequent requests due to a cache invalidation failure. The system returns stale cached context that doesn't include newly generated interaction contexts.

## Problem Statement

When a user submits a request referencing previous context (e.g., "based on above inputs"), the system reports `Context retrieved: 0 interaction contexts`, causing:
- Loss of conversation continuity
- Responses generated for wrong topics
- Previous interaction context unavailable to agents

## Root Cause Analysis

### The Caching Flow

The system uses a two-tier caching mechanism:

1. **Context Manager Cache** (`src/context_manager.py`):
   - Key: `session_{session_id}`
   - Storage: `self.session_cache` dictionary
   - Purpose: Cache session context to avoid database queries

2. **Orchestrator Cache** (`src/orchestrator_engine.py`):
   - Key: `context_{session_id}`
   - Storage: `self._context_cache` dictionary
   - TTL: 5 seconds
   - Purpose: Prevent rapid repeated context retrieval within same request processing

### The Failure Sequence

#### **First Request (Working - Context Storage)**:
```
1. User: "Tell me about Excel handling"
2. orchestrator.process_request() called
3. _get_or_create_context() checks orchestrator cache → MISS (empty)
4. Calls context_manager.manage_context()
5. manage_context() checks session_cache → MISS (empty)
6. Calls _retrieve_from_db()
7. Database query: SELECT interaction_summary FROM interaction_contexts WHERE session_id = ?
   → Returns 0 rows (new session)
8. Returns context: { interaction_contexts: [] }
9. Caches in session_cache: session_cache["session_cca279a4"] = { interaction_contexts: [] }
10. Response generated about Excel handling
11. generate_interaction_context() called
12. LLM generates 50-token summary
13. Database INSERT: INSERT INTO interaction_contexts (interaction_id, session_id, ...)
    → ✅ SUCCESS: Interaction context stored in database
14. **CRITICAL MISSING STEP**: Cache NOT invalidated
```

#### **Second Request (Broken - Context Retrieval)**:
```
1. User: "Based on above inputs, create a prototype"
2. orchestrator.process_request() called
3. _get_or_create_context() checks orchestrator cache:
   - If < 5 seconds old → Returns cached context (from step 1)
   - OR continues to step 4
4. Calls context_manager.manage_context()
5. manage_context() checks session_cache:
   session_cache.get("session_cca279a4")
   → ✅ CACHE HIT: Returns cached context from first request
   → Contains: { interaction_contexts: [] }
6. **NEVER queries database** because cache hit
7. Context returned with 0 interaction contexts
8. Logs show: "Context retrieved: 0 interaction contexts"
9. Intent agent receives empty context
10. Skills agent analyzes wrong topic
11. Response generated for wrong context (story generation, not Excel)
```

### Root Cause Identified

**PRIMARY ISSUE**: Cache Invalidation Failure

After `generate_interaction_context()` successfully stores an interaction context in the database, **the cache is never invalidated**. This causes:

1. **First Request**: Context cached with `interaction_contexts = []`
2. **Interaction Context Generated**: Stored in database ✅
3. **Cache Not Cleared**: `session_cache["session_{session_id}"]` still contains old context
4. **Second Request**: Cache hit returns stale context with 0 interaction contexts
5. **Database Never Queried**: Cache check happens before database query

**Location of Issue**:
- File: `src/orchestrator_engine.py`
- Method: `process_request()`
- Lines: 442-450 (after `generate_interaction_context()` call)
- **Missing**: Cache invalidation after interaction context generation

### Secondary Issues

#### Issue 2: Orchestrator-Level Cache Also Not Cleared

The orchestrator maintains its own cache (`_context_cache`) with a 5-second TTL. If requests come within 5 seconds:

- **Orchestrator cache hit**: Returns cached context immediately
- **Context manager never called**: Never checks session_cache or database
- **Result**: Even if session_cache were cleared, orchestrator cache would still return stale data

**Location**:
- File: `src/orchestrator_engine.py`
- Method: `_get_or_create_context()`
- Lines: 89-93

#### Issue 3: No Detection of Context Reference Mismatches

When a user explicitly references previous context (e.g., "based on above inputs"), but the system has 0 interaction contexts, there's no mechanism to:

1. Detect the mismatch
2. Force cache invalidation
3. Re-query the database
4. Warn about potential context loss

**Location**:
- File: `src/orchestrator_engine.py`
- Method: `process_request()`
- Lines: 172-174 (context retrieval happens, but no validation)

## Code Flow Analysis

### Storage Flow (Working)

```
orchestrator.process_request()
  └─> generate_interaction_context()
       └─> llm_router.route_inference() → Generate summary
       └─> Database INSERT → Store in interaction_contexts table
       └─> ✅ SUCCESS: Stored in database
       └─> ❌ MISSING: Cache invalidation
```

### Retrieval Flow (Broken)

```
orchestrator.process_request()
  └─> _get_or_create_context()
       ├─> Check orchestrator cache (5s TTL)
       │   └─> If hit: Return cached (may be stale)
       └─> manage_context()
           ├─> Check session_cache
           │   └─> If hit: Return cached (STALE - has 0 contexts)
           └─> _retrieve_from_db() (NEVER REACHED if cache hit)
               └─> Query: SELECT FROM interaction_contexts WHERE session_id = ?
                   └─> Would return stored contexts, but never called
```

## Database Verification

The interaction context **IS being stored** correctly. Evidence:

1. **Log Entry**:
   ```
   2025-10-31 06:55:55,481 - src.context_manager - INFO - ✓ Generated interaction context for 64d4ace2_15ca4dec_1761890055
   ```

2. **Storage Code** (src/context_manager.py:426-438):
   ```python
   cursor.execute("""
       INSERT OR REPLACE INTO interaction_contexts 
       (interaction_id, session_id, user_input, system_response, interaction_summary, created_at)
       VALUES (?, ?, ?, ?, ?, ?)
   """, (interaction_id, session_id, user_input[:500], system_response[:1000], summary.strip(), datetime.now().isoformat()))
   conn.commit()
   conn.close()
   ```
   ✅ This executes successfully and commits

3. **Retrieval Code** (src/context_manager.py:656-671):
   ```python
   cursor.execute("""
       SELECT interaction_summary, created_at, needs_refresh
       FROM interaction_contexts
       WHERE session_id = ? AND (needs_refresh IS NULL OR needs_refresh = 0)
       ORDER BY created_at DESC
       LIMIT 20
   """, (session_id,))
   ```
   ✅ This query would work, but is never executed due to cache hit

## Cache Invalidation Points

Current cache invalidation only happens in these scenarios:

1. **Session End**: `end_session()` clears cache (line 534-536)
2. **User Change**: User mismatch detection clears cache (line 254-255)
3. **Never**: After generating interaction context ❌

## Expected vs Actual Behavior

### Expected Behavior:
```
Request 1 → Generate context → Store in DB → Clear cache
Request 2 → Cache miss → Query DB → Find stored context → Use it
```

### Actual Behavior:
```
Request 1 → Generate context → Store in DB → Keep cache (stale)
Request 2 → Cache hit → Return stale cache (0 contexts) → Never query DB
```

## Evidence from Logs

```
# First Request - Context Generation
2025-10-31 06:55:55,481 - src.context_manager - INFO - ✓ Generated interaction context for 64d4ace2_15ca4dec_1761890055

# Second Request - Cache Hit (No DB Query)
2025-10-31 07:02:55,911 - src.context_manager - INFO - Context retrieved: 0 interaction contexts
```

**Time Gap**: 7 minutes between requests (well beyond 5-second orchestrator cache TTL)
**Result**: Still 0 contexts → Session cache hit, database never queried

## Impact Assessment

### Functional Impact:
- **HIGH**: Conversation continuity completely broken
- Users cannot reference previous responses
- Each request treated as isolated, losing all context

### User Experience Impact:
- **HIGH**: Responses generated for wrong topics
- Frustration when "based on above inputs" is ignored
- Loss of trust in system reliability

### Performance Impact:
- **LOW**: Cache is working (too well - preventing fresh data retrieval)
- Database queries being avoided (but should happen after context generation)

## Conclusion

The interaction context system is **architecturally sound** but has a **critical cache invalidation bug**:

1. ✅ Interaction contexts are correctly generated
2. ✅ Interaction contexts are correctly stored in database
3. ✅ Database retrieval query is correctly implemented
4. ❌ Cache is never invalidated after interaction context generation
5. ❌ Cache hit prevents database query from executing
6. ❌ Stale cached context (with 0 interaction contexts) is returned

**The fix requires** invalidating both:
- Context Manager's `session_cache` after `generate_interaction_context()`
- Orchestrator's `_context_cache` after `generate_interaction_context()`

This will force fresh database queries on subsequent requests, allowing stored interaction contexts to be retrieved and used.

## Files Involved

1. `src/orchestrator_engine.py` - Lines 442-450 (missing cache invalidation)
2. `src/orchestrator_engine.py` - Lines 83-113 (orchestrator cache)
3. `src/context_manager.py` - Lines 235-289 (session cache management)
4. `src/context_manager.py` - Lines 396-451 (interaction context generation)

## Additional Notes

- The cache mechanism itself is working as designed (performance optimization)
- The bug is in the **cache lifecycle management** (invalidation timing)
- Database operations are functioning correctly
- The issue is purely in the caching layer, not the persistence layer