feat: agent add context

2026-05-04 01:18:05 +08:00 · 2026-01-16 11:28:49 +08:00
parent 2591615a3c
commit a7826d9ea4
10 changed files with 458 additions and 681 deletions
--- a/api/core/memory/README.md
+++ b/api/core/memory/README.md
@ -7,7 +7,7 @@ This module provides memory management for LLM conversations, enabling context r
 The memory module contains two types of memory implementations:

 1. **TokenBufferMemory** - Conversation-level memory (existing)
-2. **NodeTokenBufferMemory** - Node-level memory (to be implemented, **Chatflow only**)
+2. **NodeTokenBufferMemory** - Node-level memory (**Chatflow only**)

 > **Note**: `NodeTokenBufferMemory` is only available in **Chatflow** (advanced-chat mode).
 > This is because it requires both `conversation_id` and `node_id`, which are only present in Chatflow.
@ -28,8 +28,8 @@ The memory module contains two types of memory implementations:
 │  ┌─────────────────────────────────────────────────────────────────────-┐   │
 │  │                    NodeTokenBufferMemory                             │   │
 │  │  Scope: Node within Conversation                                     │   │
-│  │  Storage: Object Storage (JSON file)                                 │   │
-│  │  Key: (app_id, conversation_id, node_id)                             │   │
+│  │  Storage: WorkflowNodeExecutionModel.outputs["context"]              │   │
+│  │  Key: (conversation_id, node_id, workflow_run_id)                    │   │
 │  └─────────────────────────────────────────────────────────────────────-┘   │
 │                                                                             │
 └─────────────────────────────────────────────────────────────────────────────┘
@ -98,7 +98,7 @@ history = memory.get_history_prompt_messages(max_token_limit=2000, message_limit

 ---

-## NodeTokenBufferMemory (To Be Implemented)
+## NodeTokenBufferMemory

 ### Purpose

@ -110,114 +110,69 @@ history = memory.get_history_prompt_messages(max_token_limit=2000, message_limit
 2. **Iterative Processing**: An LLM node in a loop needs to accumulate context across iterations
 3. **Specialized Agents**: Each agent node maintains its own dialogue history

-### Design Decisions
+### Design: Zero Extra Storage

-#### Storage: Object Storage for Messages (No New Database Table)
+**Key insight**: LLM node already saves complete context in `outputs["context"]`.

-| Aspect                    | Database             | Object Storage     |
-| ------------------------- | -------------------- | ------------------ |
-| Cost                      | High                 | Low                |
-| Query Flexibility         | High                 | Low                |
-| Schema Changes            | Migration required   | None               |
-| Consistency with existing | ConversationVariable | File uploads, logs |
-
-**Decision**: Store message data in object storage, but still use existing database tables for file metadata.
-
-**What is stored in Object Storage:**
-
- Message content (text)
- Message metadata (role, token_count, created_at)
- File references (upload_file_id, tool_file_id, etc.)
- Thread relationships (message_id, parent_message_id)
-
-**What still requires Database queries:**
-
- File reconstruction: When reading node memory, file references are used to query
-  `UploadFile` / `ToolFile` tables via `file_factory.build_from_mapping()` to rebuild
-  complete `File` objects with storage_key, mime_type, etc.
-
-**Why this hybrid approach:**
-
- No database migration required (no new tables)
- Message data may be large, object storage is cost-effective
- File metadata is already in database, no need to duplicate
- Aligns with existing storage patterns (file uploads, logs)
-
-#### Storage Key Format
-
-```
-node_memory/{app_id}/{conversation_id}/{node_id}.json
-```
-
-#### Data Structure
-
-```json
-{
-  "version": 1,
-  "messages": [
-    {
-      "message_id": "msg-001",
-      "parent_message_id": null,
-      "role": "user",
-      "content": "Analyze this image",
-      "files": [
-        {
-          "type": "image",
-          "transfer_method": "local_file",
-          "upload_file_id": "file-uuid-123",
-          "belongs_to": "user"
-        }
-      ],
-      "token_count": 15,
-      "created_at": "2026-01-07T10:00:00Z"
-    },
-    {
-      "message_id": "msg-002",
-      "parent_message_id": "msg-001",
-      "role": "assistant",
-      "content": "This is a landscape image...",
-      "files": [],
-      "token_count": 50,
-      "created_at": "2026-01-07T10:00:01Z"
-    }
-  ]
+Each LLM node execution outputs:
+```python
+outputs = {
+    "text": clean_text,
+    "context": self._build_context(prompt_messages, clean_text),  # Complete dialogue history!
+    ...
 }
 ```

-### Thread Support
+This `outputs["context"]` contains:
+- All previous user/assistant messages (excluding system prompt)
+- The current assistant response

-Node memory also supports thread extraction (for regeneration scenarios):
+**No separate storage needed** - we just read from the last execution's `outputs["context"]`.

-```python
-def _extract_thread(
-    self,
-    messages: list[NodeMemoryMessage],
-    current_message_id: str
-) -> list[NodeMemoryMessage]:
-    """
-    Extract messages belonging to the thread of current_message_id.
-    Similar to extract_thread_messages() in TokenBufferMemory.
-    """
-    ...
+### Benefits
+
+| Aspect | Old Design (Object Storage) | New Design (outputs["context"]) |
+|--------|----------------------------|--------------------------------|
+| Storage | Separate JSON file | Already in WorkflowNodeExecutionModel |
+| Concurrency | Race condition risk | No issue (each execution is INSERT) |
+| Cleanup | Need separate cleanup task | Follows node execution lifecycle |
+| Migration | Required | None |
+| Complexity | High | Low |
+
+### Data Flow
+
+```
+WorkflowNodeExecutionModel        NodeTokenBufferMemory           LLM Node
+     │                                  │                           │
+     │                                  │◀── get_history_prompt_messages()
+     │                                  │                           │
+     │  SELECT outputs FROM             │                           │
+     │  workflow_node_executions        │                           │
+     │  WHERE workflow_run_id = ?       │                           │
+     │  AND node_id = ?                 │                           │
+     │◀─────────────────────────────────┤                           │
+     │                                  │                           │
+     │  outputs["context"]              │                           │
+     ├─────────────────────────────────▶│                           │
+     │                                  │                           │
+     │                    deserialize PromptMessages                │
+     │                                  │                           │
+     │                    truncate by max_token_limit               │
+     │                                  │                           │
+     │                                  │  Sequence[PromptMessage]  │
+     │                                  ├──────────────────────────▶│
+     │                                  │                           │
 ```

-### File Handling
+### Thread Tracking

-Files are stored as references (not full metadata):
+Thread extraction still uses `Message` table's `parent_message_id` structure:

-```python
-class NodeMemoryFile(BaseModel):
-    type: str                        # image, audio, video, document, custom
-    transfer_method: str             # local_file, remote_url, tool_file
-    upload_file_id: str | None       # for local_file
-    tool_file_id: str | None         # for tool_file
-    url: str | None                  # for remote_url
-    belongs_to: str                  # user / assistant
-```
+1. Query `Message` table for conversation → get thread's `workflow_run_ids`
+2. Get the last completed `workflow_run_id` in the thread
+3. Query `WorkflowNodeExecutionModel` for that execution's `outputs["context"]`

-When reading, files are rebuilt using `file_factory.build_from_mapping()`.
-
-### API Design
+### API

 ```python
 class NodeTokenBufferMemory:
@ -226,160 +181,29 @@ class NodeTokenBufferMemory:
        app_id: str,
        conversation_id: str,
        node_id: str,
+        tenant_id: str,
        model_instance: ModelInstance,
    ):
-        """
-        Initialize node-level memory.
-
-        :param app_id: Application ID
-        :param conversation_id: Conversation ID
-        :param node_id: Node ID in the workflow
-        :param model_instance: Model instance for token counting
-        """
-        ...
-
-    def add_messages(
-        self,
-        message_id: str,
-        parent_message_id: str | None,
-        user_content: str,
-        user_files: Sequence[File],
-        assistant_content: str,
-        assistant_files: Sequence[File],
-    ) -> None:
-        """
-        Append a dialogue turn (user + assistant) to node memory.
-        Call this after LLM node execution completes.
-
-        :param message_id: Current message ID (from Message table)
-        :param parent_message_id: Parent message ID (for thread tracking)
-        :param user_content: User's text input
-        :param user_files: Files attached by user
-        :param assistant_content: Assistant's text response
-        :param assistant_files: Files generated by assistant
-        """
+        """Initialize node-level memory."""
        ...

    def get_history_prompt_messages(
        self,
-        current_message_id: str,
-        tenant_id: str,
+        *,
        max_token_limit: int = 2000,
-        file_upload_config: FileUploadConfig | None = None,
+        message_limit: int | None = None,
    ) -> Sequence[PromptMessage]:
        """
        Retrieve history as PromptMessage sequence.
-
-        :param current_message_id: Current message ID (for thread extraction)
-        :param tenant_id: Tenant ID (for file reconstruction)
-        :param max_token_limit: Maximum tokens for history
-        :param file_upload_config: File upload configuration
-        :return: Sequence of PromptMessage for LLM context
+        
+        Reads from last completed execution's outputs["context"].
        """
        ...

-    def flush(self) -> None:
-        """
-        Persist buffered changes to object storage.
-        Call this at the end of node execution.
-        """
-        ...
-
-    def clear(self) -> None:
-        """
-        Clear all messages in this node's memory.
-        """
-        ...
-```
-
-### Data Flow
-
-```
-Object Storage                  NodeTokenBufferMemory           LLM Node
-     │                                  │                           │
-     │                                  │◀── get_history_prompt_messages()
-     │  storage.load(key)               │                           │
-     │◀─────────────────────────────────┤                           │
-     │                                  │                           │
-     │  JSON data                       │                           │
-     ├─────────────────────────────────▶│                           │
-     │                                  │                           │
-     │                    _extract_thread()                         │
-     │                                  │                           │
-     │                    _rebuild_files() via file_factory         │
-     │                                  │                           │
-     │                    _build_prompt_messages()                  │
-     │                                  │                           │
-     │                    _truncate_by_tokens()                     │
-     │                                  │                           │
-     │                                  │  Sequence[PromptMessage]  │
-     │                                  ├──────────────────────────▶│
-     │                                  │                           │
-     │                                  │◀── LLM execution complete │
-     │                                  │                           │
-     │                                  │◀── add_messages()         │
-     │                                  │                           │
-     │  storage.save(key, data)         │                           │
-     │◀─────────────────────────────────┤                           │
-     │                                  │                           │
-```
-
-### Integration with LLM Node
-
-```python
-# In LLM Node execution
-
-# 1. Fetch memory based on mode
-if node_data.memory and node_data.memory.mode == MemoryMode.NODE:
-    # Node-level memory (Chatflow only)
-    memory = fetch_node_memory(
-        variable_pool=variable_pool,
-        app_id=app_id,
-        node_id=self.node_id,
-        node_data_memory=node_data.memory,
-        model_instance=model_instance,
-    )
-elif node_data.memory and node_data.memory.mode == MemoryMode.CONVERSATION:
-    # Conversation-level memory (existing behavior)
-    memory = fetch_memory(
-        variable_pool=variable_pool,
-        app_id=app_id,
-        node_data_memory=node_data.memory,
-        model_instance=model_instance,
-    )
-else:
-    memory = None
-
-# 2. Get history for context
-if memory:
-    if isinstance(memory, NodeTokenBufferMemory):
-        history = memory.get_history_prompt_messages(
-            current_message_id=current_message_id,
-            tenant_id=tenant_id,
-            max_token_limit=max_token_limit,
-        )
-    else:  # TokenBufferMemory
-        history = memory.get_history_prompt_messages(
-            max_token_limit=max_token_limit,
-        )
-    prompt_messages = [*history, *current_messages]
-else:
-    prompt_messages = current_messages
-
-# 3. Call LLM
-response = model_instance.invoke(prompt_messages)
-
-# 4. Append to node memory (only for NodeTokenBufferMemory)
-if isinstance(memory, NodeTokenBufferMemory):
-    memory.add_messages(
-        message_id=message_id,
-        parent_message_id=parent_message_id,
-        user_content=user_input,
-        user_files=user_files,
-        assistant_content=response.content,
-        assistant_files=response_files,
-    )
-    memory.flush()
+    # Legacy methods (no-op, kept for compatibility)
+    def add_messages(self, *args, **kwargs) -> None: pass
+    def flush(self) -> None: pass
+    def clear(self) -> None: pass
 ```

 ### Configuration
@ -388,16 +212,13 @@ Add to `MemoryConfig` in `core/workflow/nodes/llm/entities.py`:

 ```python
 class MemoryMode(StrEnum):
-    CONVERSATION = "conversation"  # Use TokenBufferMemory (default, existing behavior)
-    NODE = "node"                  # Use NodeTokenBufferMemory (new, Chatflow only)
+    CONVERSATION = "conversation"  # Use TokenBufferMemory (default)
+    NODE = "node"                  # Use NodeTokenBufferMemory (Chatflow only)

 class MemoryConfig(BaseModel):
-    # Existing fields
    role_prefix: RolePrefix | None = None
    window: MemoryWindowConfig | None = None
    query_prompt_template: str | None = None
-
-    # Memory mode (new)
    mode: MemoryMode = MemoryMode.CONVERSATION
 ```

@ -408,27 +229,39 @@ class MemoryConfig(BaseModel):
 | `conversation` | TokenBufferMemory     | Entire conversation      | All app modes |
 | `node`         | NodeTokenBufferMemory | Per-node in conversation | Chatflow only |

-> When `mode=node` is used in a non-Chatflow context (no conversation_id), it should
-> fall back to no memory or raise a configuration error.
+> When `mode=node` is used in a non-Chatflow context (no conversation_id), it falls back to no memory.

 ---

 ## Comparison

-| Feature        | TokenBufferMemory        | NodeTokenBufferMemory     |
-| -------------- | ------------------------ | ------------------------- |
-| Scope          | Conversation             | Node within Conversation  |
-| Storage        | Database (Message table) | Object Storage (JSON)     |
-| Thread Support | Yes                      | Yes                       |
-| File Support   | Yes (via MessageFile)    | Yes (via file references) |
-| Token Limit    | Yes                      | Yes                       |
-| Use Case       | Standard chat apps       | Complex workflows         |
+| Feature        | TokenBufferMemory        | NodeTokenBufferMemory              |
+| -------------- | ------------------------ | ---------------------------------- |
+| Scope          | Conversation             | Node within Conversation           |
+| Storage        | Database (Message table) | WorkflowNodeExecutionModel.outputs |
+| Thread Support | Yes                      | Yes                                |
+| File Support   | Yes (via MessageFile)    | Yes (via context serialization)    |
+| Token Limit    | Yes                      | Yes                                |
+| Use Case       | Standard chat apps       | Complex workflows                  |
+
+---
+
+## Extending to Other Nodes
+
+Currently only **LLM Node** outputs `context` in its outputs. To enable node memory for other nodes:
+
+1. Add `outputs["context"] = self._build_context(prompt_messages, response)` in the node
+2. The `NodeTokenBufferMemory` will automatically pick it up
+
+Nodes that could potentially support this:
+- `question_classifier`
+- `parameter_extractor`
+- `agent`

 ---

 ## Future Considerations

-1. **Cleanup Task**: Add a Celery task to clean up old node memory files
-2. **Concurrency**: Consider Redis lock for concurrent node executions
-3. **Compression**: Compress large memory files to reduce storage costs
-4. **Extension**: Other nodes (Agent, Tool) may also benefit from node-level memory
+1. **Cleanup**: Node memory lifecycle follows `WorkflowNodeExecutionModel`, which already has cleanup mechanisms
+2. **Compression**: For very long conversations, consider summarization strategies
+3. **Extension**: Other nodes may benefit from node-level memory