Merge remote-tracking branch 'origin/main' into feat/support-agent-sandbox

2026-05-03 17:08:03 +08:00 · 2026-01-21 14:52:11 +08:00
parent b94b7860d9 4b068022e1
commit b0a059250a
155 changed files with 1147 additions and 464 deletions
--- a/api/.env.example
+++ b/api/.env.example
@ -715,6 +715,7 @@ ANNOTATION_IMPORT_MAX_CONCURRENT=5
 SANDBOX_EXPIRED_RECORDS_CLEAN_GRACEFUL_PERIOD=21
 SANDBOX_EXPIRED_RECORDS_CLEAN_BATCH_SIZE=1000
 SANDBOX_EXPIRED_RECORDS_RETENTION_DAYS=30
+SANDBOX_EXPIRED_RECORDS_CLEAN_TASK_LOCK_TTL=90000

 # Sandbox Dify CLI configuration
 # Directory containing dify CLI binaries (dify-cli-<os>-<arch>). Defaults to api/bin when unset.
--- a/api/agent-notes/controllers/console/datasets/datasets_document.py.md
+++ b/api/agent-notes/controllers/console/datasets/datasets_document.py.md
@ -1,52 +0,0 @@
-## Purpose
-
-`api/controllers/console/datasets/datasets_document.py` contains the console (authenticated) APIs for managing dataset documents (list/create/update/delete, processing controls, estimates, etc.).
-
-## Storage model (uploaded files)
-
- For local file uploads into a knowledge base, the binary is stored via `extensions.ext_storage.storage` under the key:
-  - `upload_files/<tenant_id>/<uuid>.<ext>`
- File metadata is stored in the `upload_files` table (`UploadFile` model), keyed by `UploadFile.id`.
- Dataset `Document` records reference the uploaded file via:
-  - `Document.data_source_info.upload_file_id`
-
-## Download endpoint
-
- `GET /datasets/<dataset_id>/documents/<document_id>/download`
-
-  - Only supported when `Document.data_source_type == "upload_file"`.
-  - Performs dataset permission + tenant checks via `DocumentResource.get_document(...)`.
-  - Delegates `Document -> UploadFile` validation and signed URL generation to `DocumentService.get_document_download_url(...)`.
-  - Applies `cloud_edition_billing_rate_limit_check("knowledge")` to match other KB operations.
-  - Response body is **only**: `{ "url": "<signed-url>" }`.
-
- `POST /datasets/<dataset_id>/documents/download-zip`
-
-  - Accepts `{ "document_ids": ["..."] }` (upload-file only).
-  - Returns `application/zip` as a single attachment download.
-  - Rationale: browsers often block multiple automatic downloads; a ZIP avoids that limitation.
-  - Applies `cloud_edition_billing_rate_limit_check("knowledge")`.
-  - Delegates dataset permission checks, document/upload-file validation, and download-name generation to
-    `DocumentService.prepare_document_batch_download_zip(...)` before streaming the ZIP.
-
-## Verification plan
-
- Upload a document from a local file into a dataset.
- Call the download endpoint and confirm it returns a signed URL.
- Open the URL and confirm:
-  - Response headers force download (`Content-Disposition`), and
-  - Downloaded bytes match the uploaded file.
- Select multiple uploaded-file documents and download as ZIP; confirm all selected files exist in the archive.
-
-## Shared helper
-
- `DocumentService.get_document_download_url(document)` resolves the `UploadFile` and signs a download URL.
- `DocumentService.prepare_document_batch_download_zip(...)` performs dataset permission checks, batches
-  document + upload file lookups, preserves request order, and generates the client-visible ZIP filename.
- Internal helpers now live in `DocumentService` (`_get_upload_file_id_for_upload_file_document(...)`,
-  `_get_upload_file_for_upload_file_document(...)`, `_get_upload_files_by_document_id_for_zip_download(...)`).
- ZIP packing is handled by `FileService.build_upload_files_zip_tempfile(...)`, which also:
-  - sanitizes entry names to avoid path traversal, and
-  - deduplicates names while preserving extensions (e.g., `doc.txt` → `doc (1).txt`).
-    Streaming the response and deferring cleanup is handled by the route via `send_file(path, ...)` + `ExitStack` +
-    `response.call_on_close(...)` (the file is deleted when the response is closed).
--- a/api/agent-notes/services/dataset_service.py.md
+++ b/api/agent-notes/services/dataset_service.py.md
@ -1,18 +0,0 @@
-## Purpose
-
-`api/services/dataset_service.py` hosts dataset/document service logic used by console and API controllers.
-
-## Batch document operations
-
- Batch document workflows should avoid N+1 database queries by using set-based lookups.
- Tenant checks must be enforced consistently across dataset/document operations.
- `DocumentService.get_documents_by_ids(...)` fetches documents for a dataset using `id.in_(...)`.
- `FileService.get_upload_files_by_ids(...)` performs tenant-scoped batch lookup for `UploadFile` (dedupes ids with `set(...)`).
- `DocumentService.get_document_download_url(...)` and `prepare_document_batch_download_zip(...)` handle
-  dataset/document permission checks plus `Document -> UploadFile` validation for download endpoints.
-
-## Verification plan
-
- Exercise document list and download endpoints that use the service helpers.
- Confirm batch download uses constant query count for documents + upload files.
- Request a ZIP with a missing document id and confirm a 404 is returned.
--- a/api/agent-notes/services/file_service.py.md
+++ b/api/agent-notes/services/file_service.py.md
@ -1,35 +0,0 @@
-## Purpose
-
-`api/services/file_service.py` owns business logic around `UploadFile` objects: upload validation, storage persistence,
-previews/generators, and deletion.
-
-## Key invariants
-
- All storage I/O goes through `extensions.ext_storage.storage`.
- Uploaded file keys follow: `upload_files/<tenant_id>/<uuid>.<ext>`.
- Upload validation is enforced in `FileService.upload_file(...)` (blocked extensions, size limits, dataset-only types).
-
-## Batch lookup helpers
-
- `FileService.get_upload_files_by_ids(tenant_id, upload_file_ids)` is the canonical tenant-scoped batch loader for
-  `UploadFile`.
-
-## Dataset document download helpers
-
-The dataset document download/ZIP endpoints now delegate “Document → UploadFile” validation and permission checks to
-`DocumentService` (`api/services/dataset_service.py`). `FileService` stays focused on generic `UploadFile` operations
-(uploading, previews, deletion), plus generic ZIP serving.
-
-### ZIP serving
-
- `FileService.build_upload_files_zip_tempfile(...)` builds a ZIP from `UploadFile` objects and yields a seeked
-  tempfile **path** so callers can stream it (e.g., `send_file(path, ...)`) without hitting "read of closed file"
-  issues from file-handle lifecycle during streamed responses.
- Flask `send_file(...)` and the `ExitStack`/`call_on_close(...)` cleanup pattern are handled in the route layer.
-
-## Verification plan
-
- Unit: `api/tests/unit_tests/controllers/console/datasets/test_datasets_document_download.py`
-  - Verify signed URL generation for upload-file documents and ZIP download behavior for multiple documents.
- Unit: `api/tests/unit_tests/services/test_file_service_zip_and_lookup.py`
-  - Verify ZIP packing produces a valid, openable archive and preserves file content.
--- a/api/agent-notes/tests/unit_tests/controllers/console/datasets/test_datasets_document_download.py.md
+++ b/api/agent-notes/tests/unit_tests/controllers/console/datasets/test_datasets_document_download.py.md
@ -1,28 +0,0 @@
-## Purpose
-
-Unit tests for the console dataset document download endpoint:
-
- `GET /datasets/<dataset_id>/documents/<document_id>/download`
-
-## Testing approach
-
- Uses `Flask.test_request_context()` and calls the `Resource.get(...)` method directly.
- Monkeypatches console decorators (`login_required`, `setup_required`, rate limit) to no-ops to keep the test focused.
- Mocks:
-  - `DatasetService.get_dataset` / `check_dataset_permission`
-  - `DocumentService.get_document` for single-file download tests
-  - `DocumentService.get_documents_by_ids` + `FileService.get_upload_files_by_ids` for ZIP download tests
-  - `FileService.get_upload_files_by_ids` for `UploadFile` lookups in single-file tests
-  - `services.dataset_service.file_helpers.get_signed_file_url` to return a deterministic URL
- Document mocks include `id` fields so batch lookups can map documents by id.
-
-## Covered cases
-
- Success returns `{ "url": "<signed>" }` for upload-file documents.
- 404 when document is not `upload_file`.
- 404 when `upload_file_id` is missing.
- 404 when referenced `UploadFile` row does not exist.
- 403 when document tenant does not match current tenant.
- Batch ZIP download returns `application/zip` for upload-file documents.
- Batch ZIP download rejects non-upload-file documents.
- Batch ZIP download uses a random `.zip` attachment name (`download_name`), so tests only assert the suffix.
--- a/api/agent-notes/tests/unit_tests/services/test_file_service_zip_and_lookup.py.md
+++ b/api/agent-notes/tests/unit_tests/services/test_file_service_zip_and_lookup.py.md
@ -1,18 +0,0 @@
-## Purpose
-
-Unit tests for `api/services/file_service.py` helper methods that are not covered by higher-level controller tests.
-
-## What’s covered
-
- `FileService.build_upload_files_zip_tempfile(...)`
-  - ZIP entry name sanitization (no directory components / traversal)
-  - name deduplication while preserving extensions
-  - writing streamed bytes from `storage.load(...)` into ZIP entries
-  - yields a tempfile path so callers can open/stream the ZIP without holding a live file handle
- `FileService.get_upload_files_by_ids(...)`
-  - returns `{}` for empty id lists
-  - returns an id-keyed mapping for non-empty lists
-
-## Notes
-
- These tests intentionally stub `storage.load` and `db.session.scalars(...).all()` to avoid needing a real DB/storage.
--- a/api/agent_skills/infra.md
+++ b/api/agent_skills/infra.md
@ -1,96 +0,0 @@
-## Configuration
-
- Import `configs.dify_config` for every runtime toggle. Do not read environment variables directly.
- Add new settings to the proper mixin inside `configs/` (deployment, feature, middleware, etc.) so they load through `DifyConfig`.
- Remote overrides come from the optional providers in `configs/remote_settings_sources`; keep defaults in code safe when the value is missing.
- Example: logging pulls targets from `extensions/ext_logging.py`, and model provider URLs are assembled in `services/entities/model_provider_entities.py`.
-
-## Dependencies
-
- Runtime dependencies live in `[project].dependencies` inside `pyproject.toml`. Optional clients go into the `storage`, `tools`, or `vdb` groups under `[dependency-groups]`.
- Always pin versions and keep the list alphabetised. Shared tooling (lint, typing, pytest) belongs in the `dev` group.
- When code needs a new package, explain why in the PR and run `uv lock` so the lockfile stays current.
-
-## Storage & Files
-
- Use `extensions.ext_storage.storage` for all blob IO; it already respects the configured backend.
- Convert files for workflows with helpers in `core/file/file_manager.py`; they handle signed URLs and multimodal payloads.
- When writing controller logic, delegate upload quotas and metadata to `services/file_service.py` instead of touching storage directly.
- All outbound HTTP fetches (webhooks, remote files) must go through the SSRF-safe client in `core/helper/ssrf_proxy.py`; it wraps `httpx` with the allow/deny rules configured for the platform.
-
-## Redis & Shared State
-
- Access Redis through `extensions.ext_redis.redis_client`. For locking, reuse `redis_client.lock`.
- Prefer higher-level helpers when available: rate limits use `libs.helper.RateLimiter`, provider metadata uses caches in `core/helper/provider_cache.py`.
-
-## Models
-
- SQLAlchemy models sit in `models/` and inherit from the shared declarative `Base` defined in `models/base.py` (metadata configured via `models/engine.py`).
- `models/__init__.py` exposes grouped aggregates: account/tenant models, app and conversation tables, datasets, providers, workflow runs, triggers, etc. Import from there to avoid deep path churn.
- Follow the DDD boundary: persistence objects live in `models/`, repositories under `repositories/` translate them into domain entities, and services consume those repositories.
- When adding a table, create the model class, register it in `models/__init__.py`, wire a repository if needed, and generate an Alembic migration as described below.
-
-## Vector Stores
-
- Vector client implementations live in `core/rag/datasource/vdb/<provider>`, with a common factory in `core/rag/datasource/vdb/vector_factory.py` and enums in `core/rag/datasource/vdb/vector_type.py`.
- Retrieval pipelines call these providers through `core/rag/datasource/retrieval_service.py` and dataset ingestion flows in `services/dataset_service.py`.
- The CLI helper `flask vdb-migrate` orchestrates bulk migrations using routines in `commands.py`; reuse that pattern when adding new backend transitions.
- To add another store, mirror the provider layout, register it with the factory, and include any schema changes in Alembic migrations.
-
-## Observability & OTEL
-
- OpenTelemetry settings live under the observability mixin in `configs/observability`. Toggle exporters and sampling via `dify_config`, not ad-hoc env reads.
- HTTP, Celery, Redis, SQLAlchemy, and httpx instrumentation is initialised in `extensions/ext_app_metrics.py` and `extensions/ext_request_logging.py`; reuse these hooks when adding new workers or entrypoints.
- When creating background tasks or external calls, propagate tracing context with helpers in the existing instrumented clients (e.g. use the shared `httpx` session from `core/helper/http_client_pooling.py`).
- If you add a new external integration, ensure spans and metrics are emitted by wiring the appropriate OTEL instrumentation package in `pyproject.toml` and configuring it in `extensions/`.
-
-## Ops Integrations
-
- Langfuse support and other tracing bridges live under `core/ops/opik_trace`. Config toggles sit in `configs/observability`, while exporters are initialised in the OTEL extensions mentioned above.
- External monitoring services should follow this pattern: keep client code in `core/ops`, expose switches via `dify_config`, and hook initialisation in `extensions/ext_app_metrics.py` or sibling modules.
- Before instrumenting new code paths, check whether existing context helpers (e.g. `extensions/ext_request_logging.py`) already capture the necessary metadata.
-
-## Controllers, Services, Core
-
- Controllers only parse HTTP input and call a service method. Keep business rules in `services/`.
- Services enforce tenant rules, quotas, and orchestration, then call into `core/` engines (workflow execution, tools, LLMs).
- When adding a new endpoint, search for an existing service to extend before introducing a new layer. Example: workflow APIs pipe through `services/workflow_service.py` into `core/workflow`.
-
-## Plugins, Tools, Providers
-
- In Dify a plugin is a tenant-installable bundle that declares one or more providers (tool, model, datasource, trigger, endpoint, agent strategy) plus its resource needs and version metadata. The manifest (`core/plugin/entities/plugin.py`) mirrors what you see in the marketplace documentation.
- Installation, upgrades, and migrations are orchestrated by `services/plugin/plugin_service.py` together with helpers such as `services/plugin/plugin_migration.py`.
- Runtime loading happens through the implementations under `core/plugin/impl/*` (tool/model/datasource/trigger/endpoint/agent). These modules normalise plugin providers so that downstream systems (`core/tools/tool_manager.py`, `services/model_provider_service.py`, `services/trigger/*`) can treat builtin and plugin capabilities the same way.
- For remote execution, plugin daemons (`core/plugin/entities/plugin_daemon.py`, `core/plugin/impl/plugin.py`) manage lifecycle hooks, credential forwarding, and background workers that keep plugin processes in sync with the main application.
- Acquire tool implementations through `core/tools/tool_manager.py`; it resolves builtin, plugin, and workflow-as-tool providers uniformly, injecting the right context (tenant, credentials, runtime config).
- To add a new plugin capability, extend the relevant `core/plugin/entities` schema and register the implementation in the matching `core/plugin/impl` module rather than importing the provider directly.
-
-## Async Workloads
-
-see `agent_skills/trigger.md` for more detailed documentation.
-
- Enqueue background work through `services/async_workflow_service.py`. It routes jobs to the tiered Celery queues defined in `tasks/`.
- Workers boot from `celery_entrypoint.py` and execute functions in `tasks/workflow_execution_tasks.py`, `tasks/trigger_processing_tasks.py`, etc.
- Scheduled workflows poll from `schedule/workflow_schedule_tasks.py`. Follow the same pattern if you need new periodic jobs.
-
-## Database & Migrations
-
- SQLAlchemy models live under `models/` and map directly to migration files in `migrations/versions`.
- Generate migrations with `uv run --project api flask db revision --autogenerate -m "<summary>"`, then review the diff; never hand-edit the database outside Alembic.
- Apply migrations locally using `uv run --project api flask db upgrade`; production deploys expect the same history.
- If you add tenant-scoped data, confirm the upgrade includes tenant filters or defaults consistent with the service logic touching those tables.
-
-## CLI Commands
-
- Maintenance commands from `commands.py` are registered on the Flask CLI. Run them via `uv run --project api flask <command>`.
- Use the built-in `db` commands from Flask-Migrate for schema operations (`flask db upgrade`, `flask db stamp`, etc.). Only fall back to custom helpers if you need their extra behaviour.
- Custom entries such as `flask reset-password`, `flask reset-email`, and `flask vdb-migrate` handle self-hosted account recovery and vector database migrations.
- Before adding a new command, check whether an existing service can be reused and ensure the command guards edition-specific behaviour (many enforce `SELF_HOSTED`). Document any additions in the PR.
- Ruff helpers are run directly with `uv`: `uv run --project api --dev ruff format ./api` for formatting and `uv run --project api --dev ruff check ./api` (add `--fix` if you want automatic fixes).
-
-## When You Add Features
-
- Check for an existing helper or service before writing a new util.
- Uphold tenancy: every service method should receive the tenant ID from controller wrappers such as `controllers/console/wraps.py`.
- Update or create tests alongside behaviour changes (`tests/unit_tests` for fast coverage, `tests/integration_tests` when touching orchestrations).
- Run `uv run --project api --dev ruff check ./api`, `uv run --directory api --dev basedpyright`, and `uv run --project api --dev dev/pytest/pytest_unit_tests.sh` before submitting changes.
--- a/api/agent_skills/plugin.md
+++ b/api/agent_skills/plugin.md
@ -1 +0,0 @@
-// TBD
--- a/api/agent_skills/plugin_oauth.md
+++ b/api/agent_skills/plugin_oauth.md
@ -1 +0,0 @@
-// TBD
--- a/api/agent_skills/trigger.md
+++ b/api/agent_skills/trigger.md
@ -1,53 +0,0 @@
-## Overview
-
-Trigger is a collection of nodes that we called `Start` nodes, also, the concept of `Start` is the same as `RootNode` in the workflow engine `core/workflow/graph_engine`, On the other hand, `Start` node is the entry point of workflows, every workflow run always starts from a `Start` node.
-
-## Trigger nodes
-
- `UserInput`
- `Trigger Webhook`
- `Trigger Schedule`
- `Trigger Plugin`
-
-### UserInput
-
-Before `Trigger` concept is introduced, it's what we called `Start` node, but now, to avoid confusion, it was renamed to `UserInput` node, has a strong relation with `ServiceAPI` in `controllers/service_api/app`
-
-1. `UserInput` node introduces a list of arguments that need to be provided by the user, finally it will be converted into variables in the workflow variable pool.
-1. `ServiceAPI` accept those arguments, and pass through them into `UserInput` node.
-1. For its detailed implementation, please refer to `core/workflow/nodes/start`
-
-### Trigger Webhook
-
-Inside Webhook Node, Dify provided a UI panel that allows user define a HTTP manifest `core/workflow/nodes/trigger_webhook/entities.py`.`WebhookData`, also, Dify generates a random webhook id for each `Trigger Webhook` node, the implementation was implemented in `core/trigger/utils/endpoint.py`, as you can see, `webhook-debug` is a debug mode for webhook, you may find it in `controllers/trigger/webhook.py`.
-
-Finally, requests to `webhook` endpoint will be converted into variables in workflow variable pool during workflow execution.
-
-### Trigger Schedule
-
-`Trigger Schedule` node is a node that allows user define a schedule to trigger the workflow, detailed manifest is here `core/workflow/nodes/trigger_schedule/entities.py`, we have a poller and executor to handle millions of schedules, see `docker/entrypoint.sh` / `schedule/workflow_schedule_task.py` for help.
-
-To Achieve this, a `WorkflowSchedulePlan` model was introduced in `models/trigger.py`, and a `events/event_handlers/sync_workflow_schedule_when_app_published.py` was used to sync workflow schedule plans when app is published.
-
-### Trigger Plugin
-
-`Trigger Plugin` node allows user define there own distributed trigger plugin, whenever a request was received, Dify forwards it to the plugin and wait for parsed variables from it.
-
-1. Requests were saved in storage by `services/trigger/trigger_request_service.py`, referenced by `services/trigger/trigger_service.py`.`TriggerService`.`process_endpoint`
-1. Plugins accept those requests and parse variables from it, see `core/plugin/impl/trigger.py` for details.
-
-A `subscription` concept was out here by Dify, it means an endpoint address from Dify was bound to thirdparty webhook service like `Github` `Slack` `Linear` `GoogleDrive` `Gmail` etc. Once a subscription was created, Dify continually receives requests from the platforms and handle them one by one.
-
-## Worker Pool / Async Task
-
-All the events that triggered a new workflow run is always in async mode, a unified entrypoint can be found here `services/async_workflow_service.py`.`AsyncWorkflowService`.`trigger_workflow_async`.
-
-The infrastructure we used is `celery`, we've already configured it in `docker/entrypoint.sh`, and the consumers are in `tasks/async_workflow_tasks.py`, 3 queues were used to handle different tiers of users, `PROFESSIONAL_QUEUE` `TEAM_QUEUE` `SANDBOX_QUEUE`.
-
-## Debug Strategy
-
-Dify divided users into 2 groups: builders / end users.
-
-Builders are the users who create workflows, in this stage, debugging a workflow becomes a critical part of the workflow development process, as the start node in workflows, trigger nodes can `listen` to the events from `WebhookDebug` `Schedule` `Plugin`, debugging process was created in `controllers/console/app/workflow.py`.`DraftWorkflowTriggerNodeApi`.
-
-A polling process can be considered as combine of few single `poll` operations, each `poll` operation fetches events cached in `Redis`, returns `None` if no event was found, more detailed implemented: `core/trigger/debug/event_bus.py` was used to handle the polling process, and `core/trigger/debug/event_selectors.py` was used to select the event poller based on the trigger type.
--- a/api/configs/feature/init.py
+++ b/api/configs/feature/init.py
@ -1309,6 +1309,10 @@ class SandboxExpiredRecordsCleanConfig(BaseSettings):
        description="Retention days for sandbox expired workflow_run records and message records",
        default=30,
    )
+    SANDBOX_EXPIRED_RECORDS_CLEAN_TASK_LOCK_TTL: PositiveInt = Field(
+        description="Lock TTL for sandbox expired records clean task in seconds",
+        default=90000,
+    )


 class FeatureConfig(
--- a/api/context/flask_app_context.py
+++ b/api/context/flask_app_context.py
@ -9,7 +9,7 @@ from typing import Any, final

 from flask import Flask, current_app, g

-from context import register_context_capturer
+from core.workflow.context import register_context_capturer
 from core.workflow.context.execution_context import (
    AppContext,
    IExecutionContext,
--- a/api/core/workflow/context/init.py
+++ b/api/core/workflow/context/init.py
@ -7,16 +7,28 @@ execution in multi-threaded environments.

 from core.workflow.context.execution_context import (
    AppContext,
+    ContextProviderNotFoundError,
    ExecutionContext,
    IExecutionContext,
    NullAppContext,
    capture_current_context,
+    read_context,
+    register_context,
+    register_context_capturer,
+    reset_context_provider,
 )
+from core.workflow.context.models import SandboxContext

 __all__ = [
    "AppContext",
+    "ContextProviderNotFoundError",
    "ExecutionContext",
    "IExecutionContext",
    "NullAppContext",
+    "SandboxContext",
    "capture_current_context",
+    "read_context",
+    "register_context",
+    "register_context_capturer",
+    "reset_context_provider",
 ]
--- a/api/core/workflow/context/execution_context.py
+++ b/api/core/workflow/context/execution_context.py
@ -4,9 +4,11 @@ Execution Context - Abstracted context management for workflow execution.

 import contextvars
 from abc import ABC, abstractmethod
-from collections.abc import Generator
+from collections.abc import Callable, Generator
 from contextlib import AbstractContextManager, contextmanager
-from typing import Any, Protocol, final, runtime_checkable
+from typing import Any, Protocol, TypeVar, final, runtime_checkable
+
+from pydantic import BaseModel


 class AppContext(ABC):
@ -204,13 +206,75 @@ class ExecutionContextBuilder:
        )


+_capturer: Callable[[], IExecutionContext] | None = None
+
+# Tenant-scoped providers using tuple keys for clarity and constant-time lookup.
+# Key mapping:
+#   (name, tenant_id) -> provider
+# - name: namespaced identifier (recommend prefixing, e.g. "workflow.sandbox")
+# - tenant_id: tenant identifier string
+# Value:
+#   provider: Callable[[], BaseModel] returning the typed context value
+# Type-safety note:
+#   - This registry cannot enforce that all providers for a given name return the same BaseModel type.
+#   - Implementors SHOULD provide typed wrappers around register/read (like Go's context best practice),
+#     e.g. def register_sandbox_ctx(tenant_id: str, p: Callable[[], SandboxContext]) and
+#          def read_sandbox_ctx(tenant_id: str) -> SandboxContext.
+_tenant_context_providers: dict[tuple[str, str], Callable[[], BaseModel]] = {}
+
+T = TypeVar("T", bound=BaseModel)
+
+
+class ContextProviderNotFoundError(KeyError):
+    """Raised when a tenant-scoped context provider is missing for a given (name, tenant_id)."""
+
+    pass
+
+
+def register_context_capturer(capturer: Callable[[], IExecutionContext]) -> None:
+    """Register a single enterable execution context capturer (e.g., Flask)."""
+    global _capturer
+    _capturer = capturer
+
+
+def register_context(name: str, tenant_id: str, provider: Callable[[], BaseModel]) -> None:
+    """Register a tenant-specific provider for a named context.
+
+    Tip: use a namespaced "name" (e.g., "workflow.sandbox") to avoid key collisions.
+    Consider adding a typed wrapper for this registration in your feature module.
+    """
+    _tenant_context_providers[(name, tenant_id)] = provider
+
+
+def read_context(name: str, *, tenant_id: str) -> BaseModel:
+    """
+    Read a context value for a specific tenant.
+
+    Raises KeyError if the provider for (name, tenant_id) is not registered.
+    """
+    prov = _tenant_context_providers.get((name, tenant_id))
+    if prov is None:
+        raise ContextProviderNotFoundError(f"Context provider '{name}' not registered for tenant '{tenant_id}'")
+    return prov()
+
+
 def capture_current_context() -> IExecutionContext:
    """
    Capture current execution context from the calling environment.

-    Returns:
-        IExecutionContext with captured context
+    If a capturer is registered (e.g., Flask), use it. Otherwise, return a minimal
+    context with NullAppContext + copy of current contextvars.
    """
-    from context import capture_current_context
+    if _capturer is None:
+        return ExecutionContext(
+            app_context=NullAppContext(),
+            context_vars=contextvars.copy_context(),
+        )
+    return _capturer()

-    return capture_current_context()
+
+def reset_context_provider() -> None:
+    """Reset the capturer and all tenant-scoped context providers (primarily for tests)."""
+    global _capturer
+    _capturer = None
+    _tenant_context_providers.clear()
--- a/api/core/workflow/context/models.py
+++ b/api/core/workflow/context/models.py
@ -0,0 +1,13 @@
+from __future__ import annotations
+
+from pydantic import AnyHttpUrl, BaseModel
+
+
+class SandboxContext(BaseModel):
+    """Typed context for sandbox integration. All fields optional by design."""
+
+    sandbox_url: AnyHttpUrl | None = None
+    sandbox_token: str | None = None  # optional, if later needed for auth
+
+
+__all__ = ["SandboxContext"]
--- a/api/schedule/clean_messages.py
+++ b/api/schedule/clean_messages.py
@ -2,9 +2,11 @@ import logging
 import time

 import click
+from redis.exceptions import LockError

 import app
 from configs import dify_config
+from extensions.ext_redis import redis_client
 from services.retention.conversation.messages_clean_policy import create_message_clean_policy
 from services.retention.conversation.messages_clean_service import MessagesCleanService

@ -31,12 +33,16 @@ def clean_messages():
        )

        # Create and run the cleanup service
-        service = MessagesCleanService.from_days(
-            policy=policy,
-            days=dify_config.SANDBOX_EXPIRED_RECORDS_RETENTION_DAYS,
-            batch_size=dify_config.SANDBOX_EXPIRED_RECORDS_CLEAN_BATCH_SIZE,
-        )
-        stats = service.run()
+        # lock the task to avoid concurrent execution in case of the future data volume growth
+        with redis_client.lock(
+            "retention:clean_messages", timeout=dify_config.SANDBOX_EXPIRED_RECORDS_CLEAN_TASK_LOCK_TTL, blocking=False
+        ):
+            service = MessagesCleanService.from_days(
+                policy=policy,
+                days=dify_config.SANDBOX_EXPIRED_RECORDS_RETENTION_DAYS,
+                batch_size=dify_config.SANDBOX_EXPIRED_RECORDS_CLEAN_BATCH_SIZE,
+            )
+            stats = service.run()

        end_at = time.perf_counter()
        click.echo(
@ -50,6 +56,16 @@ def clean_messages():
                fg="green",
            )
        )
+    except LockError:
+        end_at = time.perf_counter()
+        logger.exception("clean_messages: acquire task lock failed, skip current execution")
+        click.echo(
+            click.style(
+                f"clean_messages: skipped (lock already held) - latency: {end_at - start_at:.2f}s",
+                fg="yellow",
+            )
+        )
+        raise
    except Exception as e:
        end_at = time.perf_counter()
        logger.exception("clean_messages failed")
--- a/api/schedule/clean_workflow_runs_task.py
+++ b/api/schedule/clean_workflow_runs_task.py
@ -1,11 +1,16 @@
+import logging
 from datetime import UTC, datetime

 import click
+from redis.exceptions import LockError

 import app
 from configs import dify_config
+from extensions.ext_redis import redis_client
 from services.retention.workflow_run.clear_free_plan_expired_workflow_run_logs import WorkflowRunCleanup

+logger = logging.getLogger(__name__)
+

@app.celery.task(queue="retention")
 def clean_workflow_runs_task() -> None:
@ -25,19 +30,50 @@ def clean_workflow_runs_task() -> None:

    start_time = datetime.now(UTC)

-    WorkflowRunCleanup(
-        days=dify_config.SANDBOX_EXPIRED_RECORDS_RETENTION_DAYS,
-        batch_size=dify_config.SANDBOX_EXPIRED_RECORDS_CLEAN_BATCH_SIZE,
-        start_from=None,
-        end_before=None,
-    ).run()
+    try:
+        # lock the task to avoid concurrent execution in case of the future data volume growth
+        with redis_client.lock(
+            "retention:clean_workflow_runs_task",
+            timeout=dify_config.SANDBOX_EXPIRED_RECORDS_CLEAN_TASK_LOCK_TTL,
+            blocking=False,
+        ):
+            WorkflowRunCleanup(
+                days=dify_config.SANDBOX_EXPIRED_RECORDS_RETENTION_DAYS,
+                batch_size=dify_config.SANDBOX_EXPIRED_RECORDS_CLEAN_BATCH_SIZE,
+                start_from=None,
+                end_before=None,
+            ).run()

-    end_time = datetime.now(UTC)
-    elapsed = end_time - start_time
-    click.echo(
-        click.style(
-            f"Scheduled workflow run cleanup finished. start={start_time.isoformat()} "
-            f"end={end_time.isoformat()} duration={elapsed}",
-            fg="green",
+        end_time = datetime.now(UTC)
+        elapsed = end_time - start_time
+        click.echo(
+            click.style(
+                f"Scheduled workflow run cleanup finished. start={start_time.isoformat()} "
+                f"end={end_time.isoformat()} duration={elapsed}",
+                fg="green",
+            )
        )
-    )
+    except LockError:
+        end_time = datetime.now(UTC)
+        elapsed = end_time - start_time
+        logger.exception("clean_workflow_runs_task: acquire task lock failed, skip current execution")
+        click.echo(
+            click.style(
+                f"Scheduled workflow run cleanup skipped (lock already held). "
+                f"start={start_time.isoformat()} end={end_time.isoformat()} duration={elapsed}",
+                fg="yellow",
+            )
+        )
+        raise
+    except Exception as e:
+        end_time = datetime.now(UTC)
+        elapsed = end_time - start_time
+        logger.exception("clean_workflow_runs_task failed")
+        click.echo(
+            click.style(
+                f"Scheduled workflow run cleanup failed. start={start_time.isoformat()} "
+                f"end={end_time.isoformat()} duration={elapsed} - {str(e)}",
+                fg="red",
+            )
+        )
+        raise
--- a/api/tests/unit_tests/core/workflow/context/test_execution_context.py
+++ b/api/tests/unit_tests/core/workflow/context/test_execution_context.py
@ -5,6 +5,7 @@ from typing import Any
 from unittest.mock import MagicMock

 import pytest
+from pydantic import BaseModel

 from core.workflow.context.execution_context import (
    AppContext,
@ -12,6 +13,8 @@ from core.workflow.context.execution_context import (
    ExecutionContextBuilder,
    IExecutionContext,
    NullAppContext,
+    read_context,
+    register_context,
 )


@ -256,3 +259,31 @@ class TestCaptureCurrentContext:

        # Context variables should be captured
        assert result.context_vars is not None
+
+
+class TestTenantScopedContextRegistry:
+    def setup_method(self):
+        from core.workflow.context import reset_context_provider
+
+        reset_context_provider()
+
+    def teardown_method(self):
+        from core.workflow.context import reset_context_provider
+
+        reset_context_provider()
+
+    def test_tenant_provider_read_ok(self):
+        class SandboxContext(BaseModel):
+            base_url: str | None = None
+
+        register_context("workflow.sandbox", "t1", lambda: SandboxContext(base_url="http://t1"))
+        register_context("workflow.sandbox", "t2", lambda: SandboxContext(base_url="http://t2"))
+
+        assert read_context("workflow.sandbox", tenant_id="t1").base_url == "http://t1"
+        assert read_context("workflow.sandbox", tenant_id="t2").base_url == "http://t2"
+
+    def test_missing_provider_raises_keyerror(self):
+        from core.workflow.context import ContextProviderNotFoundError
+
+        with pytest.raises(ContextProviderNotFoundError):
+            read_context("missing", tenant_id="unknown")